February 2025 – Page 11

Transforming credit decisions using generative AI with Rich Data Co and AWS

This post is co-written with Gordon Campbell, Charles Guan, and Hendra Suryanto from RDC.

The mission of Rich Data Co (RDC) is to broaden access to sustainable credit globally. Its software-as-a-service (SaaS) solution empowers leading banks and lenders with deep customer insights and AI-driven decision-making capabilities.

Making credit decisions using AI can be challenging, requiring data science and portfolio teams to synthesize complex subject matter information and collaborate productively. To solve this challenge, RDC used generative AI, enabling teams to use its solution more effectively:

Data science assistant – Designed for data science teams, this agent assists teams in developing, building, and deploying AI models within a regulated environment. It aims to boost team efficiency by answering complex technical queries across the machine learning operations (MLOps) lifecycle, drawing from a comprehensive knowledge base that includes environment documentation, AI and data science expertise, and Python code generation.
Portfolio assistant – Designed for portfolio managers and analysts, this agent facilitates natural language inquiries about loan portfolios. It provides critical insights on performance, risk exposures, and credit policy alignment, enabling informed commercial decisions without requiring in-depth analysis skills. The assistant is adept at high-level questions (such as identifying high-risk segments or potential growth opportunities) and one-time queries, allowing the portfolio to be diversified.

In this post, we discuss how RDC uses generative AI on Amazon Bedrock to build these assistants and accelerate its overall mission of democratizing access to sustainable credit.

Solution overview: Building a multi-agent generative AI solution

We began with a carefully crafted evaluation set of over 200 prompts, anticipating common user questions. Our initial approach combined prompt engineering and traditional Retrieval Augmented Generation (RAG). However, we encountered a challenge: accuracy fell below 90%, especially for more complex questions.

To overcome the challenge, we adopted an agentic approach, breaking down the problem into specialized use cases. This strategy equipped us to align each task with the most suitable foundation model (FM) and tools. Our multi-agent framework is orchestrated using LangGraph, and it consisted of:

Orchestrator – The orchestrator is responsible for routing user questions to the appropriate agent. In this example, we start with the data science or portfolio agent. However, we envision many more agents in the future. The orchestrator can also use user context, such as the user’s role, to determine routing to the appropriate agent.
Agent – The agent is designed for a specialized task. It’s equipped with the appropriate FM for the task and the necessary tools to perform actions and access knowledge. It can also handle multiturn conversations and orchestrate multiple calls to the FM to reach a solution.
Tools – Tools extend agent capabilities beyond the FM. They provide access to external data and APIs or enable specific actions and computation. To efficiently use the model’s context window, we construct a tool selector that retrieves only the relevant tools based on the information in the agent state. This helps simplify debugging in the case of errors, ultimately making the agent more effective and cost-efficient.

This approach gives us the right tool for the right job. It enhances our ability to handle complex queries efficiently and accurately while providing flexibility for future improvements and agents.

The following image is a high-level architecture diagram of the solution.

Data science agent: RAG and code generation

To boost productivity of data science teams, we focused on rapid comprehension of advanced knowledge, including industry-specific models from a curated knowledge base. Here, RDC provides an integrated development environment (IDE) for Python coding, catering to various team roles. One role is model validator, who rigorously assesses whether a model aligns with bank or lender policies. To support the assessment process, we designed an agent with two tools:

Content retriever tool – Amazon Bedrock Knowledge Bases powers our intelligent content retrieval through a streamlined RAG implementation. The service automatically converts text documents to their vector representation using Amazon Titan Text Embeddings and stores them in Amazon OpenSearch Serverless. Because the knowledge is vast, it performs semantic chunking, making sure that the knowledge is organized by topic and can fit within the FM’s context window. When users interact with the agent, Amazon Bedrock Knowledge Bases using OpenSearch Serverless provides fast, in-memory semantic search, enabling the agent to retrieve the most relevant chunks of knowledge for relevant and contextual responses to users.
Code generator tool – With code generation, we selected Anthropic’s Claude model on Amazon Bedrock due to its inherent ability to understand and generate code. This tool is grounded to answer queries related to data science and can generate Python code for quick implementation. It’s also adept at troubleshooting coding errors.

Portfolio agent: Text-to-SQL and self-correction

To boost the productivity of credit portfolio teams, we focused on two key areas. For portfolio managers, we prioritized high-level commercial insights. For analysts, we enabled deep-dive data exploration. This approach empowered both roles with rapid understanding and actionable insights, streamlining decision-making processes across teams.

Our solution required natural language understanding of structured portfolio data stored in Amazon Aurora. This led us to base our solution on a text-to-SQL model to efficiently bridge the gap between natural language and SQL.

To reduce errors and tackle complex queries beyond the model’s capabilities, we developed three tools using Anthropic’s Claude model on Amazon Bedrock for self-correction:

Check query tool – Verifies and corrects SQL queries, addressing common issues such as data type mismatches or incorrect function usage
Check result tool – Validates query results, providing relevance and prompting retries or user clarification when needed
Retry from user tool – Engages users for additional information when queries are too broad or lack detail, guiding the interaction based on database information and user input

These tools operate in an agentic system, enabling accurate database interactions and improved query results through iterative refinement and user engagement.

To improve accuracy, we tested model fine-tuning, training the model on common queries and context (such as database schemas and their definitions). This approach reduces inference costs and improves response times compared to prompting at each call. Using Amazon SageMaker JumpStart, we fine-tuned Meta’s Llama model by providing a set of anticipated prompts, intended answers, and associated context. Amazon SageMaker Jumpstart offers a cost-effective alternative to third-party models, providing a viable pathway for future applications. However, we didn’t end up deploying the fine-tuned model because we experimentally observed that prompting with Anthropic’s Claude model provided better generalization, especially for complex questions. To reduce operational overhead, we will also evaluate structured data retrieval on Amazon Bedrock Knowledge Bases.

Conclusion and next steps with RDC

To expedite development, RDC collaborated with AWS Startups and the AWS Generative AI Innovation Center. Through an iterative approach, RDC rapidly enhanced its generative AI capabilities, deploying the initial version to production in just 3 months. The solution successfully met the stringent security standards required in regulated banking environments, providing both innovation and compliance.

“The integration of generative AI into our solution marks a pivotal moment in our mission to revolutionize credit decision-making. By empowering both data scientists and portfolio managers with AI assistants, we’re not just improving efficiency—we’re transforming how financial institutions approach lending.”

–Gordon Campbell, Co-Founder & Chief Customer Officer at RDC

RDC envisions generative AI playing a significant role in boosting the productivity of the banking and credit industry. By using this technology, RDC can provide key insights to customers, improve solution adoption, accelerate the model lifecycle, and reduce the customer support burden. Looking ahead, RDC plans to further refine and expand its AI capabilities, exploring new use cases and integrations as the industry evolves.

For more information about how to work with RDC and AWS and to understand how we’re supporting banking customers around the world to use AI in credit decisions, contact your AWS Account Manager or visit Rich Data Co.

For more information about generative AI on AWS, refer to the following resources:

About the Authors

Daniel Wirjo is a Solutions Architect at AWS, focused on FinTech and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Iman Abbasnejad is a computer scientist at the Generative AI Innovation Center at Amazon Web Services (AWS) working on Generative AI and complex multi-agents systems.

Gordon Campbell is the Chief Customer Officer and Co-Founder of RDC, where he leverages over 30 years in enterprise software to drive RDC’s leading AI Decisioning platform for business and commercial lenders. With a proven track record in product strategy and development across three global software firms, Gordon is committed to customer success, advocacy, and advancing financial inclusion through data and AI.

Charles Guan is the Chief Technology Officer and Co-founder of RDC. With more than 20 years of experience in data analytics and enterprise applications, he has driven technological innovation across both the public and private sectors. At RDC, Charles leads research, development, and product advancement—collaborating with universities to leverage advanced analytics and AI. He is dedicated to promoting financial inclusion and delivering positive community impact worldwide.

Hendra Suryanto is the Chief Data Scientist at RDC with more than 20 years of experience in data science, big data, and business intelligence. Before joining RDC, he served as a Lead Data Scientist at KPMG, advising clients globally. At RDC, Hendra designs end-to-end analytics solutions within an Agile DevOps framework. He holds a PhD in Artificial Intelligence and has completed postdoctoral research in machine learning.

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

AI agents are rapidly becoming the next frontier in enterprise transformation, with 82% of organizations planning adoption within the next 3 years. According to a Capgemini survey of 1,100 executives at large enterprises, 10% of organizations already use AI agents, and more than half plan to use them in the next year. The recent release of the DeepSeek-R1 models brings state-of-the-art reasoning capabilities to the open source community. Organizations can build agentic applications using these reasoning models to execute complex tasks with advanced decision-making capabilities, enhancing efficiency and adaptability.

In this post, we dive into how organizations can use Amazon SageMaker AI, a fully managed service that allows you to build, train, and deploy ML models at scale, and can build AI agents using CrewAI, a popular agentic framework and open source models like DeepSeek-R1.

Agentic design vs. traditional software design

Agentic systems offer a fundamentally different approach compared to traditional software, particularly in their ability to handle complex, dynamic, and domain-specific challenges. Unlike traditional systems, which rely on rule-based automation and structured data, agentic systems, powered by large language models (LLMs), can operate autonomously, learn from their environment, and make nuanced, context-aware decisions. This is achieved through modular components including reasoning, memory, cognitive skills, and tools, which enable them to perform intricate tasks and adapt to changing scenarios.

Traditional software platforms, though effective for routine tasks and horizontal scaling, often lack the domain-specific intelligence and flexibility that agentic systems provide. For example, in a manufacturing setting, traditional systems might track inventory but lack the ability to anticipate supply chain disruptions or optimize procurement using real-time market insights. In contrast, an agentic system can process live data such as inventory fluctuations, customer preferences, and environmental factors to proactively adjust strategies and reroute supply chains during disruptions.

Enterprises should strategically consider deploying agentic systems in scenarios where adaptability and domain-specific expertise are critical. For instance, consider customer service. Traditional chatbots are limited to preprogrammed responses to expected customer queries, but AI agents can engage with customers using natural language, offer personalized assistance, and resolve queries more efficiently. AI agents can significantly improve productivity by automating repetitive tasks, such as generating reports, emails, and software code. The deployment of agentic systems should focus on well-defined processes with clear success metrics and where there is potential for greater flexibility and less brittleness in process management.

DeepSeek-R1

In this post, we show you how to deploy DeepSeek-R1 on SageMaker, particularly the Llama-70b distilled variant DeepSeek-R1-Distill-Llama-70B to a SageMaker real-time endpoint. DeepSeek-R1 is an advanced LLM developed by the AI startup DeepSeek. It employs reinforcement learning techniques to enhance its reasoning capabilities, enabling it to perform complex tasks such as mathematical problem-solving and coding. To learn more about DeepSeek-R1, refer to DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart and deep dive into the thesis behind building DeepSeek-R1.

Generative AI on SageMaker AI

SageMaker AI, a fully managed service, provides a comprehensive suite of tools designed to deliver high-performance, cost-efficient machine learning (ML) and generative AI solutions for diverse use cases. SageMaker AI empowers you to build, train, deploy, monitor, and govern ML and generative AI models through an extensive range of services, including notebooks, jobs, hosting, experiment tracking, a curated model hub, and MLOps features, all within a unified integrated development environment (IDE).

SageMaker AI simplifies the process for generative AI model builders of all skill levels to work with foundation models (FMs):

Amazon SageMaker Canvas enables data scientists to seamlessly use their own datasets alongside FMs to create applications and architectural patterns, such as chatbots and Retrieval Augmented Generation (RAG), in a low-code or no-code environment.
Amazon SageMaker JumpStart offers a diverse selection of open and proprietary FMs from providers like Hugging Face, Meta, and Stability AI. You can deploy or fine-tune models through an intuitive UI or APIs, providing flexibility for all skill levels.
SageMaker AI features like notebooks, Amazon SageMaker Training, inference, Amazon SageMaker for MLOps, and Partner AI Apps enable advanced model builders to adapt FMs using LoRA, full fine-tuning, or training from scratch. These services support single GPU to HyperPods (cluster of GPUs) for training and include built-in FMOps tools for tracking, debugging, and deployment.

With SageMaker AI, you can build generative AI-powered agentic workflows using a framework of your choice. Some of the key benefits of using SageMaker AI for fine-tuning and hosting LLMs or FMs include:

Ease of deployment – SageMaker AI offers access to SageMaker JumpStart, a curated model hub where models with open weights are made available for seamless deployment through a few clicks or API calls. Additionally, for Hugging Face Hub models, SageMaker AI provides pre-optimized containers built on popular open source hosting frameworks such as vLLM, NVIDIA Triton, and Hugging Face Text Generation Inference (TGI). You simply need to specify the model ID, and the model can be deployed quickly.
Instance-based deterministic pricing – SageMaker AI hosted models are billed based on instance-hours rather than token usage. This pricing model enables you to more accurately predict and manage generative AI inference costs while scaling resources to accommodate incoming request loads.
Deployments with quantization – SageMaker AI enables you to optimize models prior to deployment using advanced strategies such as quantized deployments (such as AWQ, GPTQ, float16, int8, or int4). This flexibility allows you to efficiently deploy large models, such as a 32-billion parameter model, onto smaller instance types like ml.g5.2xlarge with 24 GB of GPU memory, significantly reducing resource requirements while maintaining performance.
Inference load balancing and optimized routing – SageMaker endpoints support load balancing and optimized routing with various strategies, providing users with enhanced flexibility and adaptability to accommodate diverse use cases effectively.
SageMaker fine-tuning recipes – SageMaker offers ready-to-use recipes for quickly training and fine-tuning publicly available FMs such as Meta’s Llama 3, Mistral, and Mixtral. These recipes use Amazon SageMaker HyperPod (a SageMaker AI service that provides resilient, self-healing clusters optimized for large-scale ML workloads), enabling efficient and resilient training on a GPU cluster for scalable and robust performance.

Solution overview

CrewAI provides a robust framework for developing multi-agent systems that integrate with AWS services, particularly SageMaker AI. CrewAI’s role-based agent architecture and comprehensive performance monitoring capabilities work in tandem with Amazon CloudWatch.

The framework excels in workflow orchestration and maintains enterprise-grade security standards aligned with AWS best practices, making it an effective solution for organizations implementing sophisticated agent-based systems within their AWS infrastructure.

In this post, we demonstrate how to use CrewAI to create a multi-agent research workflow. This workflow creates two agents: one that researches on a topic on the internet, and a writer agent takes this research and acts like an editor by formatting it in a readable format. Additionally, we guide you through deploying and integrating one or multiple LLMs into structured workflows, using tools for automated actions, and deploying these workflows on SageMaker AI for a production-ready deployment.

The following diagram illustrates the solution architecture.

Prerequisites

To follow along with the code examples in the rest of this post, make sure the following prerequisites are met:

Integrated development environment – This includes the following:
- (Optional) Access to Amazon SageMaker Studio and the JupyterLab IDE – We will use a Python runtime environment to build agentic workflows and deploy LLMs. Having access to a JupyterLab IDE with Python 3.9, 3.10, or 3.11 runtimes is recommended. You can also set up Amazon SageMaker Studio for single users. For more details, see Use quick setup for Amazon SageMaker AI. Create a new SageMaker JupyterLab Space for a quick JupyterLab notebook for experimentation. To learn more, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools.
- Local IDE – You can also follow along in your local IDE (such as PyCharm or VSCode), provided that Python runtimes have been configured for site to AWS VPC connectivity (to deploy models on SageMaker AI).
Permission to deploy models – Make sure that your user execution role has the necessary permissions to deploy models to a SageMaker real-time endpoint for inference. For more information, refer to Deploy models for inference.
Access to Hugging Face Hub – You must have access to Hugging Face Hub’s deepseek-ai/DeepSeek-R1-Distill-Llama-8B model weights from your environment.
Access to code – The code used in this post is available in the following GitHub repo.

Simplified LLM hosting on SageMaker AI

Before orchestrating agentic workflows with CrewAI powered by an LLM, the first step is to host and query an LLM using SageMaker real-time inference endpoints. There are two primary methods to host LLMs on SageMaker AI:

Deploy from SageMaker JumpStart
Deploy from Hugging Face Hub

Deploy DeepSeek from SageMaker JumpStart

SageMaker JumpStart offers access to a diverse array of state-of-the-art FMs for a wide range of tasks, including content writing, code generation, question answering, copywriting, summarization, classification, information retrieval, and more. It simplifies the onboarding and maintenance of publicly available FMs, allowing you to access, customize, and seamlessly integrate them into your ML workflows. Additionally, SageMaker JumpStart provides solution templates that configure infrastructure for common use cases, along with executable example notebooks to streamline ML development with SageMaker AI.

The following screenshot shows an example of available models on SageMaker JumpStart.

To get started, complete the following steps:

Install the latest version of the sagemaker-python-sdk using pip.
Run the following command in a Jupyter cell or the SageMaker Studio terminal:

pip install -U sagemaker

List all available LLMs under the Hugging Face or Meta JumpStart hub. The following code is an example of how to do this programmatically using the SageMaker Python SDK:

from sagemaker.jumpstart.filters import (And, Or)
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# generate a conditional filter to only select LLMs from HF or Meta
filter_value = Or(
    And("task == llm", "framework == huggingface"), 
    "framework == meta", "framework == deekseek"
)

# Retrieve all available JumpStart models
all_models = list_jumpstart_models(filter=filter_value)

For example, deploying the deepseek-llm-r1 model directly from SageMaker JumpStart requires only a few lines of code:

from sagemaker.jumpstart.model import JumpStartModel

model_id = " deepseek-llm-r1" 
model_version = "*"

# instantiate a new JS meta model
model = JumpStartModel(
    model_id=model_id, 
    model_version=model_version
)

# deploy model on a 1 x p5e instance 
predictor = model.deploy(
    accept_eula=True, 
    initial_instance_count=1, 
    # endpoint_name="deepseek-r1-endpoint" # optional endpoint name
)

We recommend deploying your SageMaker endpoints within a VPC and a private subnet with no egress, making sure that the models remain accessible only within your VPC for enhanced security.

We also recommend you integrate with Amazon Bedrock Guardrails for increased safeguards against harmful content. For more details on how to implement Amazon Bedrock Guardrails on a self-hosted LLM, see Implement model-independent safety measures with Amazon Bedrock Guardrails.

Deploy DeepSeek from Hugging Face Hub

Alternatively, you can deploy your preferred model directly from the Hugging Face Hub or the Hugging Face Open LLM Leaderboard to a SageMaker endpoint. Hugging Face LLMs can be hosted on SageMaker using a variety of supported frameworks, such as NVIDIA Triton, vLLM, and Hugging Face TGI. For a comprehensive list of supported deep learning container images, refer to the available Amazon SageMaker Deep Learning Containers. In this post, we use a DeepSeek-R1-Distill-Llama-70B SageMaker endpoint using the TGI container for agentic AI inference. We deploy the model from Hugging Face Hub using Amazon’s optimized TGI container, which provides enhanced performance for LLMs. This container is specifically optimized for text generation tasks and automatically selects the most performant parameters for the given hardware configuration. To deploy from Hugging Face Hub, refer to the GitHub repo or the following code snippet:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
import os
from datetime import datetime

# Model configuration
hub = {'HF_MODEL_ID':'deepseek-ai/DeepSeek-R1-Distill-Llama-70B', #Llama-3.3-70B-Instruct
       'SM_NUM_GPUS': json.dumps(number_of_gpu),
       'HF_TOKEN': HUGGING_FACE_HUB_TOKEN,
       'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',  # Set to INFO level
       'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True'  # configure CUDA memory to use expandable memory segments
}
# Create and deploy model
huggingface_model =   HuggingFaceModel(image_uri=get_huggingface_llm_image_uri("huggingface", 
version="2.3.1"),
env=hub,
role=role,sagemaker_session=sagemaker_session)
predictor = huggingface_model.deploy(
               initial_instance_count=1,
               instance_type="ml.p4d.24xlarge"
               endpoint_name=custom_endpoint_name,
               container_startup_health_check_timeout=900)

A new DeepSeek-R1-Distill-Llama-70B endpoint should be InService in under 10 minutes. If you want to change the model from DeepSeek to another model from the hub, simply replace the following parameter or refer to the DeepSeek deploy example in the following GitHub repo. To learn more about deployment parameters that can be reconfigured inside TGI containers at runtime, refer to the following GitHub repo on TGI arguments.

...
"HF_MODEL_ID": "deepseek-ai/...", # replace with any HF hub models
# "HF_TOKEN": "hf_..." # add your token id for gated models
...

For open-weight models deployed directly from hubs, we strongly recommend placing your SageMaker endpoints within a VPC and a private subnet with no egress, making sure that the models remain accessible only within your VPC for a secure deployment.

Build a simple agent with CrewAI

CrewAI offers the ability to create multi-agent and very complex agentic orchestrations using LLMs from several LLM providers, including SageMaker AI and Amazon Bedrock. In the following steps, we create a simple blocks counting agent to serve as an example.

Create a blocks counting agent

The following code sets up a simple blocks counter workflow using CrewAI with two main components:

Agent creation (blocks_counter_agent) – The agent is configured with a specific role, goal, and capabilities. This agent is equipped with a tool called BlocksCounterTool.
Task definition (count_task) – This is a task that we want this agent to execute. The task includes a template for counting how many of each color of blocks are present, where {color} will be replaced with actual color of the block. The task is assigned to blocks_counter_agent.

from crewai import Agent, Task
from pydantic import BaseModel, Field

# 1. Configure agent
blocks_counter_agent = Agent(
    role="Blocks Inventory Manager",
    goal="Maintain accurate block counts",
    tools=[BlocksCounterTool],
    verbose=True
)

# 2. Create counting task
count_task = Task(
    description="Count {color} play blocks in storage",
    expected_output="Exact inventory count for specified color",
    agent=blocks_counter_agent
)

As you can see in the preceding code, each agent begins with two essential components: an agent definition that establishes the agent’s core characteristics (including its role, goal, backstory, available tools, LLM model endpoint, and so on), and a task definition that specifies what the agent needs to accomplish, including the detailed description of work, expected outputs, and the tools it can use during execution.

This structured approach makes sure that agents have both a clear identity and purpose (through the agent definition) and a well-defined scope of work (through the task definition), enabling them to operate effectively within their designated responsibilities.

Tools for agentic AI

Tools are special functions that give AI agents the ability to perform specific actions, like searching the internet or analyzing data. Think of them as apps on a smartphone—each tool serves a specific purpose and extends what the agent can do. In our example, BlocksCounterTool helps the agent count the number of blocks organized by color.

Tools are essential because they let agents do real-world tasks instead of just thinking about them. Without tools, agents would be like smart speakers that can only talk—they could process information but couldn’t take actual actions. By adding tools, we transform agents from simple chat programs into practical assistants that can accomplish real tasks.

Out-of-the-box tools with CrewAI
Crew AI offers a range of tools out of the box for you to use along with your agents and tasks. The following table lists some of the available tools.

Category	Tool	Description
Data Processing Tools	FileReadTool	For reading various file formats
Web Interaction Tools	WebsiteSearchTool	For web content extraction
Media Tools	YoutubeChannelSearchTool	For searching YouTube channels
Document Processing	PDFSearchTool	For searching PDF documents
Development Tools	CodeInterpreterTool	For Python code interpretation
AI Services	DALL-E Tool	For image generation

Build custom tools with CrewAI
You can build custom tools in CrewAI in two ways: by subclassing BaseTool or using the @tool decorator. Let’s look at the following BaseTool subclassing option to create the BlocksCounterTool we used earlier:

from crewai.tools import BaseTool

class BlocksCounterTool(BaseTool):
    name = "blocks_counter" 
    description = "Simple tool to count play blocks"

    def _run(self, color: str) -> str:
        return f"There are 10 {color} play blocks available"

Build a multi-agent workflow with CrewAI, DeepSeek-R1, and SageMaker AI

Multi-agent AI systems represent a powerful approach to complex problem-solving, where specialized AI agents work together under coordinated supervision. By combining CrewAI’s workflow orchestration capabilities with SageMaker AI based LLMs, developers can create sophisticated systems where multiple agents collaborate efficiently toward a specific goal. The code used in this post is available in the following GitHub repo.

Let’s build a research agent and writer agent that work together to create a PDF about a topic. We will use a DeepSeek-R1 Distilled Llama 3.3 70B model as a SageMaker endpoint for the LLM inference.

Define your own DeepSeek SageMaker LLM (using LLM base class)

The following code integrates SageMaker hosted LLMs with CrewAI by creating a custom inference tool that formats prompts with system instructions for factual responses, uses Boto3, an AWS core library, to call SageMaker endpoints, and processes responses by separating reasoning (before </think>) from final answers. This enables CrewAI agents to use deployed models while maintaining structured output patterns.

# Calls SageMaker endpoint for DeepSeek inference
def deepseek_llama_inference(prompt: dict, endpoint_name: str, region: str = "us-east-2") -> dict:
    try:
        # ... Response parsing Code...

    except Exception as e:
        raise RuntimeError(f"Error while calling SageMaker endpoint: {e}")

# CrewAI-compatible LLM implementation for DeepSeek models on SageMaker.
class DeepSeekSageMakerLLM(LLM):
    def __init__(self, endpoint: str):
        # <... Initialize LLM with SageMaker endpoint ...>

    def call(self, prompt: Union[List[Dict[str, str]], str], **kwargs) -> str:
        # <... Format and return the final response ...>

Name the DeepSeek-R1 Distilled endpoint
Set the endpoint name as defined earlier when you deployed DeepSeek from the Hugging Face Hub:

deepseek_endpoint = "deepseek-r1-dist-v3-llama70b-2025-01-22"

Create a DeepSeek inference tool
Just like how we created the BlocksCounterTool earlier, let’s create a tool that uses the DeepSeek endpoint for our agents to use. We use the same BaseTool subclass here, but we hide it in the CustomTool class implementation in sage_tools.py in the tools folder. For more information, refer to the GitHub repo.

from crewai import Crew, Agent, Task, Process 

# Create the Tool for LLaMA inference
deepseek_tool = CustomTool(
    name="deepseek_llama_3.3_70B",
    func=lambda inputs: deepseek_llama_inference(
        prompt=inputs,
        endpoint_name=deepseek_endpoint
    ),
    description="A tool to generate text using the DeepSeek LLaMA model deployed on SageMaker."
)

Create a research agent
Just like the simple blocks agent we defined earlier, we follow the same template here to define the research agent. The difference here is that we give more capabilities to this agent. We attach a SageMaker AI based DeepSeek-R1 model as an endpoint for the LLM.

This helps the research agent think critically about information processing by combining the scalable infrastructure of SageMaker with DeepSeek-R1’s advanced reasoning capabilities.

The agent uses the SageMaker hosted LLM to analyze patterns in research data, evaluate source credibility, and synthesize insights from multiple inputs. By using the deepseek_tool, the agent can dynamically adjust its research strategy based on intermediate findings, validate hypotheses through iterative questioning, and maintain context awareness across complex information it gathers.

# Research Agent

research_agent = Agent(
    role="Research Bot",
    goal="Scan sources, extract relevant information, and compile a research summary.",
    backstory="An AI agent skilled in finding relevant information from a variety of sources.",
    tools=[deepseek_tool],
    allow_delegation=True,
    llm=DeepSeekSageMakerLLM(endpoint=deepseek_endpoint),
    verbose=False
)

Create a writer agent
The writer agent is configured as a specialized content editor that takes research data and transforms it into polished content. This agent works as part of a workflow where it takes research from a research agent and acts like an editor by formatting the content into a readable format. The agent is used for writing and formatting, and unlike the research agent, it doesn’t delegate tasks to other agents.

writer_agent = Agent(
    role="Writer Bot",
    goal="Receive research summaries and transform them into structured content.",
    backstory="A talented writer bot capable of producing high-quality, structured content based on research.",
    tools=[deepseek_tool],
    allow_delegation=False,
    llm=DeepSeekSageMakerLLM(endpoint=deepseek_endpoint),
    verbose=False
)

Define tasks for the agents
Tasks in CrewAI define specific operations that agents need to perform. In this example, we have two tasks: a research task that processes queries and gathers information, and a writing task that transforms research data into polished content.

Each task includes a clear description of what needs to be done, the expected output format, and specifies which agent will perform the work. This structured approach makes sure that agents have well-defined responsibilities and clear deliverables.

Together, these tasks create a workflow where one agent researches a topic on the internet, and another agent takes this research and formats it into readable content. The tasks are integrated with the DeepSeek tool for advanced language processing capabilities, enabling a production-ready deployment on SageMaker AI.

research_task = Task(
    description=(
        "Your task is to conduct research based on the following query: {prompt}.n"
    ),
    expected_output="A comprehensive research summary based on the provided query.",
    agent=research_agent,
    tools=[deepseek_tool]
)

writing_task = Task(
    description=(
              "Your task is to create structured content based on the research provided.n""),
    expected_output="A well-structured article based on the research summary.",
    agent=research_agent,
    tools=[deepseek_tool]
)

Define a crew in CrewAI
A crew in CrewAI represents a collaborative group of agents working together to achieve a set of tasks. Each crew defines the strategy for task execution, agent collaboration, and the overall workflow. In this specific example, the sequential process makes sure tasks are executed one after the other, following a linear progression. There are other more complex orchestrations of agents working together, which we will discuss in future blog posts.

This approach is ideal for projects requiring tasks to be completed in a specific order. The workflow creates two agents: a research agent and a writer agent. The research agent researches a topic on the internet, then the writer agent takes this research and acts like an editor by formatting it into a readable format.

Let’s call the crew scribble_bots:

# Define the Crew for Sequential Workflow # 

scribble_bots = Crew( agents=[research_agent, writer_agent], 
       tasks=[research_task, writing_task], 
       process=Process.sequential # Ensure tasks execute in sequence)

Use the crew to run a task
We have our endpoint deployed, agents created, and crew defined. Now we’re ready to use the crew to get some work done. Let’s use the following prompt:

result = scribble_bots.kickoff(inputs={"prompt": "What is DeepSeek?"})

Our result is as follows:

**DeepSeek: Pioneering AI Solutions for a Smarter Tomorrow**

In the rapidly evolving landscape of artificial intelligence, 
DeepSeek stands out as a beacon of innovation and practical application. 
As an AI company, DeepSeek is dedicated to advancing the field through cutting-edge research and real-world applications, 
making AI accessible and beneficial across various industries.

**Focus on AI Research and Development**

………………….. ………………….. ………………….. …………………..

Clean up

Complete the following steps to clean up your resources:

Delete your GPU DeekSeek-R1 endpoint:

import boto3

# Create a low-level SageMaker service client.
sagemaker_client = boto3.client('sagemaker', region_name=<region>)

# Delete endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)

If you’re using a SageMaker Studio JupyterLab notebook, shut down the JupyterLab notebook instance.

Conclusion

In this post, we demonstrated how you can deploy an LLM such as DeepSeek-R1—or another FM of your choice—from popular model hubs like SageMaker JumpStart or Hugging Face Hub to SageMaker AI for real-time inference. We explored inference frameworks like Hugging Face TGI which helps streamline deployment while integrating built-in performance optimizations to minimize latency and maximize throughput. Additionally, we showcased how the SageMaker developer-friendly Python SDK simplifies endpoint orchestration, allowing seamless experimentation and scaling of LLM-powered applications.

Beyond deployment, this post provided an in-depth exploration of agentic AI, guiding you through its conceptual foundations, practical design principles using CrewAI, and the seamless integration of state-of-the-art LLMs like DeepSeek-R1 as the intelligent backbone of an autonomous agentic workflow. We outlined a sequential CrewAI workflow design, illustrating how to equip LLM-powered agents with specialized tools that enable autonomous data retrieval, real-time processing, and interaction with complex external systems.

Now, it’s your turn to experiment! Dive into our publicly available code on GitHub, and start building your own DeepSeek-R1-powered agentic AI system on SageMaker. Unlock the next frontier of AI-driven automation—seamlessly scalable, intelligent, and production-ready.

Special thanks to Giuseppe Zappia, Poli Rao, and Siamak Nariman for their support with this blog post.

About the Authors

Surya Kari is a Senior Generative AI Data Scientist at AWS, specializing in developing solutions leveraging state-of-the-art foundation models. He has extensive experience working with advanced language models including DeepSeek-R1, the LLama family, and Qwen, focusing on their fine-tuning and optimization for specific scientific applications. His expertise extends to implementing efficient training pipelines and deployment strategies using AWS SageMaker, enabling the scaling of foundation models from development to production. He collaborates with customers to design and implement generative AI solutions, helping them navigate model selection, fine-tuning approaches, and deployment strategies to achieve optimal performance for their specific use cases.

Bobby Lindsey is a Machine Learning Specialist at Amazon Web Services. He’s been in technology for over a decade, spanning various technologies and multiple roles. He is currently focused on combining his background in software engineering, DevOps, and machine learning to help customers deliver machine learning workflows at scale. In his spare time, he enjoys reading, research, hiking, biking, and trail running.

Karan Singh is a Generative AI Specialist for third-party models at AWS, where he works with top-tier third-party foundation model (FM) providers to develop and execute joint Go-To-Market strategies, enabling customers to effectively train, deploy, and scale FMs to solve industry specific challenges. Karan holds a Bachelor of Science in Electrical and Instrumentation Engineering from Manipal University, a master’s in science in Electrical Engineering from Northwestern University and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Automate bulk image editing with Crop.photo and Amazon Rekognition

Evolphin Software, Inc. is a leading provider of digital and media asset management solutions based in Silicon Valley, California. Crop.photo from Evolphin Software is a cloud-based service that offers powerful bulk processing tools for automating image cropping, content resizing, background removal, and listing image analysis.

Crop.photo is tailored for high-end retailers, ecommerce platforms, and sports organizations. The solution has created a unique offering for bulk image editing through its advanced AI-driven solutions. In this post, we explore how Crop.photo uses Amazon Rekognition to provide sophisticated image analysis, enabling automated and precise editing of large volumes of images. This integration streamlines the image editing process for clients, providing speed and accuracy, which is crucial in the fast-paced environments of ecommerce and sports.

Automation: The way out of bulk image editing challenges

Bulk image editing isn’t just about handling a high volume of images, it’s about delivering flawless results with speed at scale. Large retail brands, marketplaces, and sports industries process thousands of images weekly. Each image must be catalog-ready or broadcast-worthy in minutes, not hours.

The challenge lies not just in the quantity but in maintaining high-quality images and brand integrity. Speed and accuracy are non-negotiable. Retailers and sports organizations expect rapid turnaround without compromising image integrity.

This is where Crop.photo’s smart automations come in with an innovative solution for high-volume image processing needs. The platform’s advanced AI algorithms can automatically detect subjects of interest, crop the images, and optimize thousands of images simultaneously while providing consistent quality and brand compliance. By automating repetitive editing tasks, Crop.photo enables enterprises to reduce image processing time from hours to minutes, allowing creative teams to focus on higher-value activities.

Challenges in the ecommerce industry

The ecommerce industry often encounters the following challenges:

Inefficiencies and delays in manual image editing – Ecommerce companies rely on manual editing for tasks like resizing, alignment, and background removal. This process can be time-consuming and prone to delays and inconsistencies. A more efficient solution is needed to streamline the editing process, especially during platform migrations or large updates.
Maintaining uniformity across diverse image types – Companies work with a variety of image types, from lifestyle shots to product close-ups, across different categories. Maintaining uniformity and professionalism in all image types is essential to meet the diverse needs of marketing, product cataloging, and overall brand presentation.
Large-scale migration and platform transition – Transitioning to a new ecommerce platform involves migrating thousands of images, which presents significant logistical challenges. Providing consistency and quality across a diverse range of images during such a large-scale migration is crucial for maintaining brand standards and a seamless user experience.

For a US top retailer, wholesale distribution channels posed a unique challenge. Thousands of fashion images need to be made for the marketplace with less than a day’s notice for flash sales. Their director of creative operations said,

“Crop.photo is an essential part of our ecommerce fashion marketplace workflow. With over 3,000 on-model product images to bulk crop each month, we rely on Crop.photo to enable our wholesale team to quickly publish new products on popular online marketplaces such as Macy’s, Nordstrom, and Bloomingdales. By increasing our retouching team’s productivity by over 70%, Crop.photo has been a game changer for us. Bulk crop images used to take days can now be done in a matter of seconds!”

Challenges in the sports industry

The sports industry often contends with the following challenges:

Bulk player headshot volume and consistency – Sports organizations face the challenge of bulk cropping and resizing hundreds of player headshots for numerous teams, frequently on short notice. Maintaining consistency and quality across a large volume of images can be difficult without AI.
Diverse player facial features – Players have varying facial features, such as different hair lengths, forehead sizes, and face dimensions. Adapting cropping processes to accommodate these differences traditionally requires manual adjustments for each image, which leads to inconsistencies and significant time investment.
Editorial time constraints – Tight editorial schedules and resource limitations are common in sports organizations. The time-consuming nature of manual cropping tasks strains editorial teams, particularly during high-volume periods like tournaments, where delays and rushed work can impact quality and timing.

An Imaging Manager at Europe’s Premier Football Organization expressed,

“We recently found ourselves with 40 images from a top flight English premier league club needing to be edited just 2 hours before kick-off. Using the Bulk AI headshot cropping for sports feature from Crop.photo, we had perfectly cropped headshots of the squad in just 5 minutes, making them ready for publishing in our website CMS just in time. We would never have met this deadline using manual processes. This level of speed was unthinkable before, and it’s why we’re actively recommending Crop.photo to other sports leagues.”

Solution overview

Crop.photo uses Amazon Rekognition to power a robust solution for bulk image editing. Amazon Rekognition offers features like object and scene detection, facial analysis, and image labeling, which they use to generate markers that drive a fully automated image editing workflow.

The following diagram presents a high-level architectural data flow highlighting several of the AWS services used in building the solution.

The solution consists of the following key components:

User authentication – Amazon Cognito is used for user authentication and user management.
Infrastructure deployment – Frontend and backend servers are used on Amazon Elastic Container Service (Amazon ECS) for container deployment, orchestration, and scaling.
Content delivery and caching – Amazon CloudFront is used to cache content, improving performance and routing traffic efficiently.
File uploads – Amazon Simple Storage Service (Amazon S3) enables transfer acceleration for fast, direct uploads to Amazon S3.
Media and job storage – Information about uploaded files and job execution is stored in Amazon Aurora.
Image processing – AWS Batch processes thousands of images in bulk.
Job management – Amazon Simple Queue Service (Amazon SQS) manages and queues jobs for processing, making sure they’re run in the correct order by AWS Batch.
Media analysis – Amazon Rekognition services analyze media files, including:
- Face Analysis to generate headless crops.
- Moderation to detect and flag profanity and explicit content.
- Label Detection to provide context for image processing and focus on relevant objects.
- Custom Labels to identify and verify brand logos and adhere to brand guidelines.
Asynchronous job notifications – Amazon Simple Notification Service (Amazon SNS), Amazon EventBridge, and Amazon SQS deliver asynchronous job completion notifications, manage events, and provide reliable and scalable processing.

Amazon Rekognition is an AWS computer vision service that powers Crop.photo’s automated image analysis. It enables object detection, facial recognition, and content moderation capabilities:

Face detection – The Amazon Rekognition face detection feature automatically identifies and analyzes faces in product images. You can use this feature for face-based cropping and optimization through adjustable bounding boxes in the interface.
Image color analysis – The color analysis feature examines image composition, identifying dominant colors and balance. This integrates with Crop.photo’s brand guidelines checker to provide consistency across product images.
Object detection – Object detection automatically identifies key elements in images, enabling smart cropping suggestions. The interface highlights detected objects, allowing you to prioritize specific elements during cropping.
Custom label detection – Custom label detection recognizes brand-specific items and assets. Companies can train models for their unique needs, automatically applying brand-specific cropping rules to maintain consistency.
Text detection (OCR) – The OCR capabilities of Amazon Recognition detect and preserve text within images during editing. The system highlights text areas to make sure critical product information remains legible after cropping.

Within the Crop.photo interface, users can upload videos through the standard interface, and the speech-to-text functionality will automatically transcribe any audio content. This transcribed text can then be used to enrich the metadata and descriptions associated with the product images or videos, improving searchability and accessibility for customers. Additionally, the brand guidelines check feature can be applied to the transcribed text, making sure that the written content aligns with the company’s branding and communication style.

The Crop.photo service follows a transparent pricing model that combines unlimited automations with a flexible image credit system. Users have unrestricted access to create and run as many automation workflows as needed, without any additional charges. The service includes a range of features at no extra cost, such as basic image operations, storage, and behind-the-scenes processing.

For advanced AI-powered image processing tasks, like smart cropping or background removal, users consume image credits. The number of credits required for each operation is clearly specified, allowing users to understand the costs upfront. Crop.photo offers several subscription plans with varying image credit allowances, enabling users to choose the plan that best fits their needs.

Results: Improved speed and precision

The automated image editing capabilities of Crop.photo with the integration of Amazon Rekognition has increased speed in editing, with 70% faster image retouching for ecommerce. With a 75% reduction in manual work, the turnaround time for new product images is reduced from 2–3 days to just 1 hour. Similarly, the bulk image editing process has been streamlined, allowing over 100,000 image collections to be processed per day using AWS Fargate. Advanced AI-powered image analysis and editing features provide consistent, high-quality images at scale, eliminating the need for manual review and approval of thousands of product images.

For instance, in the ecommerce industry, this integration facilitates automatic product detection and precise cropping, making sure every image meets specific marketplace and brand standards. In sports, it enables quick identification and cropping of player facial features, including head, eyes, and mouth, adapting to varying backgrounds and maintaining brand consistency.

The following images are before and after pictures for an ecommerce use case.

For a famous wine retailer in the United Kingdom, the integration of Amazon Rekognition with Crop.photo streamlined the processing of over 1,700 product images, achieving a 95% reduction in bulk image editing time, a confirmation to the efficiency of AI-powered enhancement.

Similarly, a top 10 global specialty retailer experienced a transformative impact on their ecommerce fashion marketplace workflow. By automating the cropping of over 3,000 on-model product images monthly, they boosted their retouching team’s productivity by over 70%, maintaining compliance with the varied image standards of multiple online marketplaces.

Conclusion

These case studies illustrate the tangible benefits of integrating Crop.photo with Amazon Rekognition, demonstrating how automation and AI can revolutionize the bulk image editing landscape for ecommerce and sports industries.

Crop.photo, from AWS Partner Evolphin Software, offers powerful bulk processing tools for automating image cropping, content resizing, and listing image analysis, using advanced AI-driven solutions. Crop.photo is tailored for high-end retailers, ecommerce platforms, and sports organizations. Its integration with Amazon Rekognition aims to streamline the image editing process for clients, providing speed and accuracy in the high-stakes environment of ecommerce and sports. Crop.photo plans additional AI capabilities with Amazon Bedrock generative AI frameworks to adapt to emerging digital imaging trends, so it remains an indispensable tool for its clients.

To learn more about Evolphin Software and Crop.photo, visit their website.

To learn more about Amazon Rekognition, refer to the Amazon Rekognition Developer Guide.

About the Authors

Rahul Bhargava, founder & CTO of Evolphin Software and Crop.photo, is reshaping how brands produce and manage visual content at scale. Through Crop.photo’s AI-powered tools, global names like Lacoste and Urban Outfitters, as well as ambitious Shopify retailers, are rethinking their creative production workflows. By leveraging cutting-edge Generative AI, he’s enabling brands of all sizes to scale their content creation efficiently while maintaining brand consistency.

Vaishnavi Ganesan is a Solutions Architect specializing in Cloud Security at AWS based in the San Francisco Bay Area. As a trusted technical advisor, Vaishnavi helps customers to design secure, scalable and innovative cloud solutions that drive both business value and technical excellence. Outside of work, Vaishnavi enjoys traveling and exploring different artisan coffee roasters.

John Powers is an Account Manager at AWS, who provides guidance to Evolphin Software and other organizations to help accelerate business outcomes leveraging AWS Technologies. John has a degree in Business Administration and Management with a concentration in Finance from Gonzaga University, and enjoys snowboarding in the Sierras in his free time.

Revolutionizing business processes with Amazon Bedrock and Appian’s generative AI skills

This blog post is co-written with Louis Prensky and Philip Kang from Appian.

The digital transformation wave has compelled enterprises to seek innovative solutions to streamline operations, enhance efficiency, and maintain a competitive edge. Recognizing the growing complexity of business processes and the increasing demand for automation, the integration of generative AI skills into environments has become essential. This strategic move addresses key challenges such as managing vast amounts of unstructured data, adhering to regulatory compliance, and automating repetitive tasks to boost productivity. Using robust infrastructure and advanced language models, these AI-driven tools enhance decision-making by providing valuable insights, improving operational efficiency by automating routine tasks, and helping with data privacy through built-in detection and management of sensitive information. For enterprises, this means achieving higher levels of operational excellence, significant cost savings, and scalable solutions that adapt to business growth. For customers, it translates to improved service quality, enhanced data protection, and a more dynamic, responsive service, ultimately driving better experiences and satisfaction.

Appian has led the charge by offering generative AI skills powered by a collaboration with Amazon Bedrock and Anthropic’s Claude large language models (LLMs). This partnership allows organizations to:

Enhance decision making with valuable insights
Improve operational efficiency by automating tasks
Help protect data privacy through built-in detection and management of sensitive information
Maintain compliance with HIPAA and FedRAMP compliant AI skills

Critically, by placing AI in the context of a wider environment, organizations can operationalize AI in processes that seamlessly integrate with existing software, pass work between digital workers and humans, and help achieve strong security and compliance.

Background

Appian, an AWS Partner with competencies in financial services, healthcare, and life sciences, is a leading provider of low-code automation software to streamline and optimize complex business processes for enterprises. The Appian AI Process Platform includes everything you need to design, automate, and optimize even the most complex processes, from start to finish. The world’s most innovative organizations trust Appian to improve their workflows, unify data, and optimize operations—resulting in accelerated growth and superior customer experiences.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Appian uses the robust infrastructure of Amazon Bedrock and Anthropic’s Claude LLMs to offer fully integrated, pre-built generative AI skills that help developers enhance and automate business processes using low-code development. These use case-driven tools automate common tasks in business processes, making AI-powered applications faster and easier to develop.

This blog post will cover how Appian AI skills build automation into organizations’ mission-critical processes to improve operational excellence, reduce costs, and build scalable solutions. Additionally, we’ll cover real-world examples of processes such as:

A mortgage lender that used AI-driven data extraction to reduce mortgage processing times from 16 weeks to 10 weeks.
A financial services company that achieved a four-fold reduction in data extraction time from trade-related emails.
A legal institution that used AI to reduce attorney time spent on contract review, enabling them to focus on other, high-value work.

Current challenges faced by enterprises

Modern enterprises face numerous challenges, including:

Managing vast amounts of unstructured data: Enterprises deal with immense volumes of data generated from various sources such as emails, documents, and customer interactions. Organizing, analyzing, and extracting valuable insights from unstructured data can be overwhelming without advanced AI capabilities.
Help protect data privacy and compliance: With increasing regulatory requirements around data privacy and protection, organizations must safeguard sensitive information, such as personally identifiable information (PII). Manual processes for data redaction and compliance checks are often error-prone and resource-intensive.
Streamlining repetitive and time-consuming tasks: Routine tasks such as data entry, document processing, and content classification consume significant time and effort. Automating these tasks can lead to substantial productivity gains and allow employees to focus on more strategic activities.
Adapting to rapidly changing market conditions: In a fast-paced business environment, organizations need to be agile and responsive. This requires real-time data analysis and decision-making capabilities that traditional systems might not provide. AI helps businesses quickly adapt to industry changes and customer demands.
Enhancing decision-making with accurate data insights: Making informed decisions requires access to accurate and timely data. However, extracting meaningful insights from large datasets can be challenging without advanced analytical tools. AI-powered solutions can process and analyze data at scale, providing valuable insights that drive better decision-making.

Appian AI service architecture

The architecture of the generative AI skills integrates both the Amazon Bedrock and Amazon Textract scalable infrastructure with Appian’s process management capabilities. This generative AI architecture is designed with private AI as the foundation and upholds those principles.

If a customer site isn’t located in an AWS Region that supports a feature, customers can send their data to a supported Region, as shown in the following figure.

The key components of this architecture include:

Appian AI Process Platform instances: The frontend serves as the primary application environment where users interact with the system application to upload documents, initiate workflows, and view processed results.
Appian AI service: This service functions as an intermediary layer between the Appian instances and AWS AI services (Amazon Textract and Amazon Bedrock). This layer encapsulates the logic required to interact with the AWS AI services to manage API calls, data formatting, and error handling.
Amazon Textract: This AWS service is used to automate the extraction of text and structured data from scanned documents and images and provide the extracted data in a structured format.
Amazon Bedrock: This AWS service provides advanced AI capabilities using FMs for tasks such as text summarization, sentiment analysis, and natural language understanding. This helps enhance the extracted data with deeper insights and contextual understanding.

Solution

Appian generative AI skills, powered by Amazon Bedrock with Anthropic’s Claude family of LLMs, are designed to jump-start the use of generative AI in your processes. The following figure showcases the diverse capabilities of Appian’s generative AI skills, demonstrating how they enable enterprises to seamlessly automate complex tasks.

Selecting an AI skill

Editing an AI skill

Each new skill provides a pre-populated prompt template tailored to specific tasks, alleviating the need to start from scratch. Businesses can select the desired action and customize the prompt for a perfect fit, enabling the automation of tasks such as:

Content analysis and processing: With Appian’s generative AI skills, businesses can automatically generate, summarize, and classify content across various formats. This capability is particularly useful for managing large volumes of customer feedback, generating reports, and creating content summaries, significantly reducing the time and effort required for manual content processing.
Text and data extraction: Organizations generate mountains of data and documents. Extracting this information manually can be both burdensome and error-prone. Appian’s AI skills can perform highly accurate text extraction from PDF files and scanned images and pull relevant data from both structured and unstructured data sources such as invoices, forms, and emails. This speeds up data processing and promotes higher accuracy and consistency.
PII extraction and redaction: Identifying and managing PII within large datasets is crucial for data governance and compliance. Appian’s AI skills can automatically identify and extract sensitive information from documents and communication channels. Additionally, Appian supports plugins that can redact this content for further privacy. This assists your compliance efforts without extensive manual intervention.
Document summarization: Appian’s AI skills can summarize documents to give users an overview before digging into the details. Whether it’s summarizing research papers, legal documents, or internal reports, AI can generate concise summaries, saving time and making sure that critical information is highlighted for quick review.

The following figure shows an example of a prompt-builder skill used to extract unstructured data from a bond certificate.

Each AI skill offers pre-populated prompt templates, allowing you to deploy AI without starting from scratch. Each template caters to specific business needs, making implementation straightforward and efficient. Plus, users can customize these prompts to fit their unique requirements and operational needs.

Key takeaways

In this solution, Appian Cloud seamlessly integrates and customizes Amazon Bedrock and Claude LLMs behind the scenes, abstracting complexity to deliver enterprise-grade AI capabilities tailored to its cloud environment. It provides pre-built, use case specific prompt templates for tasks like text summarization and data extractions, dynamically customized based on user inputs and business context. Using the scalability of the Amazon Bedrock infrastructure, Appian Cloud provides optimal performance and efficient handling of enterprise-scale workflows, all within a fully managed cloud service.

By addressing these complexities, Appian Cloud empowers businesses to focus solely on using AI to achieve operational excellence and business outcomes without the burdens of technical setup, integration challenges, or ongoing maintenance efforts.

Customer success stories

Appian’s AI skills have proven effective across multiple industries. Here are a few real-world examples:

Mortgage processing: This organization automated the extraction of over 60 data fields from inconsistent document formats, reducing the process timeline from 16 weeks to 10 weeks and achieving 98.33% accuracy. The implementation of Appian’s generative AI skills allowed the mortgage processor to streamline their workflow, significantly cutting down on processing time and improving data accuracy, which led to faster loan approvals and increased customer satisfaction.
Financial services: A financial service company received over 1,000 loosely structured emails about trades. Manually annotating these emails led to significant human errors. With an Appian generative AI skill, the customer revamped the entity tagging process by automatically extracting approximately 40 data fields from unstructured emails. This resulted in a four-fold reduction in extraction time and achieved over 95% accuracy, improving the user experience compared to traditional ML extraction tools. The automated process not only reduced errors but also enhanced the speed and reliability of data extraction, leading to more accurate and timely trading decisions.
Legal review: A legal institution had to review negotiated contracts against the original contracts to determine whether the outlined risks had been resolved. This manual process was error prone and labor intensive. By deploying a generative AI skill, they automated the extraction of changes between contracts to find the differences and whether risks had been resolved. This streamlined the attorney review process and provided insights and reasoning into the differences found. The automated solution significantly reduced the time attorneys spent on contract review, allowing them to focus on more strategic tasks and improving the overall efficiency of the legal department.

Conclusion

AWS and Appian’s collaboration marks a significant advancement in business process automation. By using the power of Amazon Bedrock and Anthropic’s Claude models, Appian empowers enterprises to optimize and automate processes for greater efficiency and effectiveness. This partnership sets a new standard for AI-driven business solutions, leading to greater growth and enhanced customer experiences. The ability to quickly deploy and customize AI skills allows businesses to stay agile and responsive in a dynamic environment.

Appian solutions are available as software as a service (SaaS) offerings in AWS Marketplace. Check out the Appian website to learn more about how to use the AI skills.

About the Authors

Sunil Bemarkar is a Senior Partner Solutions Architect at Amazon Web Services. He works with various Independent Software Vendors (ISVs) and Strategic customers across industries to accelerate their digital transformation journey and cloud adoption.

John Klacynski is a Principal Customer Solution Manager within the AWS Independent Software Vendor (ISV) team. In this role, he programmatically helps ISV customers adopt AWS technologies and services to reach their business goals more quickly.

Louis Prensky is a Senior Product Manager at Appian. He is responsible for driving product strategy and feature design for AI Skills within Appian’s Cognitive Automation Group.

Philip Kang is a Principal Solutions Consultant in Partner Technology & Innovation centers with Appian. In this role, he spearheads technical innovation with a focus on AI/ML and cloud solutions.

The AI Action Summit: A golden age of innovation

Today CEO Sundar Pichai spoke in Paris, France at the AI Action Summit. Read a transcript of his speech.Read More

Unlocking the Latest Features in PyTorch 2.6 for Intel Platforms

PyTorch* 2.6 has just been released with a set of exciting new features including torch.compile compatibility with Python 3.13, new security and performance enhancements, and a change in the default parameter for torch.load. PyTorch also announced the deprecation of its official Anaconda channel.

Among the performance features are three that enhance developer productivity on Intel platforms:

Improved Intel GPU availability
FlexAttention optimization on x86 CPU for LLM
FP16 on x86 CPU support for eager and Inductor modes

Improved Intel GPU Availability

To provide developers working in artificial intelligence (AI) with better support for Intel GPUs, the PyTorch user experience on these GPUs has been enhanced. This improvement includes simplified installation steps, a Windows* release binary distribution, and expanded coverage of supported GPU models, including the latest Intel® Arc™ B-Series discrete graphics.

These new features help promote accelerated machine learning workflows within the PyTorch ecosystem, providing a consistent developer experience and support. Application developers and researchers seeking to fine-tune, perform inference, and develop with PyTorch models on Intel® Core™ Ultra AI PCs  and Intel® Arc™ discrete graphics will now be able to install PyTorch directly with binary releases for Windows, Linux*, and Windows Subsystem for Linux 2.

The new features include:

Simplified Intel GPU software stack setup to enable one-click installation of the torch-xpu PIP wheels to run deep learning workloads in a ready-to-use fashion, thus eliminating the complexity of installing and activating Intel GPU development software bundles.
Windows binary releases for torch core, torchvision and torchaudio have been made available for Intel GPUs, expanding from Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics and Intel® Arc™ A-Series graphics to the latest GPU hardware Intel® Arc™ B-Series graphics support.
Further enhanced coverage of Aten operators on Intel GPUs with SYCL* kernels for smooth eager mode execution, as well as bug fixes and performance optimizations for torch.compile on Intel GPUs.

Get a tour of new environment setup, PIP wheels installation, and examples on Intel® Client GPUs and Intel® Data Center GPU Max Series in the Getting Started Guide.

FlexAttention Optimization on X86 CPU for LLM

FlexAttention was first introduced in PyTorch 2.5, to address the need to support various Attentions or even combinations of them. This PyTorch API leverages torch.compile to generate a fused FlashAttention kernel, which eliminates extra memory allocation and achieves performance comparable to handwritten implementations.

Previously, FlexAttention was implemented for CUDA* devices based on the Triton backend. Since PyTorch 2.6, X86 CPU support of FlexAttention was added through TorchInductor CPP backend. This new feature leverages and extends current CPP template abilities to support broad attention variants (e.g., PageAttention, which is critical for LLMs inference) based on the existing FlexAttention API, and brings optimized performance on x86 CPUs. With this feature, user can easily use FlexAttention API to compose their Attention solutions on CPU platforms and achieve good performance.

Typically, FlexAttention is utilized by popular LLM ecosystem projects, such as Hugging Face transformers and vLLM in their LLM related modeling (e.g., PagedAttention) to achieve better out-of-the-box performance. Before the official adoption happens, this enabling PR in Hugging Face can help us the performance benefits that FlexAttention can bring on x86 CPU platforms.

The graph below shows the performance comparison of PyTorch 2.6 (with this feature) and PyTorch 2.5 (without this feature) on typical Llama models. For real-time mode (Batch Size = 1), there is about 1.13x-1.42x performance improvement for next token across different input token lengths. As for best throughput under a typical SLA (P99 token latency <=50ms), PyTorch 2.6 achieves more than 7.83x performance than PyTorch 2.5 as PyTorch 2.6 can run at 8 inputs (Batch Size = 8) together and still keep SLA while PyTorch 2.5 can only run 1 input, because FlexAttention based PagedAttention in PyTorch 2.6 provides more efficiency during multiple batch size scenarios.

Figure 1. Performance comparison of PyTorch 2.6 and PyTorch 2.5 on Typical Llama Models

FP16 on X86 CPU Support for Eager and Inductor Modes

Float16 is a commonly used reduced floating-point type that improves performance in neural network inference and training. CPUs like recently launched Intel® Xeon® 6 with P-Cores support Float16 datatype with native accelerator AMX, which highly improves the Float16 performance. Float16 support on x86 CPU was first introduced in PyTorch 2.5 as a prototype feature. Now it has been further improved for both eager mode and Torch.compile + Inductor mode, which is pushed to Beta level for broader adoption. This helps the deployment on the CPU side without the need to modify the model weights when the model is pre-trained with mixed precision of Float16/Float32. On platforms that support AMX Float16 (i.e., the Intel® Xeon® 6 processors with P-cores), Float16 has the same pass rate as Bfloat16 across the typical PyTorch benchmark suites: TorchBench, Hugging Face, and Timms. It also shows good performance comparable to 16 bit datatype Bfloat16.

Summary

In this blog, we discussed three features to enhance developer productivity on Intel platforms in PyTorch 2.6. These three features are designed to improve Intel GPU availability, optimize FlexAttention for x86 CPUs tailored for large language models (LLMs), and support FP16 on x86 CPUs in both eager and Inductor modes. Get PyTorch 2.6 and try them for yourself or you can access PyTorch 2.6 on the Intel® Tiber™ AI Cloud to take advantage of hosted notebooks that are optimized for Intel hardware and software.

Acknowledgements

The release of PyTorch 2.6 is an exciting milestone for Intel platforms, and it would not have been possible without the deep collaboration and contributions from the community. We extend our heartfelt thanks to Alban, Andrey, Bin, Jason, Jerry and Nikita for sharing their invaluable ideas, meticulously reviewing PRs, and providing insightful feedback on RFCs. Their dedication has driven continuous improvements and pushed the ecosystem forward for Intel platforms.

References

Product and Performance Information

Measurement on AWS EC2 m7i.metal-48xl using: 2x Intel® Xeon® Platinum 8488C, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB [8], DSA [8], IAA[8], QAT[on CPU, 8], Total Memory 512GB (16x32GB DDR5 4800 MT/s [4400 MT/s]), BIOS Amazon EC2 1.0, microcode 0x2b000603, 1x Elastic Network Adapter (ENA) 1x Amazon Elastic Block Store 800G, Ubuntu 24.04.1 LTS 6.8.0-1018-aws Test by Intel on Jan 15^th 2025.

Notices and Disclaimers

Performance varies by use, configuration and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.

Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

AI disclaimer:

AI features may require software purchase, subscription or enablement by a software or platform provider, or may have specific configuration or compatibility requirements. Details at www.intel.com/AIPC. Results may vary.

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

*Primary Contributors
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and queries. Recent work has explored alternatives to softmax attention in transformers, such as ReLU and sigmoid activations. In this work, we revisit sigmoid attention and conduct an in-depth theoretical and empirical analysis. Theoretically, we prove that transformers with sigmoid attention are universal function approximators and…Apple Machine Learning Research

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Data science teams often face challenges when transitioning models from the development environment to production. These include difficulties integrating data science team’s models into the IT team’s production environment, the need to retrofit data science code to meet enterprise security and governance standards, gaining access to production grade data, and maintaining repeatability and reproducibility in machine learning (ML) pipelines, which can be difficult without a proper platform infrastructure and standardized templates.

This post, part of the “Governing the ML lifecycle at scale” series (Part 1, Part 2, Part 3), explains how to set up and govern a multi-account ML platform that addresses these challenges. The platform provides self-service provisioning of secure environments for ML teams, accelerated model development with predefined templates, a centralized model registry for collaboration and reuse, and standardized model approval and deployment processes.

An enterprise might have the following roles involved in the ML lifecycles. The functions for each role can vary from company to company. In this post, we assign the functions in terms of the ML lifecycle to each role as follows:

Lead data scientist – Provision accounts for ML development teams, govern access to the accounts and resources, and promote standardized model development and approval process to eliminate repeated engineering effort. Usually, there is one lead data scientist for a data science group in a business unit, such as marketing.
Data scientists – Perform data analysis, model development, model evaluation, and registering the models in a model registry.
ML engineers – Develop model deployment pipelines and control the model deployment processes.
Governance officer – Review the model’s performance including documentation, accuracy, bias and access, and provide final approval for models to be deployed.
Platform engineers – Define a standardized process for creating development accounts that conform to the company’s security, monitoring, and governance standards; create templates for model development; and manage the infrastructure and mechanisms for sharing model artifacts.

This ML platform provides several key benefits. First, it enables every step in the ML lifecycle to conform to the organization’s security, monitoring, and governance standards, reducing overall risk. Second, the platform gives data science teams the autonomy to create accounts, provision ML resources and access ML resources as needed, reducing resource constraints that often hinder their work.

Additionally, the platform automates many of the repetitive manual steps in the ML lifecycle, allowing data scientists to focus their time and efforts on building ML models and discovering insights from the data rather than managing infrastructure. The centralized model registry also promotes collaboration across teams, enables centralized model governance, increasing visibility into models developed throughout the organization and reducing duplicated work.

Finally, the platform standardizes the process for business stakeholders to review and consume models, smoothing the collaboration between the data science and business teams. This makes sure models can be quickly tested, approved, and deployed to production to deliver value to the organization.

Overall, this holistic approach to governing the ML lifecycle at scale provides significant benefits in terms of security, agility, efficiency, and cross-functional alignment.

In the next section, we provide an overview of the multi-account ML platform and how the different roles collaborate to scale MLOps.

Solution overview

The following architecture diagram illustrates the solutions for a multi-account ML platform and how different personas collaborate within this platform.

There are five accounts illustrated in the diagram:

ML Shared Services Account – This is the central hub of the platform. This account manages templates for setting up new ML Dev Accounts, as well as SageMaker Projects templates for model development and deployment, in AWS Service Catalog. It also hosts a model registry to store ML models developed by data science teams, and provides a single location to approve models for deployment.
ML Dev Account – This is where data scientists perform their work. In this account, data scientists can create new SageMaker notebooks based on the needs, connect to data sources such as Amazon Simple Storage Service (Amazon S3) buckets, analyze data, build models and create model artifacts (for example, a container image), and more. The SageMaker projects, provisioned using the templates in the ML Shared Services Account, can speed up the model development process because it has steps (such as connecting to an S3 bucket) configured. The diagram shows one ML Dev Account, but there can be multiple ML Dev Accounts in an organization.
ML Test Account – This is the test environment for new ML models, where stakeholders can review and approve models before deployment to production.
ML Prod Account – This is the production account for new ML models. After the stakeholders approve the models in the ML Test Account, the models are automatically deployed to this production account.
Data Governance Account – This account hosts data governance services for data lake, central feature store, and fine-grained data access.

Key activities and actions are numbered in the preceding diagram. Some of these activities are performed by various personas, whereas others are automatically triggered by AWS services.

ML engineers create the pipelines in Github repositories, and the platform engineer converts them into two different Service Catalog portfolios: ML Admin Portfolio and SageMaker Project Portfolio. The ML Admin Portfolio will be used by the lead data scientist to create AWS resources (for example, SageMaker domains). The SageMaker Project Portfolio has SageMaker projects that data scientists and ML engineers can use to accelerate model training and deployment.
The platform engineer shares the two Service Catalog portfolios with workload accounts in the organization.
Data engineer prepares and governs datasets using services such as Amazon S3, AWS Lake Formation, and Amazon DataZone for ML.
The lead data scientist uses the ML Admin Portfolio to set up SageMaker domains and the SageMaker Project Portfolio to set up SageMaker projects for their teams.
Data scientists subscribe to datasets, and use SageMaker notebooks to analyze data and develop models.
Data scientists use the SageMaker projects to build model training pipelines. These SageMaker projects automatically register the models in the model registry.
The lead data scientist approves the model locally in the ML Dev Account.
This step consists of the following sub-steps:
1. After the data scientists approve the model, it triggers an event bus in Amazon EventBridge that ships the event to the ML Shared Services Account.
2. The event in EventBridge triggers the AWS Lambda function that copies model artifacts (managed by SageMaker, or Docker images) from the ML Dev Account into the ML Shared Services Account, creates a model package in the ML Shared Services Account, and registers the new model in the model registry in the ML Shared Services account.
ML engineers review and approve the new model in the ML Shared Services account for testing and deployment. This action triggers a pipeline that was set up using a SageMaker project.
The approved models are first deployed to the ML Test Account. Integration tests will be run and endpoint validated before being approved for production deployment.
After testing, the governance officer approves the new model in the CodePipeline.
After the model is approved, the pipeline will continue to deploy the new model into the ML Prod Account, and creates a SageMaker endpoint.

The following sections provide details on the key components of this diagram, how to set them up, and sample code.

Set up the ML Shared Services Account

The ML Shared Services Account helps the organization standardize management of artifacts and resources across data science teams. This standardization also helps enforce controls across resources consumed by data science teams.

The ML Shared Services Account has the following features:

Service Catalog portfolios – This includes the following portfolios:

ML Admin Portfolio – This is intended to be used by the project admins of the workload accounts. It is used to create AWS resources for their teams. These resources can include SageMaker domains, Amazon Redshift clusters, and more.
SageMaker Projects Portfolio – This portfolio contains the SageMaker products to be used by the ML teams to accelerate their ML models’ development while complying with the organization’s best practices.
Central model registry – This is the centralized place for ML models developed and approved by different teams. For details on setting this up, refer to Part 2 of this series.

The following diagram illustrates this architecture.

As the first step, the cloud admin sets up the ML Shared Services Account by using one of the blueprints for customizations in AWS Control Tower account vending, as described in Part 1.

In the following sections, we walk through how to set up the ML Admin Portfolio. The same steps can be used to set up the SageMaker Projects Portfolio.

Bootstrap the infrastructure for two portfolios

After the ML Shared Services Account has been set up, the ML platform admin can bootstrap the infrastructure for the ML Admin Portfolio using sample code in the GitHub repository. The code contains AWS CloudFormation templates that can be later deployed to create the SageMaker Projects Portfolio.

Complete the following steps:

Clone the GitHub repo to a local directory:

git clone https://github.com/aws-samples/data-and-ml-governance-workshop.git

Change the directory to the portfolio directory:

cd data-and-ml-governance-workshop/module-3/ml-admin-portfolio

Install dependencies in a separate Python environment using your preferred Python packages manager:
```
python3 -m venv env
source env/bin/activate pip 
install -r requirements.txt
```
Bootstrap your deployment target account using the following command:
```
cdk bootstrap aws://<target account id>/<target region> --profile <target account profile>
```
If you already have a role and AWS Region from the account set up, you can use the following command instead:
```
cdk bootstrap
```

Lastly, deploy the stack:

cdk deploy --all --require-approval never

When it’s ready, you can see the MLAdminServicesCatalogPipeline pipeline in AWS CloudFormation.

Navigate to AWS CodeStar Connections of the Service Catalog page, you can see there’s a connection named “codeconnection-service-catalog”. If you click the connection, you will notice that we need to connect it to GitHub to allow you to integrate it with your pipelines and start pushing code. Click the ‘Update pending connection’ to integrate with your GitHub account.

Once that is done, you need to create empty GitHub repositories to start pushing code to. For example, you can create a repository called “ml-admin-portfolio-repo”. Every project you deploy will need a repository created in GitHub beforehand.

Trigger CodePipeline to deploy the ML Admin Portfolio

Complete the following steps to trigger the pipeline to deploy the ML Admin Portfolio. We recommend creating a separate folder for the different repositories that will be created in the platform.

Get out of the cloned repository and create a parallel folder called platform-repositories:
```
cd ../../.. # (as many .. as directories you have moved in)
mkdir platform-repositories
```

Clone and fill the empty created repository:

cd platform-repositories
git clone https://github.com/example-org/ml-admin-service-catalog-repo.git
cd ml-admin-service-catalog-repo
cp -aR ../../ml-platform-shared-services/module-3/ml-admin-portfolio/. .

Push the code to the Github repository to create the Service Catalog portfolio:
```
git add .
git commit -m "Initial commit"
git push -u origin main
```

After it is pushed, the Github repository we created earlier is no longer empty. The new code push triggers the pipeline named cdk-service-catalog-pipeline to build and deploy artifacts to Service Catalog.

It takes about 10 minutes for the pipeline to finish running. When it’s complete, you can find a portfolio named ML Admin Portfolio on the Portfolios page on the Service Catalog console.

Repeat the same steps to set up the SageMaker Projects Portfolio, make sure you’re using the sample code (sagemaker-projects-portfolio) and create a new code repository (with a name such as sm-projects-service-catalog-repo).

Share the portfolios with workload accounts

You can share the portfolios with workload accounts in Service Catalog. Again, we use ML Admin Portfolio as an example.

On the Service Catalog console, choose Portfolios in the navigation pane.
Choose the ML Admin Portfolio.
On the Share tab, choose Share.
In the Account info section, provide the following information:
1. For Select how to share, select Organization node.
2. Choose Organizational Unit, then enter the organizational unit (OU) ID of the workloads OU.
In the Share settings section, select Principal sharing.
Choose Share.
Selecting the Principal sharing option allows you to specify the AWS Identity and Access Management (IAM) roles, users, or groups by name for which you want to grant permissions in the shared accounts.
On the portfolio details page, on the Access tab, choose Grant access.
For Select how to grant access, select Principal Name.
In the Principal Name section, choose role/ for Type and enter the name of the role that the ML admin will assume in the workload accounts for Name.
Choose Grant access.
Repeat these steps to share the SageMaker Projects Portfolio with workload accounts.

Confirm available portfolios in workload accounts

If the sharing was successful, you should see both portfolios available on the Service Catalog console, on the Portfolios page under Imported portfolios.

Now that the service catalogs in the ML Shared Services Account have been shared with the workloads OU, the data science team can provision resources such as SageMaker domains using the templates and set up SageMaker projects to accelerate ML models’ development while complying with the organization’s best practices.

We demonstrated how to create and share portfolios with workload accounts. However, the journey doesn’t stop here. The ML engineer can continue to evolve existing products and develop new ones based on the organization’s requirements.

The following sections describe the processes involved in setting up ML Development Accounts and running ML experiments.

Set up the ML Development Account

The ML Development account setup consists of the following tasks and stakeholders:

The team lead requests the cloud admin to provision the ML Development Account.
The cloud admin provisions the account.
The team lead uses shared Service Catalog portfolios to provisions SageMaker domains, set up IAM roles and give access, and get access to data in Amazon S3, or Amazon DataZone or AWS Lake Formation, or a central feature group, depending on which solution the organization decides to use.

Run ML experiments

Part 3 in this series described multiple ways to share data across the organization. The current architecture allows data access using the following methods:

Option 1: Train a model using Amazon DataZone – If the organization has Amazon DataZone in the central governance account or data hub, a data publisher can create an Amazon DataZone project to publish the data. Then the data scientist can subscribe to the Amazon DataZone published datasets from Amazon SageMaker Studio, and use the dataset to build an ML model. Refer to the sample code for details on how to use subscribed data to train an ML model.
Option 2: Train a model using Amazon S3 – Make sure the user has access to the dataset in the S3 bucket. Follow the sample code to run an ML experiment pipeline using data stored in an S3 bucket.
Option 3: Train a model using a data lake with Athena – Part 2 introduced how to set up a data lake. Follow the sample code to run an ML experiment pipeline using data stored in a data lake with Amazon Athena.
Option 4: Train a model using a central feature group – Part 2 introduced how to set up a central feature group. Follow the sample code to run an ML experiment pipeline using data stored in a central feature group.

You can choose which option to use depending on your setup. For options 2, 3, and 4, the SageMaker Projects Portfolio provides project templates to run ML experiment pipelines, steps including data ingestion, model training, and registering the model in the model registry.

In the following example, we use option 2 to demonstrate how to build and run an ML pipeline using a SageMaker project that was shared from the ML Shared Services Account.

On the SageMaker Studio domain, under Deployments in the navigation pane, choose Projects
Choose Create project.
There is a list of projects that serve various purposes. Because we want to access data stored in an S3 bucket for training the ML model, choose the project that uses data in an S3 bucket on the Organization templates tab.
Follow the steps to provide the necessary information, such as Name, Tooling Account(ML Shared Services account id), S3 bucket(for MLOPS) and then create the project.

It takes a few minutes to create the project.

After the project is created, a SageMaker pipeline is triggered to perform the steps specified in the SageMaker project. Choose Pipelines in the navigation pane to see the pipeline.You can choose the pipeline to see the Directed Acyclic Graph (DAG) of the pipeline. When you choose a step, its details show in the right pane.

The last step of the pipeline is registering the model in the current account’s model registry. As the next step, the lead data scientist will review the models in the model registry, and decide if a model should be approved to be promoted to the ML Shared Services Account.

Approve ML models

The lead data scientist should review the trained ML models and approve the candidate model in the model registry of the development account. After an ML model is approved, it triggers a local event, and the event buses in EventBridge will send model approval events to the ML Shared Services Account, and the artifacts of the models will be copied to the central model registry. A model card will be created for the model if it’s a new one, or the existing model card will update the version.

The following architecture diagram shows the flow of model approval and model promotion.

Model deployment

After the previous step, the model is available in the central model registry in the ML Shared Services Account. ML engineers can now deploy the model.

If you had used the sample code to bootstrap the SageMaker Projects portfolio, you can use the Deploy real-time endpoint from ModelRegistry – Cross account, test and prod option in SageMaker Projects to set up a project to set up a pipeline to deploy the model to the target test account and production account.

On the SageMaker Studio console, choose Projects in the navigation pane.
Choose Create project.
On the Organization templates tab, you can view the templates that were populated earlier from Service Catalog when the domain was created.
Select the template Deploy real-time endpoint from ModelRegistry – Cross account, test and prod and choose Select project template.
Fill in the template:
1. The SageMakerModelPackageGroupName is the model group name of the model promoted from the ML Dev Account in the previous step.
2. Enter the Deployments Test Account ID for PreProdAccount, and the Deployments Prod Account ID for ProdAccount.

The pipeline for deployment is ready. The ML engineer will review the newly promoted model in the ML Shared Services Account. If the ML engineer approves model, it will trigger the deployment pipeline. You can see the pipeline on the CodePipeline console.

The pipeline will first deploy the model to the test account, and then pause for manual approval to deploy to the production account. ML engineer can test the performance and Governance officer can validate the model results in the test account. If the results are satisfactory, Governance officer can approve in CodePipeline to deploy the model to production account.

Conclusion

This post provided detailed steps for setting up the key components of a multi-account ML platform. This includes configuring the ML Shared Services Account, which manages the central templates, model registry, and deployment pipelines; sharing the ML Admin and SageMaker Projects Portfolios from the central Service Catalog; and setting up the individual ML Development Accounts where data scientists can build and train models.

The post also covered the process of running ML experiments using the SageMaker Projects templates, as well as the model approval and deployment workflows. Data scientists can use the standardized templates to speed up their model development, and ML engineers and stakeholders can review, test, and approve the new models before promoting them to production.

This multi-account ML platform design follows a federated model, with a centralized ML Shared Services Account providing governance and reusable components, and a set of development accounts managed by individual lines of business. This approach gives data science teams the autonomy they need to innovate, while providing enterprise-wide security, governance, and collaboration.

We encourage you to test this solution by following the AWS Multi-Account Data & ML Governance Workshop to see the platform in action and learn how to implement it in your own organization.

About the authors

Jia (Vivian) Li is a Senior Solutions Architect in AWS, with specialization in AI/ML. She currently supports customers in financial industry. Prior to joining AWS in 2022, she had 7 years of experience supporting enterprise customers use AI/ML in the cloud to drive business results. Vivian has a BS from Peking University and a PhD from University of Southern California. In her spare time, she enjoys all the water activities, and hiking in the beautiful mountains in her home state, Colorado.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he enjoys riding motorcycle and walking with his dogs.

Dr. Alessandro Cerè is a GenAI Evaluation Specialist and Solutions Architect at AWS. He assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations. Bringing a unique perspective to the field of AI, Alessandro has a background in quantum physics and research experience in quantum communications and quantum memories. In his spare time, he pursues his passion for landscape and underwater photography.

Alberto Menendez is a DevOps Consultant in Professional Services at AWS. He helps accelerate customers’ journeys to the cloud and achieve their digital transformation goals. In his free time, he enjoys playing sports, especially basketball and padel, spending time with family and friends, and learning about technology.

Sovik Kumar Nath is an AI/ML and Generative AI senior solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

Viktor Malesevic is a Senior Machine Learning Engineer within AWS Professional Services, leading teams to build advanced machine learning solutions in the cloud. He’s passionate about making AI impactful, overseeing the entire process from modeling to production. In his spare time, he enjoys surfing, cycling, and traveling.

Accelerate your Amazon Q implementation: starter kits for SMBs

Whether you’re a small or medium-sized business (SMB) or a managed service provider at the beginning of your cloud journey, you might be wondering how to get started. Questions like “Am I following best practices?”, “Am I optimizing my cloud costs?”, and “How difficult is the learning curve?” are quite common. AWS is here to provide a concept called starter kits.

Starter kits are complete, deployable solutions that address common, repeatable business problems. They deploy the services that make up a solution according to best practices, helping you optimize costs and become familiar with these kinds of architectural patterns without a large investment in training. Most of all, starter kits save you time—time that can be better spent on your business or with your customers.

In this post, we showcase a starter kit for Amazon Q Business. If you have a repository of documents that you need to turn into a knowledge base quickly, or simply want to test out the capabilities of Amazon Q Business without a large investment of time at the console, then this solution is for you.

This deployment guide covers the steps to set up an Amazon Q solution that connects to Amazon Simple Storage Service (Amazon S3) and a web crawler data source, and integrates with AWS IAM Identity Center for authentication. An AWS CloudFormation template automates the deployment of this solution.

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Solution overview

The following diagram illustrates the solution architecture.

The workflow involves the following steps:

The user authenticates using an AWS Identity and Access Management (IAM) identity user name and password before accessing the Amazon Q web application.
Upon successful authentication, the user can access the Amazon Q web UI and ask a question.
Amazon Q retrieves relevant information from its index, which is populated using data from the connected data sources (Amazon S3 and a web crawler).
Amazon Q then generates a response using its internal large language model (LLM) and presents it to the user through the Amazon Q web UI.
The user can provide feedback on the response through the Amazon Q web UI.

Prerequisites

Before deploying the solution, make sure you have the following in place:

AWS account – You will need an active AWS account with the necessary permissions to deploy CloudFormation stacks and create the required resources.
Amazon S3 bucket – Make sure you have an existing S3 bucket that will be used as the data source for Amazon Q. To create a S3 bucket, refer to Create your first S3 bucket.
AWS IAM Identity Center – Configure AWS IAM Identity Center in your AWS environment. You will need to provide the necessary details, such as the IAM Identity Center instance Amazon Resource Name (ARN), during the deployment process.

Deploy the solution using AWS CloudFormation

Complete the following steps to deploy the CloudFormation template:

Sign in to the AWS Management Console.
Choose one of the following Launch Stack options for your desired AWS Region to open the AWS CloudFormation console and create a new stack. Please note that this stack will default to us-east-1.
For Stack name, enter a name for your application (for example, AMAZON-Q-STARTER-KIT).
In the Parameters section, for IAMIdentityCenterARN, enter the ARN of your IAM Identity Center instance.
For QBusinessApplicationName, enter a name for the Amazon Q Business application.
For S3DataSourceBucket, enter the name of the S3 bucket you created earlier.
For WebCrawlerDataSourceUrl, enter the URL of the web crawler data source.
Choose Next.

On the Configure stack options page, leave everything as default, select I acknowledge that AWS CloudFormation might create IAM resources and and choose Next.

On the Review and create page, choose Submit.
On the Amazon Q Business console, you will see the new application you created.
Choose the new Amazon Q Business application, and in the Data sources section, select the data source s3_datasource and choose Sync now.
Select the data source webpage-datasource and choose Sync now.
To add groups and users to your Amazon Q application, refer to instructions.

Test the solution

To validate the Amazon Q solution is functioning as expected, perform the following tests:

Test data ingestion:
1. Upload a test file to the S3 bucket.
2. Verify that the file is successfully ingested and processed by Amazon Q.
3. Check the Amazon Q web experience UI for the processed data.
Test web crawler functionality:
Verify that the web crawler is able to retrieve and ingest the data from the website.
Make sure the data is displayed correctly in the Amazon Q web experience UI.

Clean up

To clean up, delete the CloudFormation stack and the S3 bucket you created.

Conclusion

The Amazon Q starter kit provides a streamlined solution for SMBs to use the power of generative AI and intelligent question-answering. By automating the deployment and integration with key data sources, this kit eases the complexity of setting up Amazon Q, empowering businesses to quickly unlock insights and improve productivity.

If your SMB has a repository of documents that need to be transformed into a valuable knowledge base, or you simply want to explore the capabilities of Amazon Q, we encourage you to take advantage of this starter kit. Get started today and experience the transformative benefits of enterprise-grade question-answering tailored for your business needs, and let us know what you think in the comments. To explore more generative AI use cases, refer to AI Use Case Explorer.

About the Authors

Nneoma Okoroafor is a Partner Solutions Architect focused on AI/ML and generative AI. Nneoma is passionate about providing guidance to AWS Partners on using the latest technologies and techniques to deliver innovative solutions to customers.

Joshua Amah is a Partner Solutions Architect with Amazon Web Services. He primarily serves consulting partners, providing architectural guidance and recommendations for new and existing workloads. Outside of work, he enjoys playing soccer, golf, and spending time with family and friends.

Jason Brown is a Partner Solutions Architect focused on helping AWS Distribution Partners and their Seller Partners build and grow their AWS practices. Jason is passionate about building solutions for MSPs and VARs in the small business space. Outside the office, Jason is an avid traveler and enjoys offshore fishing.

Building the future of construction analytics: CONXAI’s AI inference on Amazon EKS

This is a guest post co-written with Tim Krause, Lead MLOps Architect at CONXAI.

CONXAI Technology GmbH is pioneering the development of an advanced AI platform for the Architecture, Engineering, and Construction (AEC) industry. Our platform uses advanced AI to empower construction domain experts to create complex use cases efficiently.

Construction sites typically employ multiple CCTV cameras, generating vast amounts of visual data. These camera feeds can be analyzed using AI to extract valuable insights. However, to comply with GDPR regulations, all individuals captured in the footage must be anonymized by masking or blurring their identities.

In this post, we dive deep into how CONXAI hosts the state-of-the-art OneFormer segmentation model on AWS using Amazon Simple Storage Service (Amazon S3), Amazon Elastic Kubernetes Service (Amazon EKS), KServe, and NVIDIA Triton.

Our AI solution is offered in two forms:

Model as a service (MaaS) – Our AI model is accessible through an API, enabling seamless integration. Pricing is based on processing batches of 1,000 images, offering flexibility and scalability for users.
Software as a service (SaaS) – This option provides a user-friendly dashboard, acting as a central control panel. Users can add and manage new cameras, view footage, perform analytical searches, and enforce GDPR compliance with automatic person anonymization.

Our AI model, fine-tuned with a proprietary dataset of over 50,000 self-labeled images from construction sites, achieves significantly greater accuracy compared to other MaaS solutions. With the ability to recognize more than 40 specialized object classes—such as cranes, excavators, and portable toilets—our AI solution is uniquely designed and optimized for the construction industry.

Our journey to AWS

Initially, CONXAI started with a small cloud provider specializing in offering affordable GPUs. However, it lacked essential services required for machine learning (ML) applications, such as frontend and backend infrastructure, DNS, load balancers, scaling, blob storage, and managed databases. At that time, the application was deployed as a single monolithic container, which included Kafka and a database. This setup was neither scalable nor maintainable.

After migrating to AWS, we gained access to a robust ecosystem of services. Initially, we deployed the all-in-one AI container on a single Amazon Elastic Compute Cloud (Amazon EC2) instance. Although this provided a basic solution, it wasn’t scalable, necessitating the development of a new architecture.

Our top reasons for choosing AWS were primarily driven by the team’s extensive experience with AWS. Additionally, the initial cloud credits provided by AWS were invaluable for us as a startup. We now use AWS managed services wherever possible, particularly for data-related tasks, to minimize maintenance overhead and pay only for the resources we actually use.

At the same time, we aimed to remain cloud-agnostic. To achieve this, we chose Kubernetes, enabling us to deploy our stack directly on a customer’s edge—such as on construction sites—when needed. Some customers are potentially very compliance-restrictive, not allowing data to leave the construction site. Another opportunity is federated learning, training on the customer’s edge and only transferring model weights, without sensitive data, into the cloud. In the future, this approach might lead to having one model fine-tuned for each camera to achieve the best accuracy, which requires hardware resources on-site. For the time being, we use Amazon EKS to offload the management overhead to AWS, but we could easily deploy on a standard Kubernetes cluster if needed.

Our previous model was running on TorchServe. With our new model, we first tried performing inference in Python with Flask and PyTorch, as well as with BentoML. Achieving high inference throughput with high GPU utilization for cost-efficiency was very challenging. Exporting the model to ONNX format was particularly difficult because the OneFormer model lacks strong community support. It took us some time to identify why the OneFormer model was so slow in the ONNX Runtime with NVIDIA Triton. We ultimately resolved the issue by converting ONNX to TensorRT.

Defining the final architecture, training the model, and optimizing costs took approximately 2–3 months. Currently, we improve our model by incorporating increasingly accurate labeled data, a process that takes around 3–4 weeks of training on a single GPU. Deployment is fully automated with GitLab CI/CD pipelines, Terraform, and Helm, requiring less than an hour to complete without any downtime. New model versions are typically rolled out in shadow mode for 1–2 weeks to provide stability and accuracy before full deployment.

Solution overview

The following diagram illustrates the solution architecture.

The architecture consists of the following key components:

The S3 bucket (1) is the most important data source. It is cost-effective, scalable, and provides almost unlimited blob storage. We encrypt the S3 bucket, and we delete all data with privacy concerns after processing took place. Almost all microservices read and write files from and to Amazon S3, which ultimately triggers (2) Amazon EventBridge (3). The process begins when a customer uploads an image on Amazon S3 using a presigned URL provided by our API handling user authentication and authorization through Amazon Cognito.
The S3 bucket is configured in such a way that it forwards (2) all events into EventBridge.
TriggerMesh is a Kubernetes controller where we use AWSEventBridgeSource (6). It abstracts the infrastructure automation and automatically creates an Amazon Simple Queue Service (Amazon SQS) (5) processing queue, which acts as a processing buffer. Additionally, it creates an EventBridge rule (4) to forward the S3 event from the event bus into the SQS processing queue. Finally, TriggerMesh creates a Kubernetes Pod to poll events from the processing queue to feed it into the Knative broker (7). The resources in the Kubernetes cluster are deployed in a private subnet.
The central place for Knative Eventing is the Knative broker (7). It is backed by Amazon Managed Streaming for Apache Kafka (Amazon MSK) (8).
The Knative trigger (9) polls the Knative broker based on a specific CloudEventType and forwards it accordingly to the KServe InferenceService (10).
KServe is a standard model inference platform on Kubernetes that uses Knative Serving as its foundation and is fully compatible with Knative Eventing. It also pulls models from a model repository into the container before the model server starts, eliminating the need to build a new container image for each model version.
We use KServe’s “Collocate transformer and predictor in same pod” feature to maximize inference speed and throughput because containers within the same pod can communicate over localhost and the network traffic never leaves the CPU.
After many performance tests, we achieved best performance with the NVIDIA Triton Inference Server (11) after converting our model first into ONNX and then into TensorRT.
Our transformer (12) uses Flask with Gunicorn and is optimized for the number of workers and CPU cores to maintain GPU utilization over 90%. The transformer gets a CloudEvent with the reference of the image Amazon S3 path, downloads it, and performs model inference over HTTP. After getting back the model results, it performs preprocessing and finally uploads the processed model results back to Amazon S3.
We use Karpenter as the cluster auto scaler. Karpenter is responsible for scaling the inference component to handle high user request loads. Karpenter launches new EC2 instances when the system experiences increased demand. This allows the system to automatically scale up computing resources to meet the increased workload.

All this divides our architecture mainly in AWS managed data service and the Kubernetes cluster:

The S3 bucket, EventBridge, and SQS queue as well as Amazon MSK are all fully managed services on AWS. This keeps our data management effort low.
We use Amazon EKS for everything else. TriggerMesh, AWSEventBridgeSource, Knative Broker, Knative Trigger, KServe with our Python transformer, and the Triton Inference Server are also within the same EKS cluster on a dedicated EC2 instance with a GPU. Because our EKS cluster is just used for processing, it is fully stateless.

Summary

From initially having our own highly customized model, transitioning to AWS, improving our architecture, and introducing our new Oneformer model, CONXAI is now proud to provide scalable, reliable, and secure ML inference to customers, enabling construction site improvements and accelerations. We achieved a GPU utilization of over 90%, and the number of processing errors has dropped almost to zero in recent months. One of the major design choices was the separation of the model from the preprocessing and postprocessing code in the transformer. With this technology stack, we gained the ability to scale down to zero on Kubernetes using the Knative serverless feature, while our scale-up time from a cold state is just 5–10 minutes, which can save significant infrastructure costs for potential batch inference use cases.

The next important step is to use these model results with proper analytics and data science. These results can also serve as a data source for generative AI features such as automated report generation. Furthermore, we want to label more diverse images and train the model on additional construction domain classes as part of a continuous improvement process. We also work closely with AWS specialists to bring our model in AWS Inferentia chipsets for better cost-efficiency.

To learn more about the services in this solution, refer to the following resources:

About the Authors

Tim Krause is Lead MLOps Architect at CONXAI. He takes care of all activities when AI meets infrastructure. He joined the company with previous Platform, Kubernetes, DevOps, and Big Data knowledge and was training LLMs from scratch.

Mehdi Yosofie is a Solutions Architect at AWS, working with startup customers, and leveraging his expertise to help startup customers design their workloads on AWS.

Solution overview: Building a multi-agent generative AI solution

Data science agent: RAG and code generation

Portfolio agent: Text-to-SQL and self-correction

Conclusion and next steps with RDC

About the Authors

Agentic design vs. traditional software design

DeepSeek-R1

Generative AI on SageMaker AI

Solution overview

Prerequisites

Simplified LLM hosting on SageMaker AI

Deploy DeepSeek from SageMaker JumpStart

Deploy DeepSeek from Hugging Face Hub

Build a simple agent with CrewAI

Tools for agentic AI

Build a multi-agent workflow with CrewAI, DeepSeek-R1, and SageMaker AI

Clean up

Conclusion

About the Authors

Automation: The way out of bulk image editing challenges

Challenges in the ecommerce industry

Challenges in the sports industry

Solution overview

Results: Improved speed and precision

Conclusion

About the Authors

Background

Current challenges faced by enterprises

Appian AI service architecture

Solution

Selecting an AI skill

Editing an AI skill

Key takeaways

Customer success stories

Conclusion

About the Authors

Improved Intel GPU Availability

FlexAttention Optimization on X86 CPU for LLM

FP16 on X86 CPU Support for Eager and Inductor Modes

Summary

Acknowledgements

References

Product and Performance Information

Notices and Disclaimers

AI disclaimer:

Solution overview

Set up the ML Shared Services Account

Bootstrap the infrastructure for two portfolios

Trigger CodePipeline to deploy the ML Admin Portfolio

Share the portfolios with workload accounts

Confirm available portfolios in workload accounts

Set up the ML Development Account

Run ML experiments

Approve ML models

Model deployment

Conclusion

About the authors

Solution overview

Prerequisites

Deploy the solution using AWS CloudFormation

Test the solution

Clean up

Conclusion

About the Authors

Our journey to AWS

Solution overview

Summary

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.