Falcon 3 models now available in Amazon SageMaker JumpStart

Falcon 3 models now available in Amazon SageMaker JumpStart

Today, we are excited to announce that the Falcon 3 family of models from TII are available in Amazon SageMaker JumpStart. In this post, we explore how to deploy this model efficiently on Amazon SageMaker AI.

Overview of the Falcon 3 family of models

The Falcon 3 family, developed by Technology Innovation Institute (TII) in Abu Dhabi, represents a significant advancement in open source language models. This collection includes five base models ranging from 1 billion to 10 billion parameters, with a focus on enhancing science, math, and coding capabilities. The family consists of Falcon3-1B-Base, Falcon3-3B-Base, Falcon3-Mamba-7B-Base, Falcon3-7B-Base, and Falcon3-10B-Base, along

These models showcase innovations such as efficient pre-training techniques, scaling for improved reasoning, and knowledge distillation for better performance in smaller models. Notably, the Falcon3-10B-Base model achieves state-of-the-art performance for models under 13 billion parameters in zero-shot and few-shot tasks. The Falcon 3 family also includes various fine-tuned versions like Instruct models and supports different quantization formats, making them versatile for a wide range of applications.

Currently, SageMaker JumpStart offers the base versions of Falcon3-3B, Falcon3-7B, and Falcon3-10B, along with their corresponding instruct variants, as well as Falcon3-1B-Instruct.

Get started with SageMaker JumpStart

SageMaker JumpStart is a machine learning (ML) hub that can help accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select pre-trained foundation models (FMs), including Falcon 3 models. These models are fully customizable for your use case with your data.

Deploying a Falcon 3 model through SageMaker JumpStart offers two convenient approaches: using the intuitive SageMaker JumpStart UI or implementing programmatically through the SageMaker Python SDK. Let’s explore both methods to help you choose the approach that best suits your needs.

Deploy Falcon 3 using the SageMaker JumpStart UI

Complete the following steps to deploy Falcon 3 through the JumpStart UI:

  1. To access SageMaker JumpStart, use one of the following methods:
    1. In Amazon SageMaker Unified Studio, on the Build menu, choose JumpStart models under Model development.
    2. Alternatively, in Amazon SageMaker Studio, choose JumpStart in the navigation pane.
  1. Search for Falcon3-10B-Base in the model browser.
  2. Choose the model and choose Deploy.
  3. For Instance type, either use the default instance or choose a different instance.
  4. Choose Deploy.
    After some time, the endpoint status will show as InService and you will be able to run inference against it.

Deploy Falcon 3 programmatically using the SageMaker Python SDK

For teams looking to automate deployment or integrate with existing MLOps pipelines, you can use the SageMaker Python SDK:

from sagemaker.serve.builder.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.jumpstart.model import ModelAccessConfig
from sagemaker.session import Session
import logging

sagemaker_session = Session()

artifacts_bucket_name = sagemaker_session.default_bucket()
execution_role_arn = sagemaker_session.get_caller_identity_arn()


js_model_id = "huggingface-llm-falcon-3-10B-base"

gpu_instance_type = "ml.g5.12xlarge"  

response = "Hello, I'm a language model, and I'm here to help you with your English."

sample_input = {
    "inputs": "Hello, I'm a language model,",
    "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.6},
}

sample_output = [{"generated_text": response}]

schema_builder = SchemaBuilder(sample_input, sample_output)

model_builder = ModelBuilder(
    model=js_model_id,
    schema_builder=schema_builder,
    sagemaker_session=sagemaker_session,
    role_arn=execution_role_arn,
    log_level=logging.ERROR
)

model= model_builder.build()

predictor = model.deploy(model_access_configs={js_model_id:ModelAccessConfig(accept_eula=True)}, accept_eula=True)

Run inference on the predictor:

predictor.predict(sample_input)

If you want to set up the ability to scale down to zero after deployment, refer to Unlock cost savings with the new scale down to zero feature in SageMaker Inference.

Clean up

To clean up the model and endpoint, use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and run a wide range of pre-trained FMs for inference, including the Falcon 3 family of models. Visit SageMaker JumpStart in SageMaker Studio now to get started. For more information, refer to SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.


About the authors

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Marc KarpMarc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

RaghuRaghu Ramesha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.

Banu Nagasundaram leads product, engineering, and strategic partnerships for SageMaker JumpStart, SageMaker’s machine learning and GenAI hub. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

Read More

Building a virtual meteorologist using Amazon Bedrock Agents

Building a virtual meteorologist using Amazon Bedrock Agents

The integration of generative AI capabilities is driving transformative changes across many industries. Although weather information is accessible through multiple channels, businesses that heavily rely on meteorological data require robust and scalable solutions to effectively manage and use these critical insights and reduce manual processes. This solution demonstrates how to create an AI-powered virtual meteorologist that can answer complex weather-related queries in natural language. We use various AWS services to deploy a complete solution that you can use to interact with an API providing real-time weather information. In this solution, we use Amazon Bedrock Agents.

Amazon Bedrock Agents helps to streamline workflows and automate repetitive tasks. Amazon Bedrock Agents can securely connect to your company’s data sources and augments the user’s request with accurate responses. You can use Amazon Bedrock Agents to architect an action schema tailored to your requirements, granting you control whenever the agent initiates the specified action. This versatile approach equips you to seamlessly integrate and execute business logic within your preferred backend service, fostering a cohesive combination of functionality and flexibility. There is also memory retention across the interaction allowing a more personalized user experience.

In this post, we present a streamlined approach to deploying an AI-powered agent by combining Amazon Bedrock Agents and a foundation model (FM). We guide you through the process of configuring the agent and implementing the specific logic required for the virtual meteorologist to provide accurate weather-related responses. Additionally, we use various AWS services, including AWS Amplify for hosting the front end, AWS Lambda functions for handling request logic, Amazon Cognito for user authentication, and AWS Identity and Access Management (IAM) for controlling access to the agent.

Solution overview

The diagram gives an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our front-end application. Amazon Bedrock Agents forwards the details from the user query to the action groups, which further invokes custom Lambda functions. Each action group and Lambda function handles a specific task:

  1. geo-coordinates – Processes geographic coordinates (geo-coordinates) to get details about a specific location
  2. weather – Gathers weather information for the provided location
  3. date-time – Obtains the current date and time

Solution Architecture

Prerequisites

You must have the following in place to complete the solution in this post:

Deploy solution resources using AWS CloudFormation

When you run the AWS CloudFormation template, the following resources are deployed (note that costs will be incurred for the AWS resources used):

  • Amazon Cognito resources:
  • Lambda resources:
    • Function – <Stack name>-geo-coordinates-<auto-generated>
    • Function – <Stack name>-weather-<auto-generated>
    • Function – <Stack name>-date-time-<auto-generated>
  • Amazon Bedrock Agents: virtual-meteorologist
    • Action groups (1) – obtain-latitude-longitude-from-place-name
    • Action groups (2) – obtain-weather-information-with-coordinates
    • Action groups (3) – get-current-date-time-from-timezone

After you deploy the CloudFormation template, copy the following from the Outputs tab on the CloudFormation console to be used during the configuration of your application after it’s deployed in AWS Amplify.

  • AWSRegion
  • BedrockAgentAliasId
  • BedrockAgentId
  • BedrockAgentName
  • IdentityPoolId
  • UserPoolClientId
  • UserPoolId

CloudFormation Output Tab

Deploy the AWS Amplify application

You need to manually deploy the Amplify application using the front-end code found on GitHub. Complete the following steps:

  1. Download the front-end code AWS-Amplify-Frontend.zip from GitHub.
  2. Use the .zip file to manually deploy the application in Amplify.
  3. Return to the Amplify page and use the domain it automatically generated to access the application.

Use Amazon Cognito for user authentication

Amazon Cognito is an identity service that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the application. We also use identity pool to provide temporary AWS credentials for the user while they interact with Amazon Bedrock API.

Use Amazon Bedrock Agents to automate application tasks

With Amazon Bedrock Agents, you can build and configure autonomous agents in your application. An agent helps your end users complete actions based on organization data and user input. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations.

Use action group to define actions that Amazon Bedrock agents perform

An action group defines a set of related actions that an Amazon Bedrock agent can perform to assist users. When configuring an action group, you have options for handling user-provided information, including adding user input to the agent’s action group, passing data to a Lambda function for custom business logic, or returning control directly through the InvokeAgent response. In our application, we created three action groups to give the Amazon Bedrock agent these essential functionalities: retrieving coordinates for specific locations, obtaining current date and time information, and fetching weather data for given locations. These action groups enable the agent to access and process crucial information, enhancing its ability to respond accurately and comprehensively to user queries related to location-based services and weather conditions.

Use Lambda for Amazon Bedrock action group

As part of this solution, three Lambda functions are deployed to support the action groups defined for our Amazon Bedrock agent:

  1. Location coordinates Lambda function – This function is triggered by the obtain-latitude-longitude-from-place-name action group. It takes a place name as input and returns the corresponding latitude and longitude coordinates. The function uses a geocoding service or database to perform this lookup.
  2. Date and time Lambda function – Invoked by the get-current-date-time-from-timezone action group, this function provides the current date and time information.
  3. Weather information Lambda function – This function is called by the obtain-weather-information-with-coordinates action group. It accepts geo-coordinates from the first Lambda function and returns current weather conditions and forecasts for the specified area. This Lambda function used a weather API to fetch up-to-date meteorological data.

Each of these Lambda functions receives an input event containing relevant metadata and populated fields from the Amazon Bedrock agent’s API operation or function parameters. The functions process this input, perform their specific tasks, and return a response with the required information. This response is then used by the Amazon Bedrock agent to formulate its reply to the user’s query. By using these Lambda functions, our Amazon Bedrock agent gains the ability to access external data sources and perform complex computations, significantly enhancing its capabilities in handling user requests related to location, time, and weather information.

Use AWS Amplify for front-end code

Amplify offers a development environment for building secure, scalable mobile and web applications. Developers can focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our front-end code using the method outlined earlier in this post.

Application walkthrough

Navigate to the URL provided after you created the application in Amplify. Upon accessing the application URL, you’ll be prompted to provide information related to Amazon Cognito and Amazon Bedrock Agents. This information is required to securely authenticate users and allow the front end to interact with the Amazon Bedrock agent. It enables the application to manage user sessions and make authorized API calls to AWS services on behalf of the user.

You can enter information with the values you collected from the CloudFormation stack outputs. You’ll be required to enter the following fields, as shown in the following screenshot:

  • User Pool ID
  • User Pool ClientID
  • Identity Pool ID
  • Region
  • Agent Name
  • Agent ID
  • Agent Alias ID
  • Region

Front-end configuration Page

You need to sign in with your username and password. A temporary password was automatically generated during deployment and sent to the email address you provided when launching the CloudFormation template. At first sign-in attempt, you’ll be asked to reset your password, as shown in the following video.

virtual-meteorologist-figure-4

Now you can start asking questions in the application, for example, “Can we do barbecue today in Dallas, TX?” In a few seconds, the application will provide you detailed results mentioning if you can do barbecue in Dallas, TX. The following video shows this chat.

virtual-meteorologist-figure-5

Example use cases

Here are a few sample queries to demonstrate the capabilities of your virtual meteorologist:

  1. “What’s the weather like in New York City today?”
  2. “Should I plan an outdoor birthday party in Miami next weekend?”
  3. “Will it snow in Denver on Christmas Day?”
  4. “Can I go swimming on a beach in Chicago today?

These queries showcase the agent’s ability to provide current weather information, offer advice based on weather forecasts, and predict future weather conditions. You can even ask a question related to an activity such as swimming, and it will answer based on the weather conditions if that activity is okay to do.

Clean up

If you decide to discontinue using the virtual meteorologist, you can follow these steps to remove it, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

  1. Delete the CloudFormation stack:
    1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
    2. Locate the stack you created during the deployment process (you assigned a name to it).
    3. Select the stack and choose Delete.
  2. Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

This solution demonstrates the power of combining Amazon Bedrock Agents with other AWS services to create an intelligent, conversational weather assistant. By using AI and cloud technologies, businesses can automate complex queries and provide valuable insights to their users.

Additional resources

To learn more about Amazon Bedrock, refer to the following resources:

To learn more about the Anthropic’s Claude 3.5 Sonnet model, refer to the following resources:


About the Authors

Salman AhmedSalman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.

Sergio BarrazaSergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.

Ravi KumarRavi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.

Ankush GoyalAnkush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Read More

Amazon Q Business simplifies integration of enterprise knowledge bases at scale

Amazon Q Business simplifies integration of enterprise knowledge bases at scale

In this new era of emerging AI technologies, we have the opportunity to build AI-powered assistants tailored to specific business requirements. Amazon Q Business, a new generative AI-powered assistant, can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in an enterprise’s systems.

Large-scale data ingestion is crucial for applications such as document analysis, summarization, research, and knowledge management. These tasks often involve processing vast amounts of documents, which can be time-consuming and labor-intensive. However, ingesting large volumes of enterprise data poses significant challenges, particularly in orchestrating workflows to gather data from diverse sources.

In this post, we propose an end-to-end solution using Amazon Q Business to simplify integration of enterprise knowledge bases at scale.

Enhancing AWS Support Engineering efficiency

The AWS Support Engineering team faced the daunting task of manually sifting through numerous tools, internal sources, and AWS public documentation to find solutions for customer inquiries. For complex customer issues, the process was especially time-consuming, laborious, and at times extended the wait time for customers seeking resolutions. To address this, the team implemented a chat assistant using Amazon Q Business. This solution ingests and processes data from hundreds of thousands of support tickets, escalation notices, public AWS documentation, re:Post articles, and AWS blog posts.

By using Amazon Q Business, which simplifies the complexity of developing and managing ML infrastructure and models, the team rapidly deployed their chat solution. The Amazon Q Business pre-built connectors like Amazon Simple Storage Service (Amazon S3), document retrievers, and upload capabilities streamlined data ingestion and processing, enabling the team to provide swift, accurate responses to both basic and advanced customer queries.

In this post, we propose an end-to-end solution using Amazon Q Business to address similar enterprise data challenges, showcasing how it can streamline operations and enhance customer service across various industries. First we discuss end-to-end large-scale data integration with Amazon Q Business, covering data preprocessing, security guardrail implementation, and Amazon Q Business best practices. Then we introduce the solution deployment using three AWS CloudFormation templates.

Solution overview

The following architecture diagram represents the high-level design of a solution proven effective in production environments for AWS Support Engineering. This solution uses the powerful capabilities of Amazon Q Business. We will walk through the implementation of key components, including configuring enterprise data sources to build our knowledge base, document indexing and boosting, and implementing comprehensive security controls.

Solution Overview

Amazon Q Business supports three users types as part of identity and access management:

  • Service user – An end-user who accesses Amazon Q Business applications with permissions granted by their administrator to perform their job duties
  • Service administrator – A user who manages Amazon Q Business resources and determines feature access for service users within the organization
  • IAM administrator – A user responsible for creating and managing access policies for Amazon Q Business through AWS IAM Identity Center

The following workflow details how a service user accesses the application:

  1. The service user initiates an interaction with the Amazon Q Business application, accessible through the web experience, which is an endpoint URL.
  2. The service user’s permissions are authenticated using IAM Identity Center, an AWS solution that connects workforce users to AWS managed applications like Amazon Q Business. It enables end-user authentication and streamlines access management.
  3. The authenticated service user submits queries in natural language to the Amazon Q Business application.
  4. The Amazon Q Business application generates and returns answers drawing from the enterprise data uploaded to an S3 bucket, which is connected as a data source to Amazon Q Business. This S3 bucket data is continuously refreshed, making sure that Amazon Q Business accesses the most current information for query responses by using a retriever to pull data from the index.

Large-scale data ingestion

Before ingesting the data to Amazon Q Business, the data might need transformation into formats supported by Amazon Q Business. Furthermore, it might contain sensitive data or personally identifiable information (PII) requiring redaction. These data ingestion challenges create a need to orchestrate tasks like transformation, redaction, and secure ingestion.

Data ingestion workflow

To facilitate orchestration, this solution incorporates AWS Step Functions. Step Functions provides a visual workflow service to orchestrate tasks and workloads resiliently and efficiently through built-in AWS integrations and error handling. The solution uses the Step Functions Map state, which allows for parallel processing of multiple items in a dataset, thereby efficiently orchestrating workflows and speeding up overall processing.

The following diagram illustrates an example architecture for ingesting data through an endpoint interfacing with a large corpus.

Data Ingestion Workflow

Step Functions orchestrates AWS services like AWS Lambda and organization APIs like DataStore to ingest, process, and store data securely. The workflow includes the following steps:

  1. The Prepare Map Input Lambda function prepares the required input for the Map state. For example, the Datastore API might require certain input like date periods to query data. This step can be used to define the date periods to be used by the Map state as an input.
  2. The Ingest Data Lambda function fetches data from the Datastore API—which can be in or outside of the virtual private cloud (VPC)—based on the inputs from the Map state. To handle large volumes, the data is split into smaller chunks to mitigate Lambda function overload. This enables Step Functions to manage the workload, retry failed chunks, and isolate failures to individual chunks instead of disrupting the entire ingestion process.
  3. The fetched data is put into an S3 data store bucket for processing.
  4. The Process Data Lambda function redacts sensitive data through Amazon Comprehend. Amazon Comprehend provides real-time APIs, such as DetectPiiEntities and DetectEntities, which use natural language processing (NLP) machine learning (ML) models to identify text portions for redaction. When Amazon Comprehend detects PII, the terms will be redacted and replaced by a character of your choice (such as *). You can also use regular expressions to remove identifiers with predetermined formats.
  5. Finally, the Lambda function creates two separate files:
    1. A sanitized data document in an Amazon Q Business supported format that will be parsed to generate chat responses.
    2. A JSON metadata file for each document containing additional information to customize chat results for end-users and apply boosting techniques to enhance user experience (which we discuss more in the next section).

The following is the sample metadata file:

{
    "DocumentId": "qbusiness-ug.pdf.txt",
    "Attributes": {
        "_created_at": "2024-10-29T20:27:45+00:00",
        "_last_updated_at": "2024-10-29T20:27:45+00:00",
        "_source_uri": "https://docs.aws.amazon.com/pdfs/amazonq/latest/qbusiness-ug/qbusiness-ug.pdf",
        "author": "AWS",
        "services": ["Q Business"]
    },
    "Title": "Amazon Q Business - User Guide",
    "ContentType": "plain/text"
}

In the preceding JSON file, the DocumentId for each data document must be unique. All the other attributes are optional; however, the file has additional attributes like services, _created_at, and _last_updated_at with values defined.

The two files are placed in a new S3 folder for Amazon Q to index. Additionally, the raw unprocessed data is deleted from the S3 bucket. You can further restrict access to documents uploaded to an S3 bucket for specific users or groups using Amazon S3 access control lists (ACLs).

Using the Amazon Q Business data source connector feature, we integrated the S3 bucket with our application. This connector functionality enables the consolidation of data from multiple sources into a unified index for the Amazon Q Business application. The service offers various integration options, with Amazon S3 being one of the supported data sources.

Boosting performance

When working with your specific dataset in Amazon Q Business, you can use relevance tuning to enhance the performance and accuracy of search results. This feature allows you to customize how Amazon Q Business prioritizes information within your ingested documents. For example, if your dataset includes product descriptions, customer reviews, and technical specifications, you can use relevance tuning to boost the importance of certain fields. You might choose to prioritize product names in titles, give more weight to recent customer reviews, or emphasize specific technical attributes that are crucial for your business. By adjusting these parameters, you can influence the ranking of search results to better align with your dataset’s unique characteristics and your users’ information needs, ultimately providing more relevant answers to their queries.

For the metadata file used in this example, we focus on boosting two key metadata attributes: _document_title and services. By assigning higher weights to these attributes, we made sure documents with specific titles or services received greater prominence in the search results, improving their visibility and relevance for the users

The following code is the sample CloudFormation template snippet to enable higher weights to _document_title and services:

BoostOverrideConfiguration:
        Fn::Sub: |
          {
            "nativeIndexConfiguration": {
              "indexId": "${QBusinessIndex.IndexId}",
              "boostingOverride": {
                "_document_title": {
                  "stringConfiguration": {
                    "boostingLevel": "MEDIUM"
                  }
                },
                "services": {
                  "stringListConfiguration": {
                    "boostingLevel": "HIGH"
                  }
                }
              }
            }
          }

Amazon Q Business guardrails

Implementing robust security measures is crucial to protect sensitive information. In this regard, Amazon Q Business guardrails or chat controls proved invaluable, offering a powerful solution to maintain data privacy and security.

Amazon Q Business guardrails provide configurable rules designed to control the application’s behavior. These guardrails act as a safety net, minimizing access, processing, or revealing of sensitive or inappropriate information. By defining boundaries for the application’s operations, organizations can maintain compliance with internal policies and external regulations. You can enable global- or topic-level controls, which control how Amazon Q Business responds to specific topics in chat.

The following is the sample CloudFormation template snippet to enable topic-level controls:

TopicConfigurations:
        - name: topic
          rules:
            - ruleType: CONTENT_BLOCKER_RULE
              ruleConfiguration:
                contentBlockerRule:
                  systemMessageOverride: This message is blocked as it contains secure content
          exampleChatMessages:
            - arn:*:ec2:us-east-1:123456789012:instance/i-abcdef123
            - arn:*:ec2:us-west-2:123456789012:vpc/bpc-abcdef123
            - arn:*:kms:eu-west-1:123456789012:key/12345678-1234-12345678-abc12345678
            - s3://bucket/prefix/file.csv
            - arn:*:s3::::bucket-name

This topic-level control blocks the Amazon Q Business chat conversation that has AWS service Amazon Resource Names (ARNs). When similar chat messages have been detected by the Amazon Q Business application, the system will block the responses and return the message “This message is blocked as it contains secure content.”

For information about deploying the Amazon Q Business application with sample boosting and guardrails, refer to the GitHub repo.

The following screenshot shows an example of the Amazon Q Business assistant chat landing page.Q Business landing Page

The following screenshot illustrates the assistant’s behavior if a user includes text that matches one of the similarity-based examples specified in the guardrail topic control.

Q Business Guardrail

Notification system

To enhance data security, you can deploy Amazon Macie classification jobs to scan for sensitive or PII data stored in S3 buckets. The following diagram illustrates a sample notification architecture to alert users on sensitive information that might be inadvertently stored. Macie uses machine learning to automatically discover, classify, and protect sensitive data stored in AWS. It focuses on identifying PII, intellectual property, and other sensitive data types to help organizations meet compliance requirements and protect their data from unauthorized access or breaches.

NotificationSystem

The workflow includes the following steps:

  1. Macie reviews the data store S3 bucket for sensitive information before being ingested.
  2. If Macie detects sensitive information, it publishes its findings to Amazon EventBridge.
  3. An EventBridge rule invokes the Rectify & Notify Lambda function.
  4. The Lambda function processes the alert, remediates it by removing the affected files from the S3 bucket, and sends a notification using Amazon Simple Notification Service (Amazon SNS) to the subscribed email addresses.

This system enables rapid response to potential security alerts, allowing for immediate action to protect sensitive data.

The Macie detection and subsequent notification system can be demonstrated by uploading a new file to the S3 bucket, such as sample-file-with-credentials.txt, containing the PII data types monitored by Macie, such as fake temporary AWS credentials. After the file is uploaded to Amazon S3 and the scheduled Macie detection job discovers it, the Lambda function immediately removes the file and sends the following notification email to the SNS topic subscribers:

Amazon Macie published a new Finding: "The S3 object contains credentials data"
Description: "The S3 object contains credentials data such as AWS secret access keys or private keys."
Severity: {'score': 3, 'description': 'High'}
Type: SensitiveData:S3Object/Credentials
Category: CLASSIFICATION
Origin Type: "SENSITIVE_DATA_DISCOVERY_JOB"
Sensitive Data Categories: "['CREDENTIALS']"
Resources affected:
Bucket="<BUCKET_NAME>",
Key="processed/sample-file-with-credentials.txt"
Trying to delete S3 Object:  s3://<BUCKET_NAME>/processed/sample-file-with-credentials.txt
File deletion succeeded.

-------------
Full Macie finding event:
{
   ...
}

The notification contains the full Macie finding event, which is omitted from the preceding excerpt. For more information on Macie finding events format, refer to Amazon EventBridge event schema for Macie findings.

Additionally, the findings are visible on the Macie console, as shown in the following screenshot.

Macie Job

Additional recommendations

To further enhance the security and reliability of the Amazon Q Business application, we recommend implementing the following measures. These additional security and logging implementations make sure the data is protected, alerts are sent in response to potential warnings, and timely actions can be taken for security incidents.

  • Amazon CloudWatch logging for Amazon Q Business – You can use Amazon CloudWatch logging for Amazon Q Business to save the logs for the data source connectors and document-level errors, focusing particularly on failed ingestion jobs. This practice is vital from a security perspective because it allows monitoring and quick identification of issues in the data ingestion process. By tracking failed jobs, potential data loss or corruption can be mitigated, maintaining the reliability and completeness of the knowledge base.
  • Unauthorized access monitoring on Amazon S3 – You can implement EventBridge rules to monitor mutating API actions on the S3 buckets. These rules are configured to invoke SNS notifications when such actions are performed by unauthorized users. Enable Amazon S3 server access logging to store detailed access records in a designated bucket, which can be analyzed using Amazon Athena for deeper insights. This approach provides real-time alerts for immediate response to potential security breaches, while also maintaining a detailed audit trail for thorough security analysis, making sure that only authorized entities can modify critical data.

Prerequisites

In the following sections, we walk through implementing the end-to-end solution. For this solution to work, the following prerequisites are needed:

  • A new or existing AWS account that will be the data collection account
  • Corresponding AWS Identity and Access Management (IAM) permissions to create S3 buckets and deploy CloudFormation stacks

Configure the data ingestion

In this post, we demonstrate the solution using publicly available documentation as our sample dataset. In your implementation, you can adapt this solution to work with your organization’s specific content sources, such as support tickets, JIRA issues, internal wikis, or other relevant documentation.

Deploy the following CloudFormation template to create the data ingestion resources:

  • S3 data bucket
  • Ingestion Lambda function
  • Processing Lambda function
  • Step Functions workflow

The data ingestion workflow in this example fetches and processes public data from the Amazon Q Business and Amazon SageMaker official documentation in PDF format. Specifically, the Ingest Data Lambda function downloads the raw PDF documents, temporarily stores them in Amazon S3, and passes their Amazon S3 URLs to the Process Data Lambda function, which performs the PII redaction (if enabled) and stores the processed documents and their metadata to the S3 path indexed by the Amazon Q Business application.

You can adapt the Step Functions Lambda code for ingestion and processing according to your own internal data, making sure that the documents and metadata are in a valid format for Amazon Q Business to index, and are properly redacted for PII data.

Configure IAM Identity Center

You can only have one IAM Identity Center instance per account. If your account already has an Identity Center instance, skip this step and proceed to configuring the Amazon Q Business application.

Deploy the following CloudFormation template to configure IAM Identity Center.

You will need to add details for a user such as user name, email, first name, and surname.

After deploying the CloudFormation template, you will receive an email where you will need to accept the invitation and change the password for the user.

Before logging in, you will need to deploy the Amazon Q Business application.

Configure the Amazon Q Business application

Deploy the following CloudFormation template to configure the Amazon Q Business application.

You will need to add details such as the IAM Identity Center stack name deployed previously and the S3 bucket name provisioned by the data ingestion stack.

After you deploy the CloudFormation template, complete the following steps to manage user access:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Choose the application you provisioned (workshop-app-01).
  3. Under User access, choose Manage user access.
  4. On the Users tab, choose the user you specified when deploying the CloudFormation stack.
  5. Choose Edit subscription.
  6. Under New subscription, choose Business Lite or Business Pro.
  7. Choose Confirm and then Confirm

Now you can log in using the user you have specified. You can find the URL for the web experience under Web experience settings.

If you are unable to log in, make sure that the user has been verified.

Sync the data source

Before you can use the Amazon Q Business application, the data source needs to be synchronized. The application’s data source is configured to sync hourly. It might take some time to synchronize.

When the synchronization is complete, you should now be able to access the application and ask questions.

Clean up

After you’re done testing the solution, you can delete the resources to avoid incurring additional charges. See the Amazon Q Business pricing page for more information. Follow the instructions in the GitHub repository to delete the resources and corresponding CloudFormation templates. Make sure to delete the CloudFormation stacks provisioned as follows:

  1. Delete the Amazon Q Business application stack.
  2. Delete the IAM Identity Center stack.
  3. Delete the data ingestion
  4. For each deleted stack, check for any resources that were skipped in the deletion process, such as S3 buckets.

Delete any skipped resources on the console.

Conclusion

In this post, we demonstrated how to build a knowledge base solution by integrating enterprise data with Amazon Q Business using Amazon S3. This approach helps organizations improve operational efficiency, reduce response times, and gain valuable insights from their historical data. The solution uses AWS security best practices to promote data protection while enabling teams to create a comprehensive knowledge base from various data sources.

Whether you’re managing support tickets, internal documentation, or other business content, this solution can handle multiple data sources and scale according to your needs, making it suitable for organizations of different sizes. By implementing this solution, you can enhance your operations with AI-powered assistance, automated responses, and intelligent routing of complex queries.

Try this solution with your own use case, and let us know about your experience in the comments section.


About the Author

OmarOmar Elkharbotly is a Senior Cloud Support Engineer at AWS, specializing in Data, Machine Learning, and Generative AI solutions. With extensive experience in helping customers architect and optimize their cloud-based AI/ML/GenAI workloads, Omar works closely with AWS customers to solve complex technical challenges and implement best practices across the AWS AI/ML/GenAI service portfolio. He is passionate about helping organizations leverage the full potential of cloud computing to drive innovation in generative AI and machine learning.

VaniaVania Toma is a Principal Cloud Support Engineer at AWS, focused on Networking and Generative AI solutions. He has deep expertise in resolving complex, cross-domain technical challenges through systematic problem-solving methodologies. With a customer-obsessed mindset, he leverages emerging technologies to drive innovation and deliver exceptional customer experiences.

BhavaniBhavani Kanneganti is a Principal Cloud Support Engineer at AWS. She specializes in solving complex customer issues on the AWS Cloud, focusing on infrastructure-as-code, container orchestration, and generative AI technologies. She collaborates with teams across AWS to design solutions that enhance the customer experience. Outside of work, Bhavani enjoys cooking and traveling.

MattiaMattia Sandrini is a Senior Cloud Support Engineer at AWS, specialized in Machine Learning technologies and Generative AI solutions, helping customers operate and optimize their ML workloads. With a deep passion for driving performance improvements, he dedicates himself to empowering both customers and teams through innovative ML-enabled solutions. Away from his technical pursuits, Mattia embraces his passion for travel and adventure.

KevinKevin Draai is a Senior Cloud Support Engineer at AWS who specializes in Serverless technologies and development within the AWS cloud. Kevin has a passion for creating solutions through code while ensuring it is built on solid infrastructure. Outside of work, Kevin enjoys art and sport.

Tipu100Tipu Qureshi is a Senior Principal Engineer leading AWS. Tipu supports customers with designing and optimizing their cloud technology strategy as a senior principal engineer in AWS Support & Managed Services. For over 15 years, he has designed, operated and supported diverse distributed systems at scale with a passion for operational excellence. He currently works on generative AI and operational excellence.

Read More

Faster distributed graph neural network training with GraphStorm v0.4

Faster distributed graph neural network training with GraphStorm v0.4

GraphStorm is a low-code enterprise graph machine learning (ML) framework that provides ML practitioners a simple way of building, training, and deploying graph ML solutions on industry-scale graph data. Although GraphStorm can run efficiently on single instances for small graphs, it truly shines when scaling to enterprise-level graphs in distributed mode using a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances or Amazon SageMaker.

Today, AWS AI released GraphStorm v0.4. This release introduces integration with DGL-GraphBolt, a new graph storage and sampling framework that uses a compact graph representation and pipelined sampling to reduce memory requirements and speed up Graph Neural Network (GNN) training and inference. For the large-scale dataset examined in this post, the inference speedup is 3.6 times faster, and per-epoch training speedup is 1.4 times faster, with even larger speedups possible.

To achieve this, GraphStorm v0.4 with DGL-GraphBolt addresses two crucial challenges of graph learning:

  • Memory constraints – GraphStorm v0.4 provides compact and distributed storage of graph structure and features, which may grow in the multi-TB range. For example, a graph with 1 billion nodes with 512 features per node and 10 billion edges will require more than 4 TB of memory to store, which necessitates distributed computation.
  • Graph sampling – In multi-layer GNNs, you need to sample neighbors of each node to propagate their representations. This can lead to exponential growth in the number of nodes sampled, potentially visiting the entire graph for a single node’s representation. GraphStorm v0.4 provides efficient, pipelined graph sampling.

In this post, we demonstrate how GraphBolt enhances GraphStorm’s performance in distributed settings. We provide a hands-on example of using GraphStorm with GraphBolt on SageMaker for distributed training. Lastly, we share how to use Amazon SageMaker Pipelines with GraphStorm.

GraphBolt: Pipeline-driven graph sampling

GraphBolt is a new data loading and graph sampling framework developed by the DGL team. It streamlines the operations needed to sample efficiently from a heterogeneous graph and fetch the corresponding features. GraphBolt introduces a new, more compact graph structure representation for heterogeneous graphs, called fused Compressed Sparse Column (fCSC). This can reduce the memory cost of storing a heterogeneous graph by up to 56%, allowing users to fit larger graphs in memory and potentially use smaller, more cost-efficient instances for GNN model training.

GraphStorm v0.4 seamlessly integrates with GraphBolt, allowing users to take advantage of its performance improvements in their GNN workflows. The user just needs to provide the additional argument --use-graphbolt true when launching graph construction and training jobs.

Solution overview

A common model development process is to perform model exploration locally on a subset of your full data, and when you’re satisfied with the results, train the full-scale model. This setup allows for cheaper exploration before training on the full dataset. GraphStorm and SageMaker Pipelines allows you to do that by creating a model pipeline you can run locally to retrieve model metrics, and when you’re ready, run your pipeline on the full data on SageMaker, and produce models, predictions, and graph embeddings to use in downstream tasks. In the next section, we show how to set up such pipelines for GraphStorm.

We demonstrate such a setup in the following diagram, where a user can perform model development and initial training on a single EC2 instance, and when they’re ready to train on their full data, hand off the heavy lifting to SageMaker for distributed training. Using SageMaker Pipelines to train models provides several benefits, like reduced costs, auditability, and lineage tracking.

GraphStorm SageMaker Arhcitecture Diagram

Prerequisites

To run this example, you will need an AWS account, an Amazon SageMaker Studio domain, and the necessary permissions to run BYOC SageMaker jobs.

Set up the environment for SageMaker distributed training

You will use the example code available in the GraphStorm repository to run through this example.

Setting up your environment should take around 10 minutes. First, set up your Python environment to run the examples:

conda init
eval $SHELL
# Create a new env for the post
conda create --name gsf python=3.10
conda activate gsf

# Install dependencies for local scripts
pip install torch==2.3.0 --index-url https://download.pytorch.org/whl/cpu
pip install sagemaker boto3 ogb pyarrow
# Verify installation, might take a few minutes for first run
python -c "import sagemaker; import torch"

# Clone the GraphStorm repository to access the example code
git clone https://github.com/awslabs/graphstorm.git ~/graphstorm

Build a GraphStorm SageMaker CPU image

Next, build and push the GraphStorm PyTorch Docker image that you will use to run the graph construction, training, and inference jobs for smaller-scale data. Your role will need to be able to pull images from the Amazon ECR Public Gallery and create Amazon Elastic Container Registry (Amazon ECR) repositories and push images to your private ECR registry.

# Enter you account ID here
ACCOUNT_ID=<aws-account-id>
REGION=us-east-1

cd ~/graphstorm
bash docker/build_graphstorm_image.sh --environment sagemaker --device cpu
bash docker/push_graphstorm_image.sh -e sagemaker -r $REGION -a $ACCOUNT_ID -d cpu
# This will create an ECR repository and push an image to
# ${ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/graphstorm:sagemaker-cpu

Download and prepare datasets

In this post, we use two citation datasets to demonstrate the scalability of GraphStorm. The Open Graph Benchmark (OGB) project hosts a number of graph datasets that can be used to benchmark the performance of graph learning systems. For a small-scale demo, we use the ogbn-arxiv dataset, and for a demonstration of GraphStorm’s large-scale learning capabilities, we use the ogbn-papers100M dataset.

Prepare the ogbn-arxiv dataset

Download the smaller-scale ogbn-arxiv dataset to run a local test before launching larger-scale SageMaker jobs on AWS. This dataset has approximately 170,000 nodes and 1.2 million edges. Use the following code to download the data and prepare it for GraphStorm:

# Provide the S3 bucket to use for output
BUCKET_NAME=<your-s3-bucket>

You use the following script to directly download, transform and upload the data to Amazon Simple Storage Service (Amazon S3):

cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt
python convert_arxiv_to_gconstruct.py 
--output-s3-prefix s3://$BUCKET_NAME/ogb-arxiv-input

This will create the tabular graph data in Amazon S3, which you can verify by running the following code:

aws s3 ls s3://$BUCKET_NAME/ogb-arxiv-input/ 
edges/
nodes/
splits/
gconstruct_config_arxiv.json

Finally, upload GraphStorm training configuration files for arxiv to use for training and inference:

# Upload the training configurations to S3
aws s3 cp ~/graphstorm/training_scripts/gsgnn_np/arxiv_nc.yaml 
s3://$BUCKET_NAME/yaml/arxiv_nc_train.yaml
aws s3 cp ~/graphstorm/inference_scripts/np_infer/arxiv_nc.yaml 
s3://$BUCKET_NAME/yaml/arxiv_nc_inference.yaml

Prepare the ogbn-papers100M dataset on SageMaker

The papers-100M dataset is a large-scale graph dataset, with 111 million nodes and 3.2 billion edges after adding reverse edges.

To download and preprocess the data as an Amazon SageMaker Processing step, use the following code. You can launch and let the job run in the background while proceeding through the rest of the post, and return to this dataset later. The job should take approximately 45 minutes to run.

# Navigate to the example code
cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt

# Build and push a Docker image to download and process the papers100M data
bash build_and_push_papers100M_image.sh -a $ACCOUNT_ID -r $REGION

# This creates an ECR repository and pushes an image to
# $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/papers100m-processor

# Run a SageMaker job to do the processing and upload the output to S3
SAGEMAKER_EXECUTION_ROLE_ARN=<your-sagemaker-execution-role-arn>
aws configure set region $REGION
python sagemaker_convert_papers100m.py 
--output-bucket $BUCKET_NAME 
--execution-role-arn $SAGEMAKER_EXECUTION_ROLE_ARN 
--region $REGION 
--instance-type ml.m5.4xlarge 
--image-uri $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/papers100m-processor

This will produce the processed data in s3://$BUCKET_NAME/ogb-papers100M-input, which can then be used as input to GraphStorm. While this job is running, you can create the GraphStorm pipelines.

Create a SageMaker pipeline

Run the following command to create a SageMaker pipeline:

# Navigate to the example code
cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt

PIPELINE_NAME="ogbn-arxiv-gs-pipeline"

bash deploy_arxiv_pipeline.sh 
--account $ACCOUNT_ID
--bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
--pipeline-name $PIPELINE_NAME 
--use-graphbolt false

Inspect the pipeline

Running the preceding code will create a SageMaker pipeline configured to run three SageMaker jobs in sequence:

  • A GConstruct job that converts the tabular file input to a binary partitioned graph on Amazon S3
  • A GraphStorm training job that trains a node classification model and saves the model to Amazon S3
  • A GraphStorm inference job that produces predictions for all nodes in the test set, and creates embeddings for all nodes

To review the pipeline, navigate to SageMaker AI Studio, choose the domain and user profile you used to create the pipeline, then choose Open Studio.

In the navigation pane, choose Pipelines. There should be a pipeline named ogbn-arxiv-gs-pipeline. Choose the pipeline, which will take you to the Executions tab for the pipeline. Choose Graph to view the pipeline steps.

GraphStorm SageMaker Pipeline on SageMaker Studio

Run the SageMaker pipeline locally for ogbn-arxiv

The ogbn-arxiv dataset is small enough that you can run the pipeline locally. Run the following command to start a local execution of the pipeline:

# Allow the local containers to inherit AWS credentials
export USE_SHORT_LIVED_CREDENTIALS=1
python ~/graphstorm/sagemaker/pipeline/execute_sm_pipeline.py 
--pipeline-name ogbn-arxiv-gs-pipeline 
--region us-east-1 
--local-execution | tee arxiv-local-logs.txt

We save the log output to arxiv-local-logs.txt. You will use that later to analyze the training speed.

Running the pipeline should take approximately 5 minutes. When the pipeline is complete, it will print a message like the following:

Pipeline execution 655b9357-xxx-xxx-xxx-4fc691fcce94 SUCCEEDED

You can inspect the mean epoch and evaluation time using the provided analyze_training_time.py script and the log file you created:

python analyze_training_time.py --log-file arxiv-local-logs.txt

Reading logs from file: arxiv-local-logs.txt

=== Training Epochs Summary ===
Total epochs completed: 10
Average epoch time: 4.70 seconds

=== Evaluation Summary ===
Total evaluations: 11
Average evaluation time: 1.90 seconds

These numbers will vary depending on your instance type; in this case, these are values reported on an m6in.4xlarge instance.

Create a GraphBolt pipeline

Now you have established a baseline for performance, you can create another pipeline that uses the GraphBolt graph representation to compare the performance.

You can use the same pipeline creation script, but change two variables, providing a new pipeline name and setting --use-graphbolt to “true”:

# Deploy a GraphBolt-enabled pipeline
PIPELINE_NAME_GB="ogbn-arxiv-gs-graphbolt-pipeline"
bash deploy_arxiv_pipeline.sh 
--account $ACCOUNT_ID 
--bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
--pipeline-name $PIPELINE_NAME_GB 
--use-graphbolt true

# Execute the pipeline locally
python ~/graphstorm/sagemaker/pipeline/execute_sm_pipeline.py 
--pipeline-name $PIPELINE_NAME_GB 
--region us-east-1 
--local-execution | tee arxiv-local-gb-logs.txt

Analyzing the training logs, you can see the per-epoch time has dropped somewhat:

python analyze_training_time.py --log-file arxiv-local-gb-logs.txt

Reading logs from file: arxiv-local-gb-logs.txt

=== Training Epochs Summary ===
Total epochs completed: 10
Average epoch time: 4.21 seconds

=== Evaluation Summary ===
Total evaluations: 11
Average evaluation time: 1.63 seconds

For such a small graph, the performance gains are modest, around 13% per epoch time. With large data, the potential gains are much greater. In the next section, you will create a pipeline and train a model for papers-100M, a citation graph with 111 million nodes and 3.2 billion edges.

Create a SageMaker pipeline for distributed training

After the SageMaker processing job that prepares the papers-100M data has finished processing and the data is stored in Amazon S3, you can set up a pipeline to train a model on that dataset.

Build the GraphStorm GPU image

For this job, you will use large GPU instances, so you will build and push the GPU image this time:

cd ~/graphstorm

bash ./docker/build_graphstorm_image.sh --environment sagemaker --device gpu

bash docker/push_graphstorm_image.sh -e sagemaker -r $REGION -a $ACCOUNT_ID -d gpu

Deploy and run pipelines for papers-100M

Before you deploy your new pipeline, upload the training YAML configuration for papers-100M to Amazon S3:

aws s3 cp 
~/graphstorm/training_scripts/gsgnn_np/papers100M_nc.yaml 
s3://$BUCKET_NAME/yaml/

Now you are ready to deploy your initial pipeline for papers-100M:

# Navigate to the example code 
cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt 
PIPELINE_NAME="ogb-papers100M-pipeline" 
bash deploy_papers100M_pipeline.sh  
    --account $ACCOUNT_ID 
    --bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
    --pipeline-name $PIPELINE_NAME  
    --use-graphbolt false

Run the pipeline on SageMaker and let it run in the background:

# Navigate to the example code
cd ~/graphstorm/examples/sagemaker-pipelines-graphbolt

PIPELINE_NAME="ogb-papers100M-pipeline"
bash deploy_papers100M_pipeline.sh 
--account $ACCOUNT_ID 
--bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
--pipeline-name $PIPELINE_NAME 
--use-graphbolt false

Your account needs to meet the required quotas for the requested instances. For this post, the defaults are set to four ml.g5.48xlarge for training jobs and one ml.r5.24xlarge instance for a processing job. To adjust your SageMaker service quotas, you can use the Service Quotas console. To run both pipelines in parallel, i.e. without GraphBolt and with GraphBolt, you will need 8 x $TRAIN_GPU_INSTANCE and 2 x $GCONSTRUCT_INSTANCE.

Next, you can deploy and run another pipeline, with GraphBolt enabled:

# Deploy the GraphBolt-enabled pipeline
PIPELINE_NAME_GB="ogb-papers100M-graphbolt-pipeline"
bash deploy_papers100M_pipeline.sh 
--account $ACCOUNT_ID
--bucket-name $BUCKET_NAME --execution-role $SAGEMAKER_EXECUTION_ROLE_ARN 
--pipeline-name $PIPELINE_NAME_GB 
--use-graphbolt true

# Execute the GraphBolt pipeline on SageMaker
python ~/graphstorm/sagemaker/pipeline/execute_sm_pipeline.py 
--pipeline-name $PIPELINE_NAME_GB 
--region us-east-1 
--async-execution

Compare performance for GraphBolt-enabled training

After both pipelines are complete, which should take approximately 4 hours, you can compare the training times for both cases.

On the Pipelines page of the SageMaker console, there should be two new pipelines named ogb-papers100M-pipeline and ogb-papers100M-graphbolt-pipeline. Choose ogb-papers100M-pipeline, which will take you to the Executions tab for the pipeline. Copy the name of the latest successful execution and use that to run the training analysis script:

python analyze_training_time.py 
--pipeline-name $PIPELINE_NAME
--execution-name execution-1734404366941

Your output will look like the following code:

== Training Epochs Summary ===
Total epochs completed: 15
Average epoch time: 73.95 seconds

=== Evaluation Summary ===
Total evaluations: 15
Average evaluation time: 15.07 seconds

Now do the same for the GraphBolt-enabled pipeline:

python analyze_training_time.py 
--pipeline-name $PIPELINE_NAME_GB 
--execution-name execution-1734463209078

You will see the improved per-epoch and evaluation times:

== Training Epochs Summary ===
Total epochs completed: 15
Average epoch time: 54.54 seconds

=== Evaluation Summary ===
Total evaluations: 15
Average evaluation time: 4.13 seconds

Without loss in accuracy, the latest version of GraphStorm achieved a speedup of approximately 1.4 times faster per epoch for training, and a speedup of 3.6 times faster in evaluation time! Depending on the dataset, the speedups can be even greater, as shown by the DGL team’s benchmarking.

Conclusion

This post showcased how GraphStorm 0.4, integrated with DGL-GraphBolt, significantly speeds up large-scale GNN training and inference, by 1.4 and 3.6 times faster, respectively, as measured on the papers-100M dataset. As shown in the DGL benchmarks, even larger speedups are possible depending on the dataset.

We encourage ML practitioners working with large graph data to try GraphStorm. Its low-code interface simplifies building, training, and deploying graph ML solutions on AWS, allowing you to focus on modeling rather than infrastructure.

To get started, visit the GraphStorm documentation and GraphStorm GitHub repository.


About the author

Theodore Vasiloudis is a Senior Applied Scientist at Amazon Web Services, where he works on distributed machine learning systems and algorithms. He led the development of GraphStorm Processing, the distributed graph processing library for GraphStorm and is a core developer for GraphStorm. He received his PhD in Computer Science from the KTH Royal Institute of Technology, Stockholm, in 2019.

Xiang Song is a Senior Applied Scientist at Amazon Web Services, where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in a Neptune graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.

Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.

Read More

Transforming credit decisions using generative AI with Rich Data Co and AWS

Transforming credit decisions using generative AI with Rich Data Co and AWS

This post is co-written with Gordon Campbell, Charles Guan, and Hendra Suryanto from RDC. 

The mission of Rich Data Co (RDC) is to broaden access to sustainable credit globally. Its software-as-a-service (SaaS) solution empowers leading banks and lenders with deep customer insights and AI-driven decision-making capabilities.

Making credit decisions using AI can be challenging, requiring data science and portfolio teams to synthesize complex subject matter information and collaborate productively. To solve this challenge, RDC used generative AI, enabling teams to use its solution more effectively:

  • Data science assistant – Designed for data science teams, this agent assists teams in developing, building, and deploying AI models within a regulated environment. It aims to boost team efficiency by answering complex technical queries across the machine learning operations (MLOps) lifecycle, drawing from a comprehensive knowledge base that includes environment documentation, AI and data science expertise, and Python code generation.
  • Portfolio assistant – Designed for portfolio managers and analysts, this agent facilitates natural language inquiries about loan portfolios. It provides critical insights on performance, risk exposures, and credit policy alignment, enabling informed commercial decisions without requiring in-depth analysis skills. The assistant is adept at high-level questions (such as identifying high-risk segments or potential growth opportunities) and one-time queries, allowing the portfolio to be diversified.

In this post, we discuss how RDC uses generative AI on Amazon Bedrock to build these assistants and accelerate its overall mission of democratizing access to sustainable credit.

Solution overview: Building a multi-agent generative AI solution

We began with a carefully crafted evaluation set of over 200 prompts, anticipating common user questions. Our initial approach combined prompt engineering and traditional Retrieval Augmented Generation (RAG). However, we encountered a challenge: accuracy fell below 90%, especially for more complex questions.

To overcome the challenge, we adopted an agentic approach, breaking down the problem into specialized use cases. This strategy equipped us to align each task with the most suitable foundation model (FM) and tools. Our multi-agent framework is orchestrated using LangGraph, and it consisted of:

  1. Orchestrator – The orchestrator is responsible for routing user questions to the appropriate agent. In this example, we start with the data science or portfolio agent. However, we envision many more agents in the future. The orchestrator can also use user context, such as the user’s role, to determine routing to the appropriate agent.
  2. Agent – The agent is designed for a specialized task. It’s equipped with the appropriate FM for the task and the necessary tools to perform actions and access knowledge. It can also handle multiturn conversations and orchestrate multiple calls to the FM to reach a solution.
  3. Tools – Tools extend agent capabilities beyond the FM. They provide access to external data and APIs or enable specific actions and computation. To efficiently use the model’s context window, we construct a tool selector that retrieves only the relevant tools based on the information in the agent state. This helps simplify debugging in the case of errors, ultimately making the agent more effective and cost-efficient.

This approach gives us the right tool for the right job. It enhances our ability to handle complex queries efficiently and accurately while providing flexibility for future improvements and agents.

The following image is a high-level architecture diagram of the solution.

High-level architecture diagram

Data science agent: RAG and code generation

To boost productivity of data science teams, we focused on rapid comprehension of advanced knowledge, including industry-specific models from a curated knowledge base. Here, RDC provides an integrated development environment (IDE) for Python coding, catering to various team roles. One role is model validator, who rigorously assesses whether a model aligns with bank or lender policies. To support the assessment process, we designed an agent with two tools:

  1. Content retriever toolAmazon Bedrock Knowledge Bases powers our intelligent content retrieval through a streamlined RAG implementation. The service automatically converts text documents to their vector representation using Amazon Titan Text Embeddings and stores them in Amazon OpenSearch Serverless. Because the knowledge is vast, it performs semantic chunking, making sure that the knowledge is organized by topic and can fit within the FM’s context window. When users interact with the agent, Amazon Bedrock Knowledge Bases using OpenSearch Serverless provides fast, in-memory semantic search, enabling the agent to retrieve the most relevant chunks of knowledge for relevant and contextual responses to users.
  2. Code generator tool – With code generation, we selected Anthropic’s Claude model on Amazon Bedrock due to its inherent ability to understand and generate code. This tool is grounded to answer queries related to data science and can generate Python code for quick implementation. It’s also adept at troubleshooting coding errors.

Portfolio agent: Text-to-SQL and self-correction

To boost the productivity of credit portfolio teams, we focused on two key areas. For portfolio managers, we prioritized high-level commercial insights. For analysts, we enabled deep-dive data exploration. This approach empowered both roles with rapid understanding and actionable insights, streamlining decision-making processes across teams.

Our solution required natural language understanding of structured portfolio data stored in Amazon Aurora. This led us to base our solution on a text-to-SQL model to efficiently bridge the gap between natural language and SQL.

To reduce errors and tackle complex queries beyond the model’s capabilities, we developed three tools using Anthropic’s Claude model on Amazon Bedrock for self-correction:

  1. Check query tool – Verifies and corrects SQL queries, addressing common issues such as data type mismatches or incorrect function usage
  2. Check result tool – Validates query results, providing relevance and prompting retries or user clarification when needed
  3. Retry from user tool – Engages users for additional information when queries are too broad or lack detail, guiding the interaction based on database information and user input

These tools operate in an agentic system, enabling accurate database interactions and improved query results through iterative refinement and user engagement.

To improve accuracy, we tested model fine-tuning, training the model on common queries and context (such as database schemas and their definitions). This approach reduces inference costs and improves response times compared to prompting at each call. Using Amazon SageMaker JumpStart, we fine-tuned Meta’s Llama model by providing a set of anticipated prompts, intended answers, and associated context. Amazon SageMaker Jumpstart offers a cost-effective alternative to third-party models, providing a viable pathway for future applications. However, we didn’t end up deploying the fine-tuned model because we experimentally observed that prompting with Anthropic’s Claude model provided better generalization, especially for complex questions. To reduce operational overhead, we will also evaluate structured data retrieval on Amazon Bedrock Knowledge Bases.

Conclusion and next steps with RDC

To expedite development, RDC collaborated with AWS Startups and the AWS Generative AI Innovation Center. Through an iterative approach, RDC rapidly enhanced its generative AI capabilities, deploying the initial version to production in just 3 months. The solution successfully met the stringent security standards required in regulated banking environments, providing both innovation and compliance.

“The integration of generative AI into our solution marks a pivotal moment in our mission to revolutionize credit decision-making. By empowering both data scientists and portfolio managers with AI assistants, we’re not just improving efficiency—we’re transforming how financial institutions approach lending.”

–Gordon Campbell, Co-Founder & Chief Customer Officer at RDC

RDC envisions generative AI playing a significant role in boosting the productivity of the banking and credit industry. By using this technology, RDC can provide key insights to customers, improve solution adoption, accelerate the model lifecycle, and reduce the customer support burden. Looking ahead, RDC plans to further refine and expand its AI capabilities, exploring new use cases and integrations as the industry evolves.

For more information about how to work with RDC and AWS and to understand how we’re supporting banking customers around the world to use AI in credit decisions, contact your AWS Account Manager or visit Rich Data Co.

For more information about generative AI on AWS, refer to the following resources:


About the Authors

Daniel Wirjo is a Solutions Architect at AWS, focused on FinTech and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive growth and innovation on AWS. Outside of work, Daniel enjoys taking walks with a coffee in hand, appreciating nature, and learning new ideas.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Iman Abbasnejad is a computer scientist at the Generative AI Innovation Center at Amazon Web Services (AWS) working on Generative AI and complex multi-agents systems.

Gordon Campbell is the Chief Customer Officer and Co-Founder of RDC, where he leverages over 30 years in enterprise software to drive RDC’s leading AI Decisioning platform for business and commercial lenders. With a proven track record in product strategy and development across three global software firms, Gordon is committed to customer success, advocacy, and advancing financial inclusion through data and AI.

Charles Guan is the Chief Technology Officer and Co-founder of RDC. With more than 20 years of experience in data analytics and enterprise applications, he has driven technological innovation across both the public and private sectors. At RDC, Charles leads research, development, and product advancement—collaborating with universities to leverage advanced analytics and AI. He is dedicated to promoting financial inclusion and delivering positive community impact worldwide.

Hendra Suryanto is the Chief Data Scientist at RDC with more than 20 years of experience in data science, big data, and business intelligence. Before joining RDC, he served as a Lead Data Scientist at KPMG, advising clients globally. At RDC, Hendra designs end-to-end analytics solutions within an Agile DevOps framework. He holds a PhD in Artificial Intelligence and has completed postdoctoral research in machine learning.

Read More

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

Build agentic AI solutions with DeepSeek-R1, CrewAI, and Amazon SageMaker AI

AI agents are rapidly becoming the next frontier in enterprise transformation, with 82% of organizations planning adoption within the next 3 years. According to a Capgemini survey of 1,100 executives at large enterprises, 10% of organizations already use AI agents, and more than half plan to use them in the next year. The recent release of the DeepSeek-R1 models brings state-of-the-art reasoning capabilities to the open source community. Organizations can build agentic applications using these reasoning models to execute complex tasks with advanced decision-making capabilities, enhancing efficiency and adaptability.

In this post, we dive into how organizations can use Amazon SageMaker AI, a fully managed service that allows you to build, train, and deploy ML models at scale, and can build AI agents using CrewAI, a popular agentic framework and open source models like DeepSeek-R1.

Agentic design vs. traditional software design

Agentic systems offer a fundamentally different approach compared to traditional software, particularly in their ability to handle complex, dynamic, and domain-specific challenges. Unlike traditional systems, which rely on rule-based automation and structured data, agentic systems, powered by large language models (LLMs), can operate autonomously, learn from their environment, and make nuanced, context-aware decisions. This is achieved through modular components including reasoning, memory, cognitive skills, and tools, which enable them to perform intricate tasks and adapt to changing scenarios.

Traditional software platforms, though effective for routine tasks and horizontal scaling, often lack the domain-specific intelligence and flexibility that agentic systems provide. For example, in a manufacturing setting, traditional systems might track inventory but lack the ability to anticipate supply chain disruptions or optimize procurement using real-time market insights. In contrast, an agentic system can process live data such as inventory fluctuations, customer preferences, and environmental factors to proactively adjust strategies and reroute supply chains during disruptions.

Enterprises should strategically consider deploying agentic systems in scenarios where adaptability and domain-specific expertise are critical. For instance, consider customer service. Traditional chatbots are limited to preprogrammed responses to expected customer queries, but AI agents can engage with customers using natural language, offer personalized assistance, and resolve queries more efficiently. AI agents can significantly improve productivity by automating repetitive tasks, such as generating reports, emails, and software code. The deployment of agentic systems should focus on well-defined processes with clear success metrics and where there is potential for greater flexibility and less brittleness in process management.

DeepSeek-R1

In this post, we show you how to deploy DeepSeek-R1 on SageMaker, particularly the Llama-70b distilled variant DeepSeek-R1-Distill-Llama-70B to a SageMaker real-time endpoint. DeepSeek-R1 is an advanced LLM developed by the AI startup DeepSeek. It employs reinforcement learning techniques to enhance its reasoning capabilities, enabling it to perform complex tasks such as mathematical problem-solving and coding. To learn more about DeepSeek-R1, refer to DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart and deep dive into the thesis behind building DeepSeek-R1.

Generative AI on SageMaker AI

SageMaker AI, a fully managed service, provides a comprehensive suite of tools designed to deliver high-performance, cost-efficient machine learning (ML) and generative AI solutions for diverse use cases. SageMaker AI empowers you to build, train, deploy, monitor, and govern ML and generative AI models through an extensive range of services, including notebooks, jobs, hosting, experiment tracking, a curated model hub, and MLOps features, all within a unified integrated development environment (IDE).

SageMaker AI simplifies the process for generative AI model builders of all skill levels to work with foundation models (FMs):

  • Amazon SageMaker Canvas enables data scientists to seamlessly use their own datasets alongside FMs to create applications and architectural patterns, such as chatbots and Retrieval Augmented Generation (RAG), in a low-code or no-code environment.
  • Amazon SageMaker JumpStart offers a diverse selection of open and proprietary FMs from providers like Hugging Face, Meta, and Stability AI. You can deploy or fine-tune models through an intuitive UI or APIs, providing flexibility for all skill levels.
  • SageMaker AI features like notebooks, Amazon SageMaker Training, inference, Amazon SageMaker for MLOps, and Partner AI Apps enable advanced model builders to adapt FMs using LoRA, full fine-tuning, or training from scratch. These services support single GPU to HyperPods (cluster of GPUs) for training and include built-in FMOps tools for tracking, debugging, and deployment.

With SageMaker AI, you can build generative AI-powered agentic workflows using a framework of your choice. Some of the key benefits of using SageMaker AI for fine-tuning and hosting LLMs or FMs include:

  • Ease of deployment – SageMaker AI offers access to SageMaker JumpStart, a curated model hub where models with open weights are made available for seamless deployment through a few clicks or API calls. Additionally, for Hugging Face Hub models, SageMaker AI provides pre-optimized containers built on popular open source hosting frameworks such as vLLM, NVIDIA Triton, and Hugging Face Text Generation Inference (TGI). You simply need to specify the model ID, and the model can be deployed quickly.
  • Instance-based deterministic pricing – SageMaker AI hosted models are billed based on instance-hours rather than token usage. This pricing model enables you to more accurately predict and manage generative AI inference costs while scaling resources to accommodate incoming request loads.
  • Deployments with quantization – SageMaker AI enables you to optimize models prior to deployment using advanced strategies such as quantized deployments (such as AWQ, GPTQ, float16, int8, or int4). This flexibility allows you to efficiently deploy large models, such as a 32-billion parameter model, onto smaller instance types like ml.g5.2xlarge with 24 GB of GPU memory, significantly reducing resource requirements while maintaining performance.
  • Inference load balancing and optimized routing – SageMaker endpoints support load balancing and optimized routing with various strategies, providing users with enhanced flexibility and adaptability to accommodate diverse use cases effectively.
  • SageMaker fine-tuning recipes – SageMaker offers ready-to-use recipes for quickly training and fine-tuning publicly available FMs such as Meta’s Llama 3, Mistral, and Mixtral. These recipes use Amazon SageMaker HyperPod (a SageMaker AI service that provides resilient, self-healing clusters optimized for large-scale ML workloads), enabling efficient and resilient training on a GPU cluster for scalable and robust performance.

Solution overview

CrewAI provides a robust framework for developing multi-agent systems that integrate with AWS services, particularly SageMaker AI. CrewAI’s role-based agent architecture and comprehensive performance monitoring capabilities work in tandem with Amazon CloudWatch.

The framework excels in workflow orchestration and maintains enterprise-grade security standards aligned with AWS best practices, making it an effective solution for organizations implementing sophisticated agent-based systems within their AWS infrastructure.

In this post, we demonstrate how to use CrewAI to create a multi-agent research workflow. This workflow creates two agents: one that researches on a topic on the internet, and a writer agent takes this research and acts like an editor by formatting it in a readable format. Additionally, we guide you through deploying and integrating one or multiple LLMs into structured workflows, using tools for automated actions, and deploying these workflows on SageMaker AI for a production-ready deployment.

The following diagram illustrates the solution architecture.

Prerequisites

To follow along with the code examples in the rest of this post, make sure the following prerequisites are met:

  • Integrated development environment – This includes the following:
    • (Optional) Access to Amazon SageMaker Studio and the JupyterLab IDE – We will use a Python runtime environment to build agentic workflows and deploy LLMs. Having access to a JupyterLab IDE with Python 3.9, 3.10, or 3.11 runtimes is recommended. You can also set up Amazon SageMaker Studio for single users. For more details, see Use quick setup for Amazon SageMaker AI. Create a new SageMaker JupyterLab Space for a quick JupyterLab notebook for experimentation. To learn more, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools.
    • Local IDE – You can also follow along in your local IDE (such as PyCharm or VSCode), provided that Python runtimes have been configured for site to AWS VPC connectivity (to deploy models on SageMaker AI).
  • Permission to deploy models – Make sure that your user execution role has the necessary permissions to deploy models to a SageMaker real-time endpoint for inference. For more information, refer to Deploy models for inference.
  • Access to Hugging Face Hub – You must have access to Hugging Face Hub’s deepseek-ai/DeepSeek-R1-Distill-Llama-8B model weights from your environment.
  • Access to code – The code used in this post is available in the following GitHub repo.

Simplified LLM hosting on SageMaker AI

Before orchestrating agentic workflows with CrewAI powered by an LLM, the first step is to host and query an LLM using SageMaker real-time inference endpoints. There are two primary methods to host LLMs on SageMaker AI:

  • Deploy from SageMaker JumpStart
  • Deploy from Hugging Face Hub

Deploy DeepSeek from SageMaker JumpStart

SageMaker JumpStart offers access to a diverse array of state-of-the-art FMs for a wide range of tasks, including content writing, code generation, question answering, copywriting, summarization, classification, information retrieval, and more. It simplifies the onboarding and maintenance of publicly available FMs, allowing you to access, customize, and seamlessly integrate them into your ML workflows. Additionally, SageMaker JumpStart provides solution templates that configure infrastructure for common use cases, along with executable example notebooks to streamline ML development with SageMaker AI.

The following screenshot shows an example of available models on SageMaker JumpStart.

To get started, complete the following steps:

  1. Install the latest version of the sagemaker-python-sdk using pip.
  2. Run the following command in a Jupyter cell or the SageMaker Studio terminal:
pip install -U sagemaker
  1. List all available LLMs under the Hugging Face or Meta JumpStart hub. The following code is an example of how to do this programmatically using the SageMaker Python SDK:
from sagemaker.jumpstart.filters import (And, Or)
from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# generate a conditional filter to only select LLMs from HF or Meta
filter_value = Or(
    And("task == llm", "framework == huggingface"), 
    "framework == meta", "framework == deekseek"
)

# Retrieve all available JumpStart models
all_models = list_jumpstart_models(filter=filter_value)

For example, deploying the deepseek-llm-r1 model directly from SageMaker JumpStart requires only a few lines of code:

from sagemaker.jumpstart.model import JumpStartModel

model_id = " deepseek-llm-r1" 
model_version = "*"

# instantiate a new JS meta model
model = JumpStartModel(
    model_id=model_id, 
    model_version=model_version
)

# deploy model on a 1 x p5e instance 
predictor = model.deploy(
    accept_eula=True, 
    initial_instance_count=1, 
    # endpoint_name="deepseek-r1-endpoint" # optional endpoint name
)

We recommend deploying your SageMaker endpoints within a VPC and a private subnet with no egress, making sure that the models remain accessible only within your VPC for enhanced security.

We also recommend you integrate with Amazon Bedrock Guardrails for increased safeguards against harmful content. For more details on how to implement Amazon Bedrock Guardrails on a self-hosted LLM, see Implement model-independent safety measures with Amazon Bedrock Guardrails.

Deploy DeepSeek from Hugging Face Hub

Alternatively, you can deploy your preferred model directly from the Hugging Face Hub or the Hugging Face Open LLM Leaderboard to a SageMaker endpoint. Hugging Face LLMs can be hosted on SageMaker using a variety of supported frameworks, such as NVIDIA Triton, vLLM, and Hugging Face TGI. For a comprehensive list of supported deep learning container images, refer to the available Amazon SageMaker Deep Learning Containers. In this post, we use a DeepSeek-R1-Distill-Llama-70B SageMaker endpoint using the TGI container for agentic AI inference. We deploy the model from Hugging Face Hub using Amazon’s optimized TGI container, which provides enhanced performance for LLMs. This container is specifically optimized for text generation tasks and automatically selects the most performant parameters for the given hardware configuration. To deploy from Hugging Face Hub, refer to the GitHub repo or the following code snippet:

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri
import os
from datetime import datetime

# Model configuration
hub = {'HF_MODEL_ID':'deepseek-ai/DeepSeek-R1-Distill-Llama-70B', #Llama-3.3-70B-Instruct
       'SM_NUM_GPUS': json.dumps(number_of_gpu),
       'HF_TOKEN': HUGGING_FACE_HUB_TOKEN,
       'SAGEMAKER_CONTAINER_LOG_LEVEL': '20',  # Set to INFO level
       'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True'  # configure CUDA memory to use expandable memory segments
}
# Create and deploy model
huggingface_model =   HuggingFaceModel(image_uri=get_huggingface_llm_image_uri("huggingface", 
version="2.3.1"),
env=hub,
role=role,sagemaker_session=sagemaker_session)
predictor = huggingface_model.deploy(
               initial_instance_count=1,
               instance_type="ml.p4d.24xlarge"
               endpoint_name=custom_endpoint_name,
               container_startup_health_check_timeout=900)

A new DeepSeek-R1-Distill-Llama-70B endpoint should be InService in under 10 minutes. If you want to change the model from DeepSeek to another model from the hub, simply replace the following parameter or refer to the DeepSeek deploy example in the following GitHub repo. To learn more about deployment parameters that can be reconfigured inside TGI containers at runtime, refer to the following GitHub repo on TGI arguments.

...
"HF_MODEL_ID": "deepseek-ai/...", # replace with any HF hub models
# "HF_TOKEN": "hf_..." # add your token id for gated models
...

For open-weight models deployed directly from hubs, we strongly recommend placing your SageMaker endpoints within a VPC and a private subnet with no egress, making sure that the models remain accessible only within your VPC for a secure deployment.

Build a simple agent with CrewAI

CrewAI offers the ability to create multi-agent and very complex agentic orchestrations using LLMs from several LLM providers, including SageMaker AI and Amazon Bedrock. In the following steps, we create a simple blocks counting agent to serve as an example.

Create a blocks counting agent

The following code sets up a simple blocks counter workflow using CrewAI with two main components:

  • Agent creation (blocks_counter_agent) – The agent is configured with a specific role, goal, and capabilities. This agent is equipped with a tool called BlocksCounterTool.
  • Task definition (count_task) – This is a task that we want this agent to execute. The task includes a template for counting how many of each color of blocks are present, where {color} will be replaced with actual color of the block. The task is assigned to blocks_counter_agent.
from crewai import Agent, Task
from pydantic import BaseModel, Field

# 1. Configure agent
blocks_counter_agent = Agent(
    role="Blocks Inventory Manager",
    goal="Maintain accurate block counts",
    tools=[BlocksCounterTool],
    verbose=True
)

# 2. Create counting task
count_task = Task(
    description="Count {color} play blocks in storage",
    expected_output="Exact inventory count for specified color",
    agent=blocks_counter_agent
)

As you can see in the preceding code, each agent begins with two essential components: an agent definition that establishes the agent’s core characteristics (including its role, goal, backstory, available tools, LLM model endpoint, and so on), and a task definition that specifies what the agent needs to accomplish, including the detailed description of work, expected outputs, and the tools it can use during execution.

This structured approach makes sure that agents have both a clear identity and purpose (through the agent definition) and a well-defined scope of work (through the task definition), enabling them to operate effectively within their designated responsibilities.

Tools for agentic AI

Tools are special functions that give AI agents the ability to perform specific actions, like searching the internet or analyzing data. Think of them as apps on a smartphone—each tool serves a specific purpose and extends what the agent can do. In our example, BlocksCounterTool helps the agent count the number of blocks organized by color.

Tools are essential because they let agents do real-world tasks instead of just thinking about them. Without tools, agents would be like smart speakers that can only talk—they could process information but couldn’t take actual actions. By adding tools, we transform agents from simple chat programs into practical assistants that can accomplish real tasks.

Out-of-the-box tools with CrewAI
Crew AI offers a range of tools out of the box for you to use along with your agents and tasks. The following table lists some of the available tools.

Category Tool Description
Data Processing Tools FileReadTool For reading various file formats
Web Interaction Tools WebsiteSearchTool For web content extraction
Media Tools YoutubeChannelSearchTool For searching YouTube channels
Document Processing PDFSearchTool For searching PDF documents
Development Tools CodeInterpreterTool For Python code interpretation
AI Services DALL-E Tool For image generation

Build custom tools with CrewAI
You can build custom tools in CrewAI in two ways: by subclassing BaseTool or using the @tool decorator. Let’s look at the following BaseTool subclassing option to create the BlocksCounterTool we used earlier:

from crewai.tools import BaseTool

class BlocksCounterTool(BaseTool):
    name = "blocks_counter" 
    description = "Simple tool to count play blocks"

    def _run(self, color: str) -> str:
        return f"There are 10 {color} play blocks available"

Build a multi-agent workflow with CrewAI, DeepSeek-R1, and SageMaker AI

Multi-agent AI systems represent a powerful approach to complex problem-solving, where specialized AI agents work together under coordinated supervision. By combining CrewAI’s workflow orchestration capabilities with SageMaker AI based LLMs, developers can create sophisticated systems where multiple agents collaborate efficiently toward a specific goal. The code used in this post is available in the following GitHub repo.

Let’s build a research agent and writer agent that work together to create a PDF about a topic. We will use a DeepSeek-R1 Distilled Llama 3.3 70B model as a SageMaker endpoint for the LLM inference.

Define your own DeepSeek SageMaker LLM (using LLM base class)

The following code integrates SageMaker hosted LLMs with CrewAI by creating a custom inference tool that formats prompts with system instructions for factual responses, uses Boto3, an AWS core library, to call SageMaker endpoints, and processes responses by separating reasoning (before </think>) from final answers. This enables CrewAI agents to use deployed models while maintaining structured output patterns.

# Calls SageMaker endpoint for DeepSeek inference
def deepseek_llama_inference(prompt: dict, endpoint_name: str, region: str = "us-east-2") -> dict:
    try:
        # ... Response parsing Code...

    except Exception as e:
        raise RuntimeError(f"Error while calling SageMaker endpoint: {e}")

# CrewAI-compatible LLM implementation for DeepSeek models on SageMaker.
class DeepSeekSageMakerLLM(LLM):
    def __init__(self, endpoint: str):
        # <... Initialize LLM with SageMaker endpoint ...>

    def call(self, prompt: Union[List[Dict[str, str]], str], **kwargs) -> str:
        # <... Format and return the final response ...>

Name the DeepSeek-R1 Distilled endpoint
Set the endpoint name as defined earlier when you deployed DeepSeek from the Hugging Face Hub:

deepseek_endpoint = "deepseek-r1-dist-v3-llama70b-2025-01-22"

Create a DeepSeek inference tool
Just like how we created the BlocksCounterTool earlier, let’s create a tool that uses the DeepSeek endpoint for our agents to use. We use the same BaseTool subclass here, but we hide it in the CustomTool class implementation in sage_tools.py in the tools folder. For more information, refer to the GitHub repo.

from crewai import Crew, Agent, Task, Process 

# Create the Tool for LLaMA inference
deepseek_tool = CustomTool(
    name="deepseek_llama_3.3_70B",
    func=lambda inputs: deepseek_llama_inference(
        prompt=inputs,
        endpoint_name=deepseek_endpoint
    ),
    description="A tool to generate text using the DeepSeek LLaMA model deployed on SageMaker."
)

Create a research agent
Just like the simple blocks agent we defined earlier, we follow the same template here to define the research agent. The difference here is that we give more capabilities to this agent. We attach a SageMaker AI based DeepSeek-R1 model as an endpoint for the LLM.

This helps the research agent think critically about information processing by combining the scalable infrastructure of SageMaker with DeepSeek-R1’s advanced reasoning capabilities.

The agent uses the SageMaker hosted LLM to analyze patterns in research data, evaluate source credibility, and synthesize insights from multiple inputs. By using the deepseek_tool, the agent can dynamically adjust its research strategy based on intermediate findings, validate hypotheses through iterative questioning, and maintain context awareness across complex information it gathers.

# Research Agent

research_agent = Agent(
    role="Research Bot",
    goal="Scan sources, extract relevant information, and compile a research summary.",
    backstory="An AI agent skilled in finding relevant information from a variety of sources.",
    tools=[deepseek_tool],
    allow_delegation=True,
    llm=DeepSeekSageMakerLLM(endpoint=deepseek_endpoint),
    verbose=False
)

Create a writer agent
The writer agent is configured as a specialized content editor that takes research data and transforms it into polished content. This agent works as part of a workflow where it takes research from a research agent and acts like an editor by formatting the content into a readable format. The agent is used for writing and formatting, and unlike the research agent, it doesn’t delegate tasks to other agents.

writer_agent = Agent(
    role="Writer Bot",
    goal="Receive research summaries and transform them into structured content.",
    backstory="A talented writer bot capable of producing high-quality, structured content based on research.",
    tools=[deepseek_tool],
    allow_delegation=False,
    llm=DeepSeekSageMakerLLM(endpoint=deepseek_endpoint),
    verbose=False
)

Define tasks for the agents
Tasks in CrewAI define specific operations that agents need to perform. In this example, we have two tasks: a research task that processes queries and gathers information, and a writing task that transforms research data into polished content.

Each task includes a clear description of what needs to be done, the expected output format, and specifies which agent will perform the work. This structured approach makes sure that agents have well-defined responsibilities and clear deliverables.

Together, these tasks create a workflow where one agent researches a topic on the internet, and another agent takes this research and formats it into readable content. The tasks are integrated with the DeepSeek tool for advanced language processing capabilities, enabling a production-ready deployment on SageMaker AI.

research_task = Task(
    description=(
        "Your task is to conduct research based on the following query: {prompt}.n"
    ),
    expected_output="A comprehensive research summary based on the provided query.",
    agent=research_agent,
    tools=[deepseek_tool]
)

writing_task = Task(
    description=(
              "Your task is to create structured content based on the research provided.n""),
    expected_output="A well-structured article based on the research summary.",
    agent=research_agent,
    tools=[deepseek_tool]
)

Define a crew in CrewAI
A crew in CrewAI represents a collaborative group of agents working together to achieve a set of tasks. Each crew defines the strategy for task execution, agent collaboration, and the overall workflow. In this specific example, the sequential process makes sure tasks are executed one after the other, following a linear progression. There are other more complex orchestrations of agents working together, which we will discuss in future blog posts.

This approach is ideal for projects requiring tasks to be completed in a specific order. The workflow creates two agents: a research agent and a writer agent. The research agent researches a topic on the internet, then the writer agent takes this research and acts like an editor by formatting it into a readable format.

Let’s call the crew scribble_bots:

# Define the Crew for Sequential Workflow # 

scribble_bots = Crew( agents=[research_agent, writer_agent], 
       tasks=[research_task, writing_task], 
       process=Process.sequential # Ensure tasks execute in sequence)

Use the crew to run a task
We have our endpoint deployed, agents created, and crew defined. Now we’re ready to use the crew to get some work done. Let’s use the following prompt:

result = scribble_bots.kickoff(inputs={"prompt": "What is DeepSeek?"})

Our result is as follows:

**DeepSeek: Pioneering AI Solutions for a Smarter Tomorrow**

In the rapidly evolving landscape of artificial intelligence, 
DeepSeek stands out as a beacon of innovation and practical application. 
As an AI company, DeepSeek is dedicated to advancing the field through cutting-edge research and real-world applications, 
making AI accessible and beneficial across various industries.

**Focus on AI Research and Development**

………………….. ………………….. ………………….. …………………..

Clean up

Complete the following steps to clean up your resources:

  1. Delete your GPU DeekSeek-R1 endpoint:
import boto3

# Create a low-level SageMaker service client.
sagemaker_client = boto3.client('sagemaker', region_name=<region>)

# Delete endpoint
sagemaker_client.delete_endpoint(EndpointName=endpoint_name)
  1. If you’re using a SageMaker Studio JupyterLab notebook, shut down the JupyterLab notebook instance.

Conclusion

In this post, we demonstrated how you can deploy an LLM such as DeepSeek-R1—or another FM of your choice—from popular model hubs like SageMaker JumpStart or Hugging Face Hub to SageMaker AI for real-time inference. We explored inference frameworks like Hugging Face TGI which helps streamline deployment while integrating built-in performance optimizations to minimize latency and maximize throughput. Additionally, we showcased how the SageMaker developer-friendly Python SDK simplifies endpoint orchestration, allowing seamless experimentation and scaling of LLM-powered applications.

Beyond deployment, this post provided an in-depth exploration of agentic AI, guiding you through its conceptual foundations, practical design principles using CrewAI, and the seamless integration of state-of-the-art LLMs like DeepSeek-R1 as the intelligent backbone of an autonomous agentic workflow. We outlined a sequential CrewAI workflow design, illustrating how to equip LLM-powered agents with specialized tools that enable autonomous data retrieval, real-time processing, and interaction with complex external systems.

Now, it’s your turn to experiment! Dive into our publicly available code on GitHub, and start building your own DeepSeek-R1-powered agentic AI system on SageMaker. Unlock the next frontier of AI-driven automation—seamlessly scalable, intelligent, and production-ready.

Special thanks to Giuseppe Zappia, Poli Rao, and Siamak Nariman for their support with this blog post.


About the Authors

Surya Kari is a Senior Generative AI Data Scientist at AWS, specializing in developing solutions leveraging state-of-the-art foundation models. He has extensive experience working with advanced language models including DeepSeek-R1, the LLama family, and Qwen, focusing on their fine-tuning and optimization for specific scientific applications. His expertise extends to implementing efficient training pipelines and deployment strategies using AWS SageMaker, enabling the scaling of foundation models from development to production. He collaborates with customers to design and implement generative AI solutions, helping them navigate model selection, fine-tuning approaches, and deployment strategies to achieve optimal performance for their specific use cases.

Bobby Lindsey is a Machine Learning Specialist at Amazon Web Services. He’s been in technology for over a decade, spanning various technologies and multiple roles. He is currently focused on combining his background in software engineering, DevOps, and machine learning to help customers deliver machine learning workflows at scale. In his spare time, he enjoys reading, research, hiking, biking, and trail running.

Karan Singh is a Generative AI Specialist for third-party models at AWS, where he works with top-tier third-party foundation model (FM) providers to develop and execute joint Go-To-Market strategies, enabling customers to effectively train, deploy, and scale FMs to solve industry specific challenges. Karan holds a Bachelor of Science in Electrical and Instrumentation Engineering from Manipal University, a master’s in science in Electrical Engineering from Northwestern University and is currently an MBA Candidate at the Haas School of Business at University of California, Berkeley.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Read More

Automate bulk image editing with Crop.photo and Amazon Rekognition

Automate bulk image editing with Crop.photo and Amazon Rekognition

Evolphin Software, Inc. is a leading provider of digital and media asset management solutions based in Silicon Valley, California. Crop.photo from Evolphin Software is a cloud-based service that offers powerful bulk processing tools for automating image cropping, content resizing, background removal, and listing image analysis.

Crop.photo is tailored for high-end retailers, ecommerce platforms, and sports organizations. The solution has created a unique offering for bulk image editing through its advanced AI-driven solutions. In this post, we explore how Crop.photo uses Amazon Rekognition to provide sophisticated image analysis, enabling automated and precise editing of large volumes of images. This integration streamlines the image editing process for clients, providing speed and accuracy, which is crucial in the fast-paced environments of ecommerce and sports.

Automation: The way out of bulk image editing challenges

Bulk image editing isn’t just about handling a high volume of images, it’s about delivering flawless results with speed at scale. Large retail brands, marketplaces, and sports industries process thousands of images weekly. Each image must be catalog-ready or broadcast-worthy in minutes, not hours.

The challenge lies not just in the quantity but in maintaining high-quality images and brand integrity. Speed and accuracy are non-negotiable. Retailers and sports organizations expect rapid turnaround without compromising image integrity.

This is where Crop.photo’s smart automations come in with an innovative solution for high-volume image processing needs. The platform’s advanced AI algorithms can automatically detect subjects of interest, crop the images, and optimize thousands of images simultaneously while providing consistent quality and brand compliance. By automating repetitive editing tasks, Crop.photo enables enterprises to reduce image processing time from hours to minutes, allowing creative teams to focus on higher-value activities.

Challenges in the ecommerce industry

The ecommerce industry often encounters the following challenges:

  • Inefficiencies and delays in manual image editing – Ecommerce companies rely on manual editing for tasks like resizing, alignment, and background removal. This process can be time-consuming and prone to delays and inconsistencies. A more efficient solution is needed to streamline the editing process, especially during platform migrations or large updates.
  • Maintaining uniformity across diverse image types – Companies work with a variety of image types, from lifestyle shots to product close-ups, across different categories. Maintaining uniformity and professionalism in all image types is essential to meet the diverse needs of marketing, product cataloging, and overall brand presentation.
  • Large-scale migration and platform transition – Transitioning to a new ecommerce platform involves migrating thousands of images, which presents significant logistical challenges. Providing consistency and quality across a diverse range of images during such a large-scale migration is crucial for maintaining brand standards and a seamless user experience.

For a US top retailer, wholesale distribution channels posed a unique challenge. Thousands of fashion images need to be made for the marketplace with less than a day’s notice for flash sales. Their director of creative operations said,

“Crop.photo is an essential part of our ecommerce fashion marketplace workflow. With over 3,000 on-model product images to bulk crop each month, we rely on Crop.photo to enable our wholesale team to quickly publish new products on popular online marketplaces such as Macy’s, Nordstrom, and Bloomingdales. By increasing our retouching team’s productivity by over 70%, Crop.photo has been a game changer for us. Bulk crop images used to take days can now be done in a matter of seconds!”

Challenges in the sports industry

The sports industry often contends with the following challenges:

  • Bulk player headshot volume and consistency – Sports organizations face the challenge of bulk cropping and resizing hundreds of player headshots for numerous teams, frequently on short notice. Maintaining consistency and quality across a large volume of images can be difficult without AI.
  • Diverse player facial features – Players have varying facial features, such as different hair lengths, forehead sizes, and face dimensions. Adapting cropping processes to accommodate these differences traditionally requires manual adjustments for each image, which leads to inconsistencies and significant time investment.
  • Editorial time constraints – Tight editorial schedules and resource limitations are common in sports organizations. The time-consuming nature of manual cropping tasks strains editorial teams, particularly during high-volume periods like tournaments, where delays and rushed work can impact quality and timing.

An Imaging Manager at Europe’s Premier Football Organization expressed,

“We recently found ourselves with 40 images from a top flight English premier league club needing to be edited just 2 hours before kick-off. Using the Bulk AI headshot cropping for sports feature from Crop.photo, we had perfectly cropped headshots of the squad in just 5 minutes, making them ready for publishing in our website CMS just in time. We would never have met this deadline using manual processes. This level of speed was unthinkable before, and it’s why we’re actively recommending Crop.photo to other sports leagues.”

Solution overview

Crop.photo uses Amazon Rekognition to power a robust solution for bulk image editing. Amazon Rekognition offers features like object and scene detection, facial analysis, and image labeling, which they use to generate markers that drive a fully automated image editing workflow.

The following diagram presents a high-level architectural data flow highlighting several of the AWS services used in building the solution.

Architecture diagram showing the end-to-end workflow for Crop.photo’s automated bulk image editing using AWS services.

The solution consists of the following key components:

  • User authenticationAmazon Cognito is used for user authentication and user management.
  • Infrastructure deployment – Frontend and backend servers are used on Amazon Elastic Container Service (Amazon ECS) for container deployment, orchestration, and scaling.
  • Content delivery and cachingAmazon CloudFront is used to cache content, improving performance and routing traffic efficiently.
  • File uploadsAmazon Simple Storage Service (Amazon S3) enables transfer acceleration for fast, direct uploads to Amazon S3.
  • Media and job storage – Information about uploaded files and job execution is stored in Amazon Aurora.
  • Image processingAWS Batch processes thousands of images in bulk.
  • Job managementAmazon Simple Queue Service (Amazon SQS) manages and queues jobs for processing, making sure they’re run in the correct order by AWS Batch.
  • Media analysis – Amazon Rekognition services analyze media files, including:
    • Face Analysis to generate headless crops.
    • Moderation to detect and flag profanity and explicit content.
    • Label Detection to provide context for image processing and focus on relevant objects.
    • Custom Labels to identify and verify brand logos and adhere to brand guidelines.
  • Asynchronous job notificationsAmazon Simple Notification Service (Amazon SNS), Amazon EventBridge, and Amazon SQS deliver asynchronous job completion notifications, manage events, and provide reliable and scalable processing.

Amazon Rekognition is an AWS computer vision service that powers Crop.photo’s automated image analysis. It enables object detection, facial recognition, and content moderation capabilities:

  • Face detection – The Amazon Rekognition face detection feature automatically identifies and analyzes faces in product images. You can use this feature for face-based cropping and optimization through adjustable bounding boxes in the interface.
  • Image color analysis – The color analysis feature examines image composition, identifying dominant colors and balance. This integrates with Crop.photo’s brand guidelines checker to provide consistency across product images.
  • Object detection – Object detection automatically identifies key elements in images, enabling smart cropping suggestions. The interface highlights detected objects, allowing you to prioritize specific elements during cropping.
  • Custom label detection – Custom label detection recognizes brand-specific items and assets. Companies can train models for their unique needs, automatically applying brand-specific cropping rules to maintain consistency.
  • Text detection (OCR) – The OCR capabilities of Amazon Recognition detect and preserve text within images during editing. The system highlights text areas to make sure critical product information remains legible after cropping.

Within the Crop.photo interface, users can upload videos through the standard interface, and the speech-to-text functionality will automatically transcribe any audio content. This transcribed text can then be used to enrich the metadata and descriptions associated with the product images or videos, improving searchability and accessibility for customers. Additionally, the brand guidelines check feature can be applied to the transcribed text, making sure that the written content aligns with the company’s branding and communication style.

The Crop.photo service follows a transparent pricing model that combines unlimited automations with a flexible image credit system. Users have unrestricted access to create and run as many automation workflows as needed, without any additional charges. The service includes a range of features at no extra cost, such as basic image operations, storage, and behind-the-scenes processing.

For advanced AI-powered image processing tasks, like smart cropping or background removal, users consume image credits. The number of credits required for each operation is clearly specified, allowing users to understand the costs upfront. Crop.photo offers several subscription plans with varying image credit allowances, enabling users to choose the plan that best fits their needs.

Results: Improved speed and precision

The automated image editing capabilities of Crop.photo with the integration of Amazon Rekognition has increased speed in editing, with 70% faster image retouching for ecommerce. With a 75% reduction in manual work, the turnaround time for new product images is reduced from 2–3 days to just 1 hour. Similarly, the bulk image editing process has been streamlined, allowing over 100,000 image collections to be processed per day using AWS Fargate. Advanced AI-powered image analysis and editing features provide consistent, high-quality images at scale, eliminating the need for manual review and approval of thousands of product images.

For instance, in the ecommerce industry, this integration facilitates automatic product detection and precise cropping, making sure every image meets specific marketplace and brand standards. In sports, it enables quick identification and cropping of player facial features, including head, eyes, and mouth, adapting to varying backgrounds and maintaining brand consistency.

The following images are before and after pictures for an ecommerce use case.

For a famous wine retailer in the United Kingdom, the integration of Amazon Rekognition with Crop.photo streamlined the processing of over 1,700 product images, achieving a 95% reduction in bulk image editing time, a confirmation to the efficiency of AI-powered enhancement.

Similarly, a top 10 global specialty retailer experienced a transformative impact on their ecommerce fashion marketplace workflow. By automating the cropping of over 3,000 on-model product images monthly, they boosted their retouching team’s productivity by over 70%, maintaining compliance with the varied image standards of multiple online marketplaces.

Conclusion

These case studies illustrate the tangible benefits of integrating Crop.photo with Amazon Rekognition, demonstrating how automation and AI can revolutionize the bulk image editing landscape for ecommerce and sports industries.

Crop.photo, from AWS Partner Evolphin Software, offers powerful bulk processing tools for automating image cropping, content resizing, and listing image analysis, using advanced AI-driven solutions. Crop.photo is tailored for high-end retailers, ecommerce platforms, and sports organizations. Its integration with Amazon Rekognition aims to streamline the image editing process for clients, providing speed and accuracy in the high-stakes environment of ecommerce and sports. Crop.photo plans additional AI capabilities with Amazon Bedrock generative AI frameworks to adapt to emerging digital imaging trends, so it remains an indispensable tool for its clients.

To learn more about Evolphin Software and Crop.photo, visit their website.

To learn more about Amazon Rekognition, refer to the Amazon Rekognition Developer Guide.


About the Authors

Rahul Bhargava, founder & CTO of Evolphin Software and Crop.photo, is reshaping how brands produce and manage visual content at scale. Through Crop.photo’s AI-powered tools, global names like Lacoste and Urban Outfitters, as well as ambitious Shopify retailers, are rethinking their creative production workflows. By leveraging cutting-edge Generative AI, he’s enabling brands of all sizes to scale their content creation efficiently while maintaining brand consistency.

Vaishnavi Ganesan is a Solutions Architect specializing in Cloud Security at AWS based in the San Francisco Bay Area. As a trusted technical advisor, Vaishnavi helps customers to design secure, scalable and innovative cloud solutions that drive both business value and technical excellence. Outside of work, Vaishnavi enjoys traveling and exploring different artisan coffee roasters.

John Powers is an Account Manager at AWS, who provides guidance to Evolphin Software and other organizations to help accelerate business outcomes leveraging AWS Technologies. John has a degree in Business Administration and Management with a concentration in Finance from Gonzaga University, and enjoys snowboarding in the Sierras in his free time.

Read More

Revolutionizing business processes with Amazon Bedrock and Appian’s generative AI skills

Revolutionizing business processes with Amazon Bedrock and Appian’s generative AI skills

This blog post is co-written with Louis Prensky and Philip Kang from Appian. 

The digital transformation wave has compelled enterprises to seek innovative solutions to streamline operations, enhance efficiency, and maintain a competitive edge. Recognizing the growing complexity of business processes and the increasing demand for automation, the integration of generative AI skills into environments has become essential. This strategic move addresses key challenges such as managing vast amounts of unstructured data, adhering to regulatory compliance, and automating repetitive tasks to boost productivity. Using robust infrastructure and advanced language models, these AI-driven tools enhance decision-making by providing valuable insights, improving operational efficiency by automating routine tasks, and helping with data privacy through built-in detection and management of sensitive information. For enterprises, this means achieving higher levels of operational excellence, significant cost savings, and scalable solutions that adapt to business growth. For customers, it translates to improved service quality, enhanced data protection, and a more dynamic, responsive service, ultimately driving better experiences and satisfaction.

Appian has led the charge by offering generative AI skills powered by a collaboration with Amazon Bedrock and Anthropic’s Claude large language models (LLMs). This partnership allows organizations to:

  • Enhance decision making with valuable insights
  • Improve operational efficiency by automating tasks
  • Help protect data privacy through built-in detection and management of sensitive information
  • Maintain compliance with HIPAA and FedRAMP compliant AI skills

Critically, by placing AI in the context of a wider environment, organizations can operationalize AI in processes that seamlessly integrate with existing software, pass work between digital workers and humans, and help achieve strong security and compliance.

Background

Appian, an AWS Partner with competencies in financial services, healthcare, and life sciences, is a leading provider of low-code automation software to streamline and optimize complex business processes for enterprises. The Appian AI Process Platform includes everything you need to design, automate, and optimize even the most complex processes, from start to finish. The world’s most innovative organizations trust Appian to improve their workflows, unify data, and optimize operations—resulting in accelerated growth and superior customer experiences.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Appian uses the robust infrastructure of Amazon Bedrock and Anthropic’s Claude LLMs to offer fully integrated, pre-built generative AI skills that help developers enhance and automate business processes using low-code development. These use case-driven tools automate common tasks in business processes, making AI-powered applications faster and easier to develop.

This blog post will cover how Appian AI skills build automation into organizations’ mission-critical processes to improve operational excellence, reduce costs, and build scalable solutions. Additionally, we’ll cover real-world examples of processes such as:

  • A mortgage lender that used AI-driven data extraction to reduce mortgage processing times from 16 weeks to 10 weeks.
  • A financial services company that achieved a four-fold reduction in data extraction time from trade-related emails.
  • A legal institution that used AI to reduce attorney time spent on contract review, enabling them to focus on other, high-value work.

Current challenges faced by enterprises

Modern enterprises face numerous challenges, including:

  • Managing vast amounts of unstructured data: Enterprises deal with immense volumes of data generated from various sources such as emails, documents, and customer interactions. Organizing, analyzing, and extracting valuable insights from unstructured data can be overwhelming without advanced AI capabilities.
  • Help protect data privacy and compliance: With increasing regulatory requirements around data privacy and protection, organizations must safeguard sensitive information, such as personally identifiable information (PII). Manual processes for data redaction and compliance checks are often error-prone and resource-intensive.
  • Streamlining repetitive and time-consuming tasks: Routine tasks such as data entry, document processing, and content classification consume significant time and effort. Automating these tasks can lead to substantial productivity gains and allow employees to focus on more strategic activities.
  • Adapting to rapidly changing market conditions: In a fast-paced business environment, organizations need to be agile and responsive. This requires real-time data analysis and decision-making capabilities that traditional systems might not provide. AI helps businesses quickly adapt to industry changes and customer demands.
  • Enhancing decision-making with accurate data insights: Making informed decisions requires access to accurate and timely data. However, extracting meaningful insights from large datasets can be challenging without advanced analytical tools. AI-powered solutions can process and analyze data at scale, providing valuable insights that drive better decision-making.

Appian AI service architecture

The architecture of the generative AI skills integrates both the Amazon Bedrock and Amazon Textract scalable infrastructure with Appian’s process management capabilities. This generative AI architecture is designed with private AI as the foundation and upholds those principles.

If a customer site isn’t located in an AWS Region that supports a feature, customers can send their data to a supported Region, as shown in the following figure.Appian Architecture diagram

The key components of this architecture include:

  1. Appian AI Process Platform instances: The frontend serves as the primary application environment where users interact with the system application to upload documents, initiate workflows, and view processed results.
  2. Appian AI service: This service functions as an intermediary layer between the Appian instances and AWS AI services (Amazon Textract and Amazon Bedrock). This layer encapsulates the logic required to interact with the AWS AI services to manage API calls, data formatting, and error handling.
  3. Amazon Textract: This AWS service is used to automate the extraction of text and structured data from scanned documents and images and provide the extracted data in a structured format.
  4. Amazon Bedrock: This AWS service provides advanced AI capabilities using FMs for tasks such as text summarization, sentiment analysis, and natural language understanding. This helps enhance the extracted data with deeper insights and contextual understanding.

Solution

Appian generative AI skills, powered by Amazon Bedrock with Anthropic’s Claude family of LLMs, are designed to jump-start the use of generative AI in your processes. The following figure showcases the diverse capabilities of Appian’s generative AI skills, demonstrating how they enable enterprises to seamlessly automate complex tasks.

Selecting an AI skill

Appian select AI skills

Editing an AI skill

Edit Appian AI skills

Each new skill provides a pre-populated prompt template tailored to specific tasks, alleviating the need to start from scratch. Businesses can select the desired action and customize the prompt for a perfect fit, enabling the automation of tasks such as:

  • Content analysis and processing: With Appian’s generative AI skills, businesses can automatically generate, summarize, and classify content across various formats. This capability is particularly useful for managing large volumes of customer feedback, generating reports, and creating content summaries, significantly reducing the time and effort required for manual content processing.
  • Text and data extraction: Organizations generate mountains of data and documents. Extracting this information manually can be both burdensome and error-prone. Appian’s AI skills can perform highly accurate text extraction from PDF files and scanned images and pull relevant data from both structured and unstructured data sources such as invoices, forms, and emails. This speeds up data processing and promotes higher accuracy and consistency.
  • PII extraction and redaction: Identifying and managing PII within large datasets is crucial for data governance and compliance. Appian’s AI skills can automatically identify and extract sensitive information from documents and communication channels. Additionally, Appian supports plugins that can redact this content for further privacy. This assists your compliance efforts without extensive manual intervention.
  • Document summarization: Appian’s AI skills can summarize documents to give users an overview before digging into the details. Whether it’s summarizing research papers, legal documents, or internal reports, AI can generate concise summaries, saving time and making sure that critical information is highlighted for quick review.

The following figure shows an example of a prompt-builder skill used to extract unstructured data from a bond certificate.

Create Gen AI Prompt

Each AI skill offers pre-populated prompt templates, allowing you to deploy AI without starting from scratch. Each template caters to specific business needs, making implementation straightforward and efficient. Plus, users can customize these prompts to fit their unique requirements and operational needs.

Key takeaways

In this solution, Appian Cloud seamlessly integrates and customizes Amazon Bedrock and Claude LLMs behind the scenes, abstracting complexity to deliver enterprise-grade AI capabilities tailored to its cloud environment. It provides pre-built, use case specific prompt templates for tasks like text summarization and data extractions, dynamically customized based on user inputs and business context. Using the scalability of the Amazon Bedrock infrastructure, Appian Cloud provides optimal performance and efficient handling of enterprise-scale workflows, all within a fully managed cloud service.

By addressing these complexities, Appian Cloud empowers businesses to focus solely on using AI to achieve operational excellence and business outcomes without the burdens of technical setup, integration challenges, or ongoing maintenance efforts.

Customer success stories

Appian’s AI skills have proven effective across multiple industries. Here are a few real-world examples:

  • Mortgage processing: This organization automated the extraction of over 60 data fields from inconsistent document formats, reducing the process timeline from 16 weeks to 10 weeks and achieving 98.33% accuracy. The implementation of Appian’s generative AI skills allowed the mortgage processor to streamline their workflow, significantly cutting down on processing time and improving data accuracy, which led to faster loan approvals and increased customer satisfaction.
  • Financial services: A financial service company received over 1,000 loosely structured emails about trades. Manually annotating these emails led to significant human errors. With an Appian generative AI skill, the customer revamped the entity tagging process by automatically extracting approximately 40 data fields from unstructured emails. This resulted in a four-fold reduction in extraction time and achieved over 95% accuracy, improving the user experience compared to traditional ML extraction tools. The automated process not only reduced errors but also enhanced the speed and reliability of data extraction, leading to more accurate and timely trading decisions.
  • Legal review: A legal institution had to review negotiated contracts against the original contracts to determine whether the outlined risks had been resolved. This manual process was error prone and labor intensive. By deploying a generative AI skill, they automated the extraction of changes between contracts to find the differences and whether risks had been resolved. This streamlined the attorney review process and provided insights and reasoning into the differences found. The automated solution significantly reduced the time attorneys spent on contract review, allowing them to focus on more strategic tasks and improving the overall efficiency of the legal department.

Conclusion

AWS and Appian’s collaboration marks a significant advancement in business process automation. By using the power of Amazon Bedrock and Anthropic’s Claude models, Appian empowers enterprises to optimize and automate processes for greater efficiency and effectiveness. This partnership sets a new standard for AI-driven business solutions, leading to greater growth and enhanced customer experiences. The ability to quickly deploy and customize AI skills allows businesses to stay agile and responsive in a dynamic environment.

Appian solutions are available as software as a service (SaaS) offerings in AWS Marketplace. Check out the Appian website to learn more about how to use the AI skills.


About the Authors

Sunil BemarkarSunil Bemarkar is a Senior Partner Solutions Architect at Amazon Web Services. He works with various Independent Software Vendors (ISVs) and Strategic customers across industries to accelerate their digital transformation journey and cloud adoption.

John Klacynski is a Principal Customer Solution Manager within the AWS Independent Software Vendor (ISV) team. In this role, he programmatically helps ISV customers adopt AWS technologies and services to reach their business goals more quickly.

Louis Prensky is a Senior Product Manager at Appian. He is responsible for driving product strategy and feature design for AI Skills within Appian’s Cognitive Automation Group.

Philip KangPhilip Kang is a Principal Solutions Consultant in Partner Technology & Innovation centers with Appian. In this role, he spearheads technical innovation with a focus on AI/ML and cloud solutions.

Read More

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Governing the ML lifecycle at scale, Part 4: Scaling MLOps with security and governance controls

Data science teams often face challenges when transitioning models from the development environment to production. These include difficulties integrating data science team’s models into the IT team’s production environment, the need to retrofit data science code to meet enterprise security and governance standards, gaining access to production grade data, and maintaining repeatability and reproducibility in machine learning (ML) pipelines, which can be difficult without a proper platform infrastructure and standardized templates.

This post, part of the “Governing the ML lifecycle at scale” series (Part 1, Part 2, Part 3), explains how to set up and govern a multi-account ML platform that addresses these challenges. The platform provides self-service provisioning of secure environments for ML teams, accelerated model development with predefined templates, a centralized model registry for collaboration and reuse, and standardized model approval and deployment processes.

An enterprise might have the following roles involved in the ML lifecycles. The functions for each role can vary from company to company. In this post, we assign the functions in terms of the ML lifecycle to each role as follows:

  • Lead data scientist – Provision accounts for ML development teams, govern access to the accounts and resources, and promote standardized model development and approval process to eliminate repeated engineering effort. Usually, there is one lead data scientist for a data science group in a business unit, such as marketing.
  • Data scientists – Perform data analysis, model development, model evaluation, and registering the models in a model registry.
  • ML engineers – Develop model deployment pipelines and control the model deployment processes.
  • Governance officer – Review the model’s performance including documentation, accuracy, bias and access, and provide final approval for models to be deployed.
  • Platform engineers – Define a standardized process for creating development accounts that conform to the company’s security, monitoring, and governance standards; create templates for model development; and manage the infrastructure and mechanisms for sharing model artifacts.

This ML platform provides several key benefits. First, it enables every step in the ML lifecycle to conform to the organization’s security, monitoring, and governance standards, reducing overall risk. Second, the platform gives data science teams the autonomy to create accounts, provision ML resources and access ML resources as needed, reducing resource constraints that often hinder their work.

Additionally, the platform automates many of the repetitive manual steps in the ML lifecycle, allowing data scientists to focus their time and efforts on building ML models and discovering insights from the data rather than managing infrastructure. The centralized model registry also promotes collaboration across teams, enables centralized model governance, increasing visibility into models developed throughout the organization and reducing duplicated work.

Finally, the platform standardizes the process for business stakeholders to review and consume models, smoothing the collaboration between the data science and business teams. This makes sure models can be quickly tested, approved, and deployed to production to deliver value to the organization.

Overall, this holistic approach to governing the ML lifecycle at scale provides significant benefits in terms of security, agility, efficiency, and cross-functional alignment.

In the next section, we provide an overview of the multi-account ML platform and how the different roles collaborate to scale MLOps.

Solution overview

The following architecture diagram illustrates the solutions for a multi-account ML platform and how different personas collaborate within this platform.

There are five accounts illustrated in the diagram:

  • ML Shared Services Account – This is the central hub of the platform. This account manages templates for setting up new ML Dev Accounts, as well as SageMaker Projects templates for model development and deployment, in AWS Service Catalog. It also hosts a model registry to store ML models developed by data science teams, and provides a single location to approve models for deployment.
  • ML Dev Account – This is where data scientists perform their work. In this account, data scientists can create new SageMaker notebooks based on the needs, connect to data sources such as Amazon Simple Storage Service (Amazon S3) buckets, analyze data, build models and create model artifacts (for example, a container image), and more. The SageMaker projects, provisioned using the templates in the ML Shared Services Account, can speed up the model development process because it has steps (such as connecting to an S3 bucket) configured. The diagram shows one ML Dev Account, but there can be multiple ML Dev Accounts in an organization.
  • ML Test Account – This is the test environment for new ML models, where stakeholders can review and approve models before deployment to production.
  • ML Prod Account – This is the production account for new ML models. After the stakeholders approve the models in the ML Test Account, the models are automatically deployed to this production account.
  • Data Governance Account – This account hosts data governance services for data lake, central feature store, and fine-grained data access.

Key activities and actions are numbered in the preceding diagram. Some of these activities are performed by various personas, whereas others are automatically triggered by AWS services.

  1. ML engineers create the pipelines in Github repositories, and the platform engineer converts them into two different Service Catalog portfolios: ML Admin Portfolio and SageMaker Project Portfolio. The ML Admin Portfolio will be used by the lead data scientist to create AWS resources (for example, SageMaker domains). The SageMaker Project Portfolio has SageMaker projects that data scientists and ML engineers can use to accelerate model training and deployment.
  2. The platform engineer shares the two Service Catalog portfolios with workload accounts in the organization.
  3. Data engineer prepares and governs datasets using services such as Amazon S3, AWS Lake Formation, and Amazon DataZone for ML.
  4. The lead data scientist uses the ML Admin Portfolio to set up SageMaker domains and the SageMaker Project Portfolio to set up SageMaker projects for their teams.
  5. Data scientists subscribe to datasets, and use SageMaker notebooks to analyze data and develop models.
  6. Data scientists use the SageMaker projects to build model training pipelines. These SageMaker projects automatically register the models in the model registry.
  7. The lead data scientist approves the model locally in the ML Dev Account.
  8. This step consists of the following sub-steps:
    1.  After the data scientists approve the model, it triggers an event bus in Amazon EventBridge that ships the event to the ML Shared Services Account.
    2. The event in EventBridge triggers the AWS Lambda function that copies model artifacts (managed by SageMaker, or Docker images) from the ML Dev Account into the ML Shared Services Account, creates a model package in the ML Shared Services Account, and registers the new model in the model registry in the ML Shared Services account.
  9. ML engineers review and approve the new model in the ML Shared Services account for testing and deployment. This action triggers a pipeline that was set up using a SageMaker project.
  10. The approved models are first deployed to the ML Test Account. Integration tests will be run and endpoint validated before being approved for production deployment.
  11. After testing, the governance officer approves the new model in the CodePipeline.
  12. After the model is approved, the pipeline will continue to deploy the new model into the ML Prod Account, and creates a SageMaker endpoint.

The following sections provide details on the key components of this diagram, how to set them up, and sample code.

Set up the ML Shared Services Account

The ML Shared Services Account helps the organization standardize management of artifacts and resources across data science teams. This standardization also helps enforce controls across resources consumed by data science teams.

The ML Shared Services Account has the following features:

Service Catalog portfolios – This includes the following portfolios:

  • ML Admin Portfolio – This is intended to be used by the project admins of the workload accounts. It is used to create AWS resources for their teams. These resources can include SageMaker domains, Amazon Redshift clusters, and more.
  • SageMaker Projects Portfolio – This portfolio contains the SageMaker products to be used by the ML teams to accelerate their ML models’ development while complying with the organization’s best practices.
  • Central model registry – This is the centralized place for ML models developed and approved by different teams. For details on setting this up, refer to Part 2 of this series.

The following diagram illustrates this architecture.

As the first step, the cloud admin sets up the ML Shared Services Account by using one of the blueprints for customizations in AWS Control Tower account vending, as described in Part 1.

In the following sections, we walk through how to set up the ML Admin Portfolio. The same steps can be used to set up the SageMaker Projects Portfolio.

Bootstrap the infrastructure for two portfolios

After the ML Shared Services Account has been set up, the ML platform admin can bootstrap the infrastructure for the ML Admin Portfolio using sample code in the GitHub repository. The code contains AWS CloudFormation templates that can be later deployed to create the SageMaker Projects Portfolio.

Complete the following steps:

  1. Clone the GitHub repo to a local directory:
    git clone https://github.com/aws-samples/data-and-ml-governance-workshop.git

  2. Change the directory to the portfolio directory:
    cd data-and-ml-governance-workshop/module-3/ml-admin-portfolio

  3. Install dependencies in a separate Python environment using your preferred Python packages manager:
    python3 -m venv env
    source env/bin/activate pip 
    install -r requirements.txt

  4. Bootstrap your deployment target account using the following command:
    cdk bootstrap aws://<target account id>/<target region> --profile <target account profile>

    If you already have a role and AWS Region from the account set up, you can use the following command instead:

    cdk bootstrap

  5. Lastly, deploy the stack:
    cdk deploy --all --require-approval never

When it’s ready, you can see the MLAdminServicesCatalogPipeline pipeline in AWS CloudFormation.

Navigate to AWS CodeStar Connections of the Service Catalog page, you can see there’s a connection named “codeconnection-service-catalog”. If you click the connection, you will notice that we need to connect it to GitHub to allow you to integrate it with your pipelines and start pushing code. Click the ‘Update pending connection’ to integrate with your GitHub account.

Once that is done, you need to create empty GitHub repositories to start pushing code to. For example, you can create a repository called “ml-admin-portfolio-repo”. Every project you deploy will need a repository created in GitHub beforehand.

Trigger CodePipeline to deploy the ML Admin Portfolio

Complete the following steps to trigger the pipeline to deploy the ML Admin Portfolio. We recommend creating a separate folder for the different repositories that will be created in the platform.

  1. Get out of the cloned repository and create a parallel folder called platform-repositories:
    cd ../../.. # (as many .. as directories you have moved in)
    mkdir platform-repositories

  2. Clone and fill the empty created repository:
    cd platform-repositories
    git clone https://github.com/example-org/ml-admin-service-catalog-repo.git
    cd ml-admin-service-catalog-repo
    cp -aR ../../ml-platform-shared-services/module-3/ml-admin-portfolio/. .

  3. Push the code to the Github repository to create the Service Catalog portfolio:
    git add .
    git commit -m "Initial commit"
    git push -u origin main

After it is pushed, the Github repository we created earlier is no longer empty. The new code push triggers the pipeline named cdk-service-catalog-pipeline to build and deploy artifacts to Service Catalog.

It takes about 10 minutes for the pipeline to finish running. When it’s complete, you can find a portfolio named ML Admin Portfolio on the Portfolios page on the Service Catalog console.

Repeat the same steps to set up the SageMaker Projects Portfolio, make sure you’re using the sample code (sagemaker-projects-portfolio) and create a new code repository (with a name such as sm-projects-service-catalog-repo).

Share the portfolios with workload accounts

You can share the portfolios with workload accounts in Service Catalog. Again, we use ML Admin Portfolio as an example.

  1. On the Service Catalog console, choose Portfolios in the navigation pane.
  2. Choose the ML Admin Portfolio.
  3. On the Share tab, choose Share.
  4. In the Account info section, provide the following information:
    1. For Select how to share, select Organization node.
    2. Choose Organizational Unit, then enter the organizational unit (OU) ID of the workloads OU.
  5. In the Share settings section, select Principal sharing.
  6. Choose Share.
    Selecting the Principal sharing option allows you to specify the AWS Identity and Access Management (IAM) roles, users, or groups by name for which you want to grant permissions in the shared accounts.
  7. On the portfolio details page, on the Access tab, choose Grant access.
  8. For Select how to grant access, select Principal Name.
  9. In the Principal Name section, choose role/ for Type and enter the name of the role that the ML admin will assume in the workload accounts for Name.
  10. Choose Grant access.
  11. Repeat these steps to share the SageMaker Projects Portfolio with workload accounts.

Confirm available portfolios in workload accounts

If the sharing was successful, you should see both portfolios available on the Service Catalog console, on the Portfolios page under Imported portfolios.

Now that the service catalogs in the ML Shared Services Account have been shared with the workloads OU, the data science team can provision resources such as SageMaker domains using the templates and set up SageMaker projects to accelerate ML models’ development while complying with the organization’s best practices.

We demonstrated how to create and share portfolios with workload accounts. However, the journey doesn’t stop here. The ML engineer can continue to evolve existing products and develop new ones based on the organization’s requirements.

The following sections describe the processes involved in setting up ML Development Accounts and running ML experiments.

Set up the ML Development Account

The ML Development account setup consists of the following tasks and stakeholders:

  1. The team lead requests the cloud admin to provision the ML Development Account.
  2. The cloud admin provisions the account.
  3. The team lead uses shared Service Catalog portfolios to provisions SageMaker domains, set up IAM roles and give access, and get access to data in Amazon S3, or Amazon DataZone or AWS Lake Formation, or a central feature group, depending on which solution the organization decides to use.

Run ML experiments

Part 3 in this series described multiple ways to share data across the organization. The current architecture allows data access using the following methods:

  • Option 1: Train a model using Amazon DataZone – If the organization has Amazon DataZone in the central governance account or data hub, a data publisher can create an Amazon DataZone project to publish the data. Then the data scientist can subscribe to the Amazon DataZone published datasets from Amazon SageMaker Studio, and use the dataset to build an ML model. Refer to the sample code for details on how to use subscribed data to train an ML model.
  • Option 2: Train a model using Amazon S3 – Make sure the user has access to the dataset in the S3 bucket. Follow the sample code to run an ML experiment pipeline using data stored in an S3 bucket.
  • Option 3: Train a model using a data lake with Athena – Part 2 introduced how to set up a data lake. Follow the sample code to run an ML experiment pipeline using data stored in a data lake with Amazon Athena.
  • Option 4: Train a model using a central feature group – Part 2 introduced how to set up a central feature group. Follow the sample code to run an ML experiment pipeline using data stored in a central feature group.

You can choose which option to use depending on your setup. For options 2, 3, and 4, the SageMaker Projects Portfolio provides project templates to run ML experiment pipelines, steps including data ingestion, model training, and registering the model in the model registry.

In the following example, we use option 2 to demonstrate how to build and run an ML pipeline using a SageMaker project that was shared from the ML Shared Services Account.

  1. On the SageMaker Studio domain, under Deployments in the navigation pane, choose Projects
  2. Choose Create project.
  3. There is a list of projects that serve various purposes. Because we want to access data stored in an S3 bucket for training the ML model, choose the project that uses data in an S3 bucket on the Organization templates tab.
  4. Follow the steps to provide the necessary information, such as Name, Tooling Account(ML Shared Services account id), S3 bucket(for MLOPS)  and then create the project.

It takes a few minutes to create the project.

After the project is created, a SageMaker pipeline is triggered to perform the steps specified in the SageMaker project. Choose Pipelines in the navigation pane to see the pipeline.You can choose the pipeline to see the Directed Acyclic Graph (DAG) of the pipeline. When you choose a step, its details show in the right pane.

The last step of the pipeline is registering the model in the current account’s model registry. As the next step, the lead data scientist will review the models in the model registry, and decide if a model should be approved to be promoted to the ML Shared Services Account.

Approve ML models

The lead data scientist should review the trained ML models and approve the candidate model in the model registry of the development account. After an ML model is approved, it triggers a local event, and the event buses in EventBridge will send model approval events to the ML Shared Services Account, and the artifacts of the models will be copied to the central model registry. A model card will be created for the model if it’s a new one, or the existing model card will update the version.

The following architecture diagram shows the flow of model approval and model promotion.

Model deployment

After the previous step, the model is available in the central model registry in the ML Shared Services Account. ML engineers can now deploy the model.

If you had used the sample code to bootstrap the SageMaker Projects portfolio, you can use the Deploy real-time endpoint from ModelRegistry – Cross account, test and prod option in SageMaker Projects to set up a project to set up a pipeline to deploy the model to the target test account and production account.

  1. On the SageMaker Studio console, choose Projects in the navigation pane.
  2. Choose Create project.
  3. On the Organization templates tab, you can view the templates that were populated earlier from Service Catalog when the domain was created.
  4. Select the template Deploy real-time endpoint from ModelRegistryCross account, test and prod and choose Select project template.
  5. Fill in the template:
    1. The SageMakerModelPackageGroupName is the model group name of the model promoted from the ML Dev Account in the previous step.
    2. Enter the Deployments Test Account ID for PreProdAccount, and the Deployments Prod Account ID for ProdAccount.

The pipeline for deployment is ready. The ML engineer will review the newly promoted model in the ML Shared Services Account. If the ML engineer approves model, it will trigger the deployment pipeline. You can see the pipeline on the CodePipeline console.

 

The pipeline will first deploy the model to the test account, and then pause for manual approval to deploy to the production account. ML engineer can test the performance and Governance officer can validate the model results in the test account. If the results are satisfactory, Governance officer can approve in CodePipeline to deploy the model to production account.

Conclusion

This post provided detailed steps for setting up the key components of a multi-account ML platform. This includes configuring the ML Shared Services Account, which manages the central templates, model registry, and deployment pipelines; sharing the ML Admin and SageMaker Projects Portfolios from the central Service Catalog; and setting up the individual ML Development Accounts where data scientists can build and train models.

The post also covered the process of running ML experiments using the SageMaker Projects templates, as well as the model approval and deployment workflows. Data scientists can use the standardized templates to speed up their model development, and ML engineers and stakeholders can review, test, and approve the new models before promoting them to production.

This multi-account ML platform design follows a federated model, with a centralized ML Shared Services Account providing governance and reusable components, and a set of development accounts managed by individual lines of business. This approach gives data science teams the autonomy they need to innovate, while providing enterprise-wide security, governance, and collaboration.

We encourage you to test this solution by following the AWS Multi-Account Data & ML Governance Workshop to see the platform in action and learn how to implement it in your own organization.


About the authors

Jia (Vivian) Li is a Senior Solutions Architect in AWS, with specialization in AI/ML. She currently supports customers in financial industry. Prior to joining AWS in 2022, she had 7 years of experience supporting enterprise customers use AI/ML in the cloud to drive business results. Vivian has a BS from Peking University and a PhD from University of Southern California. In her spare time, she enjoys all the water activities, and hiking in the beautiful mountains in her home state, Colorado.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he enjoys riding motorcycle and walking with his dogs.

Dr. Alessandro Cerè is a GenAI Evaluation Specialist and Solutions Architect at AWS. He assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations. Bringing a unique perspective to the field of AI, Alessandro has a background in quantum physics and research experience in quantum communications and quantum memories. In his spare time, he pursues his passion for landscape and underwater photography.

Alberto Menendez is a DevOps Consultant in Professional Services at AWS. He helps accelerate customers’ journeys to the cloud and achieve their digital transformation goals. In his free time, he enjoys playing sports, especially basketball and padel, spending time with family and friends, and learning about technology.

Sovik Kumar Nath is an AI/ML and Generative AI senior solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

Viktor Malesevic is a Senior Machine Learning Engineer within AWS Professional Services, leading teams to build advanced machine learning solutions in the cloud. He’s passionate about making AI impactful, overseeing the entire process from modeling to production. In his spare time, he enjoys surfing, cycling, and traveling.

Read More

Accelerate your Amazon Q implementation: starter kits for SMBs

Accelerate your Amazon Q implementation: starter kits for SMBs

Whether you’re a small or medium-sized business (SMB) or a managed service provider at the beginning of your cloud journey, you might be wondering how to get started. Questions like “Am I following best practices?”, “Am I optimizing my cloud costs?”, and “How difficult is the learning curve?” are quite common. AWS is here to provide a concept called starter kits.

Starter kits are complete, deployable solutions that address common, repeatable business problems. They deploy the services that make up a solution according to best practices, helping you optimize costs and become familiar with these kinds of architectural patterns without a large investment in training. Most of all, starter kits save you time—time that can be better spent on your business or with your customers.

In this post, we showcase a starter kit for Amazon Q Business. If you have a repository of documents that you need to turn into a knowledge base quickly, or simply want to test out the capabilities of Amazon Q Business without a large investment of time at the console, then this solution is for you.

This deployment guide covers the steps to set up an Amazon Q solution that connects to Amazon Simple Storage Service (Amazon S3) and a web crawler data source, and integrates with AWS IAM Identity Center for authentication. An AWS CloudFormation template automates the deployment of this solution.

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Solution overview

The following diagram illustrates the solution architecture.

Solution Architecture

The workflow involves the following steps:

  1. The user authenticates using an AWS Identity and Access Management (IAM) identity user name and password before accessing the Amazon Q web application.
  2. Upon successful authentication, the user can access the Amazon Q web UI and ask a question.
  3. Amazon Q retrieves relevant information from its index, which is populated using data from the connected data sources (Amazon S3 and a web crawler).
  4. Amazon Q then generates a response using its internal large language model (LLM) and presents it to the user through the Amazon Q web UI.
  5. The user can provide feedback on the response through the Amazon Q web UI.

Prerequisites

Before deploying the solution, make sure you have the following in place:

  • AWS account – You will need an active AWS account with the necessary permissions to deploy CloudFormation stacks and create the required resources.
  • Amazon S3 bucket – Make sure you have an existing S3 bucket that will be used as the data source for Amazon Q. To create a S3 bucket, refer to Create your first S3 bucket.
  • AWS IAM Identity CenterConfigure AWS IAM Identity Center in your AWS environment. You will need to provide the necessary details, such as the IAM Identity Center instance Amazon Resource Name (ARN), during the deployment process.

Deploy the solution using AWS CloudFormation

Complete the following steps to deploy the CloudFormation template:

  1. Sign in to the AWS Management Console.
  2. Choose one of the following Launch Stack options for your desired AWS Region to open the AWS CloudFormation console and create a new stack. Please note that this stack will default to us-east-1.
    Launch Stack to create solution resources
  3. For Stack name, enter a name for your application (for example, AMAZON-Q-STARTER-KIT).
  4. In the Parameters section, for IAMIdentityCenterARN, enter the ARN of your IAM Identity Center instance.
  5. For QBusinessApplicationName, enter a name for the Amazon Q Business application.
  6. For S3DataSourceBucket, enter the name of the S3 bucket you created earlier.
  7. For WebCrawlerDataSourceUrl, enter the URL of the web crawler data source.
  8. Choose Next.

Parameters section for IAMIdentityCenterARN

  1. On the Configure stack options page, leave everything as default, select I acknowledge that AWS CloudFormation might create IAM resources and and choose Next.

acknowledge AWS CloudFormation

  1. On the Review and create page, choose Submit.
  2. On the Amazon Q Business console, you will see the new application you created.
  3. Choose the new Amazon Q Business application, and in the Data sources section, select the data source s3_datasource and choose Sync now.
  4. Select the data source webpage-datasource and choose Sync now.
  5. To add groups and users to your Amazon Q application, refer to instructions.

Test the solution

To validate the Amazon Q solution is functioning as expected, perform the following tests:

  1. Test data ingestion:
    1. Upload a test file to the S3 bucket.
    2. Verify that the file is successfully ingested and processed by Amazon Q.
    3. Check the Amazon Q web experience UI for the processed data.
  2. Test web crawler functionality:
  3. Verify that the web crawler is able to retrieve and ingest the data from the website.
  4. Make sure the data is displayed correctly in the Amazon Q web experience UI.

Clean up

To clean up, delete the CloudFormation stack and the S3 bucket you created.

Conclusion

The Amazon Q starter kit provides a streamlined solution for SMBs to use the power of generative AI and intelligent question-answering. By automating the deployment and integration with key data sources, this kit eases the complexity of setting up Amazon Q, empowering businesses to quickly unlock insights and improve productivity.

If your SMB has a repository of documents that need to be transformed into a valuable knowledge base, or you simply want to explore the capabilities of Amazon Q, we encourage you to take advantage of this starter kit. Get started today and experience the transformative benefits of enterprise-grade question-answering tailored for your business needs, and let us know what you think in the comments. To explore more generative AI use cases, refer to AI Use Case Explorer.


About the Authors

Nneoma Okoroafor is a Partner Solutions Architect focused on AI/ML and generative AI. Nneoma is passionate about providing guidance to AWS Partners on using the latest technologies and techniques to deliver innovative solutions to customers.

Joshua Amah is a Partner Solutions Architect with Amazon Web Services. He primarily serves consulting partners, providing architectural guidance and recommendations for new and existing workloads. Outside of work, he enjoys playing soccer, golf, and spending time with family and friends.

Jason Brown is a Partner Solutions Architect focused on helping AWS Distribution Partners and their Seller Partners build and grow their AWS practices. Jason is passionate about building solutions for MSPs and VARs in the small business space. Outside the office, Jason is an avid traveler and enjoys offshore fishing.

Read More