Implement Amazon SageMaker domain cross-Region disaster recovery using custom Amazon EFS instances

Amazon SageMaker is a cloud-based machine learning (ML) platform within the AWS ecosystem that offers developers a seamless and convenient way to build, train, and deploy ML models. Extensively used by data scientists and ML engineers across various industries, this robust tool provides high availability and uninterrupted access for its users. When working with SageMaker, your environment resides within a SageMaker domain, which encompasses critical components like Amazon Elastic File System (Amazon EFS) for storage, user profiles, and a diverse array of security configurations. This comprehensive setup enables collaborative efforts by allowing users to store, share, and access notebooks, Python files, and other essential artifacts.

In 2023, SageMaker announced the release of the new SageMaker Studio, which offers two new types of applications: JupyterLab and Code Editor. The old SageMaker Studio was renamed to SageMaker Studio Classic. Unlike other applications that share one single storage volume in SageMaker Studio Classic, each JupyterLab and Code Editor instance has its own Amazon Elastic Block Store (Amazon EBS) volume. For more information about this architecture, see New – Code Editor, based on Code-OSS VS Code Open Source now available in Amazon SageMaker Studio. Another new feature was the ability to bring your own EFS instance, which enables you to attach and detach a custom EFS instance.

A SageMaker domain exclusive to the new SageMaker Studio is composed of the following entities:

  • User profiles
  • Applications including JupyterLab, Code Editor, RStudio, Canvas, and MLflow
  • A variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations

As a precautionary measure, some customers may want to ensure continuous operation of SageMaker in unlikely event of regional impairment of SageMaker service. This solution leverages Amazon EFS’s built-in cross-region replication capability to serve as a robust disaster recovery mechanism, providing continuous and uninterrupted access to your SageMaker domain data across multiple regions. Replicating your data and resources across multiple Regions helps to safeguards against Regional outages and fortifies your defenses against natural disasters or unforeseen technical failures, thereby providing business continuity and disaster recovery capabilities. This setup is particularly crucial for mission-critical and time-sensitive workloads, so data scientists and ML engineers can seamlessly continue their work without any disruptive interruptions.

The solution illustrated in this post focuses on the new SageMaker Studio experience, particularly private JupyterLab and Code Editor spaces. Although the code base doesn’t include shared spaces, the solution is straightforward to extend with the same concept. In this post, we guide you through a step-by-step process to seamlessly migrate and safeguard your new SageMaker domain in Amazon SageMaker Studio from one active AWS to another AWS Region, including all associated user profiles and files. By using a combination of AWS services, you can implement this feature effectively, overcoming the current limitations within SageMaker.

Solution overview

In active-passive mode, the SageMaker domain infrastructure is only provisioned in the primary AWS Region. Data backup is in near real time using Amazon EFS replication. Diagram 1 illustrates this architecture.

Diagram 1:

When the primary Region is down, a new domain is launched in the secondary Region, and an AWS Step Functions workflow runs to restore data as seen in diagram 2.

Diagram 2:

In active-active mode depicted in diagram 3, the SageMaker domain infrastructure is provisioned in two AWS Regions. Data backup is in near real time using Amazon EFS replication. The data sync is completed by the Step Functions workflow, and its cadence can be on demand, scheduled, or invoked by an event.

Diagram 3:

You can find the complete code sample in the GitHub repo.

Click here to open the AWS console and follow along.

With all the benefits of upgraded SageMaker domains, we developed a fast and robust cross- AWS Region disaster recovery solution, using Amazon EFS to back up and recover user data stored in SageMaker Studio applications. In addition, domain user profiles and respective custom Posix are managed by a YAML file in an AWS Cloud Development Kit (AWS CDK) code base to make sure domain entities in the secondary AWS Region are identical to those in the primary AWS Region. Because user-level custom EFS instances are only configurable through programmatic API calls, creating users on the AWS Management Console is not considered in our context.

Backup

Backup is performed within the primary AWS Region. There are two types of sources: an EBS space and a custom EFS instance.

For an EBS space, a lifecycle config is attached to JupyterLab or Code Editor for the purposes of backing up files. Every time the user opens the application, the lifecycle config takes a snapshot of its EBS spaces and stores them in the custom EFS instance using an rsync command.

For a custom EFS instance, it’s automatically replicated to its read-only replica in the secondary AWS Region.

Recovery

For recovery in the secondary AWS Region, a SageMaker domain with the same user profiles and spaces is deployed, and an empty custom EFS instance is created and attached to it. Then an Amazon Elastic Container Service (Amazon ECS) task runs to copy all the backup files to the empty custom EFS instance. At the last step, a lifecycle config script is run to restore the Amazon EBS snapshots before the SageMaker space launched.

Prerequisites

Complete the following prerequisite steps:

  1. Clone the GitHub repo to your local machine by running the following command in your terminal:
    git clone git@github.com:aws-samples/sagemaker-domain-cross-region-disaster-recovery-using-custom-efs.git

  2. Navigate to the project working directory and set up the Python virtual environment:
    python3 -m venv .venv
    source .venv/bin/activate

  3. Install the required dependencies:
    pip3 install -r requirements.txt

  4. Bootstrap your AWS account and set up the AWS CDK environment in both Regions:
    cdk bootstrap aws://ACCOUNT-NUMBER/PIMARY_REGION
    cdk bootstrap aws://ACCOUNT-NUMBER/SECONDARY_REGION

  5. Synthesize the AWS CloudFormation templates by running the following code:
    cdk synth

  6. Configure the necessary arguments in the constants.py file:
    1. Set the primary Region in which you want to deploy the solution.
    2. Set the secondary Region in which you want to recover the primary domain.
    3. Replace the account ID variable with your AWS account ID.

Deploy the solution

Complete the following steps to deploy the solution:

  1. Deploy the primary SageMaker domain:
    cdk deploy SagemakerDomainPrimaryStack-NewStudio

  2. Deploy the secondary SageMaker domain:
    cdk deploy SagemakerDomainSecondaryStack-NewStudio

  3. Deploy the disaster recovery Step Functions workflow:
    cdk deploy ECSTaskStack-NewStudio

  4. Launch the application with the custom EFS instance attached and add files to the application’s EBS volume and custom EFS instance.

Test the solution

Complete the following steps to test the solution:

  1. Add test files using Code Editor or JupyterLab in the primary Region.
  2. Stop and restart the application.

This invokes the lifecycle config script to take an Amazon EBS snapshot on the application.

  1. On the Step Functions console in the secondary Region, run the disaster recovery Step Functions workflow.

The following figure illustrates the workflow steps.

  1. On the SageMaker console in the secondary Region, launch the same user’s SageMaker Studio.

You will find your files backed up in either Code Editor or JupyterLab.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this post:

  1. Stop all Code Editor and JupyterLab Apps
  2. Delete all cdk stacks
cdk destroy ECSTaskStack-NewStudio
cdk destroy SagemakerDomainSecondaryStack-NewStudio
cdk destroy SagemakerDomainPrimaryStack-NewStudio

Conclusion

SageMaker offers a robust and highly available ML platform, enabling data scientists and ML engineers to build, train, and deploy models efficiently. For critical use cases, implementing a comprehensive disaster recovery strategy enhances the resilience of your SageMaker domain, ensuring continuous operation in the unlikely event of regional impairment. This post presents a detailed solution for migrating and safeguarding your SageMaker domain, including user profiles and files, from one active AWS Region to another passive or active AWS Region. By using a strategic combination of AWS services, such as Amazon EFS, Step Functions, and the AWS CDK, this solution overcomes the current limitations within SageMaker Studio and provides continuous access to your valuable data and resources. Whether you choose an active-passive or active-active architecture, this solution provides a robust and resilient backup and recovery mechanism, fortifying your defenses against natural disasters, technical failures, and Regional outages. With this comprehensive guide, you can confidently safeguard your mission-critical and time-sensitive workloads, maintaining business continuity and uninterrupted access to your SageMaker domain, even in the case of unforeseen circumstances.

For more information on disaster recovery on AWS, refer to the following:


About the Authors

Jinzhao Feng is a Machine Learning Engineer at AWS Professional Services. He focuses on architecting and implementing large-scale generative AI and classic ML pipeline solutions. He is specialized in FMOps, LLMOps, and distributed training.

Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.

Natasha Tchir is a Cloud Consultant at the Generative AI Innovation Center, specializing in machine learning. With a strong background in ML, she now focuses on the development of generative AI proof-of-concept solutions, driving innovation and applied research within the GenAIIC.

Katherine Feng is a Cloud Consultant at AWS Professional Services within the Data and ML team. She has extensive experience building full-stack applications for AI/ML use cases and LLM-driven solutions.

Read More

Amazon Bedrock Custom Model Import now generally available

Amazon Bedrock Custom Model Import now generally available

Today, we’re pleased to announce the general availability (GA) of Amazon Bedrock Custom Model Import. This feature empowers customers to import and use their customized models alongside existing foundation models (FMs) through a single, unified API. Whether leveraging fine-tuned models like Meta Llama, Mistral Mixtral, and IBM Granite, or developing proprietary models based on popular open-source architectures, customers can now bring their custom models into Amazon Bedrock without the overhead of managing infrastructure or model lifecycle tasks.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock offers a serverless experience, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

With Amazon Bedrock Custom Model Import, customers can access their imported custom models on demand in a serverless manner, freeing them from the complexities of deploying and scaling models themselves. They’re able to accelerate generative AI application development by using native Amazon Bedrock tools and features such as Knowledge Bases, Guardrails, Agents, and more—all through a unified and consistent developer experience.

Benefits of Amazon Bedrock Custom Model Import include:

  1. Flexibility to use existing fine-tuned models:Customers can use their prior investments in model customization by importing existing customized models into Amazon Bedrock without the need to recreate or retrain them. This flexibility maximizes the value of previous efforts and accelerates application development.
  2. Integration with Amazon Bedrock Features: Imported custom models can be seamlessly integrated with the native tools and features of Amazon Bedrock, such as Knowledge Bases, Guardrails, Agents, and Model Evaluation. This unified experience enables developers to use the same tooling and workflows across both base FMs and imported custom models.
  3. Serverless: Customers can access their imported custom models in an on-demand and serverless manner. This eliminates the need to manage or scale underlying infrastructure, as Amazon Bedrock handles all those aspects. Customers can focus on developing generative AI applications without worrying about infrastructure management or scalability issues.
  4. Support for popular model architectures: Amazon Bedrock Custom Model Import supports a variety of popular model architectures, including Meta Llama 3.2, Mistral 7B, Mixtral 8x7B, and more. Customers can import custom weights in formats like Hugging Face Safetensors from Amazon SageMaker and Amazon S3. This broad compatibility allows customers to work with models that best suit their specific needs and use cases, allowing for greater flexibility and choice in model selection.
  5. Leverage Amazon Bedrock converse API: Amazon Custom Model Import allows our customers to use their supported fine-tuned models with Amazon Bedrock Converse API which simplifies and unifies the access to the models.

Getting started with Custom Model Import

One of the critical requirements from our customers is the ability to customize models with their proprietary data while retaining complete ownership and control over the tuned model artifact and its deployment. Customization could be in form of domain adaptation or instruction fine-tuning. Customers have a wide degree of options for fine-tuning models efficiently and cost effectively. However, hosting models presents its own unique set of challenges. Customers are looking for some key aspects, namely:

  • Using the existing customization investment and fine-grained control over customization.
  • Having a unified developer experience when accessing custom models or base models through Amazon Bedrock’s API.
  • Ease of deployment through a fully managed, serverless, service.
  • Using pay-as-you-go inference to minimize the costs of their generative AI workloads.
  • Be backed by enterprise grade security and privacy tooling.

Amazon Bedrock Custom Model Import feature seeks to address these concerns. To bring your custom model into the Amazon Bedrock ecosystem, you need to run an import job. The import job can be invoked using the AWS Management Console or through APIs. In this post, we demonstrate the code for running the import model process through APIs. After the model is imported, you can invoke the model by using the model’s Amazon Resource Name (ARN).

As of this writing, supported model architectures today include Meta Llama (v.2, 3, 3.1, and 3.2), Mistral 7B, Mixtral 8x7B, Flan and IBM Granite models like Granite 3B-Code, 8B-Code, 20B-Code and 34B-Code.

A few points to be aware of when importing your model:

  • Models must be serialized in Safetensors format.
  • If you have a different format, you can potentially use Llama convert scripts or Mistral convert scripts to convert your model to a supported format.
  • The import process expects at least the following files:.safetensors, json, tokenizer_config.json, tokenizer.json, and tokenizer.model.
  • The precision for the model weights supported is FP32, FP16, and BF16.
  • For fine-tuning jobs that create adapters like LoRA-PEFT adapters, the import process expects the adapters to be merged into the main base model weight as described in Model merging.

Importing a model using the Amazon Bedrock console

  1. Go to the Amazon Bedrock console and choose Foundational models and then Imported models from the navigation pane on the left hand side to get to the Models
  2. Click on Import Model to configure the import process.
  3. Configure the model.
    1. Enter the location of your model weights. These can be in Amazon S3 or point to a SageMaker Model ARN object.
    2. Enter a Job name. We recommend this be suffixed with the version of the model. As of now, you need to manage the generative AI operations aspects outside of this feature.
    3. Configure your AWS Key Management Service (AWS KMS) key for encryption. By default, this will default to a key owned and managed by AWS.
    4. Service access role. You can create a new role or use an existing role which will have the necessary permissions to run the import process. The permissions must include access to your Amazon S3 if you’re specifying model weights through S3.
  1. After the Import Model job is complete, you will see the model and the model ARN. Make a note of the ARN to use later.
  2. Test the model using the on-demand feature in the Text playground as you would for any base foundations model.

The import process validates that the model configuration complies with the specified architecture for that model by reading the config.json file and validates the model architecture values such as the maximum sequence length and other relevant details. It also checks that the model weights are in the Safetensors format. This validation verifies that the imported model meets the necessary requirements and is compatible with the system.

Fine tuning a Meta Llama Model on SageMaker

Meta Llama 3.2 offers multi-modal vision and lightweight models, representing Meta’s latest advances in large language models (LLMs). These new models provide enhanced capabilities and broader applicability across various use cases. With a focus on responsible innovation and system-level safety, the Llama 3.2 models demonstrate state-of-the-art performance on a wide range of industry benchmarks and introduce features to help you build a new generation of AI experiences.

SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This gives you multiple options to discover and use hundreds of models for your use case.

In this section, we’ll show you how to fine-tune the Llama 3.2 3B Instruct model using SageMaker JumpStart. We’ll also share the supported instance types and context for the Llama 3.2 models available in SageMaker JumpStart. Although not highlighted in this post, you can also find other Llama 3.2 Model variants that can be fine-tuned using SageMaker JumpStart.

Instruction fine-tuning

The text generation model can be instruction fine-tuned on any text data, provided that the data is in the expected format. The instruction fine-tuned model can be further deployed for inference. The training data must be formatted in a JSON Lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder, but can be saved in multiple JSON Lines files. The training folder can also contain a template.json file describing the input and output formats.

Synthetic dataset

For this use case, we’ll use a synthetically generated dataset named amazon10Ksynth.jsonl in an instruction-tuning format. This dataset contains approximately 200 entries designed for training and fine-tuning LLMs in the finance domain.

The following is an example of the data format:

instruction_sample = {
    "question": "What is Amazon's plan for expanding their physical store footprint and how will that impact their overall revenue?",
    "context": "The 10-K report mentions that Amazon is continuing to expand their physical store network, including 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth.",
    "answer": "Amazon is expanding their physical store footprint, with 611 North America stores and 32 International stores as of the end of 2022. This physical store expansion is expected to contribute to increased product sales and overall revenue growth."
}
 
print(instruction_sample)

Prompt template

Next, we create a prompt template for using the data in an instruction input format for the training job (because we are instruction fine-tuning the model in this example), and for inferencing the deployed endpoint.

import json

prompt_template = {
  "prompt": "question: {question} context: {context}",
  "completion": "{answer}"
}

with open("prompt_template.json", "w") as f:
    json.dump(prompt_template, f)

After the prompt template is created, upload the prepared dataset that will be used for fine-tuning to Amazon S3.

from sagemaker.s3 import S3Uploader
import sagemaker
output_bucket = sagemaker.Session().default_bucket()
local_data_file = "amazon10Ksynth.jsonl"
train_data_location = f"s3://{output_bucket}/amazon10Ksynth_dataset"
S3Uploader.upload(local_data_file, train_data_location)
S3Uploader.upload("prompt_template.json", train_data_location)
print(f"Training data: {train_data_location}")

Fine-tuning the Meta Llama 3.2 3B model

Now, we’ll fine-tune the Llama 3.2 3B model on the financial dataset. The fine-tuning scripts are based on the scripts provided by the Llama fine-tuning repository.

from sagemaker.jumpstart.estimator import JumpStartEstimator
 
estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    environment={"accept_eula": "true"},
    disable_output_compression=True,
    instance_type="ml.g5.12xlarge",  
)
 
# Set the hyperparameters for instruction tuning
estimator.set_hyperparameters(
    instruction_tuned="True", epoch="5", max_input_length="1024"
)
 
# Fit the model on the training data
estimator.fit({"training": train_data_location})

Importing a custom model from SageMaker to Amazon Bedrock

In this section, we will use a Python SDK to create a model import job, get the imported model ID and finally generate inferences. You can refer to the console screenshots in the earlier section  for how to import a model using the Amazon Bedrock console.

Parameter and helper function set up

First, we’ll create a few helper functions and set up our parameters to create the import job. The import job is responsible for collecting and deploying the model from SageMaker to Amazon Bedrock. This is done by using the create_model_import_job function.

Stored safetensors need to be formatted so that the Amazon S3 location is the top-level folder. The configuration files and safetensors will be stored as shown in the following figure.

import json
import boto3
from botocore.exceptions import ClientError
bedrock = boto3.client('bedrock', region_name='us-east-1')
job_name = 'fine-tuned-model-import-demo'
sagemaker_model_name = 'meta-textgeneration-llama-3-2-3b-2024-10-12-23-29-57-373'
model_url = {'s3DataSource':
                 {'s3Uri':
                      "s3://sagemaker-{REGION}-{AWS_ACCOUNT}/meta-textgeneration-llama-3-2-3b-2024-10-12-23-19-53-906/output/model/"
                  }
             }

Check the status and get job ARN from the response:

After a few minutes, the model will be imported, and the status of the job can be checked using get_model_import_job. The job ARN is then used to get the imported model ARN, which we will use to generate inferences.

def get_import_model_from_job(job_name):
    response = bedrock.get_model_import_job(jobIdentifier=job_name)
    return response['importedModelArn']


job_arn = response['jobArn']
import_model_arn = get_import_model_from_job(job_arn)

Generating inferences using the imported custom model

The model can be invoked by using the invoke_model and converse APIs. The following is a support function that will be used to invoke and extract the generated text from the overall output.

from botocore.exceptions import ClientError

client = boto3.client('bedrock-runtime', region_name='us-east-1')

def generate_conversation_with_imported_model(native_request, model_id):
    request = json.dumps(native_request)
    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)
        model_response = json.loads(response["body"].read())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

Context set up and model response

Finally, we can use the custom model. First, we format our inquiry to match the fined-tuned prompt structure. This will make sure that the responses generated closely resemble the format used in the fine-tuning phase and are more aligned to our needs. To do this we use the template that we used to format the data used for fine-tuning. The context will be coming from your RAG solutions like Amazon Bedrock Knowledgebases. For this example, we take a sample context and add to demo the concept:

input_output_demarkation_key = "nn### Response:n"
question = "Tell me what was the improved inflow value of cash?"

context = "Amazons free cash flow less principal repayments of finance leases and financing obligations improved to an inflow of $46.1 billion for the trailing twelve months, compared with an outflow of $10.1 billion for the trailing twelve months ended March 31, 2023."

payload = {
    "prompt": template[0]["prompt"].format(
                                    question=question, # user query
                                    context=context 
                                       + input_output_demarkation_key # rag context
                                        ),
    "max_tokens": 100, 
    "temperature": 0.01 
}
generate_conversation_with_imported_model(payload, import_model_arn)

The output will look similar to:

After the model has been fine-tuned and imported into Amazon Bedrock, you can experiment by sending different sets of input questions and context to the model to generate a response, as shown in the following example:

question: """How did Amazon's international segment operating income change 
            in Q4 2022 compared to the prior year?"""
context: """Amazon's international segment reported an operating loss of 
            $1.1 billion in Q4 2022, an improvement from a $1.7 billion 
            operating loss in Q4 2021."""
response:

Some points to note

This examples in this post are to demonstrate Custom Model Import and aren’t designed to be used in production. Because the model has been trained on only 200 samples of synthetically generated data, it’s only useful for testing purposes. You would ideally have more diverse datasets and additional samples with continuous experimentation conducted using hyperparameter tuning for your respective use case, thereby steering the model to create a more desirable output. For this post, ensure that the model temperature parameter is set to 0 and max_tokens run time parameter is set to a lower values such as 100–150 tokens so that a succinct response is generated. You can experiment with other parameters to generate a desirable outcome. See Amazon Bedrock Recipes and GitHub for more examples.

Best practices to consider:

This feature brings significant advantages for hosting your fine-tuned models efficiently. As we continue to develop this feature to meet our customers’ needs, there are a few points to be aware of:

  • Define your test suite and acceptance metrics before starting the journey. Automating this will help to save time and effort.
  • Currently, the model weights need to be all-inclusive, including the adapter weights. There are multiple methods for merging the models and we recommend experimenting to determine the right methodology. The Custom Model Import feature lets you test your model on demand.
  • When creating your import jobs, add versioning to the job name to help quickly track your models. Currently, we’re not offering model versioning, and each import is a unique job and creates a unique model.
  • The precision supported for the model weights is FP32, FP16, and BF16. Run tests to validate that these will work for your use case.
  • The maximum concurrency that you can expect for each model will be 16 per account. Higher concurrency requests will cause the service to scale and increase the number of model copies.
  • The number of model copies active at any point in time will be available through Amazon CloudWatch See Import a customized model to Amazon Bedrock for more information.
  • As of the writing this post, we are releasing this feature in the US-EAST-1 and US-WEST-2 AWS Regions only. We will continue to release to other Regions. Follow Model support by AWS Region for updates.
  • The default import quota for each account is three models. If you need more for your use cases, work with your account teams to increase your account quota.
  • The default throttling limits for this feature for each account will be 100 invocations per second.
  • You can use this sample notebook to performance test your models imported via this feature. This notebook is mere reference and not designed to be an exhaustive testing. We will always recommend you to run your own full performance testing along with your end to end testing including functional and evaluation testing.

Now available

Amazon Bedrock Custom Model Import is generally available today in Amazon Bedrock in the US-East-1 (N. Virginia) and US-West-2 (Oregon) AWS Regions. See the full Region list for future updates. To learn more, see the Custom Model Import product page and pricing page.

Give Custom Model Import a try in the Amazon Bedrock console today and send feedback to AWS re:Post for Amazon Bedrock or through your usual AWS Support contacts.


About the authors

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.

Shikhar Kwatra is a Sr. Partner Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and support the GSI partners in building strategic industry solutions on AWS.

Claudio Mazzoni is a Sr GenAI Specialist Solutions Architect at AWS working on world class applications guiding costumers through their implementation of GenAI to reach their goals and improve their business outcomes. Outside of work Claudio enjoys spending time with family, working in his garden and cooking Uruguayan food.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.

Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.

Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.

Read More

Deploy a serverless web application to edit images using Amazon Bedrock

Deploy a serverless web application to edit images using Amazon Bedrock

Generative AI adoption among various industries is revolutionizing different types of applications, including image editing. Image editing is used in various sectors, such as graphic designing, marketing, and social media. Users rely on specialized tools for editing images. Building a custom solution for this task can be complex. However, by using various AWS services, you can quickly deploy a serverless solution to edit images. This approach can give your teams access to image editing foundation models (FMs) using Amazon Bedrock.

Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that’s best suited for your use case. Amazon Bedrock is serverless, so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.

Amazon Titan Image Generator G1 is an AI FM available with Amazon Bedrock that allows you to generate an image from text, or upload and edit your own image. Some of the key features we focus on include inpainting and outpainting.

This post introduces a solution that simplifies the deployment of a web application for image editing using AWS serverless services. We use AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon Bedrock with the Amazon Titan Image Generator G1 model to build an application to edit images using prompts. We cover the inner workings of the solution to help you understand the function of each service and how they are connected to give you a complete solution. At the time of writing this post, Amazon Titan Image Generator G1 comes in two versions; for this post, we use version 2.

Solution overview

The following diagram provides an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our frontend application. A combination of API Gateway and a Lambda function is used for our backend services, and Amazon Bedrock integrates with the FM model, enabling users to edit the image using prompts.

Solution Overview

Prerequisites

You must have the following in place to complete the solution in this post:

Deploy solution resources using AWS CloudFormation

When you run the AWS CloudFormation template, the following resources are deployed:

  • Amazon Cognito resources:
  • Lambda resources:
    • Function: <Stack name>-ImageEditBackend-<auto-generated>
  • AWS Identity Access Management (IAM) resources:
    • IAM role: <Stack name>-ImageEditBackendRole-<auto-generated>
    • IAM inline policy: AmazonBedrockAccess (this policy allows Lambda to invoke Amazon Bedrock FM amazon.titan-image-generator-v2:0)
  • API Gateway resources:
    • Rest API: ImageEditingAppBackendAPI
    • Methods:
      • OPTIONS – Added header mapping for CORS
      • POST – Lambda integration
    • Authorization: Through Amazon Cognito using CognitoAuthorizer

After you deploy the CloudFormation template, copy the following from the Outputs tab to be used during the deployment of Amplify:

  • userPoolId
  • userPoolClientId
  • invokeUrl

CFN Output

Deploy the Amplify application

You have to manually deploy the Amplify application using the frontend code found on GitHub. Complete the following steps:

  1. Download the frontend code from the GitHub repo.
  2. Unzip the downloaded file and navigate to the folder.
  3. In the js folder, find the config.js file and replace the values of XYZ for userPoolId, userPoolClientId, and invokeUrl with the values you collected from the CloudFormation stack outputs. Set the region value based on the Region where you’re deploying the solution.

The following is an example config.js file:

window._config = {
    cognito: {
        userPoolId: 'XYZ', // e.g. us-west-2_uXboG5pAb
        userPoolClientId: 'XYZ', // e.g. 25ddkmj4v6hfsfvruhpfi7n4hv
        region: 'XYZ// e.g. us-west-2
    },
    api: {
        invokeUrl: 'XYZ' // e.g. https://rc7nyt4tql.execute-api.us-west-2.amazonaws.com/prod,
    }
};

Extract Update Config File

  1. Select all the files and compress them as shown in the following screenshot.

Make sure you zip the contents and not the top-level folder. For example, if your build output generates a folder named AWS-Amplify-Code, navigate into that folder and select all the contents, and then zip the contents.

Create New Zip File

  1. Use the new .zip file to manually deploy the application in Amplify.

After it’s deployed, you will receive a domain that you can use in later steps to access the application.

AWS Amplify Search Create App

  1. Create a test user in the Amazon Cognito user pool.

An email address is required for this user because you will need to mark the email address as verified.

Cognito Create User

  1. Return to the Amplify page and use the domain it automatically generated to access the application.

Use Amazon Cognito for user authentication

Amazon Cognito is an identity platform that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the image editing application.

Upon accessing the Image Editing Tool URL, you will be prompted to sign in with a previously created test user. For first-time sign-ins, users will be asked to update their password. After this process, the user’s credentials are validated against the records stored in the user pool. If the credentials match, Amazon Cognito will issue a JSON Web Token (JWT). In the API payload to be sent section of the page, you will notice that the Authorization field has been updated with the newly issued JWT.

Use Lambda for backend code and Amazon Bedrock for generative AI function

The backend code is hosted on Lambda, and launched by user requests routed through API Gateway. The Lambda function process the request payload and forwards it to Amazon Bedrock. The reply from Amazon Bedrock follows the same route as the initial request.

Use API Gateway for API management

API Gateway streamlines API management, allowing developers to deploy, maintain, monitor, secure, and scale their APIs effortlessly. In our use case, API Gateway serves as the orchestrator for the application logic and provides throttling to manage the load to the backend. Without API Gateway, you would need to use the JavaScript SDK in the frontend to interact directly with the Amazon Bedrock API, bringing more work to the frontend.

Use Amplify for frontend code

Amplify offers a development environment for building secure, scalable mobile and web applications. It allows developers to focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our frontend code using the method outlined earlier in this post.

Image editing tool walkthrough

Navigate to the URL provided after you created the application in Amplify and sign in. At first login attempt, you’ll be asked to reset your password.

App Login

As you follow the steps for this tool, you will notice the API Payload to be Sent section on the right side updating dynamically, reflecting the details mentioned in the corresponding steps that follow.

Step 1: Create a mask on your image

To create a mask on your image, choose a file (JPEG, JPG, or PNG).

After the image is loaded, the frontend converts the file into base64 and base_image value is updated.

As you select a portion of the image you want to edit, a mask will be created, and mask value is updated with a new base64 value. You can also use the stroke size option to adjust the area you are selecting.

You now have the original image and the mask image encoded in base64. (The Amazon Titan Image Generator G1 model requires the inputs to be in base64 encoding.)

Choose File and Create Mask

Step 2: Write a prompt and set your options

Write a prompt that describes what you want to do with the image. For this example, we enter Make the driveway clear and empty. This is reflected in the prompt on the right.

You can choose from the following image editing options: inpainting and outpainting. The value for mode is updated depending on your selection.

  • Use inpainting to remove masked elements and replace them with background pixels
  • Use outpainting to extend the pixels of the masked image to the image boundaries

Choose Send to API to send the payload to the API gateway. This action invokes the Lambda function, which validates the received payload. If the payload is validated successfully, the Lambda function proceeds to invoke the Amazon Bedrock API for further processing.

The Amazon Bedrock API generates two image outputs in base64 format, which are transmitted back to the frontend application and rendered as visual images.

Prompt

Step 3: View and download the result

The following screenshot shows the results of our test. You can download the results or provide an updated prompt to get a new output.

Download

Testing and troubleshooting

When you initiate the Send to API action, the system performs a validation check. If required information is missing or incorrect, it will display an error notification. For instance, if you attempt to send an image to the API without providing a prompt, an error message will appear on the right side of the interface, alerting you to the missing input, as shown in the following screenshot.

App Error

Clean up

If you decide to discontinue using the Image Editing Tool, you can follow these steps to remove the Image Editing Tool, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:

  1. Delete the CloudFormation stack:
    1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
    2. Locate the stack you created during the deployment process (you assigned a name to it).
    3. Select the stack and choose Delete.
  2. Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

In this post, we explored a sample solution that you can use to deploy an image editing application by using AWS serverless services and generative AI services. We used Amazon Bedrock and an Amazon Titan FM that allows you to edit images by using prompts. By adopting this solution, you gain the advantage of using AWS managed services, so you don’t have to maintain the underlying infrastructure. Get started today by deploying this sample solution.

Additional resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about the Amazon Titan Image Generator G1 model, see the following resources:


About the Authors

Salman AhmedSalman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.

Sergio BarrazaSergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.

Ravi KumarRavi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.

Ankush GoyalAnkush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.

Read More

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

Brilliant words, brilliant writing: Using AWS AI chips to quickly deploy Meta LLama 3-powered applications

Many organizations are building generative AI applications powered by large language models (LLMs) to boost productivity and build differentiated experiences. These LLMs are large and complex and deploying them requires powerful computing resources and results in high inference costs. For businesses and researchers with limited resources, the high inference costs of generative AI models can be a barrier to enter the market, so more efficient and cost-effective solutions are needed. Most generative AI use cases involve human interaction, which requires AI accelerators that can deliver real time response rates with low latency. At the same time, the pace of innovation in generative AI is increasing, and it’s becoming more challenging for developers and researchers to quickly evaluate and adopt new models to keep pace with the market.

One of ways to get started with LLMs such as Llama and Mistral are by using Amazon Bedrock. However, customers who want to deploy LLMs in their own self-managed workflows for greater control and flexibility of underlying resources can use these LLMs optimized on top of AWS Inferentia2-powered Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances. In this blog post, we will introduce how to use an Amazon EC2 Inf2 instance to cost-effectively deploy multiple industry-leading LLMs on AWS Inferentia2, a purpose-built AWS AI chip, helping customers to quickly test and open up an API interface to facilitate performance benchmarking and downstream application calls at the same time.

Model introduction

There are many popular open source LLMs to choose from, and for this blog post, we will review three different use cases based on model expertise using Meta-Llama-3-8B-Instruct, Mistral-7B-instruct-v0.2, and CodeLlama-7b-instruct-hf.

Model name Release company Number of parameters Release time Model capabilities
Meta-Llama-3-8B-Instruct Meta 8 billion April 2024 Language understanding, translation, code generation, inference, chat
Mistral-7B-Instruct-v0.2 Mistral AI 7.3 billion March 2024 Language understanding, translation, code generation, inference, chat
CodeLlama-7b-Instruct-hf Meta 7 billion August 2023 Code generation, code completion, chat

Meta-Llama-3-8B-Instruct is a popular language models, released by Meta AI in April 2024. The Llama 3 model has improved pre-training, instant comprehension, output generation, coding, inference, and math skills. The Meta AI team says that Llama 3 has the potential to be the initiator of a new wave of innovation in AI. The Llama 3 model is available in two publicly released versions, 8B and 70B. At the time of writing, Llama 3.1 instruction-tuned models are available in 8B, 70B, and 405B versions. In this blog post, we will use the Meta-Llama-3-8B-Instruct model, but the same process can be followed for Llama 3.1 models.

Mistral-7B-instruct-v0.2, released by Mistral AI in March 2024, marks a major milestone in the development of the publicly available foundation model. With its impressive performance, efficient architecture, and wide range of features, Mistral 7B v0.2 sets a new standard for user-friendly and powerful AI tools. The model excels at tasks ranging from natural language processing to coding, making it an invaluable resource for researchers, developers, and businesses. In this blog post, we will use the Mistral-7B-instruct-v0.2 model, but the same process can be followed for the Mistral-7B-instruct-v0.3 model.

CodeLlama-7b-instruct-hf is a collection of models published by Meta AI. It is an LLM that uses text prompts to generate code. Code Llama is aimed at code tasks, making developers’ workflow faster and more efficient and lowering the learning threshold for coders. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more powerful and well-documented software.

Solution architecture

The solution uses a client-server architecture, and the client uses the HuggingFace Chat UI to provide a chat page that can be accessed on a PC or mobile device. Server-side model inference uses Hugging Face’s Text Generation Inference, an efficient LLM inference framework that runs in a Docker container. We pre-compiled the model using Hugging Face’s Optimum Neuron and uploaded the compilation results to Hugging Face Hub. We have also added a model switching mechanism to the HuggingFace Chat UI to control the loading of different models in the Text Generation Inference container through a scheduler (Scheduler).

Solution highlights

  1. All components are deployed on an Inf2 instance with a single chip instance (inf2.xl or inf2.8xl), and users can experience the effects of multiple models on one instance.
  2. With the client-server architecture, users can flexibly replace either the client or the server side according to their actual needs. For example, the model can be deployed in Amazon SageMaker, and the frontend Chat UI can be deployed on the Node server. To facilitate the demonstration, we deployed both the front and back ends on the same Inf2 server.
  3. Using a publicly available framework, users can customize frontend pages or models according to their own needs.
  4. Using an API interface for Text Generation Inference facilitates quick access for users using the API.
  5. Deployment using AWS Cloudformation, suitable for all types of businesses and developers within the enterprise.

Main components

The following are the main components of the solution.

Hugging Face Optimum Neuron

Optimum Neuron is an interface between the HuggingFace Transformers library and the AWS Neuron SDK. It provides a set of tools for model load, training, and inference for single and multiple accelerator setups of different downstream tasks. In this article, we mainly used Optimum Neuron’s export interface. To deploy the HuggingFace Transformers model on Neuron devices, the model needs to be compiled and exported to a serialized format before the inference is performed. The export interface is pre-compiled (Ahead of-time compilation (AOT)) using the Neuron compiler (Neuronx-cc), and the model is converted into a serialized and optimized TorchScript module. This is shown in the following figure.

During the compilation process, we introduced a tensor parallelism mechanism to split the weights, data, and computations between the two NeuronCores. For more compilation parameters, see Export a model to Inferentia.

Hugging Face’s Text Generation Inference (TGI)

Text Generation Inference (TGI) is a framework written in Rust and Python for deploying and serving LLMs. TGI provides high performance text generation services for the most popular publicly available foundation LLMs. Its main features are:

  • Simple launcher that provides inference services for many LLMs
  • Supports both generate and stream interfaces
  • Token stream using server-sent events (SSE)
  • Supports AWS Inferentia, Trainium, NVIDIA GPUs and other accelerators

HuggingFace Chat UI

HuggingFace Chat UI is an open-source chat tool built by SvelteKit and can be deployed to Cloudflare, Netlify, Node, and so on. It has the following main features:

  • Page can be customized
  • Conversation records can be stored, and chat records are stored in MongoDB
  • Supports operation on PC and mobile terminals
  • The backend can connect to Text Generation Inference and supports API interfaces such as Anthropic, Amazon SageMaker, and Cohere
  • Compatible with various publicly available foundation models (Llama series, Mistral/Mixtral series, Falcon, and so on.

Thanks to the page customization capabilities of the Hugging Chat UI, we’ve added a model switching function, so users can switch between different models on the same EC2 Inf2 instance.

Solution deployment

  1. Before deploying the solution, make sure you have an inf2.xl or inf2.8xl usage quota in the us-east-1 (Virginia) or us-west-2 (Oregon) AWS Region. See the reference link for how to apply for a quota.
  2. Sign in to the AWS Management Consol and switch the Region to us-east-1 (Virginia) or us-west-2 (Oregon) in the upper right corner of the console page.
  3. Enter Cloudformation in the service search box and choose Create stack.
  4. Select Choose an existing template, and then select Amazon S3 URL.
  5. If you plan to use an existing virtual private cloud (VPC), use the steps in a; if you plan to create a new VPC to deploy, use the steps in b.
    1. Use an existing VPC.
      1. Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_default_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
      2. Stack name: Enter the stack name.
      3. InstanceType: select inf2.xl (lower cost) or inf2.8xl (better performance).
      4. KeyPairName (optional): if you want to sign in to the Inf2 instance, enter the KeyPairName name.
      5. VpcId: Select VPC.
      6. PublicSubnetId: Select a public subnet.
      7. VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
      8. Choose Next, then Next again. Choose Submit.
    2. Create a new VPC.
      1. Enter https://zz-common.s3.amazonaws.com/tmp/tgiui/20240501/launch_server_new_vpc_ubuntu22.04.yaml in the Amazon S3 URL.
      2. Stack name: Enter the stack name.
      3. InstanceType: Select inf2.xl or inf2.8xl.
      4. KeyPairName (optional): If you want to sign in to the Inf2 instance, enter the KeyPairName name.
      5. VpcId: Leave as New.
      6. PublicSubnetId: Leave as New.
      7. VolumeSize: Enter the size of the EC2 instance EBS storage volume. The minimum value is 80 GB.
  6. Choose Next, and then Next again. Then choose Submit.6. After creating the stack, wait for the resources to be created and started (about 15 minutes). After the stack status is displayed as CREATE_COMPLETE, choose Outputs. Choose the URL where the key is the corresponding value location for Public endpoint for the web server (close all VPN connections and firewall programs).

User interface

After the solution is deployed, users can access the preceding URL on the PC or mobile phone. On the page, the Llama3-8B model will be loaded by default. Users can switch models in the menu settings, select the model name to be activated in the model list, and choose Activate to switch models. Switching models requires reloading the new model into the Inferentia 2 accelerator memory. This process takes about 1 minute. During this process, users can check the loading status of the new model by choosing Retrieve model status. If the status is Available, it indicates that the new model has been successfully loaded.

The effects of the different models are shown in the following figure:

The following figures shows the solution in a browser on a PC:

API interface and performance testing

The solution uses a Text Generation Inference Inference Server, which supports /generate and /generate_stream interfaces and uses port 8080 by default. You can make API calls by replacing <IP> that follows with the IP address deployed previously.

The /generate interface is used to return all responses to the client at once after generating all tokens on the server side.

curl <IP>:8080/generate
    -X POST
    -d '{"inputs”: "Calculate the distance from Beijing to Shanghai"}'
    -H 'Content-Type: application/json'

/generate_stream is used to reduce waiting delays and enhance the user experience by receiving tokens one by one when the model output length is relatively large.

curl <IP>:8080/generate_stream 
    -X POST
    -d '{"inputs”: "Write an essay on the mental health of elementary school students with no more than 300 words. "}' 
    -H 'Content-Type: application/json'

Here is a sample code to use requests interface in python.

import requests
url = "http://<IP>:8080/generate"
headers = {"Content-Type": "application/json"}
data = {"inputs": "Calculate the distance from Beijing to Shanghai","parameters":{
    "max_new_tokens":200
  }
}
response = requests.post(url, headers=headers, json=data)
print(response.text)

Summary

In this blog post, we introduced methods and examples of deploying popular LLMs on AWS AI chips, so that users can quickly experience the productivity improvements provided by LLMs. The model deployed on Inf2 instance has been validated by multiple users and scenarios, showing strong performance and wide applicability. AWS is continuously expanding its application scenarios and features to provide users with efficient and economical computing capabilities. See Inf2 Inference Performance to check the types and list of models supported on the Inferentia2 chip. Contact us to give feedback on your needs or ask questions about deploying LLMs on AWS AI chips.

References


About the authors

Zheng Zhang is a technical expert for Amazon Web Services machine learning products, focus on Amazon Web Services-based accelerated computing and GPU instances. He has rich experiences on large-scale model training and inference acceleration in machine learning.

Bingyang Huang is a Go-To-Market Specialist of Accelerated Computing at GCR SSO GenAI team. She has experience on deploying the AI accelerator on customer’s production environment. Outside of work, she enjoys watching films and exploring good foods.

Tian Shi is Senior Solution Architect at Amazon Web Services. He has rich experience in cloud computing, data analysis, and machine learning and is currently dedicated to research and practice in the fields of data science, machine learning, and serverless. His translations include Machine Learning as a Service, DevOps Practices Based on Kubernetes, Practical Kubernetes Microservices, Prometheus Monitoring Practice, and CoreDNS Study Guide in the Cloud Native Era.

Chuan Xie is a Senior Solution Architect at Amazon Web Services Generative AI, responsible for the design, implementation, and optimization of generative artificial intelligence solutions based on the Amazon Cloud. River has many years of production and research experience in the communications, ecommerce, internet and other industries, and rich practical experience in data science, recommendation systems, LLM RAG, and others. He has multiple AI-related product technology invention patents.

Read More

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2

Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 2

In Part 1 of this series, we explored best practices for creating accurate and reliable agents using Amazon Bedrock Agents. Amazon Bedrock Agents help you accelerate generative AI application development by orchestrating multistep tasks. Agents use the reasoning capability of foundation models (FMs) to create a plan that decomposes the problem into multiple steps. The model is augmented with the developer-provided instruction to create an orchestration plan and then carry out the plan. The agent can use company APIs and external knowledge through Retrieval Augmented Generation (RAG).

In this second part, we dive into the architectural considerations and development lifecycle practices that can help you build robust, scalable, and secure intelligent agents. Whether you are just starting to explore the world of conversational AI or looking to optimize your existing agent deployments, this comprehensive guide can provide valuable long-term insights and practical tips to help you achieve your goals.

Enable comprehensive logging and observability

From the outset of your agent development journey, you should implement thorough logging and observability practices. This is crucial for debugging, auditing, and troubleshooting your agents. The first step to achieve comprehensive logging is to enable Amazon Bedrock model invocation logging to capture prompts and responses securely in your account.

Amazon Bedrock Agents also provides you with traces, a detailed overview of the steps being orchestrated by the agents, the underlying prompts invoking the FM, the references being returned from the knowledge bases, and code being generated by the agent. Trace events are streamed in real time, which allows you to customize UX cues to keep the end-user informed about the progress of their request. You can log your agent’s traces and use them to track and troubleshoot your agents.

When moving agent applications to production, it’s a best practice to set up a monitoring workflow to continuously analyze your logs. You can do so by either creating a custom solution or using an open source solution such as Bedrock-ICYM.

Use infrastructure as code

Just as you would with any other software development project, you should use infrastructure as code (IaC) frameworks to facilitate iterative and reliable deployment. This lets you create repeatable and production-ready agents that can be readily reproduced, tested, and monitored. Amazon Bedrock Agents allows you to write IaC code with AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform. We also recommend that you get started using our Agent Blueprints construct. We provide blueprint templates of the most common capabilities of Amazon Bedrock Agents, which can be deployed and updated with a single AWS CDK command.

When creating agents that use action groups, you can specify your function definitions as a JSON object to the agent or provide an API schema in the OpenAPI schema format. If you already have an OpenAPI schema for your application, the best practice is to start with it. Make sure the functions have proper natural language descriptions, because your agent will use them to understand when to use each function. If you’re starting with no existing schema, the simplest way to provide tool metadata for your agent is to use simple JSON function definitions. Either way, you can use the Amazon Bedrock console to quickly create a default AWS Lambda function to get started implementing your actions or tools.

After you start to scale the development of agents, you should consider the reusability of the agent’s components. Using IaC will allow you to have predefined guardrails using Amazon Bedrock Guardrails, knowledge bases using Amazon Bedrock Knowledge Bases, and action groups that are reused over multiple agents.

Building agents that run tasks requires function definitions and Lambda functions. Another best practice is to use generative AI to accelerate the development and maintenance of this code. You can do so directly with the invoke model functionality in Amazon Bedrock, using the Amazon Q Developer support or even by creating an AWS PartyRock application that creates a framework of your Lambda function based on your action group metadata. You can directly generate the IaC required for creating your agents with function definitions and Lambda connections using generative AI. Independently of the approach selected, creating a test pipeline that validates and runs the IaC will help you optimize your agent solutions.

Use SessionState for additional agent context

You can use SessionState to provide additional context to your agent. You can pass information that is only available to the Lambda function in the action groups using SessionAttribute and information that should be available to your prompt as SessionPromptAttribute. For example, if you want to pass a user authentication token for your action to use, it’s best placed as a SessionAttribute. If you want to pass information that the large language model (LLM) needs to reason about, such as the current date and timestamp to define relative dates, it’s best placed as a SessionPromptAttribute. This lets your agent infer things like the number of days before your next payment due date or how many hours it has been since you placed your order using the reasoning capabilities of the underlying LLM model.

Optimize model selection for cost and performance

A key part of the agent building process is to select the underlying FM for your agent (or for each sub-agent). Experiment with available FMs to select the best one for your application based on cost, latency, and accuracy requirements. Implement automated testing pipelines to collect evaluation metrics, enabling data-driven decisions on model selection. This approach allows you to use faster, cheaper models like Anthropic’s Claude 3 Haiku on Amazon Bedrock for simple agents, and more complex applications can use more advanced models like Anthropic’s Claude 3.5 Sonnet or Anthropic’s Claude 3 Opus.

Implement robust testing frameworks

Automating the evaluation of your agent, or any generative AI-powered system, can accelerate the development process and make sure you provide your customers with the best possible solution. You should evaluate on multiple dimensions, including cost, latency, and accuracy of your agents. Use frameworks like Agent Evaluation to assess agent behavior against predefined criteria. By using the Amazon Bedrock agent versioning and alias features, you can unlock A/B testing as part of your deployment stages. You should define different aspects of agent behavior, such as formal or informal HR assistant tone, that can be tested with a subset of your user group. You can then make different agent versions available for each group during initial deployments and evaluate the agent behavior for each group. Amazon Bedrock Agents has built-in versioning capabilities to help you with this key part of testing. The following figure shows how the HR agent can be updated after a testing and evaluation phase to create a new alias pointing to the selected version of the agent for the model invocation.

Use LLMs for test case generation

You can use LLMs to generate test cases based on expected use cases for your agent. As a best practice, you should select a different LLM to generate data than the one that is powering your agent. This approach can significantly accelerate the building of comprehensive test suites, providing thorough coverage of potential scenarios. For example, you could use the following prompt to create test cases for an HR assistant agent that helps employees booking holidays:

Generate the conversation back and forward between an employee and an employee 
assistant agent. The employee is trying to reserve time off. 
The agent has access to functions for checking the available employee's time off, 
booking and updating time off, and sending notifications that a new time off booking 
has been completed. Here's a sample conversation between an employee and an employee 
assistant agent for booking time off. Your conversation should have at least 3 
interactions between the agent and the employee. The employee starts by saying hello.

Design robust confirmation and security mechanisms

Implement robust confirmation mechanisms for critical actions in your agent’s workflow. Clearly state in your instructions that the agent should ask for user confirmation before running certain functions, especially those that modify data or perform sensitive operations. This step helps move beyond proof of concept or prototype stages, verifying that your agent operates reliably in production environments. For instance, the following instruction tells your agent to confirm that a vacation request action should be run before updating the database for the user:

You are an HR agent, helping employees … [other instructions removed for brevity]

Before creating, editing or deleting a time-off request, ask for user confirmation
for your actions. Include sufficient information with that ask to be clear about
the action that will be taken. DO NOT provide the function name itself but rather focus
on the actions being executed using natural language.

You can also use the requireConfirmation field for function schema definition or the
x-requireConfirmation field for API schema definition during the creation of a new action to enable the Amazon Bedrock Agents built-in functionality for user confirmation request before invoking an action in an action group.

Implement flexible authorization and encryption

You should provide customer managed keys to encrypt your agent’s resources, and confirm that your AWS Identity and Access Management (IAM) permissions follow the least privilege approach, limiting your agent to only have access to required resources and actions. When implementing action groups, take advantage of the sessionAttributes parameter of your sessionState to provide information about your user roles and permissions so that your action can implement fine-grained permissions (see the following sample code). Another best practice is to use the knowledgeBaseConfigurations parameter of the sessionState to provide extra configurations to your knowledge base, such as the user group defining the documents that a user should have access to through knowledge base metadata filtering.

Integrate responsible AI practices

When developing generative AI applications, you should apply responsible AI practices to create systems in an ethical, transparent, and accountable manner. Amazon Bedrock features help you develop your responsible AI practices in a scalable manner. When creating agents, you should implement Amazon Bedrock Guardrails to avoid sensitive topics, filter user input and agent output from harmful content, and redact sensitive information to protect user privacy. You can create organization-level guardrails that can be reused across multiple generative AI applications, thereby preserving consistent responsible AI practices. After you create a guardrail, you can associate it with your agent using the Amazon Bedrock Agents built-in guardrails connection (see the following sample code).

Build a reusable actions catalog and scale gradually

After the successful deployment of your first agent, you can plan to reuse common functionalities, such as action groups, knowledge bases, and guardrails, for other applications. Amazon Bedrock Agents support the creation of agents manually using the AWS Management Console, using code with the SDKs available for the agent API, or using IaC with CloudFormation templates, the AWS CDK, or Terraform templates. To reuse functionality, the best practice is to create and deploy them using IaC and reuse the components across applications. The following figure shows an example of the reusability of a utilities action group across two agents: an HR assistant and a banking assistant.

Follow a crawl-walk-run methodology when scaling agent usage

The final best practice that we would like to highlight is to follow the crawl-walk-run methodology. Start with an internal application (crawl), followed with applications made available for a smaller, controlled set of external users (walk), and finally scale your applications to all customers (run) and eventually use multi-agent collaboration. This approach helps you build reliable agents that support mission-critical business operations, while minimizing risks associated with the rollout of new technology. The following figure illustrates this process.

Conclusion

By following these architectural and development lifecycle best practices, you’ll be well-equipped to create robust, scalable, and secure agents that can effectively serve your users and integrate seamlessly with your existing systems.

For examples to get started, check out the Amazon Bedrock samples repository. To learn more about Amazon Bedrock Agents, get started with the Amazon Bedrock Workshop and the standalone Amazon Bedrock Agents Workshop, which provides a deeper dive. Additionally, check out the service introduction video from AWS re:Invent 2023.


About the Authors

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Navneet Sabbineni is a Software Development Manager at AWS Bedrock. With over 9 years of industry experience as a software developer and manager, he has worked on building and maintaining scalable distributed services for AWS, including generative AI services like Amazon Bedrock Agents and conversational AI services like Amazon Lex. Outside of work, he enjoys traveling and exploring the Pacific Northwest with his family and friends.

Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.

Read More

Train, optimize, and deploy models on edge devices using Amazon SageMaker and Qualcomm AI Hub

Train, optimize, and deploy models on edge devices using Amazon SageMaker and Qualcomm AI Hub

This post is co-written Rodrigo Amaral, Ashwin Murthy and Meghan Stronach from Qualcomm.

In this post, we introduce an innovative solution for end-to-end model customization and deployment at the edge using Amazon SageMaker and Qualcomm AI Hub. This seamless cloud-to-edge AI development experience will enable developers to create optimized, highly performant, and custom managed machine learning solutions where you can bring you own model (BYOM) and bring your own data (BYOD) to meet varied business requirements across industries. From real-time analytics and predictive maintenance to personalized customer experiences and autonomous systems, this approach caters to diverse needs.

We demonstrate this solution by walking you through a comprehensive step-by-step guide on how to fine-tune YOLOv8, a real-time object detection model, on Amazon Web Services (AWS) using a custom dataset. The process uses a single ml.g5.2xlarge instance (providing one NVIDIA A10G Tensor Core GPU) with SageMaker for fine-tuning. After fine-tuning, we show you how to optimize the model with Qualcomm AI Hub so that it’s ready for deployment across edge devices powered by Snapdragon and Qualcomm platforms.

Business challenge

Today, many developers use AI and machine learning (ML) models to tackle a variety of business cases, from smart identification and natural language processing (NLP) to AI assistants. While open source models offer a good starting point, they often don’t meet the specific needs of the applications being developed. This is where model customization becomes essential, allowing developers to tailor models to their unique requirements and ensure optimal performance for specific use cases.

In addition, on-device AI deployment is a game-changer for developers crafting use cases that demand immediacy, privacy, and reliability. By processing data locally, edge AI minimizes latency, ensures sensitive information stays on-device, and guarantees functionality even in poor connectivity. Developers are therefore looking for an end-to-end solution where they can not only customize the model but also optimize the model to target on-device deployment. This enables them to offer responsive, secure, and robust AI applications, delivering exceptional user experiences.

How can Amazon SageMaker and Qualcomm AI Hub help?

BYOM and BYOD offer exciting opportunities for you to customize the model of your choice, use your own dataset, and deploy it on your target edge device. Through this solution, we propose using SageMaker for model fine-tuning and Qualcomm AI Hub for edge deployments, creating a comprehensive end-to-end model deployment pipeline. This opens new possibilities for model customization and deployment, enabling developers to tailor their AI solutions to specific use cases and datasets.

SageMaker is an excellent choice for model training, because it reduces the time and cost to train and tune ML models at scale without the need to manage infrastructure. You can take advantage of the highest-performing ML compute infrastructure currently available, and SageMaker can scale infrastructure from one to thousands of GPUs. Because you pay only for what you use, you can manage your training costs more effectively. SageMaker distributed training libraries can automatically split large models and training datasets across AWS GPU instances, or you can use third-party libraries, such as DeepSpeed, Horovod, Fully Sharded Data Parallel (FSDP), or Megatron. You can train foundation models (FMs) for weeks and months without disruption by automatically monitoring and repairing training clusters.

After the model is trained, you can use Qualcomm AI Hub to optimize, validate, and deploy these customized models on hosted devices with Snapdragon and Qualcomm Technologies within minutes. Qualcomm AI Hub is a developer-centric platform designed to streamline on-device AI development and deployment. AI Hub offers automatic conversion and optimization of PyTorch or ONNX models for efficient on-device deployment using TensorFlow Lite, ONNX Runtime, or Qualcomm AI Engine Direct SDK. It also has an existing library of over 100 pre-optimized models for Qualcomm and Snapdragon platforms.

Qualcomm AI Hub has served more than 800 companies and continues to expand its offerings in terms of models available, platforms supported, and more.

Using SageMaker and Qualcomm AI Hub together can create new opportunities for rapid iteration on model customization, providing access to powerful development tools and enabling a smooth workflow from cloud training to on-device deployment.

Solution architecture

The following diagram illustrates the solution architecture. Developers working in their local environment initiate the following steps:

  1. Select an open source model and a dataset for model customization from the Hugging Face repository.
  2. Pre-process the data into the format required by your model for training, then upload the processed data to Amazon Simple Storage Service (Amazon S3). Amazon S3 provides a highly scalable, durable, and secure object storage solution for your machine learning use case.
  3. Call the SageMaker control plane API using the SageMaker Python SDK for model training. In response, SageMaker provisions a resilient distributed training cluster with the requested number and type of compute instances to run the model training. SageMaker also handles orchestration and monitors the infrastructure for any faults.
  4. After the training is complete, SageMaker spins down the cluster, and you’re billed for the net training time in seconds. The final model artifact is saved to an S3 bucket.
  5. Pull the fine-tuned model artifact from Amazon S3 to the local development environment and validate the model accuracy.
  6. Use Qualcomm AI Hub to compile and profile the model, running it on cloud-hosted devices to deliver performance metrics ahead of downloading for deployment across edge devices.

Use case walk through

Imagine a leading electronics manufacturer aiming to enhance its quality control process for printed circuit boards (PCBs) by implementing an automated visual inspection system. Initially, using an open source vision model, the manufacturer collects and annotates a large dataset of PCB images, including both defective and non-defective samples.

This dataset, similar to the keremberke/pcb-defect-segmentation dataset from HuggingFace, contains annotations for common defect classes such as dry joints, incorrect installations, PCB damage, and short circuits. With SageMaker, the manufacturer trains a custom YOLOv8 model (You Only Look Once), developed by Ultralytics, to recognize these specific PCB defects. The model is then optimized for deployment at the edge using Qualcomm AI Hub, providing efficient performance on chosen platforms such as industrial cameras or handheld devices used in the production line.

This customized model significantly improves the quality control process by accurately detecting PCB defects in real-time. It reduces the need for manual inspections and minimizes the risk of defective PCBs progressing through the manufacturing process. This leads to improved product quality, increased efficiency, and substantial cost savings.

Let’s walk through this scenario with an implementation example.

Prerequisites

For this walkthrough, you should have the following:

  1. Jupyter Notebook – The example has been tested in Visual Studio Code with Jupyter Notebook using the Python 3.11.7 environment.
  2. An AWS account.
  3. Create an AWS Identity and Access Management (IAM) user with the AmazonSageMakerFullAccess policy to enable you to run SageMaker APIs. Set up your security credentials for CLI.
  4. Install AWS Command Line Interface (AWS CLI) and use aws configure to set up your IAM credentials securely.
  5. Create a role with the name sagemakerrole to be assumed by SageMaker. Add managed policies AmazonS3FullAccess to give SageMaker access to your S3 buckets.
  6. Make sure your account has the SageMaker Training resource type limit for ml.g5.2xlarge increased to 1 using the Service Quotas console.
  7. Follow the get started instructions to install the necessary Qualcomm AI Hub library and set up your unique API token for Qualcomm AI Hub.
  8. Use the following command to clone the GitHub repository with the assets for this use case. This repository consists of a notebook that references training assets.
    $ git clone https://github.com/aws-samples/sm-qai-hub-examples.git
    $ cd sm-qai-hub-examples/yolo

The sm-qai-hub-examples/yolo  directory contains all the training scripts that you might need to deploy this sample.

Next, you will run the sagemaker_qai_hub_finetuning.ipynb notebook to fine-tune the YOLOv8 model on SageMaker and deploy it on the edge using AI Hub. See the notebook for more details on each step. In the following sections, we walk you through the key components of fine-tuning the model.

Step 1: Access the model and data

  1. Begin by installing the necessary packages in your Python environment. At the top of the notebook, include the following code snippet, which uses Python’s pip package manager to install the required packages in your local runtime environment.
    %pip install -Uq sagemaker==2.232.0 ultralytics==8.2.100 datasets==2.18.0

  2. Import the necessary libraries for the project. Specifically, import the Dataset class from the Hugging Face datasets library and the YOLO class from the ultralytics library. These libraries are crucial for your work, because they provide the tools you need to access and manipulate the dataset and work with the YOLO object detection model.
    from datasets import Dataset
    
    from ultralytics import YOLO

Step 2: Pre-process and upload data to S3

To fine-tune your YOLOv8 model for detecting PCB defects, you will use the keremberke/pcb-defect-segmentation dataset from Hugging Face. This dataset includes 189 images of chip defects (train: 128 images, validation: 25 images and test: 36 images). These defects are annotated in COCO format.

YOLOv8 doesn’t recognize these classes out of the box, so you will map YOLOv8’s logits to identify these classes during model fine-tuning, as shown in the following image.

  1. Begin by downloading the dataset from Hugging Face to the local disk and converting it to the required YOLO dataset structure using the utility function CreateYoloHFDataset. This structure ensures that the YOLO API correctly loads and processes the images and labels during the training phase.
    dataset_name = "keremberke/pcb-defect-segmentation"
    dataset_labels = [
        'dry_joint', 
        'incorrect_installation', 
        'pcb_damage', 
        'short_circuit'
    ]
    
    data = CreateYoloHFDataset(
        hf_dataset_name=dataset_name, 
        labels_names=dataset_labels
    )

  2. Upload the dataset to Amazon S3. This step is crucial because the dataset stored in S3 will serve as the input data channel for the SageMaker training job. SageMaker will efficiently manage the process of distributing this data across the training cluster, allowing each node to access the necessary information for model training.
    uploaded_s3_uri = sagemaker.s3.S3Uploader.upload(
        local_path=data_path, 
        desired_s3_uri=f"s3://{s3_bucket}/qualcomm-aihub...”
    )

Alternatively, you can use your own custom dataset (non-Hugging Face) to fine-tune the YOLOv8 model, as long as the dataset complies with the YOLOv8 dataset format.

Step 3: Fine-tune your YOLOv8 model

3.1: Review the training script

You’re now prepared to fine-tune the model using the model.train method from the Ultralytics YOLO library.

We’ve prepared a script called train_yolov8.py that will perform the following tasks. Let’s quickly review the key points in this script before you launch the training job.

  1. The training script will do the following: Load a YOLOv8 model from the Ultralytics library
    model = YOLO(args.yolov8_model)

  2. Use the train method to run fine-tuning that considers the model data, adjusts its parameters, and optimizes its ability to accurately predict object classes and locations in images.
    tuned_model = model.train(
            data=dataset_yaml,
            batch=args.batch_size,
            imgsz=args.img_size,
            epochs=args.epochs,
     
            ...

After the model is trained, the script runs inference to test the model output and save the model artifacts to a local Amazon S3 mapped folder

results = model.predict(
          data=dataset_yaml, 
          imgsz=args.img_size, 
          batch=args.batch_size
        )

model.save(“<model_name>.pt")

3.2: Launch the training

You’re now ready to launch the training. You will use the SageMaker PyTorch training estimator to initiate training. The estimator simplifies the training process by automating several of the key tasks in this example:

  1. The SageMaker estimator spins up a training cluster of one 2xlarge instance. SageMaker handles the setup and management of these compute instances, which reduces the total cost of ownership.
  2. The estimator also uses one of the pre-built containers managed by SageMaker—PyTorch, which includes an optimized compiled version of the PyTorch framework along with its required dependencies and GPU-specific libraries for accelerated computations.

The estimator.fit() method initiates the training process with the specified input data channels. Following is the code used to launch the training job along with the necessary parameters.

estimator = PyTorch(
    entry_point='train_yolov8.py',
    source_dir='scripts',
    role=role,
    instance_count=instance_count,
    instance_type=instance_type,
    image_uri=training_image_uri,
    hyperparameters=hyperparameters,
    base_job_name="yolov8-finetuning",
    output_path=f"s3://{s3_bucket}/…"
)

estimator.fit(
    {
        'training': sagemaker.inputs.TrainingInput(
            s3_data=uploaded_s3_uri,
            distribution='FullyReplicated',
            s3_data_type='S3Prefix'
        )
    }
)

You can track a SageMaker training job by monitoring its status using the AWS Management Console, AWS CLI, or AWS SDKs. To determine when the job is completed, check for the Completed status or set up Amazon CloudWatch alarms to notify you when the job transitions to the Completed state.

Step 4 & 5: Save, download and validate the trained model

The training process generates model artifacts that will be saved to the S3 bucket specified in output_path location. This example uses the download_tar_and_untar utility to download the model to a local drive.

  • Run inference on this model and visually validate how close ground truth and model predictions bounding boxes align on test images. The following code shows how to generate an image mosaic using a custom utility function—draw_bounding_boxes—that overlays an image with ground truth and model classification along with a confidence value for class prediction.
    image_mosiacs = []
    for i, _key in enumerate(image_label_pairs):
        img_path, lbl_path = image_label_pairs[_key]["image_path"], image_label_pairs[_key]["label_path"]
        result = model([img_path], save=False)
        image_with_boxes = draw_bounding_boxes(
            yolo_result=result[0], 
            ground_truth=open(lbl_path).read().splitlines(),
            confidence_threshold=0.2
        )
        image_mosiacs.append(np.array(image_with_boxes))

From the preceding image mosaic, you can observe two distinct sets of bounding boxes: the cyan boxes indicate human annotations of defects on the PCB image, while the red boxes represent the model’s predictions of defects. Along with the predicted class, you can also see the confidence value for each prediction, which reflects the quality of the YOLOv8 model’s output.

After fine-tuning, YOLOv8 begins to accurately predict the PCB defect classes present in the custom dataset, even though it hadn’t encountered these classes during model pretraining. Additionally, the predicted bounding boxes are closely aligned with the ground truth, with confidence scores of greater than or equal to 0.5 in most cases. You can further improve the model’s performance without the need for hyperparameter guesswork by using a SageMaker hyperparameter tuning job.

Step 6: Run the model on a real device with Qualcomm AI Hub

Now that you’re validated the fine-tuned model on PyTorch, you want to run the model on a real device.

Qualcomm AI Hub enables you to do the following:

  1. Compile and optimize the PyTorch model into a format that can be run on a device
  2. Run the compiled model on a device with a Snapdragon processor hosted in AWS device farm
  3. Verify on-device model accuracy
  4. Measure on-device model latency

To run the model:

  1. Compile the model.

The first step is converting the PyTorch model into a format that can run on the device.

This example uses a Windows laptop powered by the Snapdragon X Elite processor. This device uses the ONNX model format, which you will configure during compilation.

As you get started, you can see a list of all the devices supported on Qualcomm AI Hub, by running qai-hub list-devices.

See Compiling Models to learn more about compilation on Qualcomm AI Hub.

compile_job = hub.submit_compile_job(
    model=traced_model,
    input_specs={"image": (model_input.shape, "float32")},
    device=target_device,
    name=model_name,
    options="--target_runtime onnx"
)
  1. Inference the model on a real device

Run the compiled model on a real cloud-hosted device with Snapdragon using the same model input you verified locally with PyTorch.

See Running Inference to learn more about on-device inference on Qualcomm AI Hub.

inference_job = hub.submit_inference_job(
    model=compile_job.get_target_model(),
    inputs={"image": [model_input.numpy()]},
    device=target_device,
    name=model_name,
)
  1. Profile the model on a real device.

Profiling measures the latency of the model when run on a device. It reports the minimum value over 100 invocations of the model to best isolate model inference time from other processes on the device.

See Profiling Models to learn more about profiling on Qualcomm AI Hub.

profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=target_device,
    name=model_name,
)
  1. Deploy the compiled model to your device

Run the command below to download the compiled model.

The compiled model can be used in conjunction with the AI Hub sample application hosted here. This application uses the model to run object detection on a Windows laptop powered by Snapdragon that you have locally.

compile_job.download_target_model()

Conclusion

Model customization with your own data through Amazon SageMaker—with over 250 models available on SageMaker JumpStart—is an addition to the existing features of Qualcomm AI Hub, which include BYOM and access to a growing library of over 100 pre-optimized models. Together, these features create a rich environment for developers aiming to build and deploy customized on-device AI models across Snapdragon and Qualcomm platforms.

The collaboration between Amazon SageMaker and Qualcomm AI Hub will help enhance the user experience and streamline machine learning workflows, enabling more efficient model development and deployment across any application at the edge. With this effort, Qualcomm Technologies and AWS are empowering their users to create more personalized, context-aware, and privacy-focused AI experiences.

To learn more, visit Qualcomm AI Hub and Amazon SageMaker. For queries and updates, join the Qualcomm AI Hub community on Slack.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. or its subsidiaries


About the authors

Rodrigo Amaral currently serves as the Lead for Qualcomm AI Hub Marketing at Qualcomm Technologies, Inc. In this role, he spearheads go-to-market strategies, product marketing, developer activities, with a focus on AI and ML with a focus on edge devices. He brings almost a decade of experience in AI, complemented by a strong background in business. Rodrigo holds a BA in Business and a Master’s degree in International Management.

Ashwin Murthy is a Machine Learning Engineer working on Qualcomm AI Hub. He works on adding new models to the public AI Hub Models collection, with a special focus on quantized models. He previously worked on machine learning at Meta and Groq.

Meghan Stronach is a PM on Qualcomm AI Hub. She works to support our external community and customers, delivering new features across Qualcomm AI Hub and enabling adoption of ML on device. Born and raised in the Toronto area, she graduated from the University of Waterloo in Management Engineering and has spent her time at companies of various sizes.

Kanwaljit Khurmi is a Principal Generative AI/ML Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Karan JainKaran Jain is a Senior Machine Learning Specialist at AWS, where he leads the worldwide Go-To-Market strategy for Amazon SageMaker Inference. He helps customers accelerate their generative AI and ML journey on AWS by providing guidance on deployment, cost-optimization, and GTM strategy. He has led product, marketing, and business development efforts across industries for over 10 years, and is passionate about mapping complex service features to customer solutions.

Read More

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

With the advent of generative AI and machine learning, new opportunities for enhancement became available for different industries and processes. During re:Invent 2023, we launched AWS HealthScribe, a HIPAA eligible service that empowers healthcare software vendors to build their clinical applications to use speech recognition and generative AI to automatically create preliminary clinician documentation. In addition to AWS HealthScribe, we also launched Amazon Q Business, a generative AI-powered assistant that can perform functions such as answer questions, provide summaries, generate content, and securely complete tasks based on data and information that are in your enterprise systems.

AWS HealthScribe combines speech recognition and generative AI trained specifically for healthcare documentation to accelerate clinical documentation and enhance the consultation experience.

Key features of AWS HealthScribe include:

  • Rich consultation transcripts with word-level timestamps.
  • Speaker role identification (clinician or patient).
  • Transcript segmentation into relevant sections such as subjective, objective, assessment, and plan.
  • Summarized clinical notes for sections such as chief complaint, history of present illness, assessment, and plan.
  • Evidence mapping that references the original transcript for each sentence in the AI-generated notes.
  • Extraction of structured medical terms for entries such as conditions, medications, and treatments.

AWS HealthScribe provides a suite of AI-powered features to streamline clinical documentation while maintaining security and privacy. It doesn’t retain audio or output text, and users have control over data storage with encryption in transit and at rest.

With Amazon Q Business, we provide a new generative AI-powered assistant designed specifically for business and workplace use cases. It can be customized and integrated with an organization’s data, systems, and repositories. Amazon Q allows users to have conversations, help solve problems, generate content, gain insights, and take actions through its AI capabilities. Amazon Q offers user-based pricing plans tailored to how the product is used. It can adapt interactions based on individual user identities, roles, and permissions within the organization. Importantly, AWS never uses customer content from Amazon Q to train its underlying AI models, making sure that company information remains private and secure.

In this blog post, we’ll show you how AWS HealthScribe and Amazon Q Business together analyze patient consultations to provide summaries and trends from clinician conversations, simplifying documentation workflows. This automation and use of machine learning from clinician-patient interactions with Amazon HealthScribe and Amazon Q can help improve patient outcomes by enhancing communication, leading to more personalized care for patients and increased efficiency for clinicians.

Benefits and use cases

Gaining insight from patient-clinician interactions alongside a chatbot can help in a variety of ways such as:

  1. Enhanced communication: In analyzing consultations, clinicians using AWS HealthScribe can more readily identify patterns and trends in large patient datasets, which can help improve communication between clinicians and patients. An example would be a clinician understanding common trends in their patient’s symptoms that they can then consider for new consultations.
  2. Personalized care: Using machine learning, clinicians can tailor their care to individual patients by analyzing the specific needs and concerns of each patient. This can lead to more personalized and effective care.
  3. Streamlined workflows: Clinicians can use machine learning to help streamline their workflows by automating tasks such as appointment scheduling and consultation summarization. This can give clinicians more time to focus on providing high-quality care to their patients. An example would be using clinician summaries together with agentic workflows to perform these tasks on a routine basis.

Architecture diagram

Architecture diagram of the workflow which includes AWS IAM Identity Center, Amazon Q Business, Amazon Simple Storage Service, and AWS HealthScribe

In the architecture diagram we present for this demo, two user workflows are shown. To kickoff the process, a clinician uploads the recording of a consultation to Amazon Simple Storage Service (Amazon S3). This audio file is then ingested by AWS HealthScribe and used to analyze consultation conversations. AWS HealthScribe will then output two files which are also stored on Amazon S3. In the second workflow, an authenticated user logs in via AWS IAM Identity Center to an Amazon Q web front end hosted by Amazon Q Business. In this scenario, Amazon Q Business is given the output Amazon S3 bucket as the data source for use in its web app.

Prerequisites

Implementation

To start using AWS HealthScribe you must first start a transcription job that takes a source audio file and outputs summary and transcription JSON files with the analyzed conversation. You’ll then connect these output files to Amazon Q.

Creating the AWS HealthScribe job

  1. In the AWS HealthScribe console, choose Transcription jobs in the navigation pane, and then choose Create job to get started.Screenshot of AWS HealthScribe on the console and the button to create a job
  2. Enter a name for the job—in this example, we use FatigueConsult—and select the S3 bucket where the audio file of the clinician-patient conversation is stored.Screenshot of AWS HealthScribe and how to choose the S3 bucket for the input files
  3. Next, use the S3 URI search field to find and point the transcription job to the Amazon S3 bucket you want the output files to be saved to. Maintain the default options for audio settings, customization, and content removal.
  4. Create a new AWS Identity and Access Management (IAM) role for AWS HealthScribe to use for access to the S3 input and output buckets by choosing Create an IAM role. In our example, we entered HealthScribeRole as the Role name. To complete the job creation, choose Create job.Screenshot of AWS HealthScribe and how to set up access permissions
  5. This will take a few minutes to finish. When it’s complete, you will see the status change from In Progress to Complete and can inspect the results by selecting the job name.
  6. AWS HealthScribe will create two files: a word-for-word transcript of the conversation with the suffix /transcript.json and a summary of the conversation with the suffix /summary.json. This summary uses the underlying power of generative AI to highlight key topics in the conversation, extract medical terminology, and more.

In this workflow, AWS HealthScribe analyzes the patient-clinician conversation audio to:

  1. Transcribe the consultation
  2. Identify speaker roles (for example, clinician and patient)
  3. Segment the transcript (for example, small talk, visit flow management, assessment, and treatment plan)
  4. Extract medical terms (for example, medication name and medical condition name)
  5. Summarize notes for key sections of the clinical document (for example, history of present illness and treatment plan)
  6. Create evidence mapping (linking every sentence in the AI-generated note with corresponding transcript dialogues).

Connecting an AWS HealthScribe job to Amazon Q

To use Amazon Q with the summarized notes and transcripts from AWS HealthScribe, we need to first create an Amazon Q business application and set the data source as the S3 bucket where the output files were stored in the HealthScribe jobs workflow. This will allow Amazon Q to index the files and give users the ability to ask questions of the data.

  1. In the Amazon Q Business console, choose Get Started, then choose Create Application.
  2. Enter a name for your application and select Create and use a new service-linked role (SLR).Screenshot of Q Business app creation and access permissions
  3. Choose Create when you’re ready to select a data source.
  4. In the Add data source pane select Amazon S3.Screenshot of which data source to configure for the application.
  5. To configure the S3 bucket with Amazon Q, enter a name for the data source. In our example we use my-s3-bucket.Screenshot of adding the data source (Amazon S3) for Q Business
  6. Next, locate the S3 bucket with the JSON outputs from HealthScribe using the Browse S3 button. Select Full sync for the sync mode and select a cadence of your preference. Once you complete these steps, Amazon Q Business will run a full sync of the objects in your S3 bucket and be ready for use.Screenshot of which parameters to change in the Sync scope and Sync mode option for Q Business
  7. In the main applications dashboard, navigate to the URL under Web experience URL. This is how you will access the Amazon Q web front end to interact with the assistant.Screenshot of where to find the web experience URL front end once the application has been created successfully.

 After a user signs in to the web experience, they can start asking questions directly in the chat box as shown in the sample frontend that follows.

Sample frontend workflow

With the AWS HealthScribe results integrated into Amazon Q Business, users can go to the web experience to gain insights from their patient conversations. For example, you can use Q to determine information such as trends in patient symptoms, checking which medications patients are taking and so on as shown in the following figures.

The workflow starts with a question and answer about issues patients had, as shown in the following figure. Example of the frontend workflow asking what symptoms patients had with stomach painIn the example above, a clinician is asking what the symptoms were of patients who complained of stomach pain. Q responds with common symptoms, like bloating and bowel problems, from the data it has access to. The answers generated cite the source files from Amazon S3 that led to its summary and can be inspected by choosing Sources.

In the following example, a clinician asks what medications patients with knee pain are taking. Using our sample data of various consultations for knee pain, Q tells us patients are taking over the counter ibuprofen, but that it is not often providing patients relief.

This application can also help clinicians understand common trends in their patient data, such as asking what the common symptoms are for patients with chest pain.

Example of the frontend workflow asking what are the most common symptoms in patients that have chest painIn the final example for this post, a clinician asks Q if there are common symptoms for patients complaining of knee and elbow pain. Q responds that both sets of patients describe their pain being exacerbated by movement, but that it cannot conclusively point to any common symptoms across both consultation types. In this case Amazon Q is correctly using source data to prevent a hallucination from occurring.Example of the frontend workflow asking if there are any common symptoms between patients with knee pain and elbow pain

Considerations

The UI for Amazon Q has limited customization. At the time of writing this post, the Amazon Q frontend cannot be embedded in other tools. Supported customization of the web experience includes the addition of a title and subtitle, adding a welcome message, and displaying sample prompts. For updates on web experience customizations, see Customizing an Amazon Q Business web experience. If this kind of customization is critical to your application and business needs, you can explore custom large language model chatbot designs using Amazon Bedrock or Amazon SageMaker.

AWS HealthScribe uses conversational and generative AI to transcribe patient-clinician conversations and generate clinical notes. The results produced by AWS HealthScribe are probabilistic and might not always be accurate because of various factors, including audio quality, background noise, speaker clarity, the complexity of medical terminology, and context-specific language nuances. AWS HealthScribe is designed to be used in an assistive role for clinicians and medical scribes rather than as a substitute for their clinical expertise. As such, AWS HealthScribe output should not be employed to fully automate clinical documentation workflows, but rather to provide additional assistance to clinicians or medical scribes in their documentation process. Please ensure that your application provides the workflow for reviewing the clinical notes produced by AWS HealthScribe and establishes expectation of the need for human review before finalizing clinical notes.

Amazon Q Business uses machine learning models that generate predictions based on patterns in data, and generate insights and recommendations from your content. Outputs are probabilistic and should be evaluated for accuracy as appropriate for your use case, including by employing human review of the output. You and your users are responsible for all decisions made, advice given, actions taken, and failures to take action based on your use of these features.

This proof-of-concept can be extrapolated to create a patient-facing application as well, with the notion that a patient can review their own conversations with physicians and be given access to their medical records and consultation notes in a way that makes it easy for them to ask questions of the trends and data for their own medical history.

AWS HealthScribe is only available for English-US language at this time in the US East (N. Virginia) Region. Amazon Q Business is only available in US East (N. Virginia) and US West (Oregon).

Clean up

To ensure that you don’t continue to accrue charges from this solution, you must complete the following clean-up steps.

AWS HealthScribe

Navigate to the AWS HealthScribe the console and choose Transcription jobs. Select whichever HealthScribe jobs you want to clean up and choose Delete at the top right corner of the console page.

Amazon S3

To clean up your Amazon S3 resources, navigate to the Amazon S3 console and choose the buckets that you used or created while going through this post. To empty the buckets, follow the instructions for Emptying a bucket. After you empty the bucket, you delete the entire bucket.

Amazon Q Business

To delete your Amazon Q Business application, follow the instructions on Managing Amazon Q Business applications.

Conclusion

In this post, we discussed how you can use AWS HealthScribe with Amazon Q Business to create a chatbot to quickly gain insights into patient clinician conversations. To learn more, reach out to your AWS account team or check out the links that follow.


About the Authors

Laura Salinas is a Startup Solution Architect supporting customers whose core business involves machine learning. She is passionate about guiding her customers on their cloud journey and finding solutions that help them innovate. Outside of work she loves boxing, watching the latest movie at the theater and playing competitive dodgeball.

Tiffany Chen is a Solutions Architect on the CSC team at AWS. She has supported AWS customers with their deployment workloads and currently works with Enterprise customers to build well-architected and cost-optimized solutions. In her spare time, she enjoys traveling, gardening, baking, and watching basketball.

Art Tuazon is a Partner Solutions Architect focused on enabling AWS Partners through technical best practices and is passionate about helping customers build on AWS. In her free time, she enjoys running and cooking.

Winnie Chen is a Solutions Architect currently on the CSC team at AWS supporting greenfield customers. She supports customers of all industries as well as sizes such as enterprise and small to medium businesses. She has helped customers migrate and build their infrastructure on AWS. In her free time, she enjoys traveling and spending time outdoors through activities like hiking, biking and rock climbing.

Read More

Use Amazon SageMaker Studio with a custom file system in Amazon EFS

Use Amazon SageMaker Studio with a custom file system in Amazon EFS

Amazon SageMaker Studio is the latest web-based experience for running end-to-end machine learning (ML) workflows. SageMaker Studio offers a suite of integrated development environments (IDEs), which includes JupyterLab, Code Editor, as well as RStudio. Data scientists and ML engineers can spin up SageMaker Studio private and shared spaces, which are used to manage the storage and resource needs of the JupyterLab and Code Editor applications, enable stopping the applications when not in use to save on compute costs, and resume the work from where they stopped.

The storage resources for SageMaker Studio spaces are Amazon Elastic Block Store (Amazon EBS) volumes, which offer low-latency access to user data like notebooks, sample data, or Python/Conda virtual environments. However, there are several scenarios where using a distributed file system shared across private JupyterLab and Code Editor spaces is convenient, which is enabled by configuring an Amazon Elastic File System (Amazon EFS) file system in SageMaker Studio. Amazon EFS provides a scalable fully managed elastic NFS file system for AWS compute instances.

Amazon SageMaker supports automatically mounting a folder in an EFS volume for each user in a domain. Using this folder, users can share data between their own private spaces. However, users can’t share data with other users in the domain; they only have access to their own folder user-default-efs in the $HOME directory of the SageMaker Studio application.

In this post, we explore three distinct scenarios that demonstrate the versatility of integrating custom Amazon EFS with SageMaker Studio.

For further information on configuring Amazon EFS in SageMaker Studio, refer to Attaching a custom file system to a domain or user profile.

Solution overview

In the first scenario, an AWS infrastructure admin wants to set up an EFS file system that can be shared across the private spaces of a given user profile in SageMaker Studio. This means that each user within the domain will have their own private space on the EFS file system, allowing them to store and access their own data and files. The automation described in this post will enable new team members joining the data science team can quickly set up their private space on the EFS file system and access the necessary resources to start contributing to the ongoing project.

The following diagram illustrates this architecture.

First scenario architecture

This scenario offers the following benefits:

  • Individual data storage and analysis – Users can store their personal datasets, models, and other files in their private spaces, allowing them to work on their own projects independently. Segregation is made by their user profile.
  • Centralized data management – The administrator can manage the EFS file system centrally, maintaining data security, backup, and direct access for all users. By setting up an EFS file system with a private space, users can effortlessly track and maintain their work.
  • Cross-instance file sharing – Users can access their files from multiple SageMaker Studio spaces, because the EFS file system provides a persistent storage solution.

The second scenario is related to the creation of a single EFS directory that is shared across all the spaces of a given SageMaker Studio domain. This means that all users within the domain can access and use the same shared directory on the EFS file system, allowing for better collaboration and centralized data management (for example, to share common artifacts). This is a more generic use case, because there is no specific segregated folder for each user profile.

The following diagram illustrates this architecture.

Second scenario architecture

This scenario offers the following benefits:

  • Shared project directories – Suppose the data science team is working on a large-scale project that requires collaboration among multiple team members. By setting up a shared EFS directory at project level, the team can collaborate on the same projects by accessing and working on files in the shared directory. The data science team can, for example, use the shared EFS directory to store their Jupyter notebooks, analysis scripts, and other project-related files.
  • Simplified file management – Users don’t need to manage their own private file storage, because they can rely on the shared directory for their file-related needs.
  • Improved data governance and security – The shared EFS directory, being centrally managed by the AWS infrastructure admin, can provide improved data governance and security. The admin can implement access controls and other data management policies to maintain the integrity and security of the shared resources.

The third scenario explores the configuration of an EFS file system that can be shared across multiple SageMaker Studio domains within the same VPC. This allows users from different domains to access and work with the same set of files and data, enabling cross-domain collaboration and centralized data management.

The following diagram illustrates this architecture.

Third scenario architecture

This scenario offers the following benefits:

  • Enterprise-level data science collaboration – Imagine a large organization with multiple data science teams working on various projects across different departments or business units. By setting up a shared EFS file system accessible across the organization’s SageMaker Studio domains, these teams can collaborate on cross-functional projects, share artifacts, and use a centralized data repository for their work.
  • Shared infrastructure and resources – The EFS file system can be used as a shared resource across multiple SageMaker Studio domains, promoting efficiency and cost-effectiveness.
  • Scalable data storage – As the number of users or domains increases, the EFS file system automatically scales to accommodate the growing storage and access requirements.
  • Data governance – The shared EFS file system, being managed centrally, can be subject to stricter data governance policies, access controls, and compliance requirements. This can help the organization meet regulatory and security standards while still enabling cross-domain collaboration and data sharing.

Prerequisites

This post provides an AWS CloudFormation template to deploy the main resources for the solution. In addition to this, the solution expects that the AWS account in which the template is deployed already has the following configuration and resources:

Refer to Attaching a custom file system to a domain or user profile for additional prerequisites.

Configure an EFS directory shared across private spaces of a given user profile

In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, creating a private file system directory for each user. We can distinguish two use cases:

  • Create new SageMaker Studio user profiles – A new team member joins a preexisting SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces
  • Use preexisting SageMaker Studio user profiles – A team member is already working on a specific SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces

The solution provided in this post focuses on the first use case. We discuss how to adapt the solution for preexisting SageMaker Studio domain user profiles later in this post.

The following diagram illustrates the high-level architecture of the solution.

AWS Architecture

In this solution, we use CloudTrail, Amazon EventBridge, and Lambda to automatically create a private EFS directory when a new SageMaker Studio user profile is created. The high-level steps to set up this architecture are as follows:

  1. Create an EventBridge rule that invokes the Lambda function when a new SageMaker user profile is created and logged in CloudTrail.
  2. Create an EFS file system with an access point for the Lambda function and with a mount target in every Availability Zone that the SageMaker Studio domain is located.
  3. Use a Lambda function to create a private EFS directory with the required POSIX permissions for the profile. The function will also update the profile with the new file system configuration.

Deploy the solution using AWS CloudFormation

To use the solution, you can deploy the infrastructure using the following CloudFormation template. This template deploys three main resources in your account: Amazon EFS resources (file system, access points, mount targets), an EventBridge rule, and a Lambda function.

Refer to Create a stack from the CloudFormation console for additional information. The input parameters for this template are:

  • SageMakerDomainId – The SageMaker Studio domain ID that will be associated with the EFS file system.
  • SageMakerStudioVpc – The VPC associated to the SageMaker Studio domain.
  • SageMakerStudioSubnetId – One or multiple subnets associated to the SageMaker Studio domain. The template deploys its resources in these subnets.
  • SageMakerStudioSecurityGroupId – The security group associated to the SageMaker Studio domain. The template configures the Lambda function with this security group.

Amazon EFS resources

After you deploy the template, navigate to the Amazon EFS console and confirm that the EFS file system has been created. The file system has a mount target in every Availability Zone that your SageMaker domain connects to.

Note that each mount target uses the EC2 security group that SageMaker created in your AWS account when you first created the domain, which allows NFS traffic at port 2049. The provided template automatically retrieves this security group when it is first deployed, using a Lambda backed custom resource.

You can also observe that the file system has an EFS access point. This access point grants root access on the file system for the Lambda function that will create the directories for the SageMaker Studio user profiles.

EventBridge rule

The second main resource is an EventBridge rule invoked when a new SageMaker Studio user profile is created. Its target is the Lambda function that creates the folder in the EFS file system and updates the profile that has been just created. The input of the Lambda function is the event matched, where you can get the SageMaker Studio domain ID and the SageMaker user profile name.

Lambda function

Lastly, the template creates a Lambda function that creates a directory in the EFS file system with the required POSIX permissions for the user profile and updates the user profile with the new file system configuration.

At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

  • UID – The POSIX user ID. The default is 200001. A valid range is a minimum value of 10000 and maximum value of 4000000.
  • GID – The POSIX group ID. The default is 1001. A valid range is a minimum value of 1001 and maximum value of 4000000.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created.

Lambda function configuration

Adapt the solution for preexisting SageMaker Studio domain user profiles

We can reuse the previous solution for scenarios in which the domain already has user profiles created. For that, you can create an additional Lambda function in Python that lists all the user profiles for the given SageMaker Studio domain and creates a dedicated EFS directory for each user profile.

The Lambda function should be in the same VPC as the EFS file system and it has attached the file system and access point previously created. You need to add the efs_id and domain_id values as environment variables for the function.

You can include the following code as part of this new Lambda function and run it manually:

import json
import subprocess
import boto3
import os

sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    # Get EFS and Domain ID
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']    
    
    
    # Get Domain user profiles
    list_user_profiles_response = sm_client.list_user_profiles(
        DomainIdEquals=domain_id
    )
    domain_users = list_user_profiles_response["UserProfiles"]
    
    # Create directories for each user
    for user in domain_users:

        user_profile_name = user["UserProfileName"]

        # Permissions
        repository=f'/mnt/efs/{user_profile_name}'
        subprocess.call(['mkdir', repository])
        subprocess.call(['chown', '200001:1001', repository])
        
        # Update SageMaker user
        response = sm_client.update_user_profile(
            DomainId=domain_id,
            UserProfileName=user_profile_name,
            UserSettings={
                'CustomFileSystemConfigs': [
                    {
                        'EFSFileSystemConfig': {
                            'FileSystemId': file_system,
                            'FileSystemPath': f'/{user_profile_name}'
                        }
                    }
                ]
            }
        )

Configure an EFS directory shared across all spaces of a given domain

In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, using the same file system directory for all the users.

To achieve this, in addition to the prerequisites described earlier in this post, you need to complete the following steps.

Create the EFS file system

The file system needs to be in the same VPC as the SageMaker Studio domain. Refer to Creating EFS file systems for additional information.

Add mount targets to the EFS file system

Before SageMaker Studio can access the new EFS file system, the file system must have a mount target in each of the subnets associated with the domain. For more information about assigning mount targets to subnets, see Managing mount targets. You can get the subnets associated to the domain on the SageMaker Studio console under Network. You need to create a mount target for each subnet.

Networking used

Additionally, for each mount target, you must add the security group that SageMaker created in your AWS account when you created the SageMaker Studio domain. The security group name has the format security-group-for-inbound-nfs-domain-id.

The following screenshot shows an example of an EFS file system with two mount targets for a SageMaker Studio domain associated to two subnets. Note the security group associated to both mount targets.

EFS file system

Create an EFS access point

The Lambda function accesses the EFS file system as root using this access point. See Creating access points for additional information.

EFS access point

Create a new Lambda function

Define a new Lambda function with the name LambdaManageEFSUsers. This function updates the default space settings of the SageMaker Studio domain, configuring the file system settings to use a specific EFS file system shared repository path. This configuration is automatically applied to all spaces within the domain.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created. Additionally, you need to add efs_id and domain_id as environment variables for the function.

At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

  • UID – The POSIX user ID. The default is 200001.
  • GID – The POSIX group ID. The default is 1001.

The function updates the default space settings of the SageMaker Studio domain, configuring the EFS file system to be used by all users. See the following code:

import json
import subprocess
import boto3
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    # Environment variables
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']
    
    # EFS directory name
    repository_name='shared_repository'
    repository=f'/mnt/efs/{repository_name}'
            
    # Add permissions to the new directory
    try:
        subprocess.call(['mkdir -p', repository])
        subprocess.call(['chown', '200001:1001', repository])
    except:
        print("Repository already created")
    
    # Update Sagemaker domain to enable access to the new directory
    response = sm_client.update_domain(
        DomainId=domain_id,
        DefaultUserSettings={
            'CustomFileSystemConfigs': [
                {
                    'EFSFileSystemConfig': {
                        'FileSystemId': file_system,
                        'FileSystemPath': f'/{repository_name}'
                    }
                }
            ]
        }
    )
    logger.info(f"Updated Studio Domain {domain_id} and EFS {file_system}")
    return {
        'statusCode': 200,
        'body': json.dumps(f"Created dir and modified permissions for Studio Domain {domain_id}")
    }

The execution role of the Lambda function needs to have permissions to update the SageMaker Studio domain:

{ 
"Version": "2012-10-17",
    "Statement": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:UpdateDomain"
        ],
        "Resource": "*" 
        } 
    ]
}

Configure an EFS directory shared across multiple domains under the same VPC

In this scenario, an administrator wants to provision an EFS file system for all users of multiple SageMaker Studio domains, using the same file system directory for all the users. The idea in this case is to assign the same EFS file system to all users of all domains that are within the same VPC. To test the solution, the account should ideally have two SageMaker Studio domains inside the VPC and subnet.

Create the EFS file system, add mount targets, and create an access point

Complete the steps in the previous section to set up your file system, mount targets, and access point.

Create a new Lambda function

Define a Lambda function called LambdaManageEFSUsers. This function is responsible for automating the configuration of SageMaker Studio domains to use a shared EFS file system within a specific VPC. This can be useful for organizations that want to provide a centralized storage solution for their ML projects across multiple SageMaker Studio domains. See the following code:

import json
import subprocess
import boto3
import os
import sys

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    #Environment variables
    event_domain_id =event["domain_id"]
    file_system=os.environ['efs_id']
    env_vpc_id =os.environ['vpc_id']
    
    #Event parameters 
    repository_name='shared_repository'
    repository=f'/mnt/efs/{repository_name}'
    domains =[]    

    # List all SageMaker domains in the specified VPC
    response = sm_client.list_domains()
    all_domains = response['Domains']
    for domain in all_domains:
        domain_id =domain["DomainId"]
        data =sm_client.describe_domain(DomainId=domain_id)
        domain_vpc_id = data['VpcId']
        if domain_vpc_id ==env_vpc_id:
            domains.append(domain_id)
    
    # Create directory and add the permission
    try:
        subprocess.call(['mkdir -p', repository])
        subprocess.call(['chown', '200001:1001', repository])
    except:
        print("Repository already created")
    
    #Update Sagemaker domain
    if len(domains)>0:
        for domain_id in domains: 
            response = sm_client.update_domain(
                DomainId=event_domain_id,
                DefaultUserSettings={
                    'CustomFileSystemConfigs': [
                        {
                            'EFSFileSystemConfig': {
                                'FileSystemId': file_system,
                                'FileSystemPath': f'/{repository_name}'
                            }
                        }
                    ]
                }
            )
   
        logger.info(f"Updated Studio for Domains {domains} and EFS {file_system}")
        return {
                'statusCode': 200,
                'body': json.dumps(f"Created dir and modified permissions for Domains {domains}")
            }
    
    else:
        return {
            'statusCode': 400,
            'body': json.dumps(f"VPC id of all the domains {domain_vpc} is different than the vpc id configured {env_vpc_id}")
        }

The execution role of the Lambda function needs to have permissions to describe and update the SageMaker Studio domain:

{ 
"Version": "2012-10-17",
    "Statement": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:DescribeDomain",
            "sagemaker:UpdateDomain"
        ],
        "Resource": "*" 
        } 
    ]
}

Clean up

To clean up the solution you implemented and avoid further costs, delete the CloudFormation template you deployed in your AWS account. When you delete the template, you also delete the EFS file system and its storage. For additional information, refer to Delete a stack from the CloudFormation console.

Conclusion

In this post, we have explored three scenarios demonstrating the versatility of integrating Amazon EFS with SageMaker Studio. These scenarios highlight how Amazon EFS can provide a scalable, secure, and collaborative data storage solution for data science teams.

The first scenario focused on configuring an EFS directory with private spaces for individual user profiles, allowing users to store and access their own data while the administrator manages the EFS file system centrally.

The second scenario showcased a shared EFS directory across all spaces within a SageMaker Studio domain, enabling better collaboration and centralized data management.

The third scenario explored an EFS file system shared across multiple SageMaker Studio domains, empowering enterprise-level data science collaboration and promoting efficient use of shared resources.

By implementing these Amazon EFS integration scenarios, organizations can unlock the full potential of their data science teams, improve data governance, and enhance the overall efficiency of their data-driven initiatives. The integration of Amazon EFS with SageMaker Studio provides a versatile platform for data science teams to thrive in the evolving landscape of ML and AI.


About the Authors

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads, to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.

Itziar Molina Fernandez is an AI/ML Consultant in the AWS Professional Services team. In her role, she works with customers building large-scale machine learning platforms and generative AI use cases on AWS. In her free time, she enjoys exploring new places.

Matteo Amadei is a Data Scientist Consultant in the AWS Professional Services team. He uses his expertise in artificial intelligence and advanced analytics to extract valuable insights and drive meaningful business outcomes for customers. He has worked on a wide range of projects spanning NLP, computer vision, and generative AI. He also has experience with building end-to-end MLOps pipelines to productionize analytical models. In his free time, Matteo enjoys traveling and reading.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years of software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Read More