Build a serverless voice-based contextual chatbot for people with disabilities using Amazon Bedrock

Build a serverless voice-based contextual chatbot for people with disabilities using Amazon Bedrock

At Amazon and AWS, we are always finding innovative ways to build inclusive technology. With voice assistants like Amazon Alexa, we are enabling more people to ask questions and get answers on the spot without having to type. Whether you’re a person with a motor disability, juggling multiple tasks, or simply away from your computer, getting search results without typing is a valuable feature. With modern voice assistants, you can now ask your questions conversationally and get verbal answers instantly.

In this post, we discuss voice-guided applications. Specifically, we focus on chatbots. Chatbots are no longer a niche technology. They are now ubiquitous on customer service websites, providing around-the-clock automated assistance. Although AI chatbots have been around for years, recent advances of large language models (LLMs) like generative AI have enabled more natural conversations. Chatbots are proving useful across industries, handling both general and industry-specific questions. Voice-based assistants like Alexa demonstrate how we are entering an era of conversational interfaces. Typing questions already feels cumbersome to many who prefer the simplicity and ease of speaking with their devices.

We explore how to build a fully serverless, voice-based contextual chatbot tailored for individuals who need it. We also provide a sample chatbot application. The application is available in the accompanying GitHub repository. We create an intelligent conversational assistant that can understand and respond to voice inputs in a contextually relevant manner. The AI assistant is powered by Amazon Bedrock. This chatbot is designed to assist users with various tasks, provide information, and offer personalized support based on their unique requirements. For our LLM, we use Anthropic Claude on Amazon Bedrock.

We demonstrate the process of integrating Anthropic Claude’s advanced natural language processing capabilities with the serverless architecture of Amazon Bedrock, enabling the deployment of a highly scalable and cost-effective solution. Additionally, we discuss techniques for enhancing the chatbot’s accessibility and usability for people with motor disabilities. The aim of this post is to provide a comprehensive understanding of how to build a voice-based, contextual chatbot that uses the latest advancements in AI and serverless computing.

We hope that this solution can help people with certain mobility disabilities. A limited level of interaction is still required, and specific identification of start and stop talking operations is required. In our sample application, we address this by having a dedicated Talk button that performs the transcription process while being pressed.

For people with significant motor disabilities, the same operation can be implemented with a dedicated physical button that can be pressed by a single finger or another body part. Alternatively, a special keyword can be said to indicate the beginning of the command. This approach is used when you communicate with Alexa. The user always starts the conversation with “Alexa.”

Solution overview

The following diagram illustrates the architecture of the solution.

Architecture of serverless components of the solution

To deploy this architecture, we need managed compute that can host the web application, authentication mechanisms, and relevant permissions. We discuss this later in the post.

All the services that we use are serverless and fully managed by AWS. You don’t need to provision the compute resources. You only consume the services through their API. All the calls to the services are made directly from the client application.

The application is a simple React application that we create using the Vite build tool. We use the AWS SDK for JavaScript to call the services. The solution uses the following major services:

  • Amazon Polly is a service that turns text into lifelike speech.
  • Amazon Transcribe is an AWS AI service that makes it straightforward to convert speech to text.
  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities that you need to build generative AI applications.
  • Amazon Cognito is an identity service for web and mobile apps. It’s a user directory, an authentication server, and an authorization service for OAuth 2.0 access tokens and AWS credentials.

To consume AWS services, the user needs to obtain temporary credentials from AWS Identity and Access Management (IAM). This is possible due to the Amazon Cognito identity pool, which acts as a mediator between your application user and IAM services. The identity pool holds the information about the IAM roles with all permissions necessary to run the solution.

Amazon Polly and Amazon Transcribe don’t require additional setup from the client aside from what we have described. However, Amazon Bedrock requires named user authentication. This means that having an Amazon Cognito identity pool is not enough—you also need to use the Amazon Cognito user pool, which allows you to define users and bind them to the Amazon Cognito identity pool. To understand better how Amazon Cognito allows external applications to invoke AWS services, refer to refer to Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

The heavy lifting of provisioning the Amazon Cognito user pool and identity pool, including generating the sign-in UI for the React application, is done by AWS Amplify. Amplify consists of a set of tools (open source framework, visual development environment, console) and services (web application and static website hosting) to accelerate the development of mobile and web applications on AWS. We cover the steps of setting Amplify in the next sections.

Prerequisites

Before you begin, complete the following prerequisites:

  1. Make sure you have the following installed:
  2. Create an IAM role to use in the Amazon Cognito identity pool. Use the least privilege principal to provide only the minimum set of permissions needed to run the application.
    • To invoke Amazon Bedrock, use the following code:
      {
      					  "Version": "2012-10-17",
      					  "Statement": [
      						{
      						  "Sid": "VisualEditor1",
      						  "Effect": "Allow",
      						  "Action": "bedrock:InvokeModel",
      						  "Resource": "*"
      						}
      					  ]
      					}

    • To invoke Amazon Polly, use the following code:
      {
      					  "Version": "2012-10-17",
      					  "Statement": [
      						{
      						  "Sid": "VisualEditor2",
      						  "Effect": "Allow",
      						  "Action": "polly:SynthesizeSpeech",
      						  "Resource": "*"
      						}
      					  ]
      					}

    • To invoke Amazon Transcribe, use the following code:
      {
      				  "Version": "2012-10-17",
      				  "Statement": [
      					{
      					  "Sid": "VisualEditor3",
      					  "Effect": "Allow",
      					  "Action": "transcribe:StartStreamTranscriptionWebSocket",
      					  "Resource": "*"
      					}
      				  ]
      				}

The full policy JSON should look as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor2",
      "Effect": "Allow",
      "Action": "polly:SynthesizeSpeech",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor3",
      "Effect": "Allow",
      "Action": "transcribe:StartStreamTranscriptionWebSocket",
      "Resource": "*"
    }
  ]
}
  1. Run the following command to clone the GitHub repository:
    git clone https://github.com/aws-samples/serverless-conversational-chatbot.git

  2. To use Amplify, refer to Set up Amplify CLI to complete the initial setup.
  3. To be consistent with the values that you use later in the instructions, call your AWS profile amplify when you see the following prompt.
    Creation of the AWS profile "amplify"
  4. Create the role amplifyconsole-backend-role with the AdministratorAccess-Amplify managed policy, which allows Amplify to create the necessary resources.
    IAM Role with "AdministratorAccess-Amplify" policy
  5. For this post, we use the Anthropic Claude 3 Haiku LLM. To enable the LLM in Amazon Bedrock, refer to Access Amazon Bedrock foundation models.

Deploy the solution

There are two options to deploy the solution:

  • Use Amplify to deploy the application automatically
  • Deploy the application manually

We provide the steps for both options in this section.

Deploy the application automatically using Amplify

Amplify can deploy the application automatically if it’s stored in GitHub, Bitbucket, GitLab, or AWS CodeCommit. Upload the application that you downloaded earlier to your preferred repository (from the aforementioned options). For instructions, see Getting started with deploying an app to Amplify Hosting.

You can now continue to the next section of this post to set up IAM permissions.

Deploy the application manually

If you don’t have access to one of the storage options that we mentioned, you can deploy the application manually. This can also be useful if you want to modify the application to better fit your use case.

We tested the deployment on AWS Cloud9, a cloud integrated development environment (IDE) for writing, running, and debugging code, with Ubuntu Server 22.04 and Amazon Linux 2023.

We use the Visual Studio Code IDE and run all the following commands directly in the terminal window inside the IDE, but you can also run the commands in the terminal of your choice.

  1. From the directory where you checked out the application on GitHub, run the following command:
    cd serverless-conversational-chatbot

  2. Run the following commands:
    npm i
    
    amplify init

  3. Follow the prompts as shown in the following screenshot.
    • For authentication, choose the AWS profile amplify that you created as part of the prerequisite steps.
      Initial AWS Amplify setup in React application: 1. Do you want to use an existing environment? No 2. Enter a name for the environment: sampleenv 3. Select the authentication method you want to use: AWS Profile 4. Please choose the profile you want to use: amplify
    • Two new files will appear in the project under the src folder:
      • amplifyconfiguration.json
      • aws-exports.js

      New objects created by AWS Amplify: 1. aws-exports.js 2. amplifyconfiguration.json

  1. Next run the following command:
    amplify configure project

Then select “Project Information”

Project Configuration of AWS Amplify in React Applications

  1.  Enter the following information:
    Which setting do you want to configure? Project information
    
    Enter a name for the project: servrlsconvchat
    
    Choose your default editor: Visual Studio Code
    
    Choose the type of app that you're building: javascript
    
    What javascript framework are you using: react
    
    Source Directory Path: src
    
    Distribution Directory Path: dist
    
    Build Command: npm run-script build
    
    Start Command: npm run-script start

You can use an existing Amazon Cognito identity pool and user pool or create new objects.

  1. For our application, run the following command:
    amplify add auth

If you get the following message, you can ignore it:

Auth has already been added to this project. To update run amplify update auth
  1. Choose Default configuration.
    Selecting "default configuration" when adding authentication objects
  2. Accept all options proposed by the prompt.
  3. Run the following command:
    amplify add hosting

  4. Choose your hosting option.

You have two options to host the application. The application can be hosted to the Amplify console or to Amazon Simple Storage Service (Amazon S3) and then exposed through Amazon CloudFront.

Hosting with the Amplify console differs from CloudFront and Amazon S3. The Amplify console is a managed service providing continuous integration and delivery (CI/CD) and SSL certificates, prioritizing swift deployment of serverless web applications and backend APIs. In contrast, CloudFront and Amazon S3 offer greater flexibility and customization options, particularly for hosting static websites and assets with features like caching and distribution. CloudFront and Amazon S3 are preferable for intricate, high-traffic web applications with specific performance and security needs.

For this post, we use the Amplify console. To learn more about the deployment with Amazon S3 and Amazon CloudFront, refer to documentation.
Selecting the deployment option for the React application on the Amplify Console. Selected option: Hosting with Amplify Console

Now you’re ready to publish the application. There is an option to publish the application to GitHub to support CI/CD pipelines. Amplify has built-in integration with GitHub and can redeploy the application automatically when you push the changes. For simplicity, we use manual deployment.

  1. Choose Manual deployment.
    Selecting "Manual Deployment" when publishing the project
  2. Run the following command:
    amplify publish

After the application is published, you will see the following output. Note down this URL to use in a later step.
Result of the Deployment of the React Application on the Amplify Console. The URL that the user should use to enter the Amplify application

  1. Log in to the Amplify console, navigate to the servrlsconvchat application, and choose General under App settings in the navigation pane.
    Service Role attachment to the deployed application. First step. Select the deployed application. Seelct “General” option
  2. Edit the app settings and enter amplifyconsole-backend-role for Service role (you created this role in the prerequisites section).
    Service Role attachment to the deployed application. Second step. Setting “amplifyconsole-backend-role” in the “Service role” field

Now you can proceed to the next section to set up IAM permissions.

Configure IAM permissions

As part of the publishing method you completed, you provisioned a new identity pool. You can view this on the Amazon Cognito console, along with a new user pool. The names will be different from those presented in this post.

As we explained earlier, you need to attach policies to this role to allow interaction with Amazon Bedrock, Amazon Polly, and Amazon Transcribe. To set up IAM permissions, complete the following steps:

  1. On the Amazon Cognito console, choose Identity pools in the navigation pane.
  2. Navigate to your identity pool.
  3. On the User access tab, choose the link for the authenticated role.
    Identifying the IAM Authentication Role in the Cognitive Identity Pool. Select “Identity pools” option in the console. Select “User access” tab. Click on the link under “Authentication role”
  4. Attach the policies that you defined in the prerequisites section.
    IAM Policies Attached to Cognito Identity Pool Authenticated Roles. Textual data presaented in “Prerequisites” section, item 2.

Amazon Bedrock can only be used with a named user, so we create a sample user in the Amazon Cognito user pool that was provisioned as part of the application publishing process.

  1. On the user pool details page, on the Users tab, choose Create user.
    User Creation in the Cognito User Pool. Select relevant user pool in “User pools” section. Select “Users” tab. Click on “Create user” button
  2. Provide your user information.
    Sample user definition in the Cognito User Pool. Enter email address and temporary password.

You’re now ready to run the application.

Use the sample serverless application

To access the application, navigate to the URL you saved from the output at the end of the application publishing process. Sign in to the application with the user you created in the previous step. You might be asked to change the password the first time you sign in.
Application Login Page. Enter user name and password

Use the Talk button and hold it while you’re asking the question. (We use this approach for the simplicity of demonstrating the abilities of the tool. For people with motor disabilities, we propose using a dedicated button that can be operated with different body parts, or a special keyword to initiate the conversation.)

When you release the button, the application sends your voice to Amazon Transcribe and returns the transcription text. This text is used as an input for an Amazon Bedrock LLM. For this example, we use Anthropic Claude 3 Haiku, but you can modify the code and use another model.

The response from Amazon Bedrock is displayed as text and is also spoken by Amazon Polly.
Instructions on how to invoke the "Talk" operation, by using “Talk” operation

The conversation history is also stored. This means that you can ask follow-up questions, and the context of the conversation is preserved. For example, we asked, “What is the most famous tower there?” without specifying the location, and our chatbot was able to understand that the context of the question is Paris based on our previous question.
Demonstration of context preservation during conversation. Continues question-answer conversation with chatbot.

We store the conversation history inside a JavaScript variable, which means that if you refresh the page, the context will be lost. We discuss how to preserve the conversation context in a persistent way later in this post.

To identify that the transcription process is happening, choose and hold the Talk button. The color of the button changes and a microphone icon appears.

"Talk" operation indicator. “Talk” button changes color to orche

Clean up

To clean up your resources, run the following command from the same directory where you ran the Amplify commands:

amplify delete

Result of the "Cleanup" operation after running “amplify delete” command

This command removes the Amplify settings from the React application, Amplify resources, and all Amazon Cognito objects, including the IAM role and Amazon Cognito user pool’s user.

Conclusion

In this post, we presented how to create a fully serverless voice-based contextual chatbot using Amazon Bedrock with Anthropic Claude.

This serves a starting point for a serverless and cost-effective solution. For example, you could extend the solution to have persistent conversational memory for your chats, such as Amazon DynamoDB. If you want to use a Retrieval Augmented Generation (RAG) approach, you can use Amazon Bedrock Knowledge Bases to securely connect FMs in Amazon Bedrock to your company data.

Another approach is to customize the model you use in Amazon Bedrock with your own data using fine-tuning or continued pre-training to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.

For additional resources, refer to the following:


About the Author

Michael Shapira is a Senior Solution Architect covering general topics in AWS and part of the AWS Machine Learning community. He has 16 years’ experience in Software Development. He finds it fascinating to work with cloud technologies and help others on their cloud journey.

Eitan Sela is a Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Read More

Maintain access and consider alternatives for Amazon Monitron

Maintain access and consider alternatives for Amazon Monitron

Amazon Monitron, the Amazon Web Services (AWS) machine learning (ML) service for industrial equipment condition monitoring, will no longer be available to new customers effective October 31, 2024. Existing customers of Amazon Monitron will be able to purchase devices and use the service as normal. We will continue to sell devices until July 2025 and will honor the 5-year device warranty, including service support. AWS continues to invest in security, availability, and performance improvements for Amazon Monitron, but we do not plan to introduce new features to Amazon Monitron.

This post discusses how customers can maintain access to Amazon Monitron after it is closed to new customers and what some alternatives are to Amazon Monitron.

Maintaining access to Amazon Monitron

Customers will be considered an existing customer if they have commissioned an Amazon Monitron sensor through a project any time in the 30 days prior to October 31, 2024. In order to maintain access to the service after October 31, 2024, customers should create a project and commission at least one sensor.

For any questions or support needed, you may contact your assigned account manager, solutions architect, or create a case from the AWS Management Console.

Ordering Amazon Monitron hardware

For existing Amazon business customers, we will allowlist your account with the existing Amazon Monitron devices. For existing Amazon.com retail customers, the Amazon Monitron team will provide specific ordering instructions according to individual request.

Alternatives to Amazon Monitron

For customers interested in an alternative for your condition monitoring needs, we recommend exploring alternative solutions provided by our AWS Partners: Tactical Edge, IndustrAI, and Factory AI.

Summary

While new customers will no longer have access to Amazon Monitron after October 31, 2024, AWS offers a range of partner solutions through the AWS Partner Network finder. Customers should explore these options to understand what works best for their specific needs.

More details can be found in the following resources at AWS Partner Network.


About the author

Stuart Gillen is a Sr. Product Manager for Monitron, at AWS. Stuart has held a variety of roles in engineering management, business development, product management, and consulting. Most of his career has been focused on industrial applications specifically in reliability practices, maintenance systems, and manufacturing.

Read More

Import a question answering fine-tuned model into Amazon Bedrock as a custom model

Import a question answering fine-tuned model into Amazon Bedrock as a custom model

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Common generative AI use cases, including but not limited to chatbots, virtual assistants, conversational search, and agent assistants, use FMs to provide responses. Retrieval Augment Generation (RAG) is a technique to optimize the output of FMs by providing context around the questions for these use cases. Fine-tuning the FM is recommended to further optimize the output to follow the brand and industry voice or vocabulary.

Custom Model Import for Amazon Bedrock, in preview now, allows you to import customized FMs created in other environments, such as Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2) instances, and on premises, into Amazon Bedrock. This post is part of a series that demonstrates various architecture patterns for importing fine-tuned FMs into Amazon Bedrock.

In this post, we provide a step-by-step approach of fine-tuning a Mistral model using SageMaker and import it into Amazon Bedrock using the Custom Import Model feature. We use the OpenOrca dataset to fine-tune the Mistral model and use the SageMaker FMEval library to evaluate the fine-tuned model imported into Amazon Bedrock.

Key Features

Some of the key features of Custom Model Import for Amazon Bedrock are:

  1. This feature allows you to bring your fine-tuned models and leverage the fully managed serverless capabilities of Amazon Bedrock
  2. Currently we are supporting Llama 2, Llama 3, Flan, Mistral Model architectures using this feature with a precisions of FP32, FP16 and BF16 with further quantizations coming soon.
  3. To leverage this feature you can run the import process (covered later in the blog) with your model weights being in Amazon Simple Storage Service (Amazon S3).
  4. You can even leverage your models created using Amazon SageMaker by referencing the Amazon SageMaker model Amazon Resource Names (ARN) which provides for a seamless integration with SageMaker.
  5. Amazon Bedrock will automatically scale your model as your traffic pattern increases and when not in use, scale your model down to 0 thus reducing your costs.

Let us dive into a use-case and see how easy it is to use this feature.

Solution overview

At the time of writing, the Custom Model Import feature in Amazon Bedrock supports models following the architectures and patterns in the following figure.

In this post, we walk through the following high-level steps:

  1. Fine-tune the model using SageMaker.
  2. Import the fine-tuned model into Amazon Bedrock.
  3. Test the imported model.
  4. Evaluate the imported model using the FMEval library.

The following diagram illustrates the solution architecture.

The process includes the following steps:

  1. We use a SageMaker training job to fine-tune the model using a SageMaker JupyterLab notebook. This training job reads the dataset from Amazon Simple Storage Service (Amazon S3) and writes the model back into Amazon S3. This model will then be imported into Amazon Bedrock.
  2. To import the fine-tuned model, you can use the Amazon Bedrock console, the Boto3 library, or APIs.
  3. An import job orchestrates the process to import the model and make the model available from the customer account.
    1. The import job copies all the model artifacts from the user’s account into an AWS managed S3 bucket.
  4. When the import job is complete, the fine-tuned model is made available for invocation from your AWS account.
  5. We use the SageMaker FMEval library in a SageMaker notebook to evaluate the imported model.

The copied model artifacts will remain in the Amazon Bedrock account until the custom imported model is deleted from Amazon Bedrock. Deleting model artifacts in your AWS account S3 bucket doesn’t delete the model or the related artifacts in the Amazon Bedrock managed account. You can delete an imported model from Amazon Bedrock along with all the copied artifacts using either the Amazon Bedrock console, Boto3 library, or APIs.

Additionally, all data (including the model) remains within the selected AWS Region. The model artifacts are imported into the AWS operated deployment account using a virtual private cloud (VPC) endpoint, and you can encrypt your model data using an AWS Key Management Service (AWS KMS) customer managed key.

In the following sections, we dive deep into each of these steps to deploy, test, and evaluate the model.

Prerequisites

We use Mistral-7B-v0.3 in this post because it uses an extended vocabulary compared to its prior version produced by Mistral AI. This model is straightforward to fine-tune, and Mistral AI has provided example fine-tuned models. We use Mistral for this use case because this model supports a 32,000-token context capacity and is fluent in English, French, Italian, German, Spanish, and coding languages. With the Mixture of Experts (MoE) feature, it can achieve higher accuracy for customer support use cases.

Mistral-7B-v0.3 is a gated model on the Hugging Face model repository. You need to review the terms and conditions and request access to the model by submitting your details.

We use Amazon SageMaker Studio to preprocess the data and fine-tune the Mistral model using a SageMaker training job. To set up SageMaker Studio, refer to Launch Amazon SageMaker Studio. Refer to the SageMaker JupyterLab documentation to set up and launch a JupyterLab notebook. You will submit a SageMaker training job to fine-tune the Mistral model from the SageMaker JupyterLab notebook, which can found on the GitHub repo.

Fine-tune the model using QLoRA

To fine-tune the Mistral model, we apply QLoRA and Parameter-Efficient Fine-Tuning (PEFT) optimization techniques. In the provided notebook, you use the Fully Sharded Data Parallel (FSDP) PyTorch API to perform distributed model tuning. You use supervised fine-tuning (SFT) to fine-tune the Mistral model.

Prepare the dataset

The first step in the fine-tuning process is to prepare and format the dataset. After you transform the dataset into the Mistral Default Instruct format, you upload it as a JSONL file into the S3 bucket used by the SageMaker session, as shown in the following code:

# Load dataset from the hub
dataset = load_dataset("Open-Orca/OpenOrca")
flan_dataset = dataset.filter(lambda example, indice: "flan" in example["id"], with_indices=True)
flan_dataset = flan_dataset["train"].train_test_split(test_size=0.01, train_size=0.035)

columns_to_remove = list(dataset["train"].features)
flan_dataset = flan_dataset.map(create_conversation, remove_columns=columns_to_remove, batched=False)

# save datasets to s3
flan_dataset["train"].to_json(f"{training_input_path}/train_dataset.json", orient="records", force_ascii=False)
flan_dataset["test"].to_json(f"{training_input_path}/test_dataset.json", orient="records", force_ascii=False)

You transform the dataset into Mistral Default Instruct format within the SageMaker training job as instructed in the training script (run_fsdp_qlora.py):

    ################
    # Dataset
    ################
    
    train_dataset = load_dataset(
        "json",
        data_files=os.path.join(script_args.dataset_path, "train_dataset.json"),
        split="train",
    )
    test_dataset = load_dataset(
        "json",
        data_files=os.path.join(script_args.dataset_path, "test_dataset.json"),
        split="train",
    )

    ################
    # Model & Tokenizer
    ################

    # Tokenizer        
    tokenizer = AutoTokenizer.from_pretrained(script_args.model_id, use_fast=True)
    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.chat_template = MISTRAL_CHAT_TEMPLATE
    
    # template dataset
    def template_dataset(examples):
        return{"text":  tokenizer.apply_chat_template(examples["messages"], tokenize=False)}
    
    train_dataset = train_dataset.map(template_dataset, remove_columns=["messages"])
    test_dataset = test_dataset.map(template_dataset, remove_columns=["messages"])

Optimize fine-tuning using QLoRA

You optimize your fine-tuning using QLoRA and with the precision provided as input into the training script as SageMaker training job parameters. QLoRA is an efficient fine-tuning approach that reduces memory usage to fine-tune a 65-billion-parameter model on a single 48 GB GPU, preserving the full 16-bit fine-tuning task performance. In this notebook, you use the bitsandbytes library to set up quantization configurations, as shown in the following code:

    # Model    
    torch_dtype = torch.bfloat16 if training_args.bf16 else torch.float32
    quant_storage_dtype = torch.bfloat16

    if script_args.use_qlora:
        print(f"Using QLoRA - {torch_dtype}")
        quantization_config = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch_dtype,
                bnb_4bit_quant_storage=quant_storage_dtype,
            )
    else:
        quantization_config = None

You use the LoRA config based on the QLoRA paper and Sebastian Raschka experiment, as shown in the following code. Two key points to consider from the Raschka experiment are that QLoRA offers 33% memory savings at the cost of an 39% increase in runtime, and to make sure LoRA is applied to all layers to maximize model performance.

################
# PEFT
################
# LoRA config based on QLoRA paper & Sebastian Raschka experiment
peft_config = LoraConfig(
    lora_alpha=8,
    lora_dropout=0.05,
    r=16,
    bias="none",
    target_modules="all-linear",
    task_type="CAUSAL_LM",
    )

You use SFTTrainer to fine-tune the Mistral model:

    ################
    # Training
    ################
    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        dataset_text_field="text",
        eval_dataset=test_dataset,
        peft_config=peft_config,
        max_seq_length=script_args.max_seq_length,
        tokenizer=tokenizer,
        packing=True,
        dataset_kwargs={
            "add_special_tokens": False,  # We template with special tokens
            "append_concat_token": False,  # No need to add additional separator token
        },
    )

At the time of writing, only merged adapters are supported using the Custom Model Import feature for Amazon Bedrock. Let’s look at how to merge the adapter with the base model next.

Merge the adapters

Adapters are new modules added between layers of a pre-trained network. Creation of these new modules is possible by back-propagating gradients through a frozen, 4-bit quantized pre-trained language model into low-rank adapters in the fine-tuning process. To import the Mistral model into Amazon Bedrock, the adapters need to be merged with the base model and saved in Safetensors format. Use the following code to merge the model adapters and save them in Safetensors format:

        # load PEFT model in fp16
        model = AutoPeftModelForCausalLM.from_pretrained(
            training_args.output_dir,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16
        )
        # Merge LoRA and base model and save
        model = model.merge_and_unload()
        model.save_pretrained(
            sagemaker_save_dir, safe_serialization=True, max_shard_size="2GB"
        )

To import the Mistral model into Amazon Bedrock, the model needs to be in an uncompressed directory within an S3 bucket accessible by the Amazon Bedrock service role used in the import job.

Import the fine-tuned model into Amazon Bedrock

Now that you have fine-tuned the model, you can import the model into Amazon Bedrock. In this section, we demonstrate how to import the model using the Amazon Bedrock console or the SDK.

Import the model using the Amazon Bedrock console

To import the model using the Amazon Bedrock console, see Import a model with Custom Model Import. You use the Import model page as shown in the following screenshot to import the model from the S3 bucket.

After you successfully import the fine-tuned model, you can see the model listed on the Amazon Bedrock console.

Import the model using the SDK

The AWS Boto3 library supports importing custom models into Amazon Bedrock. You can use the following code to import a fine-tuned model from within the notebook into Amazon Bedrock. This is an asynchronous method.

import boto3
import datetime
br_client = boto3.client('bedrock', region_name='<aws-region-name>')
pt_model_nm = "<bedrock-custom-model-name>"
pt_imp_jb_nm = f"{pt_model_nm}-{datetime.datetime.now().strftime('%Y%m%d%M%H%S')}"
role_arn = "<<bedrock_role_with_custom_model_import_policy>>"
pt_model_src = {"s3DataSource": {"s3Uri": f"{pt_pubmed_model_s3_path}"}}
resp = br_client.create_model_import_job(jobName=pt_imp_jb_nm,
                                  importedModelName=pt_model_nm,
                                  roleArn=role_arn,
                                  modelDataSource=pt_model_src)

Test the imported model

Now that you have imported the fine-tuned model into Amazon Bedrock, you can test the model. In this section, we demonstrate how to test the model using the Amazon Bedrock console or the SDK.

Test the model on the Amazon Bedrock console

You can test the imported model using an Amazon Bedrock playground, as illustrated in the following screenshot.

Test the model using the SDK

You can also use the Amazon Bedrock Invoke Model API to run the fine-tuned imported model, as shown in the following code:

client = boto3.client("bedrock-runtime", region_name="us-west-2")
model_id = "<<replace with the imported bedrock model arn>>"


def call_invoke_model_and_print(native_request):
    request = json.dumps(native_request)

    try:
        # Invoke the model with the request.
        response = client.invoke_model(modelId=model_id, body=request)
        model_response = json.loads(response["body"].read())

        response_text = model_response["outputs"][0]["text"]
        print(response_text)
    except (ClientError, Exception) as e:
        print(f"ERROR: Can't invoke '{model_id}'. Reason: {e}")
        exit(1)

prompt = "will there be a season 5 of shadowhunters"
formatted_prompt = f"[INST] {prompt} [/INST]</s>"
native_request = {
"prompt": formatted_prompt,
"max_tokens": 64,
"top_p": 0.9,
"temperature": 0.91
}
call_invoke_model_and_print(native_request)

The custom Mistral model that you imported using Amazon Bedrock supports temperature, top_p, and max_gen_len parameters when invoking the model for inferencing. The inference parameters top_k, max_seq_len, max_batch_size, and max_new_tokens are not supported for a custom Mistral fine-tuned model.

Evaluate the imported model

Now that you have imported and tested the model, let’s evaluate the imported model using the SageMaker FMEval library. For more details, refer to Evaluate Bedrock Imported Models. To evaluate the question answering task, we use the metrics F1 Score, Exact Match Score, Quasi Exact Match Score, Precision Over Words, and Recall Over Words. The key metrics for the question answering tasks are Exact Match, Quasi-Exact Match, and F1 over words evaluated by comparing the model predicted answers against the ground truth answers. The FMEval library supports out-of-the-box evaluation algorithms for metrics such as accuracy, QA Accuracy, and others detailed in the FMEval documentation. Because you fine-tuned the Mistral model for question answering, you can use the QA Accuracy algorithm, as shown in the following code. The FMEval library supports these metrics for the QA Accuracy algorithm.

config = DataConfig(
    dataset_name="trex_sample",
    dataset_uri="data/test_dataset.json",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="question",
    target_output_location="answer"
)
bedrock_model_runner = BedrockModelRunner(
    model_id=model_id,
    output='outputs[0].text',
    content_template='{"prompt": $prompt, "max_tokens": 500}',
)

eval_algo = QAAccuracy()
eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config, 
                                    prompt_template="[INST]$model_input[/INST]", save=True)

You can get the consolidated metrics for the imported model as follows:

for op in eval_output:
    print(f"Eval Name: {op.eval_name}")
    for score in op.dataset_scores:
        print(f"{score.name} : {score.value}")

Clean up

To delete the imported model from Amazon Bedrock, navigate to the model on the Amazon Bedrock console. On the options menu (three dots), choose Delete.

To delete the SageMaker domain along with the SageMaker JupyterLab space, refer to Delete an Amazon SageMaker domain. You may also want to delete the S3 buckets where the data and model are stored. For instructions, see Deleting a bucket.

Conclusion

In this post, we explained the different aspects of fine-tuning a Mistral model using SageMaker, importing the model into Amazon Bedrock, invoking the model using both an Amazon Bedrock playground and Boto3, and then evaluating the imported model using the FMEval library. You can use this feature to import base FMs or FMs fine-tuned either on premises, on SageMaker, or on Amazon EC2 into Amazon Bedrock and use the models without any heavy lifting in your generative AI applications. Explore the Custom Model Import feature for Amazon Bedrock to deploy FMs fine-tuned for code generation tasks in a secure and scalable manner. Visit our GitHub repository to explore samples prepared for fine-tuning and importing models from various families.


About the Authors

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. In this role, he functions as the Lead Architect, helping partners ideate, build, and launch Partner Solutions. As an Information Technology Leader, Jay specializes in artificial intelligence, generative AI, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across supply chain, legal technologies, real estate, financial services, insurance, payments, and market research business domains.

Rupinder Grewal is a Senior AI/ML Specialist Solutions Architect with AWS. He currently focuses on serving of models and MLOps on Amazon SageMaker. Prior to this role, he worked as a Machine Learning Engineer building and hosting models. Outside of work, he enjoys playing tennis and biking on mountain trails.

Evandro Franco is a Sr. AI/ML Specialist Solutions Architect at Amazon Web Services. He helps AWS customers overcome business challenges related to AI/ML on top of AWS. He has more than 18 years of experience working with technology, from software development, infrastructure, serverless, to machine learning.

Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Prior to joining AWS, Felipe worked with GE Digital and SLB, where he focused on modeling and optimization products for industrial applications.

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, artificial intelligence, machine learning, and system design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Ragha Prasad is a Principal Engineer and a founding member of Amazon Bedrock, where he has had the privilege to listen to customer needs first-hand and understands what it takes to build and launch scalable and secure Gen AI products. Prior to Bedrock, he worked on numerous products in Amazon, ranging from devices to Ads to Robotics.

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area.

Read More

Using task-specific models from AI21 Labs on AWS

Using task-specific models from AI21 Labs on AWS

In this blog post, we will show you how to leverage AI21 Labs’ Task-Specific Models (TSMs) on AWS to enhance your business operations. You will learn the steps to subscribe to AI21 Labs in the AWS Marketplace, set up a domain in Amazon SageMaker, and utilize AI21 TSMs via SageMaker JumpStart.

AI21 Labs is a foundation model (FM) provider focusing on building state-of-the-art language models. AI21 Task Specific Models (TSMs) are built for answering questions, summarization, condensing lengthy texts, and so on. AI21 TSMs are available in Amazon SageMaker Jumpstart.

Here are the AI21 TSMs that can be accessed and customized in SageMaker JumpStart: AI21 Contextual Answers, AI21 Summarize, AI21 Paraphrase, and AI21 Grammatical Error Correction.

AI21 FMs (Jamba-Instruct, AI21 Jurassic-2 Ultra, AI21 Jurassic-2 Mid) are available in Amazon Bedrock and can be used for large language model (LLM) use cases. We used AI21 TSMs available in SageMaker Jumpstart for this post. SageMaker Jumpstart enables you to select, compare, and evaluate available AI21 TSMs.

AI21’s TSMs

Foundation models can solve many tasks, but not every task is unique. Some commercial tasks are common across many applications. AI21 Labs’ TSMs are specialized models built to solve a particular problem. They’re built to deliver out-of-box value, cost effectiveness, and higher accuracy for the common tasks behind many commercial use-cases. In this post, we will explore three of AI21 Labs’ TSMs and their unique capabilities.

Foundation models are built and trained on massive datasets to perform a variety of tasks. Unlike FMs, TSMs are trained to perform unique tasks.

When your use case is supported by a TSM, you quickly realize benefits such as improved refusal rates when you don’t want the model to provide answers unless they’re grounded in actual document content.

  • Paraphrase: This model is used to enhance content creation and communication by generating varied versions of text while maintaining a consistent tone and style. This model is ideal for creating multiple product descriptions, marketing materials, and customer support responses, improving clarity and engagement. It also simplifies complex documents, making information more accessible.
  • Summarize: This model is used to condense lengthy texts into concise summaries while preserving the original meaning. This model is particularly useful for processing large documents, such as financial reports, legal documents, and technical papers, making critical information more accessible and comprehensible.
  • Contextual answers: This model is used to significantly enhance information retrieval and customer support processes. This model excels at providing accurate and relevant answers based on specific document contexts, making it particularly useful in customer service, legal, finance, and educational sectors. It streamlines workflows by quickly accessing relevant information from extensive databases, reducing response times and improving customer satisfaction.

Prerequisites

To follow the steps in this post, you must have the following prerequisites in place:

AWS account setup

Completing the labs in this post requires an AWS account and SageMaker environments set up. If you don’t have an AWS account, see Complete your AWS registration for the steps to create one.

AWS Marketplace opt-in

AI21 TSMs can also be accessed through Amazon Marketplace for subscription. Using AWS Marketplace, you can subscribe to AI21 TSMs and deploy SageMaker endpoints.

To do these exercises you must subscribe to the following offerings in the AWS Marketplace

Service quota limits

To use some of the GPU’s required to run AI21’s task specific models, you must have the required service quota limits. You can request a service quota limit increase in the AWS Management Console. Limits are account and resource specific.

To create a service request, search for service quotas in the console search bar. Select the service to land go to the dashboard and enter the name of the GPU (for example, ml.g5.48xlarge). Ensure the quota is for endpoint usage

Estimated cost

The following is the estimated cost to walk through the solution in this post.

Contextual answers:

  • We used an ml.g5.48xlarge
    • By default, AWS accounts don’t have access to this GPU. You must request a service quota limit increase (see the previous section: Service Quota Limits).
  • The notebook runtime was approximately 15 minutes.
  • The cost was $20.41 (billed on an hourly basis).

Summarize notebook

  • We used an ml.g4dn.12xlarge GPU.
    • You must request a service quota limit increase (see the previous section: Service Quota Limits).
  • The notebook runtime was approximately 10 minutes.
  • The cost was $4.94 (billed on an hourly basis).

Paraphrase notebook

  • We used the ml.g4dn.12xlarge GPU.
    • You must request a service quota limit increase (see the previous section: Service Quota Limits).
  • The notebook runtime approximately 10 minutes.
  • The cost was $4.94 (billed on an hourly basis).

Total cost: $30.29 (1 hour charge for each deployed endpoint)

Using AI21 models on AWS

Getting started

In this section, you will access AI21 TSMs in SageMaker Jumpstart.  These interactive notebooks contain code to deploy TSM endpoints and will also provide example code blocks to run inference.  These first few steps are pre-requisites to deploying the same notebooks.  If you already have a SageMaker domain and username set up, you may skip to Step 7.

  1. Use the search bar in the AWS Management Console to navigate to Amazon SageMaker , as shown in the following figure.


If you don’t already have one set up, you must create a SageMaker domain. A domain consists of an associated Amazon Elastic File System (Amazon EFS) volume; a list of authorized users, and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations.

Users within a domain can share notebook files and other artifacts with each other. For more information, see Learn about Amazon SageMaker domain entities and statuses. For today’s exercises, you will use Quick Set-Up to deploy an environment.

  1. Choose Create a SageMaker domain as shown in the following figure.
  2. Select Quick setup. After you choose Set up the domain will begin creation
  3. After a moment, your domain will be created.
  4. Choose Add user.
  5. You can keep the default user profile values.
  6. Launch Studio by choosing Launch button and then selecting Studio.
  7. Choose JumpStart in the navigation pane as shown in the following figure.

Here you can see the model providers for our JumpStart notebooks.

You will see the model providers for JumpStart notebooks.

  1. Select AI21 Labs to see their available models.

Each of AI21’s models has an associated model card. A model card provides key information about the model such as its intended use cases, training, and evaluation details. For this example, you will use the Summarize, Paraphrase, and Contextual Answers TSMs.

  1. Start with Contextual Answers. Select the AI21 Contextual Answers model card.

A sample notebook is included as part of the model. Jupyter Notebooks are a popular way to interact with code and LLMs.

  1. Choose Notebooks to explore the notebook.
  2. To run the notebook’s code blocks, choose Open in JupyterLab.
  3. If you do not already have an existing space, choose Create new space and enter an appropriate name. When ready, choose Create space and open notebook.

It can take up to 5 minutes to open your notebook.
SageMaker Spaces are used to manage the storage and resource needs of some SageMaker Studio applications. Each space has a 1:1 relationship with an instance of an application.

  1. After the notebook opens, you will be prompted to select a kernal. Ensure Python 3 is selected and choose Select.

Navigating the notebook exercises

Repeat the preceding process to import the remaining notebooks.

Each AI21 notebook demonstrates required code imports, version checks, model selection, endpoint creation, and inferences showcasing the TSM’s unique strengths through code blocks and example prompts

Each notebook will have a clean up step at the end to delete your deployed endpoints. It’s important to terminate any running endpoints to avoid additional costs.

Contextual Answers JumpStart Notebook

AWS customers and partners can use AI21 Labs’s Contextual Answers model to significantly enhance their information retrieval and customer support processes. This model excels at providing accurate and relevant answers based on specific context, making it useful in customer service, legal, finance, and educational sectors.

The following are code snippets from AI21’s Contextual Answers TSM through JumpStart. Notice that there is no prompt engineering required. The only input is the question and the context provided.

Input:

financial_context = """In 2020 and 2021, enormous QE — approximately $4.4 trillion, or 18%, of 2021 gross domestic product (GDP) — and enormous fiscal stimulus (which has been and always will be inflationary) — approximately $5 trillion, or 21%, of 2021 GDP — stabilized markets and allowed companies to raise enormous amounts of capital. In addition, this infusion of capital saved many small businesses and put more than $2.5 trillion in the hands of consumers and almost $1 trillion into state and local coffers. These actions led to a rapid decline in unemployment, dropping from 15% to under 4% in 20 months — the magnitude and speed of which were both unprecedented. Additionally, the economy grew 7% in 2021 despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods. Fortunately, during these two years, vaccines for COVID-19 were also rapidly developed and distributed.
In today's economy, the consumer is in excellent financial shape (on average), with leverage among the lowest on record, excellent mortgage underwriting (even though we've had home price appreciation), plentiful jobs with wage increases and more than $2 trillion in excess savings, mostly due to government stimulus. Most consumers and companies (and states) are still flush with the money generated in 2020 and 2021, with consumer spending over the last several months 12% above pre-COVID-19 levels. (But we must recognize that the account balances in lower-income households, smaller to begin with, are going down faster and that income for those households is not keeping pace with rising inflation.)
Today's economic landscape is completely different from the 2008 financial crisis when the consumer was extraordinarily overleveraged, as was the financial system as a whole — from banks and investment banks to shadow banks, hedge funds, private equity, Fannie Mae and many other entities. In addition, home price appreciation, fed by bad underwriting and leverage in the mortgage system, led to excessive speculation, which was missed by virtually everyone — eventually leading to nearly $1 trillion in actual losses.
"""
question = "Did the economy shrink after the Omicron variant arrived?"
response = client.answer.create(
    context=financial_context,
    question=question,
)

print(response.answer)

Output:

No, the economy did not shrink after the Omicron variant arrived. In fact, the economy grew 7% in 2021, despite the arrival of the Delta and Omicron variants and the global supply chain shortages, which were largely fueled by the dramatic upswing in consumer spending and the shift in that spend from services to goods.

As mentioned in our introduction, AI21’s Contextual Answers model does not provide answers to questions outside of the context provided. If the prompt includes a question unrelated to 2020/2021 economy, you will get a response as shown in the following example.

Input:

irrelevant_question = "How did COVID-19 affect the financial crisis of 2008?"

response = client.answer.create(
context=financial_context,
question=irrelevant_question,
)

print(response.answer)

Output:

None

When finished, you can delete your deployed endpoint by running the final two cells of the notebook.

model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)
model.delete_model()

You can import the other notebooks by navigating to SageMaker JumpStart and repeating the same process you used to import this first notebook.

Summarize JumpStart Notebook

AWS customers and partners can uses AI21 Labs’ Summarize model to condense lengthy texts into concise summaries while preserving the original meaning. This model is particularly useful for processing large documents, such as financial reports, legal documents, and technical papers, making critical information more accessible and comprehensible.

The following are highlight code snippets from AI21’s Summarize TSM using JumpStart. Notice that the  input must include the full text that the user wants to summarize.

Input:

text = """The error affected a number of international flights leaving the terminal on Wednesday, with some airlines urging passengers to travel only with hand luggage.
Virgin Atlantic said all airlines flying out of the terminal had been affected.
Passengers have been warned it may be days before they are reunited with luggage.
An airport spokesperson apologised and said the fault had now been fixed.
Virgin Atlantic said it would ensure all bags were sent out as soon as possible.
It added customers should retain receipts for anything they had bought and make a claim to be reimbursed.
Passengers, who were informed by e-mail of the problem, took to social media to vent their frustrations.
One branded the situation "ludicrous" and said he was only told 12 hours before his flight.
The airport said it could not confirm what the problem was, what had caused it or how many people had been affected."""

response = client.summarize.create(
    source=text,
    source_type=DocumentType.TEXT,
)

print("Original text:")
print(text)
print("================")
print("Summary:")
print(response.summary)

Output:
Original text:
The error affected a number of international flights leaving the terminal on Wednesday, with some airlines urging passengers to travel only with hand luggage.
Virgin Atlantic said all airlines flying out of the terminal had been affected.
Passengers have been warned it may be days before they are reunited with luggage.
An airport spokesperson apologised and said the fault had now been fixed.
Virgin Atlantic said it would ensure all bags were sent out as soon as possible.
It added customers should retain receipts for anything they had bought and make a claim to be reimbursed.
Passengers, who were informed by e-mail of the problem, took to social media to vent their frustrations.
One branded the situation "ludicrous" and said he was only told 12 hours before his flight.
The airport said it could not confirm what the problem was, what had caused it or how many people had been affected.
================
Summary:
A number of international flights leaving the terminal were affected by the error on Wednesday, with some airlines urging passengers to travel only with hand luggage. Passengers were warned it may be days before they are reunited with their luggage.

Paraphrase JumpStart Notebook

AWS customers and partners can use AI21 Labs’s Paraphrase TSM through JumpStart to enhance content creation and communication by generating varied versions of text.

The following are highlight code snippets from AI21’s Paraphrase TSM using JumpStart. Notice that there is no extensive prompt engineering required. The only input required is the full text that the user wants to paraphrase and a chosen style, for example casual, formal, and so on.

Input:

text = "Throughout this page, we will explore the advantages and features of the Paraphrase model."

response = client.paraphrase.create(
text=text,
style="formal"

)

print(response.suggestions) Output: 
[Suggestion(text='We will examine the advantages and features of the Paraphrase model throughout this page.'), Suggestion(text='The purpose of this page is to examine the advantages and features of the Paraphrase model.'), Suggestion(text='On this page, we will discuss the advantages and features of the Paraphrase model.'), Suggestion(text='This page will provide an overview of the advantages and features of the Paraphrase model.'), Suggestion(text='In this article, we will examine the advantages and features of the Paraphrase model.'), Suggestion(text='Here we will explore the advantages and features of the Paraphrase model.'), Suggestion(text='The purpose of this page is to describe the advantages and features of the Paraphrase model.'), Suggestion(text='In this page, we will examine the advantages and features of the Paraphrase model.'), Suggestion(text='The Paraphrase model will be reviewed on this page with an emphasis on its advantages and features.'), Suggestion(text='Our goal on this page will be to explore the benefits and features of the Paraphrase model.')]

Input:

print("Original sentence:")
print(text)
print("============================")
print("Suggestions:")
print("n".join(["- " + x.text for x in response.suggestions]))

Output:

Original sentence:
Throughout this page, we will explore the advantages and features of the Paraphrase model.
============================
Suggestions:
- We will examine the advantages and features of the Paraphrase model throughout this page.
- The purpose of this page is to examine the advantages and features of the Paraphrase model.
- On this page, we will discuss the advantages and features of the Paraphrase model.
- This page will provide an overview of the advantages and features of the Paraphrase model.
- In this article, we will examine the advantages and features of the Paraphrase model.
- Here we will explore the advantages and features of the Paraphrase model.
- The purpose of this page is to describe the advantages and features of the Paraphrase model.
- In this page, we will examine the advantages and features of the Paraphrase model.
- The Paraphrase model will be reviewed on this page with an emphasis on its advantages and features.
- Our goal on this page will be to explore the benefits and features of the Paraphrase model.

Less prompt engineering

A key advantage of AI21’s task-specific models is the reduced need for complex prompt engineering compared to foundation models. Let’s consider how you might approach a summarization task using a foundation model compared to using AI21’s specialized Summarize TSM.

For a foundation model, you might need to craft an elaborate prompt template with detailed instructions:

python prompt_template = "You are a highly capable summarization assistant. Concisely summarize the given text while preserving key details and overall meaning. Use clear language tailored for human readers.nnText: 

[INPUT_TEXT]nnSummary:" ``` To summarize text with this foundation model, you'd populate the template and pass the full prompt: ```python input_text = "Insert text to summarize here..." prompt = prompt_template.replace("[INPUT_TEXT]", input_text) summary = model(prompt)

 In contrast, using AI21's Summarize TSM is more straightforward:
python input_text = "Insert text to summarize here..." summary = summarize_model(input_text)

That’s it! With the Summarize TSM, you pass the input text directly to the model; there’s no need for an intricate prompt template.

Lower cost and higher accuracy

By using TSMs, you can achieve lower costs and higher accuracy. As demonstrated previously in the Contextual Notebook, TSMs have a higher refusal rate than most mainstream models, which can lead to higher accuracy. This characteristic of TSMs is beneficial in use cases where wrong answers are less acceptable.

Conclusion

In this post, we explored AI21 Labs’s approach to generative AI using task-specific models (TSMs). Through guided exercises, you walked through the process of setting up a SageMaker domain and importing sample JumpStart Notebooks to experiment with AI21’s TSMs, including Contextual Answers, Paraphrase, and Summarize.

Throughout the exercises, you saw the potential benefits of task-specific models compared to foundation models. When asking questions outside the context of the intended use case, the AI21 TSMs refused to answer, making them less prone to hallucinating or generating nonsensical outputs beyond their intended domain—a critical factor for applications that require precision and safety. Lastly, we highlighted how task-specific models are designed from the outset to excel at specific tasks, streamlining development and reducing the need for extensive prompt engineering and fine-tuning, which can them a more cost-effective solution.

Whether you’re a data scientist, machine learning practitioner, or someone curious about AI advancements, we hope this post has provided valuable insights into the advantages of AI21 Labs’s task-specific approach. As the field of generative AI continues to evolve rapidly, we encourage you to stay curious, experiment with various approaches, and ultimately choose the one that best aligns with your project’s unique requirements and goals. Visit AWS GitHub for other example use cases and codes to experiment in your own environment.

Additional resources


About the Authors

Joe Wilson is a Solutions Architect at Amazon Web Services supporting nonprofit organizations. He has core competencies in data analytics, AI/ML and GenAI. Joe background is in data science and international development. He is passionate about leveraging data and technology for social good.

Pat Wilson is a Solutions Architect at Amazon Web Services with a focus on AI/ML workloads and security. He currently supports Federal Partners. Outside of work Pat enjoys learning, working out, and spending time with family/friends.

Josh Famestad is a Solutions Architect at Amazon Web Services. Josh works with public sector customers to build and execute cloud based approaches to deliver on business priorities.

Read More

Architecture to AWS CloudFormation code using Anthropic’s Claude 3 on Amazon Bedrock

Architecture to AWS CloudFormation code using Anthropic’s Claude 3 on Amazon Bedrock

The Anthropic’s Claude 3 family of models, available on Amazon Bedrock, offers multimodal capabilities that enable the processing of images and text. This capability opens up innovative avenues for image understanding, wherein Anthropic’s Claude 3 models can analyze visual information in conjunction with textual data, facilitating more comprehensive and contextual interpretations. By taking advantage of its multimodal prowess, we can ask the model questions like “What objects are in the image, and how are they relatively positioned to each other?” We can also gain an understanding of data presented in charts and graphs by asking questions related to business intelligence (BI) tasks, such as “What is the sales trend for 2023 for company A in the enterprise market?” These are just some examples of the additional richness Anthropic’s Claude 3 brings to generative artificial intelligence (AI) interactions.

Architecting specific AWS Cloud solutions involves creating diagrams that show relationships and interactions between different services. Instead of building the code manually, you can use Anthropic’s Claude 3’s image analysis capabilities to generate AWS CloudFormation templates by passing an architecture diagram as input.

In this post, we explore some ways you can use Anthropic’s Claude 3 Sonnet’s vision capabilities to accelerate the process of moving from architecture to the prototype stage of a solution.

Use cases for architecture to code

The following are relevant use cases for this solution:

  • Converting whiteboarding sessions to AWS infrastructure To quickly prototype your designs, you can take the architecture diagrams created during whiteboarding sessions and generate the first draft of a CloudFormation template. You can also iterate over the CloudFormation template to develop a well-architected solution that meets all your requirements.
  • Fast deployment of architecture diagrams – You can generate boilerplate CloudFormation templates by using architecture diagrams you find on the web. This allows you to experiment quickly with new designs.
  • Streamlined AWS infrastructure design through collaborative diagramming – You might draw architecture diagrams on a diagramming tool during an all-hands meeting. These raw diagrams can generate boilerplate CloudFormation templates, quickly leading to actionable steps while speeding up collaboration and increasing meeting value.

Solution overview

To demonstrate the solution, we use Streamlit to provide an interface for diagrams and prompts. Amazon Bedrock invokes the Anthropic’s Claude 3 Sonnet model, which provides multimodal capabilities. AWS Fargate is the compute engine for web application. The following diagram illustrates the step-by-step process.

Architecture Overview

The workflow consists of the following steps:

  1. The user uploads an architecture image (JPEG or PNG) on the Streamlit application, invoking the Amazon Bedrock API to generate a step-by-step explanation of the architecture using the Anthropic’s Claude 3 Sonnet model.
  2. The Anthropic’s Claude 3 Sonnet model is invoked using a step-by-step explanation and few-shot learning examples to generate the initial CloudFormation code. The few-shot learning example consists of three CloudFormation templates; this helps the model understand writing practices associated with CloudFormation code.
  3. The user manually provides instructions using the chat interface to update the initial CloudFormation code.

*Steps 1 and 2 are executed once when architecture diagram is uploaded. To trigger changes to the AWS CloudFormation code (step 3) provide update instructions from the Streamlit app

The CloudFormation templates generated by the web application are intended for inspiration purposes and not for production-level applications. It is the responsibility of a developer to test and verify the CloudFormation template according to security guidelines.

Few-shot Prompting

To help Anthropic’s Claude 3 Sonnet understand the practices of writing CloudFormation code, we use few-shot prompting by providing three CloudFormation templates as reference examples in the prompt. Exposing Anthropic’s Claude 3 Sonnet to multiple CloudFormation templates will allow it to analyze and learn from the structure, resource definitions, parameter configurations, and other essential elements consistently implemented across your organization’s templates. This enables Anthropic’s Claude 3 Sonnet to grasp your team’s coding conventions, naming conventions, and organizational patterns when generating CloudFormation templates. The following examples used for few-shot learning can be found in the GitHub repo.

Few-shot prompting example 1

Few-shot prompting example 1

Few-shot prompting example 2

Few-shot prompting example 2

Few-shot prompting example 3

Few-shot prompting example 3

Furthermore, Anthropic’s Claude 3 Sonnet can observe how different resources and services are configured and integrated within the CloudFormation templates through few-shot prompting. It will gain insights into how to automate the deployment and management of various AWS resources, such as Amazon Simple Storage Service (Amazon S3), AWS Lambda, Amazon DynamoDB, and AWS Step Functions.

Inference parameters are preset, but they can be changed from the web application if desired. We recommend experimenting with various combinations of these parameters. By default, we set the temperature to zero to reduce the variability of outputs and create focused, syntactically correct code.

Prerequisites

To access the Anthropic’s Claude 3 Sonnet foundation model (FM), you must request access through the Amazon Bedrock console. For instructions, see Manage access to Amazon Bedrock foundation models. After requesting access to Anthropic’s Claude 3 Sonnet, you can deploy the following development.yaml CloudFormation template to provision the infrastructure for the demo. For instructions on how to deploy this sample, refer to the GitHub repo. Use the following table to launch the CloudFormation template to quickly deploy the sample in either us-east-1 or us-west-2.

Region Stack
us-east-1 development.yaml
us-west-2 development.yaml

When deploying the template, you have the option to specify the Amazon Bedrock model ID you want to use for inference. This flexibility allows you to choose the model that best suits your needs. By default, the template uses the Anthropic’s Claude 3 Sonnet model, renowned for its exceptional performance. However, if you prefer to use a different model, you can seamlessly pass its Amazon Bedrock model ID as a parameter during deployment. Verify that you have requested access to the desired model beforehand and that the model possesses the necessary vision capabilities required for your specific use case.

After you launch the CloudFormation stack, navigate to the stack’s Outputs tab on the AWS CloudFormation console and collect the Amazon CloudFront URL. Enter the URL in your browser to view the web application.

Web application screenshot

In this post, we discuss CloudFormation template generation for three different samples. You can find the sample architecture diagrams in the GitHub repo. These samples are similar to the few-shot learning examples, which is intentional. As an enhancement to this architecture, you can employ a Retrieval Augmented Generation (RAG)-based approach to retrieve relevant CloudFormation templates from a knowledge base to dynamically augment the prompt.

Due to the non-deterministic behavior of the large language model (LLM), you might not get the same response as shown in this post.

Let’s generate CloudFormation templates for the following sample architecture diagram.

Sample Architecture for CloudFormation generation

Uploading the preceding architecture diagram to the web application generates a step-by-step explanation of the diagram using Anthropic’s Claude 3 Sonnet’s vision capabilities.

Web application output screenshot

Let’s analyze the step-by-step explanation. The generated response is divided into three parts:

  1. The context explains what the architecture diagram depicts.
  2. The architecture diagram’s flow gives the order in which AWS services are invoked and their relationship with each other.
  3. We get a summary of the entire generated response.

In the following step-by-step explanation, we see a few highlighted errors.

Step-by-step explanation errors

The step-by-step explanation is augmented with few-shot learning examples to develop an initial CloudFormation template. Let’s analyze the initial CloudFormation template:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  This CloudFormation stack sets up a serverless data processing pipeline triggered by file uploads to an S3 bucket.
  It uses AWS Lambda to process the uploaded files, and Amazon SNS to send notifications upon successful processing.
  This template is not production ready and should only be used for inspiration
Parameters:
  S3BucketName:
    Type: String
    Description: Name of the S3 bucket for file uploads
    AllowedPattern: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
    MinLength: 1
    MaxLength: 63

  EmailAddress:
    Type: String
    Description: Email address to receive notifications
    AllowedPattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$

Resources:

  # S3 Bucket
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref S3BucketName
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: 's3:ObjectCreated:*'
            Function: !GetAtt ProcessingLambda.Arn

  # SNS Topic
  SNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref EmailAddress
          Protocol: email

  # Lambda Function
  ProcessingLambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: ProcessingLambda
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt LambdaRole.Arn
      Code:
        ZipFile: |
          import boto3

          def lambda_handler(event, context):
              s3 = boto3.client('s3')
              sns = boto3.client('sns')

              # Process the uploaded file
              for record in event['Records']:
                  bucket_name = record['s3']['bucket']['name']
                  object_key = record['s3']['object']['key']

                  # Process the file data
                  # ...

              # Send notification upon successful processing
              sns.publish(
                  TopicArn=!Ref SNSTopic,
                  Message='File processing completed successfully',
                  Subject='Data Processing Notification'
              )

              return {
                  'statusCode': 200,
                  'body': 'File processing completed successfully'
              }

  # Lambda Role
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: 'sts:AssumeRole'
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: S3Access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 's3:GetObject'
                Resource: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName, '/*']]
        - PolicyName: SNSPublish
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'sns:Publish'
                Resource: !Ref SNSTopic

  # Lambda Permissions
  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt ProcessingLambda.Arn
      Action: 'lambda:InvokeFunction'
      Principal: s3.amazonaws.com
      SourceAccount: !Ref AWS::AccountId
      SourceArn: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName]]

Outputs:

  S3BucketName:
    Description: Name of the S3 bucket for file uploads
    Value: !Ref S3Bucket
    Export:
      Name: !Sub '${AWS::StackName}-S3BucketName'

  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt ProcessingLambda.Arn
    Export:
      Name: !Sub '${AWS::StackName}-LambdaFunctionArn'

  SNSTopicArn:
    Description: ARN of the SNS topic for notifications
    Value: !Ref SNSTopic
    Export:
      Name: !Sub '${AWS::StackName}-SNSTopicArn'

After analyzing the CloudFormation template, we see that the Lambda code refers to an Amazon Simple Notification Service (Amazon SNS) topic using !Ref SNSTopic, which is not valid. We also want to add additional functionality to the template. First, we want to filter the Amazon S3 notification configuration to invoke Lambda only when *.csv files are uploaded. Second, we want to add metadata to the CloudFormation template. To do this, we use the chat interface to give the following update instructions to the web application:

Make the following updates:

Use environment variables for AWS Lambda to access SNS Topic ARN.

Add filter to S3 notification configuration to only invoke AWS lambda when *.csv files are uploaded

Add metadata to CloudFormation template

Chat interface web application screenshot

The updated CloudFormation template is as follows:

AWSTemplateFormatVersion: '2010-09-09'
Description: >
  This CloudFormation stack sets up a serverless data processing pipeline triggered by file uploads to an S3 bucket.
  It uses AWS Lambda to process the uploaded files, and Amazon SNS to send notifications upon successful processing.
  This template is not production ready and should only be used for inspiration.
Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
      - Label:
          default: 'S3 Bucket Configuration'
        Parameters:
          - S3BucketName
      - Label:
          default: 'Notification Configuration'
        Parameters:
          - EmailAddress

Parameters:
  S3BucketName:
    Type: String
    Description: Name of the S3 bucket for file uploads
    AllowedPattern: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
    MinLength: 1
    MaxLength: 63

  EmailAddress:
    Type: String
    Description: Email address to receive notifications
    AllowedPattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$

Resources:

  # S3 Bucket
  S3Bucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Ref S3BucketName
      NotificationConfiguration:
        LambdaConfigurations:
          - Event: 's3:ObjectCreated:*'
            Function: !GetAtt ProcessingLambda.Arn
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: .csv

  # SNS Topic
  SNSTopic:
    Type: AWS::SNS::Topic
    Properties:
      Subscription:
        - Endpoint: !Ref EmailAddress
          Protocol: email

  # Lambda Function
  ProcessingLambda:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: ProcessingLambda
      Runtime: python3.9
      Handler: index.lambda_handler
      Role: !GetAtt LambdaRole.Arn
      Environment:
        Variables:
          SNS_TOPIC_ARN: !Ref SNSTopic
      Code:
        ZipFile: |
          import boto3
          import os

          def lambda_handler(event, context):
              s3 = boto3.client('s3')
              sns = boto3.client('sns')
              sns_topic_arn = os.environ['SNS_TOPIC_ARN']

              # Process the uploaded file
              for record in event['Records']:
                  bucket_name = record['s3']['bucket']['name']
                  object_key = record['s3']['object']['key']

                  # Process the file data
                  # ...

              # Send notification upon successful processing
              sns.publish(
                  TopicArn=sns_topic_arn,
                  Message='File processing completed successfully',
                  Subject='Data Processing Notification'
              )

              return {
                  'statusCode': 200,
                  'body': 'File processing completed successfully'
              }

  # Lambda Role
  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: 'sts:AssumeRole'
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: S3Access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 's3:GetObject'
                Resource: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName, '/*']]
        - PolicyName: SNSPublish
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - 'sns:Publish'
                Resource: !Ref SNSTopic

  # Lambda Permissions
  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt ProcessingLambda.Arn
      Action: 'lambda:InvokeFunction'
      Principal: s3.amazonaws.com
      SourceAccount: !Ref AWS::AccountId
      SourceArn: !Join ['', ['arn:aws:s3:::', !Ref S3BucketName]]

Outputs:

  S3BucketName:
    Description: Name of the S3 bucket for file uploads
    Value: !Ref S3Bucket
    Export:
      Name: !Sub '${AWS::StackName}-S3BucketName'

  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt ProcessingLambda.Arn
    Export:
      Name: !Sub '${AWS::StackName}-LambdaFunctionArn'

  SNSTopicArn:
    Description: ARN of the SNS topic for notifications
    Value: !Ref SNSTopic
    Export:
      Name: !Sub '${AWS::StackName}-SNSTopicArn'

Additional examples

We have provided two more sample diagrams, their associated CloudFormation code generated by Anthropic’s Claude 3 Sonnet, and the prompts used to create them. You can see how diagrams in various forms, from digital to hand-drawn, or some combination, can be used. The end-to-end analysis of these samples can be found at sample 2 and sample 3 on the GitHub repo.

Best practices for architecture to code

In the demonstrated use case, you can observe how well the Anthropic’s Claude 3 Sonnet model could pull details and relationships between services from an architecture image. The following are some ways you can improve the performance of Anthropic’s Claude in this use case:

  • Implement a multimodal RAG approach to enhance the application’s ability to handle a wider variety of complex architecture diagrams, because the current implementation is limited to diagrams similar to the provided static examples.
  • Enhance the architecture diagrams by incorporating visual cues and features, such as labeling services, indicating orchestration hierarchy levels, grouping related services at the same level, enclosing services within clear boxes, and labeling arrows to represent the flow between services. These additions will aid in better understanding and interpreting the diagrams.
  • If the application generates an invalid CloudFormation template, provide the error as update instructions. This will help the model understand the mistake and make a correction.
  • Use Anthropic’s Claude 3 Opus or Anthropic’s Claude 3.5 Sonnet for greater performance on long contexts in order to support near-perfect recall
  • With careful design and management, orchestrate agentic workflows by using Agents for Amazon Bedrock. This enables you to incorporate self-reflection, tool use, and planning within your workflow to generate more relevant CloudFormation templates.
  • Use Amazon Bedrock Prompt Flows to accelerate the creation, testing, and deployment of workflows through an intuitive visual interface. This can reduce development effort and accelerate workflow testing.

Clean up

To clean up the resources used in this demo, complete the following steps:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Select the deployed yaml development.yaml stack and choose Delete.

Conclusion

With the pattern demonstrated with Anthropic’s Claude 3 Sonnet, developers can effortlessly translate their architectural visions into reality by simply sketching their desired cloud solutions. Anthropic’s Claude 3 Sonnet’s advanced image understanding capabilities will analyze these diagrams and generate boilerplate CloudFormation code, minimizing the need for initial complex coding tasks. This visually driven approach empowers developers from a variety of skill levels, fostering collaboration, rapid prototyping, and accelerated innovation.

You can investigate other patterns, such as including RAG and agentic workflows, to improve the accuracy of code generation. You can also explore customizing the LLM by fine-tuning it to write CloudFormation code with greater flexibility.

Now that you have seen Anthropic’s Claude 3 Sonnet in action, try designing your own architecture diagrams using some of the best practices to take your prototyping to the next level.

For additional resources, refer to the :


About the Authors

Author 1 Eashan KaushikEashan Kaushik is an Associate Solutions Architect at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Author 2 Chris PecoraChris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focusing on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Read More

How Northpower used computer vision with AWS to automate safety inspection risk assessments

How Northpower used computer vision with AWS to automate safety inspection risk assessments

This post is co-written with Andreas Astrom from Northpower.

Northpower provides reliable and affordable electricity and fiber internet services to customers in the Northland region of New Zealand. As an electricity distributor, Northpower aims to improve access, opportunity, and prosperity for its communities by investing in infrastructure, developing new products and services, and giving back to shareholders. Additionally, Northpower is one of New Zealand’s largest infrastructure contractors, serving clients in transmission, distribution, generation, and telecommunications. With over 1,400 staff working across 14 locations, Northpower plays a crucial role in maintaining essential services for customers driven by a purpose of connecting communities and building futures for Northland.

The energy industry is at a critical turning point. There is a strong push from policymakers and the public to decarbonize the industry, while at the same time balancing energy resilience with health, safety, and environmental risk. Recent events including Tropical Cyclone Gabrielle have highlighted the susceptibility of the grid to extreme weather and emphasized the need for climate adaptation with resilient infrastructure. Electricity Distribution Businesses (EDBs) are also facing new demands with the integration of decentralized energy resources like rooftop solar as well as larger-scale renewable energy projects like solar and wind farms. These changes call for innovative solutions to ensure operational efficiency and continued resilience.

In this post, we share how Northpower has worked with their technology partner Sculpt to reduce the effort and carbon required to identify and remediate public safety risks. Specifically, we cover the computer vision and artificial intelligence (AI) techniques used to combine datasets into a list of prioritized tasks for field teams to investigate and mitigate. The resulting dashboard highlighted that 141 power pole assets required action, out of a network of 57,230 poles.

Northpower challenge

Utility poles have stay wires that anchor the pole to the ground for extra stability. These stay wires are meant to have an inline insulator to avoid the situation of the stay wire becoming live, which would create a safety risk for person or animal in the area.

Northpower faced a significant challenge in determining how many of their 57,230 power poles have stay wires without insulators. Without reliable historical data, manual inspections of such a vast and predominantly rural network is labor-intensive and costly. Alternatives like helicopter surveys or field technicians require access to private properties for safety inspections, and are expensive. Moreover, the travel requirement for technicians to physically visit each pole across such a large network posed a considerable logistical challenge, emphasizing the need for a more efficient solution.

Thankfully, some asset datasets were available in digital format, and historical paper-based inspection reports, dating back 20 years, were available in scanned format. This archive, along with 765,933 varied-quality inspection photographs, some over 15 years old, presented a significant data processing challenge. Processing these images and scanned documents is not a cost- or time-efficient task for humans, and requires highly performant infrastructure that can reduce the time to value.

Solution overview

Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. In this solution, the team used Amazon SageMaker Studio to launch an object detection model available in Amazon SageMaker JumpStart using the PyTorch framework.

The following diagram illustrates the high-level workflow.

Northpower chose SageMaker for a number of reasons:

  • SageMaker Studio is a managed service with ready-to-go development environments, saving time otherwise used for setting up environments manually
  • SageMaker JumpStart took care of the setup and deployed the required ML jobs involved in the project with minimal configuration, further saving development time
  • The integrated labeling solution with Amazon SageMaker Ground Truth was suitable for large-scale image annotations and simplified the collaboration with a Northpower labeling workforce

In the following sections, we discuss the key components of the solution as illustrated in the preceding diagram.

Data preparation

SageMaker Ground Truth employs a human workforce made up of Northpower volunteers to annotate a set of 10,000 images. The workforce created a bounding box around stay wires and insulators and the output was subsequently used to train an ML model.

Model training, validation, and storage

This component uses the following services:

  • SageMaker Studio is used to access and deploy a pre-trained object detection model and develop code on managed Jupyter notebooks. The model was then fine-tuned with training data from the data preparation stage. For a step-by-step guide to set up SageMaker Studio, refer to Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users.
  • SageMaker Studio runs custom Python code to augment the training data and transform the metadata output from SageMaker Ground Truth into a format supported by the computer vision model training job. The model is then trained using a fully managed infrastructure, validated, and published to the Amazon SageMaker Model Registry.
  • Amazon Simple Storage Service (Amazon S3) stores the model artifacts and creates a data lake to host the inference output, document analysis output, and other datasets in CSV format.

Model deployment and inference

In this step, SageMaker hosts the ML model on an endpoint used to run inferences.

A SageMaker Studio notebook was used again post-inference to run custom Python code to simplify the datasets and render bounding boxes on objects based on criteria. This step also applied a custom scoring system that was also rendered onto the final image, and this allowed for an additional human QA step for low confidence images.

Data analytics and visualization

This component includes the following services:

  • An AWS Glue crawler is used to understand the dataset structures stored in the data lake so that it can be queried by Amazon Athena
  • Athena allows the use of SQL to combine the inference output and asset datasets to find highest risk items
  • Amazon QuickSight was used as the tool for both the human QA process and for determining which assets needed a field technician to be sent for physical inspection

Document understanding

In the final step, Amazon Textract digitizes historical paper-based asset assessments and stores the output in CSV format.

Results

The trained PyTorch object detection model enabled the detection of stay wires and insulators on utility poles, and a SageMaker postprocessing job calculated a risk score using an m5.24xlarge Amazon Elastic Compute Cloud (EC2) instance with 200 concurrent Python threads. This instance was also responsible for rendering the score information along with an object bounding box onto an output image, as shown in the following example.

Writing the confidence scores into the S3 data lake alongside the historical inspection results allowed Northpower to run analytics using Athena to understand each classification of image. The sunburst graph below is a visualization of this classification.

Northpower categorized 1,853 poles as high priority risks, 3,922 as medium priority, 36,260 as low priority, and 15,195 as the lowest priority. These were viewable in the QuickSight dashboard and used as an input for humans to review the highest risk assets first.

At the conclusion of the analysis, Northpower found that 31 poles needed stay wire insulators installed and a further 110 poles needed investigation in the field. This significantly reduced the cost and carbon usage involved in manually checking every asset.

Conclusion

Remote asset inspecting remains a challenge for regional EDBs, but using computer vision and AI to uncover new value from data that was previously unused was key to Northpower’s success in this project. SageMaker JumpStart provided deployable models that could be trained for object detection use cases with minimal data science knowledge and overhead.

Discover the publicly available foundation models offered by SageMaker JumpStart and fast-track your own ML project with the following step-by-step tutorial.


About the authors

Scott Patterson is a Senior Solutions Architect at AWS.

Andreas Astrom is the Head of Technology and Innovation at Northpower

Read More

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

GenAI for Aerospace: Empowering the workforce with expert knowledge on Amazon Q and Amazon Bedrock

Aerospace companies face a generational workforce challenge today. With the strong post-COVID recovery, manufacturers are committing to record production rates, requiring the sharing of highly specialized domain knowledge across more workers. At the same time, maintaining the headcount and experience level of the workforce is increasingly challenging, as a generation of subject matter experts (SMEs) retires and increased fluidity characterizes the post-COVID labor market. This domain knowledge is traditionally captured in reference manuals, service bulletins, quality ticketing systems, engineering drawings, and more, but the quantity and complexity of documents is growing and takes time to learn. You simply can’t train new SMEs overnight. Without a mechanism to manage this knowledge transfer gap, productivity across all phases of the lifecycle might suffer from losing expert knowledge and repeating past mistakes.

Generative AI is a modern form of machine learning (ML) that has recently shown significant gains in reasoning, content comprehension, and human interaction. It can be a significant force multiplier to help the human workforce quickly digest, summarize, and answer complex questions from large technical document libraries, accelerating your workforce development. AWS is uniquely positioned to help you address these challenges through generative AI, with a broad and deep range of AI/ML services and over 20 years of experience in developing AI/ML technologies.

This post shows how aerospace customers can use AWS generative AI and ML-based services to address this document-based knowledge use case, using a Q&A chatbot to provide expert-level guidance to technical staff based on large libraries of technical documents. We focus on the use of two AWS services:

  • Amazon Q can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories, code, and enterprise systems.
  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Although Amazon Q is a great way to get started with no code for business users, Amazon Bedrock Knowledge Bases offers more flexibility at the API level for generative AI developers; we explore both these solutions in the following sections. But first, let’s revisit some basic concepts around Retrieval Augmented Generation (RAG) applications.

Generative AI constraints and RAG

Although generative AI holds great promise for automating complex tasks, our aerospace customers often express concerns about the use of the technology in such a safety- and security-sensitive industry. They ask questions such as:

  • “How do I keep my generative AI applications secure?”
  • “How do I make sure my business-critical data isn’t used to train proprietary models?”
  • “How do I know that answers are accurate and only drawn from authoritative sources?” (Avoiding the well-known problem of hallucination.)
  • “How can I trace the reasoning of my model back to source documents to build user trust?”
  • “How do I keep my generative AI applications up to date with an ever-evolving knowledge base?”

In many generative AI applications built on proprietary technical document libraries, these concerns can be addressed by using the RAG architecture. RAG helps maintain the accuracy of responses, keeps up with the rapid pace of document updates, and provides traceable reasoning while keeping your proprietary data private and secure.

This architecture combines a general-purpose large language model (LLM) with a customer-specific document database, which is accessed through a semantic search engine. Rather than fine-tuning the LLM to the specific application, the document library is loaded with the relevant reference material for that application. In RAG, these knowledge sources are often referred to as a knowledge base.

A high-level RAG architecture is shown in the following figure. The workflow includes the following steps:

  1. When the technician has a question, they enter it at the chat prompt.
  2. The technician’s question is used to search the knowledge base.
  3. The search results include a ranked list of most relevant source documentation.
  4. Those documentation snippets are added to the original query as context, and sent to the LLM as a combined prompt.
  5. The LLM returns the answer to the question, as synthesized from the source material in the prompt.

Because RAG uses a semantic search, it can find more relevant material in the database than just a keyword match alone. For more details on the operation of RAG systems, refer to Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart.

RAG architecture

This architecture addresses the concerns listed earlier in few key ways:

  • The underlying LLM doesn’t require custom training because the domain-specialized knowledge is contained in a separate knowledge base. As a result, the RAG-based system can be kept up to date, or retrained to completely new domains, simply by changing the documents in the knowledge base. This mitigates the significant cost typically associated with training custom LLMs.
  • Because of the document-based prompting, generative AI answers can be constrained to only come from trusted document sources, and provide direct attribution back to those source documents to verify.
  • RAG-based systems can securely manage access to different knowledge bases by role-based access control. Proprietary knowledge in generative AI remains private and protected in those knowledge bases.

AWS provides customers in aerospace and other high-tech domains the tools they need to rapidly build and securely deploy generative AI solutions at scale, with world-class security. Let’s look at how you can use Amazon Q and Amazon Bedrock to build RAG-based solutions in two different use cases.

Use case 1: Create a chatbot “expert” for technicians with Amazon Q

Aerospace is a high-touch industry, and technicians are the front line of that workforce. Technician work appears at every lifecycle stage for the aircraft (and its components), engineering prototype, qualification testing, manufacture, quality inspection, maintenance, and repair. Technician work is demanding and highly specialized; it requires detailed knowledge of highly technical documentation to make sure products meet safety, functional, and cost requirements. Knowledge management is a high priority for many companies, seeking to spread domain knowledge from experts to junior employees to offset attrition, scale production capacity, and improve quality.

Our customers frequently ask us how they can use customized chatbots built on customized generative AI models to automate access to this information and help technicians make better-informed decisions and accelerate their development. The RAG architecture shown in this post is an excellent solution to this use case because it allows companies to quickly deploy domain-specialized generative AI chatbots built securely on their own proprietary documentation. Amazon Q can deploy fully managed, scalable RAG systems tailored to address a wide range of business problems. It provides immediate, relevant information and advice to help streamline tasks, accelerate decision-making, and help spark creativity and innovation at work. It can automatically connect to over 40 different data sources, including Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, Atlassian Confluence, Slack, and Jira Cloud.

Let’s look at an example of how you can quickly deploy a generative AI-based chatbot “expert” using Amazon Q.

  1. Sign in to the Amazon Q console.

If you haven’t used Amazon Q before, you might be greeted with a request for initial configuration.

  1. Under Connect Amazon Q to IAM Identity Center, choose Create account instance to create a custom credential set for this demo.
  2. Under Select a bundle to get started, under Amazon Q Business Lite, choose Subscribe in Q Business to create a test subscription.

If you have previously used Amazon Q in this account, you can simply reuse an existing user or subscription for this walkthrough.

Amazon Q subscription

  1. After you create your AWS IAM Identity Center and Amazon Q subscription, choose Get started on the Amazon Q landing page.

Amazon Q getting started

  1. Choose Create application.
  2. For Application name, enter a name (for example, my-tech-assistant).
  3. Under Service access, select Create and use a new service-linked role (SLR).
  4. Choose Create.

This creates the application framework.

Amazon Q create app

  1. Under Retrievers, select Use native retriever.
  2. Under Index provisioning, select Starter for a basic, low-cost retriever.
  3. Choose Next.

Amazon Q indexer / retriever

Next, we need to configure a data source. For this example, we use Amazon S3 and assume that you have already created a bucket and uploaded documents to it (for more information, see Step 1: Create your first S3 bucket). For this example, we have uploaded some public domain documents from the Federal Aviation Administration (FAA) technical library relating to software, system standards, instrument flight rating, aircraft construction and maintenance, and more.

  1. For Data sources, choose Amazon S3 to point our RAG assistant to this S3 bucket.

Amazon Q data source

  1. For Data source name, enter a name for your data source (independent of the S3 bucket name, such as my-faa-docs).
  2. Under IAM role, choose Create new service role (Recommended).
  3. Under Sync scope, choose the S3 bucket where you uploaded your documents.
  4. Under Sync run schedule, choose Run on demand (or another option, if you want your documents to be re-indexed on a set schedule).
  5. Choose Add data source.
  6. Leave the remaining settings as default and choose Next to finish adding your Amazon S3 data source.

Amazon Q S3 source

Finally, we need to create user access permissions to our chatbot.

  1. Under Add groups and users, choose Add groups and users.
  2. In the popup that appears, you can choose to either create new users or select existing ones. If you want to use an existing user, you can skip the following steps:
    • Select Add new users, then choose Next.
    • Enter the new user information, including a valid email address.

An email will be sent to that address with a link to validate that user.

  1. Now that you have a user, select Assign existing users and groups and choose Next.
  2. Choose your user, then choose Assign.

Amazon Q add user

You should now have a user assigned to your new chatbot application.

  1. Under Web experience service access, select Create and use a new service role.
  2. Choose Create application.

Amazon Q create app

You now have a new generative AI application! Before the chatbot can answer your questions, you have to run the indexer on your documents at least one time.

  1. On the Applications page, choose your application.

Amazon Q select app

  1. Select your data source and choose Sync now.

The synchronization process takes a few minutes to complete.

  1. When the sync is complete, on the Web experience settings tab, choose the link under Deployed URL.

If you haven’t yet, you will be prompted to log in using the user credentials you created; use the email address as the user name.

Your chatbot is now ready to answer technical questions on the large library of documents you provided. Try it out! You’ll notice that for each answer, the chatbot provides a Sources option that indicates the authoritative reference from which it drew its answer.

Amazon Q chat

Our fully customized chatbot required no coding, no custom data schemas, and no managing of underlying infrastructure to scale! Amazon Q fully manages the infrastructure required to securely deploy your technician’s assistant at scale.

Use case 2: Use Amazon Bedrock Knowledge Bases

As we demonstrated in the previous use case, Amazon Q fully manages the end-to-end RAG workflow and allows business users to get started quickly. But what if you need more granular control of parameters related to the vector database, chunking, retrieval, and models used to generate final answers? Amazon Bedrock Knowledge Bases allows generative AI developers to build and interact with proprietary document libraries for accurate and efficient Q&A over documents. In this example, we use the same FAA documents as before, but this time we set up the RAG solution using Amazon Bedrock Knowledge Bases. We demonstrate how to do this using both APIs and the Amazon Bedrock console. The full notebook for following the API-based approach can be downloaded from the GitHub repo.

The following diagram illustrates the architecture of this solution.

Amazon Bedrock Knowledge Bases

Create your knowledge base using the API

To implement the solution using the API, complete the following steps:

  1. Create a role with the necessary policies to access data from Amazon S3 and write embeddings to Amazon OpenSearch Serverless. This role will be used by the knowledge base to retrieve relevant chunks for OpenSearch based on the input query.
# Create security, network and data access policies within OSS
encryption_policy, network_policy, access_policy = create_policies_in_oss(vector_store_name=vector_store_name,
    aoss_client=aoss_client, bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn)
  1. Create an empty OpenSearch Serverless index to store the document embeddings and metadata. OpenSearch Serverless is a fully managed option that allows you to run petabyte-scale workloads without managing clusters.
# Create the OpenSearch Serverless collection
collection = aoss_client.create_collection(name=vector_store_name, type='VECTORSEARCH')

# Create the index within the collection
response = oss_client.indices.create(index=index_name, body=json.dumps(body_json))
print('Creating index:')
pp.pprint(response)
  1. With the OpenSearch Serverless index set up, you can now create the knowledge base and associate it with a data source containing our documents. For brevity, we haven’t included the full code; to run this example end-to-end, refer to the GitHub repo.
# Initialize OSS configuration for the Knowledge Base
opensearchServerlessConfiguration = { ... }

# Set chunking strategy for how to split documents
chunkingStrategyConfiguration = { ... }

# Configure S3 data source
s3Configuration = { ... }

# Set embedding model ARN
embeddingModelArn = "arn:aws:bedrock:{region}::foundation-model/amazon.titan-embed-text-v2:0"

# Create the Knowledge Base
kb = create_knowledge_base_func()

# Create a data source and associate it with the KB
ds = bedrock_agent_client.create_data_source(...)

# Start ingestion job to load data into OSS
start_job_response = bedrock_agent_client.start_ingestion_job(
    knowledgeBaseId=kb['knowledgeBaseId'], dataSourceId=ds["dataSourceId"])

The ingestion job will fetch documents from the Amazon S3 data source, preprocess and chunk the text, create embeddings for each chunk, and store them in the OpenSearch Serverless index.

  1. With the knowledge base populated, you can now query it using the RetrieveAndGenerate API and get responses generated by LLMs like Anthropic’s Claude on Amazon Bedrock:
# Helper function to query the knowledge base
def ask_bedrock_llm_with_knowledge_base(query, model_arn, kb_id):
    response = bedrock_agent_runtime_client.retrieve_and_generate(...)
    generated_text = response['output']['text']
    return generated_text

# Example query
query = "How are namespaces registered with the FAA for service providers?"

# Query knowledge base with different Claude models
for model_id in claude_model_ids:
    model_arn = f'arn:aws:bedrock:{region}::foundation-model/{model_id[1]}'
    generated_text = ask_bedrock_llm_with_knowledge_base(query, model_arn, kb_id)
    print(f"Generated using {model_id[0]}:")
    pp.pprint(generated_text)

The RetrieveAndGenerate API converts the query into an embedding, searches the knowledge base for relevant document chunks, and generates a response by providing the retrieved context to the specified language model. We asked the question “How are namespaces registered with the FAA for service providers?” Anthropic’s Claude 3 Sonnet uses the chunks retrieved from our OpenSearch vector index to answer as follows:

To register a namespace with the FAA as a service provider, you need to follow these steps:

  1. Develop the namespaces metadata according to FAA-STD-063 and submit it for registration in the FAA Data Registry (FDR).
  2. The FDR registrar will perform the namespace registration function. The specific process for
    developing and registering a namespace in the FDR involves:
  3. Searching the FDR for an existing namespace that matches your business domain. If none exists, work
    with the FDR registrar to create a new one.
  4. Create and document the new namespace according to FAA-STD-063, following the guidelines for
    organization, web service, or taxonomy namespaces.
  5. Register the namespace in the FDR by either filling out a registration form and submitting it to the FDR
    registrar, or requesting access to enter the metadata directly into the FDR.

Create your knowledge base on the Amazon Bedrock console

If you prefer, you can build the same solution in Amazon Bedrock Knowledge Bases using the Amazon Bedrock console instead of the API-based implementation shown in the previous section. Complete the following steps:

  1. Sign in to your AWS account.
  2. On the Amazon Bedrock console, choose Get started.

Amazon Bedrock getting started

As a first step, you need to set up your permissions to use the various LLMs in Amazon Bedrock.

  1. Choose Model access in the navigation pane.
  2. Choose Modify model access.

Amazon Bedrock model access

  1. Select the LLMs to enable.
  2. Choose Next¸ then choose Submit to complete your access request.

You should now have access to the models you requested.

Amazon Bedrock model select

Now you can set up your knowledge base.

  1. Choose Knowledge bases under Builder tools in the navigation pane.
  2. Choose Create knowledge base.

Amazon Bedrock create Knowledge Base

  1. On the Provide knowledge base details page, keep the default settings and choose Next.
  2. For Data source name, enter a name for your data source or keep the default.
  3. For S3 URI, choose the S3 bucket where you uploaded your documents.
  4. Choose Next.

Amazon Bedrock Knowledge Base details

  1. Under Embeddings model, choose the embeddings LLM to use (for this post, we choose Titan Text Embeddings).
  2. Under Vector database, select Quick create a new vector store.

This option uses OpenSearch Serverless as the vector store.

  1. Choose Next.

Amazon Bedrock embeddings

  1. Choose Create knowledge base to finish the process.

Your knowledge base is now set up! Before interacting with the chatbot, you need to index your documents. Make sure you have already loaded the desired source documents into your S3 bucket; for this walkthrough, we use the same public-domain FAA library referenced in the previous section.

  1. Under Data source, select the data source you created, then choose Sync.
  2. When the sync is complete, choose Select model in the Test knowledge base pane, and choose the model you want to try (for this post, we use Anthropic Claude 3 Sonnet, but Amazon Bedrock gives you the flexibility to experiment with many other models).

Amazon Bedrock data source

Your technician’s assistant is now set up! You can experiment with it using the chat window in the Test knowledge base pane. Experiment with different LLMs and see how they perform. Amazon Bedrock provides a simple API-based framework to experiment with different models and RAG components so you can tune them to help meet your requirements in production workloads.

Amazon Bedrock chat

Clean up

When you’re done experimenting with the assistant, complete the following steps to clean up your created resources to avoid ongoing charges to your account:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select the application you created, and on the Actions menu, choose Delete.
  3. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  4. Select the knowledge base you created, then choose Delete.

Conclusion

This post showed how quickly you can launch generative AI-enabled expert chatbots, trained on your proprietary document sets, to empower your workforce across specific aerospace roles with Amazon Q and Amazon Bedrock. After you have taken these basic steps, more work will be needed to solidify these solutions for production. Future editions in this “GenAI for Aerospace” series will explore follow-up topics, such as creating additional security controls and tuning performance for different content.

Generative AI is changing the way companies address some of their largest challenges. For our aerospace customers, generative AI can help with many of the scaling challenges that come from ramping production rates and the skills of their workforce to match. This post showed how you can apply this technology to expert knowledge challenges in various functions of aerospace development today. The RAG architecture shown can help meet key requirements for aerospace customers: maintaining privacy of data and custom models, minimizing hallucinations, customizing models with private and authoritative reference documents, and direct attribution of answers back to those reference documents. There are many other aerospace applications where generative AI can be applied: non-conformance tracking, business forecasting, bid and proposal management, engineering design and simulation, and more. We examine some of these use cases in future posts.

AWS provides a broad range of AI/ML services to help you develop generative AI solutions for these use cases and more. This includes newly announced services like Amazon Q, which provides fast, relevant answers to pressing business questions drawn from enterprise data sources, with no coding required, and Amazon Bedrock, which provides quick API-level access to a wide range of LLMs, with knowledge base management for your proprietary document libraries and direct integration to external workflows through agents. AWS also offers competitive price-performance for AI workloads, running on purpose-built silicon—the AWS Trainium and AWS Inferentia processors—to run your generative AI services in the most cost-effective, scalable, simple-to-manage way. Get started on addressing your toughest business challenges with generative AI on AWS today!

For more information on working with generative AI and RAG on AWS, refer to Generative AI. For more details on building an aerospace technician’s assistant with AWS generative AI services, refer to Guidance for Aerospace Technician’s Assistant on AWS.


About the authors

Peter Bellows is a Principal Solutions Architect and Head of Technology for Commercial Aviation in the Worldwide Specialist Organization (WWSO) at Amazon Web Services (AWS). He leads technical development for solutions across aerospace domains, including manufacturing, engineering, operations, and security. Prior to AWS, he worked in aerospace engineering for 20+ years.

Shreyas Subramanian is a Principal Data Scientist and helps customers by using Machine Learning to solve their business challenges using the AWS platform. Shreyas has a background in large scale optimization and Machine Learning, and in use of Machine Learning and Reinforcement Learning for accelerating optimization tasks.

Priyanka Mahankali is a Senior Specialist Solutions Architect for Aerospace at AWS, bringing over 7 years of experience across the cloud and aerospace sectors. She is dedicated to streamlining the journey from innovative industry ideas to cloud-based implementations.

Read More

Scalable training platform with Amazon SageMaker HyperPod for innovation: a video generation case study

Scalable training platform with Amazon SageMaker HyperPod for innovation: a video generation case study

Video generation has become the latest frontier in AI research, following the success of text-to-image models. Luma AI’s recently launched Dream Machine represents a significant advancement in this field. This text-to-video API generates high-quality, realistic videos quickly from text and images. Trained on the Amazon SageMaker HyperPod, Dream Machine excels in creating consistent characters, smooth motion, and dynamic camera movements.

To accelerate iteration and innovation in this field, sufficient computing resources and a scalable platform are essential. During the iterative research and development phase, data scientists and researchers need to run multiple experiments with different versions of algorithms and scale to larger models. Model parallel training becomes necessary when the total model footprint (model weights, gradients, and optimizer states) exceeds the memory of a single GPU. However, building large distributed training clusters is a complex and time-intensive process that requires in-depth expertise. Furthermore, as clusters scale to larger sizes (for example, more than 32 nodes), they require built-in resiliency mechanisms such as automated faulty node detection and replacement to improve cluster goodput and maintain efficient operations. These challenges underscore the importance of robust infrastructure and management systems in supporting advanced AI research and development.

Amazon SageMaker HyperPod, introduced during re:Invent 2023, is a purpose-built infrastructure designed to address the challenges of large-scale training. It removes the undifferentiated heavy lifting involved in building and optimizing machine learning (ML) infrastructure for training foundation models (FMs). SageMaker HyperPod offers a highly customizable user interface using Slurm, allowing users to select and install any required frameworks or tools. Clusters are provisioned with the instance type and count of your choice and can be retained across workloads. With these capabilities, customers are adopting SageMaker HyperPod as their innovation platform for more resilient and performant model training, enabling them to build state-of-the-art models faster.

In this post, we share an ML infrastructure architecture that uses SageMaker HyperPod to support research team innovation in video generation. We will discuss the advantages and pain points addressed by SageMaker HyperPod, provide a step-by-step setup guide, and demonstrate how to run a video generation algorithm on the cluster.

Training video generation algorithms on Amazon SageMaker HyperPod: background and architecture

Video generation is an exciting and rapidly evolving field that has seen significant advancements in recent years. While generative modeling has made tremendous progress in the domain of image generation, video generation still faces several challenges that require further improvement.

Algorithms architecture complexity with diffusion model family

Diffusion models have recently made significant strides in generating high-quality images, prompting researchers to explore their potential in video generation. By leveraging the architecture and pre-trained generative capabilities of diffusion models, scientists aim to create visually impressive videos. The process extends image generation techniques to the temporal domain. Starting with noisy frames, the model iteratively refines them, removing random elements while adding meaningful details guided by text or image prompts. This approach progressively transforms abstract patterns into coherent video sequences, effectively translating diffusion models’ success in static image creation to dynamic video synthesis.

However, the compute requirements for video generation using diffusion models increase substantially compared to image generation for several reasons:

  1. Temporal dimension – Unlike image generation, video generation requires processing multiple frames simultaneously. This adds a temporal dimension to the original 2D UNet, significantly increasing the amount of data that needs to be processed in parallel.
  2. Iterative denoising process – The diffusion process involves multiple iterations of denoising for each frame. When extended to videos, this iterative process must be applied to multiple frames, multiplying the computational load.
  3. Increased parameter count – To handle the additional complexity of video data, models often require more parameters, leading to larger memory footprints and increased computational demands.
  4. Higher resolution and longer sequences – Video generation often aims for higher resolution outputs and longer sequences compared to single image generation, further amplifying the computational requirements.

Due to these factors, the operational efficiency of diffusion models for video generation is lower and significantly more compute-intensive compared to image generation. This increased computational demand underscores the need for advanced hardware solutions and optimized model architectures to make video generation more practical and accessible.

Handling the increased computational requirements

The improvement in video generation quality necessitates a significant increase in the size of the models and training data. Researchers have concluded that scaling up the base model size leads to substantial enhancements in video generation performance. However, this growth comes with considerable challenges in terms of computing power and memory resources. Training larger models requires more computational power and memory space, which can limit the accessibility and practical use of these models. As the model size increases, the computational requirements grow exponentially, making it difficult to train these models on single GPU, or even single node multi-GPUs environment. Moreover, storing and manipulating the large datasets required for training also pose significant challenges in terms of infrastructure and costs. High-quality video datasets tend to be massive, requiring substantial storage capacity and efficient data management systems. Transferring and processing these datasets can be time-consuming and resource-intensive, adding to the overall computational burden.

Maintaining temporal consistency and continuity

Maintaining temporal consistency and continuity becomes increasingly challenging as the length of the generated video increases. Temporal consistency refers to the continuity of visual elements, such as objects, characters, and scenes, across subsequent frames. Inconsistencies in appearance, movement, or lighting can lead to jarring visual artifacts and disrupt the overall viewing experience. To address this challenge, researchers have explored the use of multiframe inputs, which provide the model with information from multiple consecutive frames to better understand and model the relationships and dependencies across time. These techniques preserve high-resolution details in visual quality while simulating a continuous and smooth temporal motion process. However, they require more sophisticated modeling techniques and increased computational resources.

Algorithm overview

In the following sections, we illustrate how to run the Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation algorithm on Amazon SageMaker HyperPod for video generation. Animate Anyone is one of the methods for transforming character images into animated videos controlled by desired pose sequences. The key components of the architecture include:

  1. ReferenceNet – A symmetrical UNet structure that captures spatial details of the reference image and integrates them into the denoising UNet using spatial-attention to preserve appearance consistency
  2. Pose guider – A lightweight module that efficiently integrates pose control signals into the denoising process to ensure pose controllability
  3. Temporal layer – Added to the denoising UNet to model relationships across multiple frames, preserving high-resolution details and ensuring temporal stability and continuity of the character’s motion

The model architecture is illustrated in the following image from its original research paper. The method is trained on a dataset of video clips and achieves state-of-the-art results on fashion video and human dance synthesis benchmarks, demonstrating its ability to animate arbitrary characters while maintaining appearance consistency and temporal stability. The implementation of AnimateAnyone can be found in this repository.

To address the challenges of large-scale training infrastructure required in video generation training process, we can use the power of Amazon SageMaker HyperPod. While many customers have adopted SageMaker HyperPod for large-scale training, such as Luma’s launch of Dream Machine and Stability AI’s work on FMs for image or video generation, we believe that the capabilities of SageMaker HyperPod can also benefit lighter ML workloads, including full fine-tuning.

Amazon SageMaker HyperPod concept and advantage

SageMaker HyperPod offers a comprehensive set of features that significantly enhance the efficiency and effectiveness of ML workflows. From purpose-built infrastructure for distributed training to customizable environments and seamless integration with tools like Slurm, SageMaker HyperPod empowers ML practitioners to focus on their core tasks while taking advantage of the power of distributed computing. With SageMaker HyperPod, you can accelerate your ML projects, handle larger datasets and models, and drive innovation in your organization. SageMaker HyperPod provides several key features and advantages in the scalable training architecture.

Purpose-built infrastructure – One of the primary advantages of SageMaker HyperPod is its purpose-built infrastructure for distributed training. It simplifies the setup and management of clusters, allowing you to easily configure the desired instance types and counts, which can be retained across workloads. As a result of this flexibility, you can adapt to various scenarios. For example, when working with a smaller backbone model like Stable Diffusion 1.5, you can run multiple experiments simultaneously on a single GPU to accelerate the iterative development process. As your dataset grows, you can seamlessly switch to data parallelism and distribute the workload across multiple GPUs, such as eight GPUs, to reduce compute time. Furthermore, when dealing with larger backbone models like Stable Diffusion XL, SageMaker HyperPod offers the flexibility to scale and use model parallelism.

Shared file system – SageMaker HyperPod supports the attachment of a shared file system, such as Amazon FSx for Lustre. This integration brings several benefits to your ML workflow. FSx for Lustre enables full bidirectional synchronization with Amazon Simple Storage Service (Amazon S3), including the synchronization of deleted files and objects. It also allows you to synchronize file systems with multiple S3 buckets or prefixes, providing a unified view across multiple datasets. In our case, this means that the installed libraries within the conda virtual environment will be synchronized across different worker nodes, even if the cluster is torn down and recreated. Additionally, input video data for training and inference results can be seamlessly synchronized with S3 buckets, enhancing the experience of validating inference results.

Customizable environment – SageMaker HyperPod offers the flexibility to customize your cluster environment using lifecycle scripts. These scripts allow you to install additional frameworks, debugging tools, and optimization libraries tailored to your specific needs. You can also split your training data and model across all nodes for parallel processing, fully using the cluster’s compute and network infrastructure. Moreover, you have full control over the execution environment, including the ability to easily install and customize virtual Python environments for each project. In our case, all the required libraries for running the training script are installed within a conda virtual environment, which is shared across all worker nodes, simplifying the process of distributed training on multi-node setups. We also installed MLflow Tracking on the controller node to monitor the training progress.

Job distribution with Slurm integration – SageMaker HyperPod seamlessly integrates with Slurm, a popular open source cluster management and job scheduling system. Slurm can be installed and set up through lifecycle scripts as part of the cluster creation process, providing a highly customizable user interface. With Slurm, you can efficiently schedule jobs across different GPU resources so you can run multiple experiments in parallel or use distributed training to train large models for improved performance. With Slurm, customers can customize the job queues, prioritization algorithms, and job preemption policies, ensuring optimal resource use and streamlining your ML workflows. If you are searching a Kubernetes-based administrator experience, recently, Amazon SageMaker HyperPod introduces Amazon EKS support to manage their clusters using a Kubernetes-based interface.

Enhanced productivity – To further enhance productivity, SageMaker HyperPod supports connecting to the cluster using Visual Studio Code (VS Code) through a Secure Shell (SSH) connection. You can easily browse and modify code within an integrated development environment (IDE), execute Python scripts seamlessly as if in a local environment, and launch Jupyter notebooks for quick development and debugging. The Jupyter notebook application experience within VS Code provides a familiar and intuitive interface for iterative experimentation and analysis.

Set up SageMaker HyperPod and run video generation algorithms

In this walkthrough, we use the AnimateAnyone algorithm as an illustration for video generation. AnimateAnyone is a state-of-the-art algorithm that generates high-quality videos from input images or videos. Our walkthrough guidance code is available on GitHub.

Set up the cluster

To create the SageMaker HyperPod infrastructure, follow the detailed intuitive and step-by-step guidance for cluster setup from the Amazon SageMaker HyperPod workshop studio.

The two things you need to prepare are a provisioning_parameters.json file required by HyperPod for setting up Slurm and a cluster-config.json file as the configuration file for creating the HyperPod cluster. Inside these configuration files, you need to specify the InstanceGroupName, InstanceType, and InstanceCount for the controller group and worker group, as well as the execution role attached to the group.

One practical setup is to set up bidirectional synchronization with Amazon FSx and Amazon S3. This can be done with the Amazon S3 integration for Amazon FSx for Lustre. It helps to establish a full bidirectional synchronization of your file systems with Amazon S3. In addition, it can synchronize your file systems with multiple S3 buckets or prefixes.

In addition, if you prefer a local IDE such as VSCode, you can set up an SSH connection to the controller node within your IDE. In this way, the worker nodes can be used for running scripts within a conda environment and a Jupyter notebook server.

Run the AnimateAnyone algorithm

When the cluster is in service, you can connect using SSH into the controller node, then go into the worker nodes, where the GPU compute resources are available. You can follow the SSH Access to compute guide. We suggest installing the libraries on the worker nodes directly.

To create the conda environment, follow the instructions at Miniconda’s Quick command line install. You can then use the conda environment to install all required libraries.

source ~/miniconda3/bin/activate
conda create -n videogen 
pip install -r requirements.txt

To run AnimateAnyone, clone the GitHub repo and follow the instructions.

To train AnimateAnyone, launch stage 1 for training the denoising UNet and ReferenceNet, which enables the model to generate high-quality animated images under the condition of a given reference image and target pose. The denoising UNet and ReferenceNet are initialized based on the pre-trained weights from Stable Diffusion.

accelerate launch train_stage_1.py --config configs/train/stage1.yaml

In stage 2, the objective is to train the temporal layer to capture the temporal dependencies among video frames.

accelerate launch train_stage_2.py --config configs/train/stage2.yaml

Once the training script executes as expected, use a Slurm scheduled job to run on a single node. We provide a batch file to simulate the single-node training job. It can be a single GPU or a single node with multiple GPUs. If you want to know more, the documentation provides detailed instructions on running jobs on SageMaker HyperPod clusters.

sbatch submit-animateanyone-algo.sh
#!/bin/bash
#SBATCH --job-name=video-gen
#SBATCH -N 1
#SBATCH --exclusive
#SBATCH -o video-gen-stage-1.out
export OMP_NUM_THREADS=1
# Activate the conda environment
source ~/miniconda3/bin/activate
conda activate videogen
srun accelerate launch train_stage_1.py --config configs/train/stage1.yaml

Check the job status using the following code snippet.

squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
10 dev video-ge ubuntu R 0:16 1 ip-10-1-93-196

By using a small batch size and setting use_8bit_adam=True, you can achieve efficient training on a single GPU. When using a single GPU, use a multi-GPU cluster for running multiple experiments.

The following code block is one example of running four jobs in parallel to test different hyperparameters. We provide the batch file here as well.

sbatch submit-hyperparameter-testing.sh

squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4_0 dev video-ge ubuntu R 0:08 1 ip-10-1-17-56
4_1 dev video-ge ubuntu R 0:08 1 ip-10-1-33-49
4_2 dev video-ge ubuntu R 0:08 1 ip-10-1-37-152
4_3 dev video-ge ubuntu R 0:08 1 ip-10-1-83-68

The experiments can then be compared, and you can move forward with the best configuration. In our scenario, shown in the following screenshot, we use different datasets and video preprocessing strategies to validate the stage 1 training. Then, we quickly draw conclusions about the impact on video quality with respect to stage 1 training results.  For experiment tracking, besides installing MLflow on the controller node to monitor the training progress, you can also leverage the fully managed MLflow capability on Amazon SageMaker. This makes it easy for data scientists to use MLflow on SageMaker for model training, registration, and deployment.

Scale to multi-node GPU setup

As model sizes grow, single GPU memory quickly becomes a bottleneck. Large models easily exhaust memory with pure data parallelism, and implementing model parallelism can be challenging. DeepSpeed addresses these issues, accelerating model development and training.

ZeRO

DeepSpeed is a deep learning optimization library that aims to make distributed training easy, efficient, and effective. DeepSpeed’s ZeRO removes memory redundancies across data-parallel processes by partitioning three model states (optimizer states, gradients, and parameters) across data-parallel processes instead of replicating them. This approach significantly boosts memory efficiency compared to classic data-parallelism while maintaining computational granularity and communication efficiency.

ZeRO offers three stages of optimization:

  1. ZeRO Stage 1 – Partitions optimizer states across processes, with each process updating only its partition
  2. ZeRO Stage 2 – Additionally partitions gradients, with each process retaining only the gradients corresponding to its optimizer state portion
  3. ZeRO Stage 3 – Partitions model parameters across processes, automatically collecting and partitioning them during forward and backward passes

Each stage offers progressively higher memory efficiency at the cost of increased communication overhead. These techniques enable training of extremely large models that would otherwise be impossible. This is particularly useful when working with limited GPU memory or training very large models.

Accelerate

Accelerate is a library that enables running the same PyTorch code across any distributed configuration with minimal code changes. It handles the complexities of distributed setups, allowing developers to focus on their models rather than infrastructure. To put it briefly, Accelerate makes training and inference at scale straightforward, efficient, and adaptable.

Accelerate allows easy integration of DeepSpeed features through a configuration file. Users can supply a custom configuration file or use provided templates. The following is an example of how to use DeepSpeed with Accelerate.

Single node with multiple GPUs job

To run a job on a single node with multiple GPUs, we have tested this configuration on four GPU instances (for example, g5.24xlarge). For these instances, adjust train_width: 768 and train_height: 768, and set use_8bit_adam: False in your configuration file. You’ll likely notice that the model can handle much larger images for generation with these settings.

sbatch submit-deepspeed-singlenode.sh

This Slurm job will:

  1. Allocate a single node
  2. Activate the training environment
  3. Run accelerate launch train_stage_1.py --config configs/train/stage1.yaml

Multi-node with multiple GPUs job

To run a job across multiple nodes, each with multiple GPUs, we have tested this distribution with two ml.g5.24xlarge instances.

sbatch submit-deepspeed-multinode.sh

This Slurm job will:

  1. Allocate the specified number of nodes
  2. Activate the training environment on each node
  3. Run accelerate launch --multi_gpu --num_processes <num_processes> --num_machines <num_machines> train_stage_1.py --config configs/train/stage1.yaml

When running a multi-node job, make sure that the num_processes and num_machines arguments are set correctly based on your cluster configuration.

For optimal performance, adjust the batch size and learning rate according to the number of GPUs and nodes being used. Consider using a learning rate scheduler to adapt the learning rate during training.

Additionally, monitor the GPU memory usage and adjust the model’s architecture or batch size if necessary to prevent out-of-memory issues.

By following these steps and configurations, you can efficiently train your models on single-node and multi-node setups with multiple GPUs, taking advantage of the power of distributed training.

Monitor cluster usage

To achieve comprehensive observability into your SageMaker HyperPod cluster resources and software components, integrate the cluster with Amazon Managed Service for Prometheus and Amazon Managed Grafana. The integration with Amazon Managed Service for Prometheus makes it possible to export of metrics related to your HyperPod cluster resources, providing insights into their performance, utilization, and health. The integration with Amazon Managed Grafana makes it possible to visualize these metrics through various Grafana dashboards that offer intuitive interfaces for monitoring and analyzing the cluster’s behavior. You can follow the SageMaker documentation on Monitor SageMaker HyperPod cluster resources and Workshop Studio Observability section to bootstrap your cluster monitoring with the metric exporter services. The following screenshot shows a Grafana dashboard.

Inference and results discussion

When the fine-tuned model is ready, you have two primary deployment options: using popular image and video generation GUIs like ComfyUI or deploying an inference endpoint with Amazon SageMaker. The SageMaker option offers several advantages, including easy integration of image generation APIs with video generation endpoints to create end-to-end pipelines. As a managed service with auto scaling, SageMaker makes parallel generation of multiple videos possible using either the same reference image with different reference videos or the reverse. Furthermore, you can deploy various video generation model endpoints such as MimicMotion and UniAnimate, allowing for quality comparisons by generating videos in parallel with the same reference image and video. This approach not only provides flexibility and scalability but also accelerates the production process by making possible the generation of a large number of videos quickly, ultimately streamlining the process of obtaining content that meets business requirements. The SageMaker option thus offers a powerful, efficient, and scalable solution for video generation workflows. The following diagram shows a basic version of video generation pipeline. You can modify it based on your own specific business requirements.

Recent advancements in video generation have rapidly overcome limitations of earlier models like AnimateAnyone. Two notable research papers showcase significant progress in this domain.

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance enhances shape alignment and motion guidance. It demonstrates superior ability in generating high-quality human animations that accurately capture both pose and shape variations, with improved generalization on in-the-wild datasets.

UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation makes it possible to generate longer videos, up to one minute, compared to earlier models’ limited frame outputs. It introduces a unified noise input supporting both random noise input and first frame conditioned input, enhancing long-term video generation capabilities.

Cleanup

To avoid incurring future charges, delete the resources created as part of this post:

  1. Delete the SageMaker HyperPod cluster using either the CLI or the console.
  2. Once the SageMaker HyperPod cluster deletion is complete, delete the CloudFormation stack. For more details on cleanup, refer to the cleanup section in the Amazon SageMaker HyperPod workshop.
  1. To delete the endpoints created during deployment, refer to the endpoint deletion section we provided in the Jupyter notebook. Then manually delete the SageMaker notebook.

Conclusion

In this post, we explored the exciting field of video generation and showcased how SageMaker HyperPod can be used to efficiently train video generation algorithms at scale. By using the AnimateAnyone algorithm as an example, we demonstrated the step-by-step process of setting up a SageMaker HyperPod cluster, running the algorithm, scaling it to multiple GPU nodes, and monitoring GPU usage during the training process.

SageMaker HyperPod offers several key advantages that make it an ideal platform for training large-scale ML models, particularly in the domain of video generation. Its purpose-built infrastructure allows for distributed training at scale so you can manage clusters with desired instance types and counts. The ability to attach a shared file system such as Amazon FSx for Lustre provides efficient data storage and retrieval, with full bidirectional synchronization with Amazon S3. Moreover, the SageMaker HyperPod customizable environment, integration with Slurm, and seamless connectivity with Visual Studio Code enhance productivity and simplify the management of distributed training jobs.

We encourage you to use SageMaker HyperPod for your ML training workloads, especially those involved in video generation or other computationally intensive tasks. By harnessing the power of SageMaker HyperPod, you can accelerate your research and development efforts, iterate faster, and build state-of-the-art models more efficiently. Embrace the future of video generation and unlock new possibilities with SageMaker HyperPod. Start your journey today and experience the benefits of distributed training at scale.


About the author

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at AWS. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building AI-powered industrial applications in computer vision, natural language processing, and online user behavior prediction. At AWS, he shares his domain expertise and helps customers unlock business potentials and drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Gordon Wang is a Senior Data Scientist at AWS. He helps customers imagine and scope the use cases that will create the greatest value for their businesses, define paths to navigate technical or business challenges. He is passionate about computer vision, NLP, generative AI, and MLOps. In his spare time, he loves running and hiking.

Gary LO is a Solutions Architect at AWS based in Hong Kong. He is a highly passionate IT professional with over 10 years of experience in designing and implementing critical and complex solutions for distributed systems, web applications, and mobile platforms for startups and enterprise companies. Outside of the office, he enjoys cooking and sharing the latest technology trends and insights on his social media platforms with thousands of followers.

Read More

Control data access to Amazon S3 from Amazon SageMaker Studio with Amazon S3 Access Grants

Control data access to Amazon S3 from Amazon SageMaker Studio with Amazon S3 Access Grants

Amazon SageMaker Studio provides a single web-based visual interface where different personas like data scientists, machine learning (ML) engineers, and developers can build, train, debug, deploy, and monitor their ML models. These personas rely on access to data in Amazon Simple Storage Service (Amazon S3) for tasks such as extracting data for model training, logging model training metrics, and storing model artifacts after training. For example, data scientists need access to datasets stored in Amazon S3 for tasks like data exploration and model training. ML engineers require access to intermediate model artifacts stored in Amazon S3 from past training jobs.

Traditionally, access to data in Amazon S3 from SageMaker Studio for these personas is provided through roles configured in SageMaker Studio—either at the domain level or user profile level. The SageMaker Studio domain role grants permissions for the SageMaker Studio domain to interact with other AWS services, providing access to data in Amazon S3 for all users of that domain. If no specific user profile roles are created, this role will apply to all user profiles, granting uniform access privileges across the domain. However, if different users of the domain have different access restrictions, then configuring individual user roles allows for more granular control. These roles define the specific actions and access each user profile can have within the environment, providing granular permissions.

Although this approach offers a degree of flexibility, it also entails frequent updates to the policies attached to these roles whenever access requirements change, which can add maintenance overhead. This is where Amazon S3 Access Grants can significantly streamline the process. S3 Access Grants enables you to manage access to Amazon S3 data more dynamically, without the need to constantly update AWS Identity and Access Management (IAM) roles. S3 Access Grants allows data owners or permission administrators to set permissions, such as read-only, write-only, or read/write access, at various levels of Amazon S3, such as at the bucket, prefix, or object level. The permissions can be granted to IAM principals or to users and groups from their corporate directory through integration with AWS IAM Identity Center.

In this post, we demonstrate how to simplify data access to Amazon S3 from SageMaker Studio using S3 Access Grants, specifically for different user personas using IAM principals.

Solution overview

Now that we’ve discussed the benefits of S3 Access Grants, let’s look at how grants can be applied with SageMaker Studio user roles and domain roles for granular access control.

Consider a scenario involving a product team with two members: User A and User B. They use an S3 bucket where the following access requirements are implemented:

  • All members of the team should have access to the folder named Product within the S3 bucket.
  • The folder named UserA should be accessible only by User A.
  • The folder named UserB should be accessible only by User B.
  • User A will be running an Amazon SageMaker Processing job that uses S3 Access Grants to get data from the S3 bucket. The processing job will access the required data from the S3 bucket using the temporary credentials provided by the access grants.

The following diagram illustrates the solution architecture and workflow.

Let’s start by creating a SageMaker Studio environment as needed for our scenario. This includes establishing a SageMaker Studio domain, setting up user profiles for User A and User B, configuring an S3 bucket with the necessary folders, configuring S3 Access Grants.

Prerequisites

To set up the SageMaker Studio environment and configure S3 Access Grants as described in this post, you need administrative privileges for the AWS account you’ll be working with. If you don’t have administrative access, request assistance from someone who does. Throughout this post, we assume that you have the necessary permissions to create SageMaker Studio domains, create S3 buckets, and configure S3 Access Grants. If you don’t have these permissions, consult with your AWS administrator or account owner for guidance.

Deploy the solution resources using AWS CloudFormation

To provision the necessary resources and streamline the deployment process, we’ve provided an AWS CloudFormation template that automates the provisioning of required services. Deploying the CloudFormation stack in your account incurs AWS usage charges.

The CloudFormation stack creates the following resources:

Virtual private cloud (VPC) with private subnets with relevant route tables, NAT gateway, internet gateway, and security groups

  • IAM execution roles
  • S3 Access Grants instance
  • AWS Lambda function to load the Abalone dataset into Amazon S3
  • SageMaker domain
  • SageMaker Studio user profiles

Complete the following steps to deploy the stack:

  1. Choose Launch Stack to launch the CloudFormation stack.
    Launch Stack to create Agent
  2. On the Create stack page, leave the default options and choose Next.
  3. On the Specify stack details page, for Stack name, enter a name (for example, blog-sagemaker-s3-access-grants).
  4. Under Parameters, provide the following information:
    1. For PrivateSubnetCIDR, enter the IP address range in CIDR notation that should be allocated for the private subnet.
    2. For ProjectName, enter sagemaker-blog.
    3.  For VpcCIDR, enter the desired IP address range in CIDR notation for the VPC being created.
  5. Choose Next.
  6. On the Configure stack options page, leave the default options and choose Next.
  7. On the Review and create page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  8. Review the template and choose Create stack.

After the successful deployment of stack, you can view the resources created on the stack’s Outputs tab on the AWS CloudFormation console.

Validate data in the S3 bucket

To validate access to the S3 bucket, we use the Abalone dataset. As part of the CloudFormation stack deployment process, a Lambda function is invoked to load the data into Amazon S3. After the Lambda function is complete, you should find the abalone.csv file in all three folders (Product, UserA, and UserB) within the S3 bucket.

Validate the SageMaker domain and associated user profiles

Complete the following steps to validate the SageMaker resources:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose Product-Domain to be directed to the domain details page.
  3. In the User profiles section, verify that the userA and userB profiles are present.
  4. Choose a user profile name to be directed to the user profile details.
  5. Validate that each user profile is associated with its corresponding IAM role: userA is associated with sagemaker-usera-role, and userB is associated with sagemaker-userb-role.

Validate S3 Access Grants setup

Complete the following steps to validate your configuration of S3 Access Grants:

  1. On the Amazon S3 console, choose Access Grants in the navigation pane.
  2. Choose View details to be directed to the details page of S3 Access Grants.
  3. On the Locations tab, confirm that the URI of S3 bucket created is registered with the S3 Access Grants instance for the location scope.
  4. On the Grants tab, confirm the following:
    1. sagemaker-usera-role has been given read/write permissions on the S3 prefix Product/* and UserA/*
    2. sagemaker-userb-role has been given read/write permissions on the S3 prefix Product/* and UserB/*

Validate access from your SageMaker Studio environment

To validate the access grants we set up, we run a distributed data processing job on the Abalone dataset using SageMaker Processing jobs and PySpark.

To get started, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose the domain Product-Domain to be directed to the domain details page.
  3. Choose userA under User profiles.
  4. On the User Details page, choose Launch and choose Studio.
  5. On the SageMaker Studio console, choose JupyterLab in the navigation pane.
  6. Choose Create JupyterLab space.
  7. For Name, enter usera-space.

  8. For Sharing, select Private.

  9. Choose Create space.

  10. After the space is created, choose Run space.
  11. When the status shows as Running, choose Open JupyterLab, which will redirect you to the SageMaker JupyterLab experience.
  12. On the Launcher page, choose Python 3 under Notebook.
    This will open a new Python notebook, which we use to run the PySpark script.

    Let’s validate the access grants by running a distributed job using SageMaker Processing jobs to process data, because we often need to process data before it can be used for training ML models. SageMaker Processing jobs allow you to run distributed data processing workloads while using the access grants you set up earlier.
  13. Copy the following PySpark script into a cell in your SageMaker Studio notebook.
    The %%writefile directive is used to save the script locally. The script is used to generate temporary credentials using the access grant and configures Spark to use these credentials for accessing data in Amazon S3. It performs some basic feature engineering on the Abalone dataset, including string indexing, one-hot encoding, and vector assembly, and combines them into a pipeline. It then does an 80/20 split to produce training and validation datasets as outputs, and saves these datasets in Amazon S3.
    Make sure to replace region_name with the AWS Region you’re using in the script.
    %%writefile ./preprocess.py
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType, StructField, StringType, DoubleType
    from pyspark.ml import Pipeline
    from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler
    import argparse
    import subprocess
    import sys
    
    def install_packages():
        subprocess.check_call([sys.executable, "-m", "pip", "install", "boto3==1.35.1", "botocore>=1.35.0"])
    
    install_packages()
    import boto3
    print(f"logs: boto3 version in the processing job: {boto3.__version__}")
    import botocore
    print(f"logs: botocore version in the processing job: {botocore.__version__}")
    
    def get_temporary_credentials(account_id, bucket_name, object_key_prefix):
        region_name = '<region>'
        s3control_client = boto3.client('s3control', region_name=region_name)
        response = s3control_client.get_data_access(
            AccountId=account_id,
            Target=f's3://{bucket_name}/{object_key_prefix}/',
            Permission='READWRITE'
        )
        return response['Credentials']
    
    def configure_spark_with_s3a(credentials):
        spark = SparkSession.builder 
            .appName("PySparkApp") 
            .config("spark.hadoop.fs.s3a.access.key", credentials['AccessKeyId']) 
            .config("spark.hadoop.fs.s3a.secret.key", credentials['SecretAccessKey']) 
            .config("spark.hadoop.fs.s3a.session.token", credentials['SessionToken']) 
            .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") 
            .config("spark.hadoop.fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider") 
            .getOrCreate()
        
        spark.sparkContext._jsc.hadoopConfiguration().set(
            "mapred.output.committer.class", "org.apache.hadoop.mapred.FileOutputCommitter"
        )
        return spark
    
    def csv_line(data):
        r = ",".join(str(d) for d in data[1])
        return str(data[0]) + "," + r
    
    def main():
        parser = argparse.ArgumentParser(description="app inputs and outputs")
        parser.add_argument("--account_id", type=str, help="AWS account ID")
        parser.add_argument("--s3_input_bucket", type=str, help="s3 input bucket")
        parser.add_argument("--s3_input_key_prefix", type=str, help="s3 input key prefix")
        parser.add_argument("--s3_output_bucket", type=str, help="s3 output bucket")
        parser.add_argument("--s3_output_key_prefix", type=str, help="s3 output key prefix")
        args = parser.parse_args()
    
        # Get temporary credentials for both reading and writing
        credentials = get_temporary_credentials(args.account_id, args.s3_input_bucket, args.s3_input_key_prefix)
        spark = configure_spark_with_s3a(credentials)
    
        # Defining the schema corresponding to the input data
        schema = StructType([
            StructField("sex", StringType(), True),
            StructField("length", DoubleType(), True),
            StructField("diameter", DoubleType(), True),
            StructField("height", DoubleType(), True),
            StructField("whole_weight", DoubleType(), True),
            StructField("shucked_weight", DoubleType(), True),
            StructField("viscera_weight", DoubleType(), True),
            StructField("shell_weight", DoubleType(), True),
            StructField("rings", DoubleType(), True),
        ])
    
        # Reading data directly from S3 using s3a protocol
        total_df = spark.read.csv(
            f"s3a://{args.s3_input_bucket}/{args.s3_input_key_prefix}/abalone.csv",
            header=False,
            schema=schema
        )
    
        # Transformations and data processing
        sex_indexer = StringIndexer(inputCol="sex", outputCol="indexed_sex")
        sex_encoder = OneHotEncoder(inputCol="indexed_sex", outputCol="sex_vec")
        assembler = VectorAssembler(
            inputCols=[
                "sex_vec",
                "length",
                "diameter",
                "height",
                "whole_weight",
                "shucked_weight",
                "viscera_weight",
                "shell_weight",
            ],
            outputCol="features"
        )
        pipeline = Pipeline(stages=[sex_indexer, sex_encoder, assembler])
        model = pipeline.fit(total_df)
        transformed_total_df = model.transform(total_df)
        (train_df, validation_df) = transformed_total_df.randomSplit([0.8, 0.2])
    
        # Saving transformed datasets to S3 using RDDs and s3a protocol
        train_rdd = train_df.rdd.map(lambda x: (x.rings, x.features))
        train_lines = train_rdd.map(csv_line)
        train_lines.saveAsTextFile(
            f"s3a://{args.s3_output_bucket}/{args.s3_output_key_prefix}/train"
        )
    
        validation_rdd = validation_df.rdd.map(lambda x: (x.rings, x.features))
        validation_lines = validation_rdd.map(csv_line)
        validation_lines.saveAsTextFile(
            f"s3a://{args.s3_output_bucket}/{args.s3_output_key_prefix}/validation"
        )
    
    if __name__ == "__main__":
        main()
  14. Run the cell to create the preprocess.py file locally.
  15. Next, you use the PySparkProcessor class to define a Spark job and run it using SageMaker Processing. Copy the following code into a new cell in your SageMaker Studio notebook, and run the cell to invoke the SageMaker Processing job:
    from sagemaker.spark.processing import PySparkProcessor
    from time import gmtime, strftime
    import boto3
    import sagemaker
    import logging
    
    # Get region
    region = boto3.Session().region_name
    
    # Initialize Boto3 and SageMaker sessions
    boto_session = boto3.Session(region_name=region)
    sagemaker_session = sagemaker.Session(boto_session=boto_session)
    
    # Get account id
    def get_account_id():
        client = boto3.client("sts")
        return client.get_caller_identity()["Account"]
    account_id = get_account_id()
    
    bucket = sagemaker_session.default_bucket()
    role = sagemaker.get_execution_role()
    sagemaker_logger = logging.getLogger("sagemaker")
    sagemaker_logger.setLevel(logging.INFO)
    sagemaker_logger.addHandler(logging.StreamHandler())
    
    # Set up S3 bucket and paths
    timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())
    prefix = "Product/sagemaker/spark-preprocess-demo/{}".format(timestamp_prefix)
    
    # Define the account ID and S3 bucket details
    input_bucket = f'blog-access-grants-{account_id}-{region}'
    input_key_prefix = 'UserA'
    output_bucket = f'blog-access-grants-{account_id}-{region}'
    output_key_prefix = 'UserA/output'
    
    # Define the Spark processor using the custom Docker image
    spark_processor = PySparkProcessor(
        framework_version="3.3",
        role=role,
        instance_count=2,
        instance_type="ml.m5.2xlarge",
        base_job_name="spark-preprocess-job",
        sagemaker_session=sagemaker_session 
    )
    
    # Run the Spark processing job
    spark_processor.run(
        submit_app="./preprocess.py",
        arguments=[
            "--account_id", account_id,
            "--s3_input_bucket", input_bucket,
            "--s3_input_key_prefix", input_key_prefix,
            "--s3_output_bucket", output_bucket,
            "--s3_output_key_prefix", output_key_prefix,
        ],
        spark_event_logs_s3_uri=f"s3://{output_bucket}/{prefix}/spark_event_logs",
        logs=False
    )

    A few things to note in the definition of the PySparkProcessor:

    • This is a multi-node job with two ml.m5.2xlarge instances (specified in the instance_count and instance_type parameters)
    • The Spark framework version is set to 3.3 using the framework_version parameter
    • The PySpark script is passed using the submit_app parameter
    • Command line arguments to the PySpark script (such as the account ID, input/output bucket names, and input/output key prefixes) are passed through the arguments parameter
    • Spark event logs will be offloaded to the Amazon S3 location specified in spark_event_logs_s3_uri and can be used to view the Spark UI while the job is in progress or after it’s complete.
  16. After the job is complete, validate the output of the preprocessing job by looking at the first five rows of the output dataset using the following validation script:
    import boto3
    import pandas as pd
    import io
    
    # Initialize S3 client
    s3 = boto3.client('s3')
    
    # Get region
    region = boto3.Session().region_name
    
    # Get account id
    def get_account_id():
        client = boto3.client("sts")
        return client.get_caller_identity()["Account"]
    account_id = get_account_id()
    # Replace with your bucket name and output key prefix bucket_name = f'blog-access-grants-{account_id}-{region}' output_key_prefix = 'UserA/output/train' # Get temporary credentials for accessing S3 data using user profile role s3control_client = boto3.client('s3control') response = s3control_client.get_data_access( AccountId=account_id, Target=f's3://{bucket_name}/{output_key_prefix}', Permission='READ' ) credentials = response['Credentials'] # Create an S3 client with the temporary credentials s3_client = boto3.client( 's3', aws_access_key_id=credentials['AccessKeyId'], aws_secret_access_key=credentials['SecretAccessKey'], aws_session_token=credentials['SessionToken'] ) objects = s3_client.list_objects(Bucket=bucket_name, Prefix=output_key_prefix) # Read the first part file into a pandas DataFrame first_part_key = f"{output_key_prefix}/part-00000" obj = s3_client.get_object(Bucket=bucket_name, Key=first_part_key) data = obj['Body'].read().decode('utf-8') df = pd.read_csv(io.StringIO(data), header=None) # Print the top 5 rows print(f"Top 5 rows from s3://{bucket_name}/{first_part_key}") print(df.head())

    This script uses the access grants to obtain temporary credentials, reads the first part file (part-00000) from the output location into a pandas DataFrame, and prints the top five rows of the DataFrame.
    Because the User A role has access to the userA folder, the user can read the contents of the file part-00000, as shown in the following screenshot.

    Now, let’s validate access to the userA folder from the User B profile.

  17. Repeat the earlier steps to launch a Python notebook under the User B profile.

  18. Use the validation script to read the contents of the file part-00000, which is in the userA folder.

If User B tries to read the contents of the file part-00000, which is in the userA folder, their access will be denied, as shown in the following screenshot, because User B doesn’t have access to the userA folder.

Clean up

To avoid incurring future charges, delete the CloudFormation stack. This will delete resources such as the SageMaker Studio domain, S3 Access Grants instance, and S3 bucket you created.

Conclusion

In this post, you learned how to control data access to Amazon S3 from SageMaker Studio with S3 Access Grants. S3 Access Grants provides a more flexible and scalable mechanism to define access patterns at scale than IAM based techniques. These grants not only support IAM principals but also allow direct granting of access to users and groups from a corporate directory that is synchronized with IAM Identity Center.

Take the next step in optimizing your data management workflow by integrating S3 Access Grants into your AWS environment alongside SageMaker Studio, a web-based visual interface for building, training, debugging, deploying, and monitoring ML models. Take advantage of the granular access control and scalability offered by S3 Access Grants to enable efficient collaboration, secure data access, and simplified access management for your team working in the SageMaker Studio environment. For more details, refer to Managing access with S3 Access Grants and Amazon SageMaker Studio.


About the authors

Koushik Konjeti is a Senior Solutions Architect at Amazon Web Services. He has a passion for aligning architectural guidance with customer goals, ensuring solutions are tailored to their unique requirements. Outside of work, he enjoys playing cricket and tennis.

Vijay Velpula is a Data Architect with AWS Professional Services. He helps customers implement Big Data and Analytics Solutions. Outside of work, he enjoys spending time with family, traveling, hiking and biking.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey. In his spare time, he rides motorcycle and enjoys the nature with his family.

Read More