Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

Introducing SageMaker Core: A new object-oriented Python SDK for Amazon SageMaker

We’re excited to announce the release of SageMaker Core, a new Python SDK from Amazon SageMaker designed to offer an object-oriented approach for managing the machine learning (ML) lifecycle. This new SDK streamlines data processing, training, and inference and features resource chaining, intelligent defaults, and enhanced logging capabilities. With SageMaker Core, managing ML workloads on SageMaker becomes simpler and more efficient. The SageMaker Core SDK comes bundled as part of the SageMaker Python SDK version 2.231.0 and above.

In this post, we show how the SageMaker Core SDK simplifies the developer experience while providing API for seamlessly executing various steps in a general ML lifecycle. We also discuss the main benefits of using this SDK along with sharing relevant resources to learn more about this SDK.

Traditionally, developers have had two options when working with SageMaker: the  AWS SDK for Python, also known as boto3, or the SageMaker Python SDK. Although both provide comprehensive APIs for ML lifecycle management, they often rely on loosely typed constructs such as hard-coded constants and JSON dictionaries, mimicking a REST interface. For instance, to create a training job, Boto3 offers a create_training_job API, but retrieving job details requires the describe_training_job API.

While using boto3, developers face the challenge of remembering and crafting lengthy JSON dictionaries, ensuring that all keys are accurately placed. Let’s take a closer look at the create_training_job method from boto3:

response = client.create_training_job(
    TrainingJobName='string',
    HyperParameters={
        'string': 'string'
    },
    AlgorithmSpecification={
            .
            .
            .
    },
    RoleArn='string',
    InputDataConfig=[
        {
            .
            .
            .
        },
    ],
    OutputDataConfig={
            .
            .
            .
    },
    ResourceConfig={
            .
            .
            .    
    },
    VpcConfig={
            .
            .
            .
    },
    .
    .
    .
    .
# All arguments/fields are not shown for brevity purposes.

)

If we observe carefully, for arguments such as AlgorithmSpecification, InputDataConfig, OutputDataConfig, ResourceConfig, or VpcConfig, we need to write verbose JSON dictionaries. Because it contains many string variables in a long dictionary field, it’s very easy to have a typo somewhere or a missing key. There is no type checking possible, and as for the compiler, it’s just a string.
Similarly in SageMaker Python SDK, it requires us to create an estimator object and invoke the fit() method on it. Although these constructs work well, they aren’t intuitive to the developer experience. It’s hard for developers to map the meaning of an estimator to something that can be used to train a model.

Introducing SageMaker Core SDK

SageMaker Core SDK offers to solve this problem by replacing such long dictionaries with object-oriented interfaces, so developers can work with object-oriented abstractions, and SageMaker Core will take care of converting those objects to dictionaries and executing the actions on the developer’s behalf.

The following are the key features of SageMaker Core:

  • Object-oriented interface – It provides object-oriented classes for tasks such as processing, training, or deployment. Providing such interface can enforce strong type checking, make the code more manageable and promote reusability. Developers can benefit from all features of object-oriented programming.
  • Resource chaining – Developers can seamlessly pass SageMaker resources as objects by supplying them as arguments to different resources. For example, we can create a model object and pass that model object as an argument while setting up the endpoint. In contrast, while using Boto3, we need to supply ModelName as a string argument.
  • Abstraction of low-level details – It automatically handles resource state transitions and polling logics, freeing developers from managing these intricacies and allowing them to focus on higher value tasks.
  • Support for intelligent defaults – It supports SageMaker intelligent defaults, allowing developers to set default values for parameters such as AWS and Identity and Access Management (IAM) roles and virtual private cloud (VPC) configurations. This streamlines the setup process, and SageMaker Core API will pick the default settings automatically from the environment.
  • Auto code completion – It enhances the developer experience by offering real-time suggestions and completions in popular integrated development environments (IDEs), reducing chances of syntax errors and speeding up the coding process.
  • Full parity with SageMaker APIs, including generative AI – It provides access to the SageMaker capabilities, including generative AI, through the core SDK, so developers can seamlessly use SageMaker Core without worrying about feature parity with Boto3.
  • Comprehensive documentation and type hints – It provides robust and comprehensive documentation and type hints so developers can understand the functionalities of the APIs and objects, write code faster, and reduce errors.

For this walkthrough, we use a straightforward generative AI lifecycle involving data preparation, fine-tuning, and a deployment of Meta’s Llama-3-8B LLM. We use the SageMaker Core SDK to execute all the steps.

Prerequsities

To get started with SageMaker Core, make sure Python 3.8 or greater is installed in the environment. There are two ways to get started with SageMaker Core:

  1. If not using SageMaker Python SDK, install the sagemaker-core SDK using the following code example.
    %pip install sagemaker-core

  2. If you’re already using SageMaker Python SDK, upgrade it to a version greater than or matching version 2.231.0. Any version above 2.231.0 has SageMaker Core preinstalled. The following code example shows the command for upgrading the SageMaker Python SDK.
    %pip install –upgrade sagemaker>=2.231.0

Solution walkthrough

To manage your ML workloads on SageMaker using SageMaker Core, use the steps in the following sections.

Data preparation

In this phase, prepare the training and test data for the LLM. Here, use a publicly available dataset Stanford Question Answering Dataset (SQuAD). The following code creates a ProcessingJob object using the static method create, specifying the script path, instance type, and instance count. Intelligent default settings fetch the SageMaker execution role, which simplifies the developer experience further. You didn’t need to provide the input data location and output data location because that also is supplied through intelligent defaults. For information on how to set up intelligent defaults, check out Configuring and using defaults with the SageMaker Python SDK.

from sagemaker_core.resources import ProcessingJob

# Initialize a ProcessingJob resource
processing_job = ProcessingJob.create(
    processing_job_name="llm-data-prep",
    script_path="s3://my-bucket/data-prep-script.py",
    role_arn=<<Execution Role ARN>>, # Intelligent default for execution role
    instance_type="ml.m5.xlarge",
    instance_count=1
)

# Wait for the ProcessingJob to complete
processing_job.wait()

Training

In this step, you use the pre-trained Llama-3-8B model and fine-tune it on the prepared data from the previous step. The following code snippet shows the training API. You create a TrainingJob object using the create method, specifying the training script, source directory, instance type, instance count, output path, and hyper-parameters.

from sagemaker_core.resources import TrainingJob
from sagemaker_core.shapes import HyperParameters

# Initialize a TrainingJob resource
training_job = TrainingJob.create(
    training_job_name="llm-fine-tune",
    estimator_entry_point="train.py",
    source_dir="s3://my-bucket/training-code",
    instance_type="ml.g5.12xlarge",
    instance_count=1,
    output_path="s3://my-bucket/training-output",
    hyperparameters=HyperParameters(
        learning_rate=0.00001,
        batch_size=8,
        epochs=3
    ),
    role_arn==<<Execution Role ARN>>, # Intelligent default for execution role
    input_data=processing_job.output # Resource chaining
)

# Wait for the TrainingJob to complete
training_job.wait()

For hyperparameters, you create an object, instead of supplying a dictionary. Use resource chaining by passing the output of the ProcessingJob resource as the input data for the TrainingJob.

You also use the intelligent defaults to get the SageMaker execution role. Wait for the training job to finish, and it will produce a model artifact, wrapped in a tar.gz, and store it in the output_path provided in the preceding training API.

Model creation and deployment

Deploying a model on a SageMaker endpoint consists of three steps:

  1. Create a SageMaker model object
  2. Create the endpoint configuration
  3. Create the endpoint

SageMaker Core provides an object-oriented interface for all three steps.

  1. Create a SageMaker model object

The following code snippet shows the model creation experience in SageMaker Core.

from sagemaker_core.shapes import ContainerDefinition
from sagemaker_core.resources import Model

# Create a Model resource
model = Model.create(
    model_name="llm-model",
    primary_container=ContainerDefinition(
        image="763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124",
        environment={"HF_MODEL_ID": "meta-llama/Meta-Llama-3-8B"}
    ),
    execution_role_arn=<<Execution Role ARN>>, # Intelligent default for execution role
    input_data=training_job.output # Resource chaining
)

Similar to the processing and training steps, you have a create method from model class. The container definition is an object now, specifying the container definition that includes the large model inference (LMI) container image and the HuggingFace model ID. You can also observie resource chaining in action where you pass the output of the TrainingJob as input data to the model.

  1. Create the endpoint configuration

Create the endpoint configuration. The following code snippet shows the experience in SageMaker Core.

from sagemaker_core.shapes import ProductionVariant
from sagemaker_core.resources import Model, EndpointConfig, Endpoint

# Create an EndpointConfig resource
endpoint_config = EndpointConfig.create(
    endpoint_config_name="llm-endpoint-config",
    production_variants=[
        ProductionVariant(
            variant_name="llm-variant",
            initial_instance_count=1,
            instance_type="ml.g5.12xlarge",
            model_name=model
        )
    ]
)

ProductionVariant is an object in itself now.

  1. Create the endpoint

Create the endpoint using the following code snippet.

endpoint = Endpoint.create(
endpoint_name=model_name,
endpoint_config_name=endpoint_config,  # Pass `EndpointConfig` object created above
)

This also uses resource chaining. Instead of supplying just the endpoint_config_name (in Boto3), you pass the whole endpoint_config object.

As we have shown in these steps, SageMaker Core simplifies the development experience by providing an object-oriented interface for interacting with SageMaker resources. The use of intelligent defaults and resource chaining reduces the amount of boilerplate code and manual parameter specification, resulting in more readable and maintainable code.

Cleanup

Any endpoint created using the code in this post will incur charges. Shut down any unused endpoints by using the delete() method.

A note on existing SageMaker Python SDK

SageMaker Python SDK will be using the SageMaker Core as its foundation and will benefit from the object-oriented interfaces created as part of SageMaker Core. Customers can choose to use the object-oriented approach while using the SageMaker Python SDK going forward.

Benefits

The SageMaker Core SDK offers several benefits:

  • Simplified development – By abstracting low-level details and providing intelligent defaults, developers can focus on building and deploying ML models without getting slowed down by repetitive tasks. It also relieves the developers of the cognitive overload of having to remember long and complex multilevel dictionaries. They can instead work on the object-oriented paradigm that developers are most comfortable with.
  • Increased productivity – Features like automatic code completion and type hints help developers write code faster and with fewer errors.
  • Enhanced readability – Dedicated resource classes and resource chaining result in more readable and maintainable code.
  • Lightweight integration with AWS Lambda – Because this SDK is lightweight (about 8 MB when unzipped), it is straightforward to build an AWS Lambda layer for SageMaker Core and use it for executing various steps in the ML lifecycle through Lambda functions.

Conclusion

SageMaker Core is a powerful addition to Amazon SageMaker, providing a streamlined and efficient development experience for ML practitioners. With its object-oriented interface, resource chaining, and intelligent defaults, SageMaker Core empowers developers to focus on building and deploying ML models without getting slowed down by complex orchestration of JSON structures. Check out the following resources to get started today on SageMaker Core:


About the authors

Vikesh Pandey is a Principal GenAI/ML Specialist Solutions Architect at AWS, helping customers from financial industries design, build and scale their GenAI/ML workloads on AWS. He carries an experience of more than a decade and a half working on entire ML and software engineering stack. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

sishwe-author-picShweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and Masters of Science in Financial Engineering, both from New York University.

Read More

Create a data labeling project with Amazon SageMaker Ground Truth Plus

Create a data labeling project with Amazon SageMaker Ground Truth Plus

Amazon SageMaker Ground Truth is a powerful data labeling service offered by AWS that provides a comprehensive and scalable platform for labeling various types of data, including text, images, videos, and 3D point clouds, using a diverse workforce of human annotators. In addition to traditional custom-tailored deep learning models, SageMaker Ground Truth also supports generative AI use cases, enabling the generation of high-quality training data for artificial intelligence and machine learning (AI/ML) models. SageMaker Ground Truth includes a self-serve option and an AWS managed option known as SageMaker Ground Truth Plus. In this post, we focus on getting started with SageMaker Ground Truth Plus by creating a project and sharing your data that requires labeling.

Overview of solution

First, you fill out a consultation form on the Get Started with Amazon SageMaker Ground Truth page or, if you already have an AWS account, you submit a request project form on the SageMaker Ground Truth Plus console. An AWS expert contacts you to review your specific data labeling requirements. You can share any specific requirements such as subject matter expertise, language expertise, or geographic location of labelers. If you submitted a consultation form, you submit a request project form on the SageMaker Ground Truth Plus console and it will be approved without any further discussion. If you submitted a project request, then your project status changes from Review in progress to Request approved.

Next, you create your project team, which includes people that are collaborating with you on the project. Each team member receives an invitation to join your project. Now, you upload the data that requires labeling to an Amazon Simple Storage Solution (Amazon S3) bucket. To add that data to your project, go to your project portal and create a batch and include the S3 bucket URL. Every project consists of one or more batches. Each batch is made up of data objects to be labeled.

Now, the SageMaker Ground Truth Plus team takes over and sources annotators based on your specific data labeling needs, trains them on your labeling requirements, and creates a UI for them to label your data. After the labeled data passes internal quality checks, it is delivered back to an S3 bucket for you to use for training your ML models.

The following diagram illustrates the solution architecture.

Using the steps outlined in this post, you’ll be able to quickly get set up for your data labeling project. This includes requesting a new project, setting up a project team, and creating a batch, which includes the data objects you needed labeled.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account.
  • The URI of the S3 bucket where your data is stored. The bucket should be in the US East (N. Virginia) AWS Region.
  • An AWS Identity and Access Management (IAM) user. If you’re the owner of your AWS account, then you have administrator access and can skip this step. If your AWS account is part of an AWS organization, then you can ask your AWS administrator to grant your IAM user the required permissions. The following identity-based policy specifies the minimum permissions required for your IAM user to perform all the steps in this post (provide the name of the S3 bucket where your data is stored:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "groundtruthlabeling:ListProjects",
                    "groundtruthlabeling:GetIntakeFormStatus",
                    "groundtruthlabeling:CreateProject",
                    "sagemaker:ListWorkforces",
                    "sagemaker:PutLabelingPortalPolicy",
                    "sagemaker:GetLabelingPortalPolicy",
                    "iam:GetRole",
                    "iam:PassRole",
                    "iam:ListRoles",
                    "iam:CreateRole",
                    "iam:CreatePolicy",
                    "iam:AttachRolePolicy",
                    "s3:ListAllMyBuckets",
                    "cognito-idp:CreateUserPool",
                    "cognito-idp:ListUserPools",
                    "cognito-idp:ListGroups",
                    "cognito-idp:AdminAddUserToGroup",
                    "cognito-idp:AdminCreateUser",
                    "cognito-idp:CreateGroup",
                    "cognito-idp:CreateUserPoolClient",
                    "cognito-idp:CreateUserPoolDomain",
                    "cognito-idp:DescribeUserPool",
                    "cognito-idp:DescribeUserPoolClient",
                    "cognito-idp:ListUsersInGroup",
                    "cognito-idp:UpdateUserPool",
                    "cognito-idp:UpdateUserPoolClient"
                ],
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::<your-S3-bucket-name>",
                    "arn:aws:s3:::<your-S3-bucket-name>/*"
                ]
            }
        ]
    }

Request a project 

Complete the following steps to request a project:

  1. On the SageMaker console, under Ground Truth in the navigation pane, choose Plus.
  2. Choose Request project.
  3. For Business email address, enter a valid email.
  4. For Project name, enter a descriptive name with no spaces or special characters.
  5. For Task type, choose the option that best describes the type of data you need labeled.
  6. For Contains PII, only turn on if your data contains personally identifiable information (PII).
  7. For IAM role, the role you choose grants SageMaker Ground Truth Plus permissions to access your data in Amazon S3 and perform a labeling job. You can use any of the following options to specify the IAM role:
    1. Choose Create an IAM role (recommended), which provides access to the S3 buckets you specify and automatically attaches the required permissions and trust policy to the role.
    2. Enter a custom IAM role ARN.
    3. Choose an existing role.

If you don’t have permissions to create an IAM role, you may ask your AWS administrator to create the role for you. When using an existing role or a custom IAM role ARN, the IAM role should have the following permissions policy and trust policy.

The following code is the permissions policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<input-bucket-name>",
                "arn:aws:s3:::<input-bucket-name>/*",
                "arn:aws:s3:::<output-bucket-name>",
                "arn:aws:s3:::<output-bucket-name>/*"         
            ]
        }
    ]
}

The following code is the trust policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "sagemaker-ground-truth-plus.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}
  1. Choose Request project.

Under Ground Truth in the navigation pane, you can choose Plus to see your project listed in the Projects section with the status Review in progress.

An AWS representative will contact you within 72 hours to review your project requirements. When this review is complete, your project status will change from Review in progress to Request approved.

Create project team

SageMaker Ground Truth uses Amazon Cognito to manage the members of your workforce and work teams. Amazon Cognito is a service that you use to create identities for your workers. Complete the following steps to create a project team:

  1. On the SageMaker console, under Ground Truth in the navigation pane, choose Plus.
  2. Choose the Create project team.

The remaining steps depend on whether you create a new user group or import an existing group.

Option 1: Create a new Amazon Cognito user group

If you don’t want to import members from an existing Amazon Cognito user group in your account, or you don’t have any Amazon Cognito user groups in your account, you can use this option.

  1. When creating your project team, select Create a new Amazon Cognito user group.
  2. For Amazon Cognito user group name, enter a descriptive name with no spaces.
  3. For Email addresses, enter up to 50 addresses. Use a comma between addresses.
  4. Choose Preview invitation to see the email that is sent to the email addresses you provided.
  5. Choose Create project team.

Under Ground Truth in the navigation pane, choose Plus to see your project team listed in the Project team section. The email addresses you added are included in the Members section.

Option 2: Import existing Amazon Cognito user groups

If you have an existing Amazon Cognito user group in your account from which you want to import members, you can use this option.

  1. When creating your project team, select Import existing Amazon Cognito user groups.
  2. For Select existing Amazon Cognito user groups, choose the user group from which you want to import members.
  3. Choose Create project team.

Under Ground Truth in the navigation pane, choose Plus to see your project team listed in the Project team section. The email addresses you added are included in the Members section.

Access the project portal and Create Batch

You can use the project portal to create batches containing your unlabeled input data and track the status of your previously created batches in a project. To access the project portal, make sure that you have created at least one project and at least one project team with one verified member.

  1. On the SageMaker console, under Ground Truth in the navigation pane, choose Plus.
  2. Choose Open project portal.
  3. Log in to the project portal using your project team’s user credentials created in the previous step.

A list of all your projects is displayed on the project portal.

  1. Choose a project to open its details page.
  2. In the Batches section, choose Create batch.
  3. Enter the batch name, batch description, S3 location for input datasets, and S3 location for output datasets.
  4. Choose Submit.

To create a batch successfully, make sure you meet the following criteria:

  • Your S3 bucket is in the US East (N. Virginia) Region.
  • The maximum size for each file is no more than 2 GB.
  • The maximum number of files in a batch is 10,000.
  • The total size of a batch is less than 100 GB.
  • Your submitted batch is listed in the Batches section with the status Request submitted. When the data transfer is complete, the status changes to Data received.

Next, the SageMaker Ground Truth Plus team sets up data labeling workflows, which changes the batch status to In progress. Annotators label the data, and you complete your data quality check by accepting or rejecting the labeled data. Rejected objects go back to annotators to re-label. Accepted objects are delivered to an S3 bucket for you to use for training your ML models.

Conclusion

SageMaker Ground Truth Plus provides a seamless solution for building high-quality training datasets for your ML models. By using AWS managed expert labelers and automating the data labeling workflow, SageMaker Ground Truth Plus eliminates the overhead of building and managing your own labeling workforce. With its user-friendly interface and integrated tools, you can submit your data, specify labeling requirements, and monitor the progress of your projects with ease. As you receive accurately labeled data, you can confidently train your models, maintaining optimal performance and accuracy. Streamline your ML projects and focus on building innovative solutions with the power of SageMaker Ground Truth Plus.

To learn more, see Use Amazon SageMaker Ground Truth Plus to Label Data.


About the Authors

Joydeep Saha is a System Development Engineer at AWS with expertise in designing and implementing solutions to deliver business outcomes for customers. His current focus revolves around building cloud-native end-to-end data labeling solutions, empowering customers to unlock the full potential of their data and drive success through accurate and reliable machine learning models.

Ami Dani is a Senior Technical Program Manager at AWS focusing on AI/ML services. During her career, she has focused on delivering transformative software development projects for the federal government and large companies in industries as diverse as advertising, entertainment, and finance. Ami has experience driving business growth, implementing innovative training programs, and successfully managing complex, high-impact projects.

Read More

The Path to Achieve PyTorch Performance Boost on Windows CPU

The Path to Achieve PyTorch Performance Boost on Windows CPU

The challenge of PyTorch’s lower CPU performance on Windows compared to Linux has been a significant issue. There are multiple factors leading to this performance disparity. Through our investigation, we’ve identified several reasons for poor CPU performance on Windows, two primary issues have been pinpointed: the inefficiency of the Windows default malloc memory allocator and the absence of SIMD for vectorization optimizations on the Windows platform. In this article, we show how PyTorch CPU performance on Windows has improved from the previous releases and where it stands as of PyTorch 2.4.1.

Memory Allocation Optimization in PyTorch 2.1.2 and later

In versions prior to PyTorch 2.1.2, PyTorch relied on the operating system’s default malloc function for memory allocation. The default malloc memory allocation on the Windows platform was less efficient compared to the malloc implementation mechanism on the Linux platform, leading to increased memory allocation times and reduced performance. To address this, we have substituted the default Windows malloc with mimalloc, a more efficient memory allocator developed by Microsoft. This update, included with the release of PyTorch 2.1.2 and later, has significantly enhanced the CPU performance of PyTorch on Windows, as shown in Figure 1.1.

performance comparison chart

PyTorch CPU Performance Improvement on Windows with Memory Allocation Optimization

Figure 1.1: Relative throughput improvement achieved by upgrading from Windows PyTorch version 2.0.1 to 2.1.2 (higher is better).

The graph illustrates that with the release of PyTorch 2.1.2, there has been a notable enhancement in CPU performance on the Windows platform. The degree of improvement varies across different models, which can be attributed to the diverse mix of operations they perform and their corresponding memory access patterns. While the BERT model shows a modest performance gain, models like ResNet50 and MobileNet-v3 Large benefit from more pronounced improvements.

On a high-performance CPU, memory allocation becomes a performance bottleneck. This is also why addressing this issue has led to such significant performance improvements.

As shown in the graphs below, we see that PyTorch CPU performance on Windows can significantly be improved. However, there is still a noticeable gap when compared to its performance on Linux. The absence of vectorization optimizations in the Windows variant of PyTorch CPU is a key factor to the remaining performance gap.

performance comparison chart

Windows vs Linux Performance on PyTorch 2.0.1

Figure 1.2: Relative performance of Windows vs Linux with PyTorch version 2.0.1 (higher is better).

performance comparison chart

Windows vs Linux Performance on PyTorch 2.1.2

Figure 1.3: Relative performance of Windows vs Linux with PyTorch version 2.1.2 (higher is better).

Vectorization Optimization in PyTorch 2.4.1 and later

Prior to PyTorch 2.4.1, the Windows build of PyTorch lacked SIMD for vectorization optimizations, a feature that the Linux build leveraged for improved performance. This discrepancy was due to the SLEEF Library’s integration issues on Windows, which is a SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT and is essential for efficient trigonometric calculations. Through a collaborative effort with engineers from ARM and Qualcomm, these challenges were resolved, enabling the integration of SIMD into PyTorch for Windows. The PyTorch 2.4.1 update has thus significantly enhanced PyTorch’s CPU performance on Windows, as shown in Figure 2.1.

performance comparison chart

PyTorch CPU Performance Improvement on Windows with Vertorization Optimization

Figure 2.1: Relative throughput improvement achieved by upgrading from PyTorch CPU version 2.1.2 to 2.4.1 (higher is better).

As shown in the graph below, we see that PyTorch CPU performance on Windows ahieved the performance on Linux.

performance comparison chart

Windows vs Linux Performance on PyTorch 2.4.1

Figure 2.2: Relative performance of Windows vs Linux with PyTorch version 2.4.1 (higher is better).

CONCLUSION

From PyTorch 2.0.1 to PyTorch 2.4.1, the CPU performance gap between Windows and Linux has been continuously narrowing. We compared the ratio of CPU performance on Windows to CPU performance on Linux across different versions, and the results are shown in the following graph.

performance comparison chart

Windows vs Linux Performance on different version of PyTorch

Figure 3: Performance Ratio for Windows to Linux with different version of PyTorch (higher is better).

The graph shows that with PyTorch 2.4.1, CPU performance on Windows has nearly converged with that on Linux, and on some models, it has even surpassed Linux. For example, in the case of DistillBERT and RoBERTa models, the CPU performance ratio of Windows to Linux has achieved a remarkable 102%. However, certain models, including MobileNet-v3, still show a performance discrepancy. Intel engineers will continue to collaborate with Meta engineers, to reduce the performance gap of PyTorch CPU between Windows and Linux.

HOW TO TAKE ADVANTAGE OF THE OPTIMIZATIONS

Install PyTorch CPU 2.4.1 or later on Windows from the official repository, and you may automatically experience a performance boost with memory allocation and vectorizations.

ACKNOWLEDGMENTS

The results presented in this blog post was achieved through the collaborative effort of the Intel PyTorch team and Meta. We would like to express our sincere gratitude to Xu Han, Jiong Gong, Haozhe Zhu, Mingfei Ma, Chuanqi Wang, Guobing Chen and Eikan Wang. Their expertise and dedication have been instrumental in achieving the optimizations and performance improvements discussed here. Thanks to Jiachen Pu from community for his participation in the issue discussion and suggesting the use of mimalloc. We’d also like to express our gratitude to Microsoft for providing such an easily integrated and performant mallocation library. Thanks to Pierre Blanchard , Nathan Sircombe from ARM and Alex Reinking from Qualcomm for their contribution in overcome the compatibility issues with the sleef integrated to PyTorch Windows. Finally we want to thank Jing Xu, Weizhuo Zhang and Zhaoqiong Zheng for their contributions to this blog.

Product and Performance Information

The configurations in the table are collected with svr-info. Test by Intel on August 30, 2024.

Specification Configuration1 Configuration2
Name ThinkBook 14 G5+ IRH ThinkBook 14 G5+ IRH
Time Fri Aug 30 02:43:02 PM UTC 2024 Fri Aug 30 02:43:02 PM UTC 2024
System LENOVO LENOVO
Baseboard LENOVO LENOVO
Chassis LENOVO LENOVO
CPU Model 13th Gen Intel(R) Core(TM) i7-13700H 13th Gen Intel(R) Core(TM) i7-13700H
Microarchitecture Unknown Intel Unknown Intel
Sockets 1 1
Cores per Socket 14 14
Hyperthreading Enabled Enabled
CPUs 20 20
Intel Turbo Boost Enabled Enabled
Base Frequency 2.4GHz 2.4GHz
All-core Maximum Frequency 4.7GHz 4.7GHz
Maximum Frequency 4.8GHz 4.8GHz
NUMA Nodes 1 1
Prefetchers L2 HW: Enabled, L2 Adj.: Enabled, DCU HW: Enabled, DCU IP: Enabled L2 HW: Enabled, L2 Adj.: Enabled, DCU HW: Enabled, DCU IP: Enabled
PPINs
Accelerators DLB, DSA, IAA, QAT DLB, DSA, IAA, QAT
Installed Memory 32GB (8x4GB LPDDR4 7400 MT/s [5200 MT/s]) 32GB (8x4GB LPDDR4 7400 MT/s [5200 MT/s])
Hugepagesize 2048kb 2048kb
Transparent Huge Pages madvise madvise
Automatic NUMA Balancing Disabled Disabled
NIC “1. Raptor Lake PCH CNVi WiFi 2. Intel Corporation” “1. Raptor Lake PCH CNVi WiFi 2. Intel Corporation”
Disk Micron MTFDKBA512TFH 500G Micron MTFDKBA512TFH 500G
BIOS LBCN22WW LBCN22WW
Microcode 0x411c 0x411c
OS Windows 11 Desktop Ubuntu 23.10
Kernel OS Build 19045.4412 6.5.0-27-generic
TDP 200 watts 200 watts
Power & Perf Policy Normal Powersave (7) Normal Powersave (7)
Frequency Governor performance performance
Frequency Driver intel_pstate intel_pstate
Max C-State 9 9

Notices and Disclaimers

Performance varies by use, configuration and other factors. Learn more on the Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.

Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Read More

Generalizable Autoregressive Modeling of Time Series Through Functional Narratives

Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees. Based on the new set of…Apple Machine Learning Research

CAMPHOR: Collaborative Agents for Multi-Input Planning and High-Order Reasoning On Device

While server-side Large Language Models (LLMs) demonstrate proficiency in tool integration and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings opportunities to improve latency and privacy but also introduces unique challenges for accuracy and memory. We introduce CAMPHOR, an innovative on-device SLM multi-agent framework designed to handle multiple user inputs and reason over personal context locally, ensuring privacy is maintained. CAMPHOR employs a hierarchical architecture where a high-order reasoning agent decomposes complex tasks and coordinates expert…Apple Machine Learning Research

Create a multimodal chatbot tailored to your unique dataset with Amazon Bedrock FMs

Create a multimodal chatbot tailored to your unique dataset with Amazon Bedrock FMs

With recent advances in large language models (LLMs), a wide array of businesses are building new chatbot applications, either to help their external customers or to support internal teams. For many of these use cases, businesses are building Retrieval Augmented Generation (RAG) style chat-based assistants, where a powerful LLM can reference company-specific documents to answer questions relevant to a particular business or use case.

In the last few months, there has been substantial growth in the availability and capabilities of multimodal foundation models (FMs). These models are designed to understand and generate text about images, bridging the gap between visual information and natural language. Although such multimodal models are broadly useful for answering questions and interpreting imagery, they’re limited to only answering questions based on information from their own training document dataset.

In this post, we show how to create a multimodal chat assistant on Amazon Web Services (AWS) using Amazon Bedrock models, where users can submit images and questions, and text responses will be sourced from a closed set of proprietary documents. Such a multimodal assistant can be useful across industries. For example, retailers can use this system to more effectively sell their products (for example, HDMI_adaptor.jpeg, “How can I connect this adapter to my smart TV?”). Equipment manufacturers can build applications that allow them to work more effectively (for example, broken_machinery.png, “What type of piping do I need to fix this?”). This approach is broadly effective in scenarios where image inputs are important to query a proprietary text dataset. In this post, we demonstrate this concept on a synthetic dataset from a car marketplace, where a user can upload a picture of a car, ask a question, and receive responses based on the car marketplace dataset.

Solution overview

For our custom multimodal chat assistant, we start by creating a vector database of relevant text documents that will be used to answer user queries. Amazon OpenSearch Service is a powerful, highly flexible search engine that allows users to retrieve data based on a variety of lexical and semantic retrieval approaches. This post focuses on text-only documents, but for embedding more complex document types, such as those with images, see Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker.

After the documents are ingested in OpenSearch Service (this is a one-time setup step), we deploy the full end-to-end multimodal chat assistant using an AWS CloudFormation template. The following system architecture represents the logic flow when a user uploads an image, asks a question, and receives a text response grounded by the text dataset stored in OpenSearch.

System architecture

The logic flow for generating an answer to a text-image response pair routes as follows:

  • Steps 1 and 2 – To start, a user query and corresponding image are routed through an Amazon API Gateway connection to an AWS Lambda function, which serves as the processing and orchestrating compute for the overall process.
  • Step 3 – The Lambda function stores the query image in Amazon S3 with a specified ID. This may be useful for later chat assistant analytics.
  • Steps 4–8 – The Lambda function orchestrates a series of Amazon Bedrock calls to a multimodal model, an LLM, and a text-embedding model:
    • Query the Claude V3 Sonnet model with the query and image to produce a text description.
    • Embed a concatenation of the original question and the text description with the Amazon Titan Text Embeddings
    • Retrieve relevant text data from OpenSearch Service.
    • Generate a grounded response to the original question based on the retrieved documents.
  • Step 9 – The Lambda function stores the user query and answer in Amazon DynamoDB, linked to the Amazon S3 image ID.
  • Steps 10 and 11 – The grounded text response is sent back to the client.

There is also an initial setup of the OpenSearch Index, which is done using an Amazon SageMaker notebook.

Prerequisites

To use the multimodal chat assistant solution, you need to have a handful of Amazon Bedrock FMs available.

  1. On the Amazon Bedrock console, choose Model access in the navigation pane.
  2. Choose Manage model access.
  3. Activate all the Anthropic models, including Claude 3 Sonnet, as well as the Amazon Titan Text Embeddings V2 model, as shown in the following screenshot.

For this post, we recommend activating these models in the us-east-1 or us-west-2 AWS Region. These should become immediately active and available.

Bedrock model access

Simple deployment with AWS CloudFormation

To deploy the solution, we provide a simple shell script called deploy.sh, which can be used to deploy the end-to-end solution in different Regions. This script can be acquired directly from Amazon S3 using aws s3 cp s3://aws-blogs-artifacts-public/artifacts/ML-16363/deploy.sh .

Using the AWS Command Line Interface (AWS CLI), you can deploy this stack in various Regions using one of the following commands:

bash deploy.sh us-east-1

or

bash deploy.sh us-west-2

The stack may take up to 10 minutes to deploy. When the stack is complete, note the assigned physical ID of the Amazon OpenSearch Serverless collection, which you will use in further steps. It should look something like zr1b364emavn65x5lki8. Also, note the physical ID of the API Gateway connection, which should look something like zxpdjtklw2, as shown in the following screenshot.

cloudformation output

Populate the OpenSearch Service index

Although the OpenSearch Serverless collection has been instantiated, you still need to create and populate a vector index with the document dataset of car listings. To do this, you use an Amazon SageMaker notebook.

  1. On the SageMaker console, navigate to the newly created SageMaker notebook named MultimodalChatbotNotebook (as shown in the following image), which will come prepopulated with car-listings.zip and Titan-OS-Index.ipynb.
  1. After you open the Titan-OS-Index.ipynb notebook, change the host_id variable to the collection physical ID you noted earlier.Sagemaker notebook
  1. Run the notebook from top to bottom to create and populate a vector index with a dataset of 10 car listings.

After you run the code to populate the index, it may still take a few minutes before the index shows up as populated on the OpenSearch Service console, as shown in the following screenshot. 

Test the Lambda function

Next, test the Lambda function created by the CloudFormation stack by submitting a test event JSON. In the following JSON, replace your bucket with the name of your bucket created to deploy the solution, for example, multimodal-chatbot-deployment-ACCOUNT_NO-REGION.

{
"bucket": "multimodal-chatbot-deployment-ACCOUNT_NO-REGION",
"key": "jeep.jpg",
"question_text": "How much would a car like this cost?"
}

You can set up this test by navigating to the Test panel for the created lambda function and defining a new test event with the preceding JSON. Then, choose Test on the top right of the event definition.

If you are querying the Lambda function from another bucket than those allowlisted in the CloudFormation template, make sure to add the relevant permissions to the Lambda execution role.

The Lambda function may take between 10–20 seconds to run (mostly dependent on the size of your image). If the function performs properly, you should receive an output JSON similar to the following code block. The following screenshot shows the successful output on the console.

{
  "statusCode": 200,
  "body": ""Based on the 2013 Jeep Grand Cherokee SRT8 listing, a heavily modified Jeep like the one described could cost around $17,000 even with significant body damage and high mileage. The powerful engine, custom touches, and off-road capabilities likely justify that asking price.""
}

Note that if you just enabled model access, it may take a few minutes for access to propagate to the Lambda function.

Test the API

For integration into an application, we’ve connected the Lambda function to an API Gateway connection that can be pinged from various devices. We’ve included a notebook within the SageMaker notebook that allows you to query the system with a question and an image and return a response. To use the notebook, replace the API_GW variable with the physical ID of the API Gateway connection that was created using the CloudFormation stack and the REGION variable with the Region your infrastructure was deployed in. Then, making sure your image location and query are set correctly, run the notebook cell. Within 10–20 seconds, you should receive the output of your multimodal query sourced from your own text dataset. This is shown in the following screenshot.

Note that the API Gateway connection is only accessible from this specific notebook, and more comprehensive security and permission elements are required to productionize the system.

Qualitative results

A grounded multimodal chat assistant, where users can submit images with queries, can be useful in many settings. We demonstrate this application with a dataset of cars for sale. For example, a user may have a question about a car they’re looking at, so they snap a picture and submit a question, such as “How much might a car like this cost?” Rather than answering the question with generic information that the LLM was trained on (which may be out of date), responses will be grounded with your local and specific car sales dataset. In this use case, we took images from Unsplash and used a synthetically created dataset of 10 car listings to answer questions. The model and year of the 10 car listings are shown in the following screenshot.

For the examples in the following table, you can observe in the answer, not only has the vision language model (VLM) system identified the correct cars in the car listings that are most similar to the input image, but also it has answered the questions with specific numbers, costs, and locations that are only available from our closed cars dataset car-listings.zip.

Question Image Answer
How much would a car like this cost? The 2013 Jeep Grand Cherokee SRT8 listing is most relevant, with an asking price of $17,000 despite significant body damage from an accident. However, it retains the powerful 470 hp V8 engine and has been well-maintained with service records.
What is the engine size of this car? The car listing for the 2013 Volkswagen Beetle mentions it has a fuel-efficient 1.8L turbocharged engine. No other engine details are provided in the listings.
Where in the world could I purchase a used car like this? Based on the car listings provided, the 2021 Tesla Model 3 for sale seems most similar to the car you are interested in. It’s described as a low mileage, well-maintained Model 3 in pristine condition located in the Seattle area for $48,000.

Latency and quantitative results

Because speed and latency are important for chat assistants and because this solution consists of multiple API calls to FMs and data stores, it’s interesting to measure the speed of each step in the process. We did an internal analysis of the relative speeds of the various API calls, and the following graph visualizes the results.

From slowest to fastest, we have the call to the Claude V3 Vision FM, which takes on average 8.2 seconds. The final output generation step (LLM Gen on the graph in the screenshot) takes on average 4.9 seconds. The Amazon Titan Text Embeddings model and OpenSearch Service retrieval process are much faster, taking 0.28 and 0.27 seconds on average, respectively.

In these experiments, the average time for the full multistage multimodal chatbot is 15.8 seconds. However, the time can be as low as 11.5 seconds overall if you submit a 2.2 MB image, and it could be much lower if you use even lower-resolution images.

Clean up

To clean up the resources and avoid charges, follow these steps:

  1. Make sure all the important data from Amazon DynamoDB and Amazon S3 are saved
  2. Manually empty and delete the two provisioned S3 buckets
  3. To clean up the resources, delete the deployed resource stack from the CloudFormation console.

Conclusion

From applications ranging from online chat assistants to tools to help sales reps close a deal, AI assistants are a rapidly maturing technology to increase efficiency across sectors. Often these assistants aim to produce answers grounded in custom documentation and datasets that the LLM was not trained on, using RAG. A final step is the development of a multimodal chat assistant that can do so as well—answering multimodal questions based on a closed text dataset.

In this post, we demonstrated how to create a multimodal chat assistant that takes images and text as input and produces text answers grounded in your own dataset. This solution will have applications ranging from marketplaces to customer service, where there is a need for domain-specific answers sourced from custom datasets based on multimodal input queries.

We encourage you to deploy the solution for yourself, try different image and text datasets, and explore how you can orchestrate various Amazon Bedrock FMs to produce streamlined, custom, multimodal systems.


About the Authors

Emmett Goodman is an Applied Scientist at the Amazon Generative AI Innovation Center. He specializes in computer vision and language modeling, with applications in healthcare, energy, and education. Emmett holds a PhD in Chemical Engineering from Stanford University, where he also completed a postdoctoral fellowship focused on computer vision and healthcare.

Negin Sokhandan is a Principle Applied Scientist at the AWS Generative AI Innovation Center, where she works on building generative AI solutions for AWS strategic customers. Her research background is statistical inference, computer vision, and multimodal systems.

Yanxiang Yu is an Applied Scientist at the Amazon Generative AI Innovation Center. With over 9 years of experience building AI and machine learning solutions for industrial applications, he specializes in generative AI, computer vision, and time series modeling.

Read More

Design secure generative AI application workflows with Amazon Verified Permissions and Amazon Bedrock Agents

Design secure generative AI application workflows with Amazon Verified Permissions and Amazon Bedrock Agents

Amazon Bedrock Agents enable generative AI applications to perform multistep tasks across various company systems and data sources. They orchestrate and analyze the tasks and break them down into the correct logical sequences using the reasoning abilities of the foundation model (FM). Agents automatically call the necessary APIs to interact with the company systems and processes to fulfill the request. Throughout this process, agents determine whether they can proceed or if additional information is needed.

Customers can build innovative generative AI applications using Amazon Bedrock Agents’ capabilities to intelligently orchestrate their application workflows. When building such workflows, it can be challenging for customers to apply fine-grained access controls to make sure that the application’s workflow operates only on the authorized data based on the application user’s entitlements. Controlling access to resources based on user context, roles, actions and resource conditions can be challenging to maintain in an application workflow because that would require hardcoding several rules in your application or building your own authorization system to externalize those rules.

Instead of building your own authorization system for fine-grained access controls in your application workflows, you can integrate Amazon Verified Permissions into the agent’s workflow to apply contextually aware fine-grained access controls. Verified Permissions is a scalable permissions management and authorization service for custom applications built by you. Verified Permissions helps developers build secure applications faster by externalizing the authorization component and centralizing policy management and administration.

In this post, we demonstrate how to design fine-grained access controls using Verified Permissions for a generative AI application that uses Amazon Bedrock Agents to answer questions about insurance claims that exist in a claims review system using textual prompts as inputs and outputs. In our insurance claims system use case, there are two types of users: claims administrators and claims adjusters. Both are capable of listing open claims, but only one is capable of reading claim detail and making changes. We also show how to restrict permissions using custom attributes such as a user’s region for filtering insurance claims. In this post, the term region doesn’t refer to an AWS Region, but rather to a business-defined region.

Solution overview

In this solution design, we assume that the customer has claims records in an Amazon DynamoDB table and would like to build a chat-based application to answer frequently asked questions about their claims. This chat assistant will be used internally by claims administrators and claims adjusters to answer their clients’ questions.

The following is a list of actions that the claims team needs to perform to answer their clients’ questions:

  • Show me a list of my open claims
  • Show me claim detail for an input claim number
  • Update the status to closed for the input claim number

The customer has the following access control requirements for their claims system:

  • A claims administrator can list claims across various geographic areas, but they can’t read individual claim records
  • A claims adjuster can list claims for their region and can read and update the records of claims assigned to them. However, a claims adjuster can’t access claims from other regions.
  • is placed into a group in Amazon Cognito, where their application-level permissions are set and maintained
  • The customer would like to use Verified Permissions to externalize entity and record level authorization decisions without hard coding the application logic

To improve the performance of the chat assistant, the customer uses FMs available on Amazon Bedrock. To retrieve the necessary information from the claims table and dynamically orchestrate the requests, the customer uses Amazon Bedrock Agents together with Verified Permissions to provide fine-grained authorization for the agents’ invocation.

The application architecture for building the example chat-based Generative AI Claims application with fine-grained access controls is shown in the following diagram.

Figure 1. Architectural diagram for user flow

The application architecture flow is as follows:

  1. User accesses the Generative AI Claims web application (App).
  2. The App authenticates the user with the Amazon Cognito service and issues an ID token and an access tokenID token has the user’s identity and custom attributes.
  3. Using the App, the user sends a request asking to “list the open claims.” The request is sent along with the user’s ID token and access token. The App calls the Claims API Gateway API to run the claims proxy passing user requests and tokens.
  4. Claims API Gateway runs the Custom Authorizer to validate the access token.
  5. When access token validation is successful, the Claims API Gateway sends the user request to the Claims Proxy.
  6. The Claims Proxy invokes the Amazon Bedrock agent passing the user request and ID token. The Amazon Bedrock agent is configured to use Anthropic’s Claude model and to invoke actions using the Claims Agent Helper AWS Lambda
  7. Amazon Bedrock Agent uses chain-of-thought-prompting and builds the list of API actions to run with the help of Claims Agent Helper.
  8. The Claims Agent Helper retrieves claim records from Claims DB and constructs a claims list object. For this example, we are providing hard-coded examples in the Lambda function and no DynamoDB was added to the example solution provided. However, we provide the component on the architecture for representing real-life use cases where the data is stored outside the Lambda
  9. The Claims Agent Helper retrieves the user’s metadata (that is, their name) from ID token, builds the Verified Permissions data entities, and makes the Verified Permissions authorization request. This request contains the principal (user and role), action (that is, ListClaim) and resource (Claim). Verified Permissions evaluates the request against the Verified Permissions policies and returns an Allow or Deny decision. Subsequently, the Claims Agent Helper filters the claims based on that decision. Verified Permissions has “default deny” functionality, meaning that in the absence of an explicit allow, the service defaults to an implicit deny. If there is an explicit Deny in the policies involved in the request, Verified Permissions denies the request.
  10. The Claims Amazon Bedrock Agent receives the authorized list of claims, augments the prompt and sends it to the Claude model for completion. The agent returns the completion back to the user.

Fine-grained access control flows

Based on the customer’s access control requirements, there are three fine-grained access control flows as depicted in the following system sequence diagrams.

Use case: Claims administrator can list claims across regions

The following diagram shows how the claims administrator can list claims across regions.

Figure 2: Claims administrator 'list claims' allow

The following diagram depicts how the claims administrator’s fine-grained access to the claim record is run. In this diagram, notice a deny decision from Verified Permissions. This is because the principal’s role isn’t ClaimsAdjuster.

Figure 3: Claims administrator 'list claims' deny

Use case: Claims adjuster can see claims they own

The following diagram depicts how the claims adjuster’s fine-grained access to retrieve claim details is run. In this diagram, notice the allow decision from Verified Permissions. This is because the principal’s role is ClaimsAdjuster and the resource owner (that is, claim owner) matches the user principal (that is, user=alice).

Figure 4: Claims adjuster 'list claims' allow

The following diagram depicts how the claims adjuster’s fine-grained access to list open claims is run. In this diagram, notice the allow decision from Verified Permissions. This is because the principal’s group is ClaimsAdjuster and the region on the resource matches the principal’s region. As a result of this region filter on the authorization policy, only open claims for the user’s region are returned. Verified Permissions acts on principal, action, and individual resource (that is, a claim record) for the authorization decision. Therefore, the Lambda function needs to iterate through the list of open claims and make an isAuthorized request for each claim record. If this results in a performance issue, you can use the BatchIsAuthorized API and send multiple authzRequest in one API call.

Figure 5: Claims adjuster 'list claims' allow or deny

Entities design considerations

When designing fine-grained data access controls, it is best practice to start with the entity-relationship diagram (ERD) for the application. For our claims application, the user will operate on claim records to retrieve a list of claims records, get the details for an individual claim record, or update the status of a claim record. The following diagram is the ERD for this application modeled in Verified Permissions. With Verified Permissions, you can apply both role-based access control (RBAC) and attribute-based access control (ABAC).

Figure 6: Entity relationship diagram for the application

Here is a brief description of each entity and attributes that will be used for RBAC and ABAC against claim records.

  • Application – The application is a chat-based generative AI application using Amazon Bedrock Agents to understand the questions and retrieve the relevant claims data to assist claims administrators and claims adjusters.
  • Claim – The claim represents an insurance claim record that is stored in the DynamoDB table. The claims system stores claim records and the chatbot application allows users to retrieve and update these records.
  • User – The user.
  • Role – The role represents a user’s access within the application. Here is a list of available roles:
    • Claims administrators – Can list claims across various geographic regions, but they can’t read individual claim records
    • Claims adjusters – Can list claims for their region and read and update their claim records

The roles are managed through Amazon Cognito and Verified Permissions. Cognito maintains a record of which role a user is assigned to and includes this information in the token. Verified Permissions maintains a record of what that role is permitted to do. Fine-grained access controls exist to make sure that users have appropriate permissions for their roles, restricting access to sensitive claim data based on geographic regions and user groups.

Fine-grained authorization: Policy design

The Actions diagram view lists the types of Principals you have configured in your policy store, the Actions they are eligible to perform, and the Resources they are eligible to perform actions on. The lines between entities indicate your ability to create a policy that allows a principal to take an action on a resource. The following image shows the actions diagram from Verified Permissions for our insurance claims use case. The User principal will have access to the Get, List, and Update actions. The resources are the Application and the Claim entity within the application. This diagram generates the underlying schema that governs the policy definition.

Figure 7: Policy schema from Amazon Verified Permissions

Use case: Claims administrator can list all claim records across regions

A policy is a statement that either permits or forbids a principal to take one or more actions on a resource. Each policy is evaluated independently of other policies. The Verified Permissions policy for this use case is shown in the following code example. In this policy, the principal (that is, user Bob), is assigned the role of claims administrator.

permit (
    principal in avp::claim::app::Role::"ClaimsAdministrator",
    action in [
    avp::claim::app::Action::"ListClaim"
    ],
    resource
) ;

Use case: Claims administrator can’t access claim detail record

The Verified Permissions policy for this use case is shown in the following code example. The use of explicit “forbid” policies is a valid practice.

forbid (
    principal in avp::claim::app::Role::"ClaimsAdministrator",
    action in [
    avp::claim::app::Action::"GetClaim"
    ],
    resource
) ;

Use case: Claims adjuster can list claims they own in their region

The Verified Permissions policy for this use case is shown in the following code example. In this policy, the principal (that is, user Alice) is assigned the role of claims adjuster and their region is passed as a custom attribute in the ID token.

permit (
    principal in avp::claim::app::Role::"ClaimsAdjuster",
    action in [
    avp::claim::app::Action::"ListClaim"
    ],
    resource
) when {
    resource has owner &&
    principal == resource.owner &&
    principal has custom &&
    principal.custom has region &&
    principal.custom.region == resource.region
};

Use case: Claims adjuster can retrieve or update a claim they own

permit (
    principal in avp::claim::app::Role::"ClaimsAdjuster",
    action in [
    avp::claim::app::Action::"GetClaim",
     avp::claim::app::Action::"UpdateClaim"
    ],
    resource
) when {
    principal == resource.owner&&
    principal has custom &&
    principal.custom has region &&
    principal.custom.region == resource.region
};

Authentication design considerations

The configuration of Amazon Cognito for this use case followed the security practices included as part of the standard configuration workflow: a strong password policy, multi-factor authentication (MFA), and a client secret. When using Amazon Cognito with Verified Permissions, your application can pass user pool access or identity tokens to Verified Permissions to make the allow or deny decision. Verified Permissions evaluates the user’s request based on the policies it has stored in the policy store.

For custom attributes, we are using region to restrict which claims a claims adjuster can see, excluding claims made in regions outside the adjuster’s own region. We are also using role as a custom attribute to provide that information in the ID token that is passed to the Amazon Bedrock agent. When the user is registered in the Cognito user pool, these custom attributes will be recorded as part of the sign-up process.

Amazon Cognito integrates with Verified Permissions through the Identity sources section in the console. The following screenshot shows that we’ve connected our Cognito user pool to the Amazon Verified Permissions policy store.

Figure 8: Amazon Verified Permissions policy stores by ID

Fine-grained authorization: Passing ID token to the Amazon Bedrock agent

When the user is authenticated against the Cognito user pool, it returns an ID token and access token to the client application. The ID token will be passed through an API gateway and a proxy Lambda through SessionAttributes on the invoke_agent call.

# Invoke the agent API
response = bedrock_agent_runtime_client.invoke_agent(
    …    
    sessionState={
        'sessionAttributes': {
            'authorization_header': '<AUTHORIZATION_HEADER>'
        }
    },
)

The header is then retrieved from the Lambda event in the Action Group Lambda function and Verified Permissions is used to verify the user’s access against the desired action.

# Retrieve session attributes from event and use it to validate action
sessAttr = event.get("sessionAttributes")
auth, reason = verifyAccess(sessionAttributes, action_id)

Fine-grained authorization: Integration with Amazon Bedrock Agents

The ID token issued by Cognito contains the user’s identity and custom attributes. This ID token is passed to the Amazon Bedrock agent, and the Agent Helper Lambda retrieves that token from the agent’s session attribute. Then, the Agent Helper Lambda retrieves open claim records from DynamoDB and constructs the Verified Permissions schema entities and makes the isAuthorized API call.

Because Verified Permissions resources operate at the individual record level (that is, a single claim record), you need to iterate over the claims list object and make the isAuthorized API call for the authorization decision and then create the filtered claims list. The filtered claims list is then passed back to the caller. As a result, the claims adjuster will only see claims for their region, while a claims administrator can see claims across all regions.

The Amazon Bedrock agent then uses this filtered claim list to complete the user’s request to list claims. The chat application can only access the claims records that the user is authorized to view, providing the fine-grained access control integrated with the Amazon Bedrock agent workflow.

Getting started

Check out our code to get started developing your secure generative AI application using Amazon Verified Permissions. We provide you with an end-to-end implementation of the architecture described in this post and a demo UI you can use to test the permissions of different users. Update this example to implement generative AI applications that connect with your use case setup.

Conclusion

In this post, we discussed the challenges in applying fine-grained access controls for agent workflows in a generative AI application. We shared an application architecture for building an example chat-based generative AI application that uses Amazon Bedrock Agents to orchestrate workflows and applies fine-grained access controls using Amazon Verified Permissions. We discussed how to design fine-grained access permissions through the design of persona-based access control workflows. If you are looking for a scalable and secure way to apply fine-grained permissions to your generative AI agent-based workflows, give this solution a try and leave your feedback.


About the authors

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his three-year old sheep-a-doodle!

Samantha Wylatowska is a Solutions Architect at AWS. With a background in DevSecOps, her passion lies in guiding organizations towards secure operational efficiency, leveraging the power of automation for a seamless cloud experience. In her free time, she’s usually learning something new through music, literature, or film.

Anil Nadiminti is a Senior Solutions Architect at AWS specializing in empowering organizations to harness cloud computing and AI for digital transformation and innovation. His expertise in architecting scalable solutions and implementing data-driven strategies enables companies to innovate and thrive in today’s rapidly evolving technological landscape.

Michael Daniels is an AI/ML Specialist at AWS. His expertise lies in building and leading AI/ML and generative AI solutions for complex and challenging business problems, which is enhanced by his PhD from the Univ. of Texas and his MSc in computer science specialization in machine learning from the Georgia Institute of Technology. He excels in applying cutting-edge cloud technologies to innovate, inspire, and transform industry-leading organizations while also effectively communicating with stakeholders at any level or scale. In his spare time, you can catch Michael skiing or snowboarding.

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Read More

MAXimum AI: RTX-Accelerated Adobe AI-Powered Features Speed Up Content Creation

MAXimum AI: RTX-Accelerated Adobe AI-Powered Features Speed Up Content Creation

At the Adobe MAX creativity conference this week, Adobe announced updates to its Adobe Creative Cloud products, including Premiere Pro and After Effects, as well as to Substance 3D products and the Adobe video ecosystem.

These apps are accelerated by NVIDIA RTX and GeForce RTX GPUs — in the cloud or running locally on RTX AI PCs and workstations.

One of the most highly anticipated features is Generative Extend in Premiere Pro (beta), which uses generative AI to seamlessly add frames to the beginning or end of a clip. Powered by the Firefly Video Model, it’s designed to be commercially safe and only trained on content Adobe has permission to use, so artists can create with confidence.

Adobe Substance 3D Collection apps offer numerous RTX-accelerated features for 3D content creation, including ray tracing, AI delighting and upscaling, and image-to-material workflows powered by Adobe Firefly.

Substance 3D Viewer, entering open beta at Adobe MAX, is designed to unlock 3D in 2D design workflows by allowing 3D files to be opened, viewed and used across design teams. This will improve interoperability with other RTX-accelerated Adobe apps like Photoshop.

Adobe Firefly integrations have also been added to Substance 3D Collection apps, including Text to Texture, Text to Pattern and Image to Texture tools in Substance 3D Sampler, as well as Generative Background in Substance 3D Stager, to further enhance the 3D content creation with generative AI.

The October NVIDIA Studio Driver, designed to optimize creative apps, will be available for download tomorrow. For automatic Studio Driver notifications, as well as easy access to apps like NVIDIA Broadcast, download the NVIDIA app beta.

Video Editing Evolved

Adobe Premiere Pro has transformed video editing workflows over the last four years with features like Auto Reframe and Scene Edit Detection.

The recently launched GPU-accelerated Enhance Speech, AI Audio Category Tagging and Filler Word Detection features allow editors to use AI to intelligently cut and modify video scenes.

The Adobe Firefly Video Model — now available in limited beta at Firefly.Adobe.com — brings generative AI to video, marking the next advancement in video editing. It allows users to create and edit video clips using simple text prompts or images, helping fill in content gaps without having to reshoot, extend or reframe takes. It can also be used to create video clip prototypes as inspiration for future shots.

Topaz Labs has introduced a new plug-in for Adobe After Effects, a video enhancement software that uses AI models to improve video quality. This gives users access to enhancement and motion deblur models for sharper, clearer video quality. Accelerated on GeForce RTX GPUs, these models run nearly 2.5x faster on the GeForce RTX 4090 Laptop GPU compared with the MacBook Pro M3 Max.

Stay tuned for NVIDIA TensorRT enhancements and more Topaz Video AI effects coming to the After Effects plug-in soon.

3D Super Powered

The Substance 3D Collection is revolutionizing the ideation stage of 3D creation with powerful generative AI features in Substance 3D Sampler and Stager.

Sampler’s Text to Texture, Text to Pattern and Image to Texture tools, powered by Adobe Firefly, allow artists to rapidly generate reference images from simple prompts that can be used to create parametric materials.

Stager’s Generative Background feature helps designers explore backgrounds for staging 3D models, using text descriptions to generate images. Stager can then match lighting and camera perspective, allowing designers to explore more variations faster when iterating and mocking up concepts.

Substance 3D Viewer also offers a connected workflow with Photoshop, where 3D models can be placed into Photoshop projects and edits made to the model in Viewer will be automatically sent back to the Photoshop project. GeForce RTX GPU hardware acceleration and ray tracing provide smooth movement in the viewport, producing up to 80% higher frames per second on the GeForce RTX 4060 Laptop GPU compared to the MacBook M3 Pro.

There are also new Firefly-powered features in Substance 3D Viewer, like Text to 3D and 3D Model to Image, that combine text prompts and 3D objects to give artists more control when generating new scenes and variations.

The latest After Effects release features an expanded range of 3D tools that enable creators to embed 3D animations, cast ultra-realistic shadows on 2D objects and isolate effects in 3D space.

After Effects now also has an RTX GPU-powered Advanced 3D Renderer that accelerates the processing-intensive and time-consuming task of applying HDRI lighting — lowering creative barriers to entry while improving content realism. Rendering can be done 30% faster on a GeForce RTX 4090 GPU over the previous generation.

Pairing Substance 3D with After Effects native and fast 3D integration allows artists to significantly boost the visual quality of 3D in After Effects with precision texturing and access to more than 20,000 parametric 3D materials, IBL environment lights and 3D models.

Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More