Amazon AWS – Page 130

In this post, we discuss how United Airlines, in collaboration with the Amazon Machine Learning Solutions Lab, build an active learning framework on AWS to automate the processing of passenger documents.

“In order to deliver the best flying experience for our passengers and make our internal business process as efficient as possible, we have developed an automated machine learning-based document processing pipeline in AWS. In order to power these applications, as well as those using other data modalities like computer vision, we need a robust and efficient workflow to quickly annotate data, train and evaluate models, and iterate quickly. Over the course a couple months, United partnered with the Amazon Machine Learning Solutions Labs to design and develop a reusable, use case-agnostic active learning workflow using AWS CDK. This workflow will be foundational to our unstructured data-based machine learning applications as it will enable us to minimize human labeling effort, deliver strong model performance quickly, and adapt to data drift.”

– Jon Nelson, Senior Manager of Data Science and Machine Learning at United Airlines.

Problem

United’s Digital Technology team is made up of globally diverse individuals working together with cutting-edge technology to drive business outcomes and keep customer satisfaction levels high. They wanted to take advantage of machine learning (ML) techniques such as computer vision (CV) and natural language processing (NLP) to automate document processing pipelines. As part of this strategy, they developed an in-house passport analysis model to verify passenger IDs. The process relies on manual annotations to train ML models, which are very costly.

United wanted to create a flexible, resilient, and cost-efficient ML framework for automating passport information verification, validating passenger’s identities and detecting possible fraudulent documents. They engaged the ML Solutions Lab to help achieve this goal, which allows United to continue delivering world-class service in the face of future passenger growth.

Solution overview

Our joint team designed and developed an active learning framework powered by the AWS Cloud Development Kit (AWS CDK), which programmatically configures and provisions all necessary AWS services. The framework uses Amazon SageMaker to process unlabeled data, creates soft labels, launches manual labeling jobs with Amazon SageMaker Ground Truth, and trains an arbitrary ML model with the resulting dataset. We used Amazon Textract to automate information extraction from specific document fields such as name and passport number. On a high level, the approach can be described with the following diagram.

Data

The primary dataset for this problem is comprised of tens of thousands of main-page passport images from which personal information (name, date of birth, passport number, and so on) must be extracted. Image size, layout, and structure vary depending on the document issuing country. We normalize these images into a set of uniform thumbnails, which constitute the functional input for the active learning pipeline (auto-labeling and inference).

The second dataset contains JSON line formatted manifest files that relate raw passport images, thumbnail images, and label information such as soft labels and bounding box positions. Manifest files serve as a metadata set storing results from various AWS services in a unified format, and decouple the active learning pipeline from downstream services used by United. The following diagram illustrates this architecture.

The following code is an example manifest file:

{
    "raw-ref": "s3://bucket/passport-0.jpg",
    "textract-ref": "s3://bucket/textract/passport-0.jpg",
    "source-ref": "s3://bucket/clean-images/passport-0.jpg",
    "page-num": 1,
    "label": {
        "image_size": [...],
        "annotations": [
            {
                "class_id": 0,
                "top": 1856,
                "left": 1476,
                "height": 67,
                "width": 329
            },
            {"class_id": 1 ...},
            {"class_id": 2 ...},
            {"class_id": 3 ...},
            {"class_id": 4 ...},
            {"class_id": 5 ...},
            {"class_id": 6 ...},
            {"class_id": 7 ...},
            {"class_id": 8 ...},
            {"class_id": 9 ...},
            {"class_id": 10 ...},
        ]
    },
    "label-metadata": {
        "objects": [...],
        "class-map ": {"0": "Passport No." ...},
        "type": "groundtruth/object-detection",
        "human-annotated": "yes",
        "creation-date": "2022-09-19T00:58:55.729305",
        "job-name": "labeling-job/passports-20220918-195035"
    }
}

Solution components

The solution includes two main components:

An ML framework, which is responsible for training the model
An auto-labeling pipeline, which is responsible for improving trained model accuracy in a cost-efficient manner

The ML framework is responsible for training the ML model and deploying it as a SageMaker endpoint. The auto-labeling pipeline focuses on automating SageMaker Ground Truth jobs and sampling images for labeling through those jobs.

The two components are decoupled from each other and only interact through the set of labeled images produced by the auto-labeling pipeline. That is, the labeling pipeline creates labels that are later used by the ML framework to train the ML model.

ML framework

The ML Solutions Lab team built the ML framework using the Hugging Face implementation of the state-of-art LayoutLMV2 model (LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding, Yang Xu, et al.). Training was based on Amazon Textract outputs, which served as a preprocessor and produced bounding boxes around text of interest. The framework uses distributed training and runs on a custom Docker container based on the SageMaker pre-built Hugging Face image with additional dependencies (dependencies that are missing in the pre-built SageMaker Docker image but required for Hugging Face LayoutLMv2).

The ML model was trained to classify document fields in the following 11 classes:

"0": "Passport No.",
"1": "Surname",
"2": "Given Names",
"3": "Nationality",
"4": "Date of birth",
"5": "Place of birth",
"6": "Sex",
"7": "Date of issue",
"8": "Authority",
"9": "Date of expiration",
"10": "Endorsements"

The pre-built image parameters are:

{
    "framework": "huggingface",
    "py_version": "py38",
    "version": "4.17",
    "base_framework_version": "pytorch1.10"
}

The custom image Dockerfile is as follows: (BASE_IMAGE refers to the preceding base image):

ARG BASE_IMAGE
FROM ${BASE_IMAGE}

RUN pip install "amazon-textract-response-parser>=0.1,<0.2" "Pillow>=8,<9" 
    && pip install git+https://github.com/facebookresearch/detectron2.git
RUN pip install pytesseract "datasets==2.2.1" "torchvision>=0.11.3,<0.12"
RUN pip install setuptools==59.5.0

The training pipeline can be summarized in the following diagram.

First, we resize and normalize a batch of raw images into thumbnails. At the same time, a JSON line manifest file with one line per image is created with information about raw and thumbnail images from the batch. Next, we use Amazon Textract to extract text bounding boxes in the thumbnail images. All information produced by Amazon Textract is recorded in the same manifest file. Finally, we use the thumbnail images and manifest data to train a model, which is later deployed as a SageMaker endpoint.

Auto-labeling pipeline

We developed an auto-labeling pipeline designed to perform the following functions:

Run periodic batch inference on an unlabeled dataset.
Filter results based on a specific uncertainty sampling strategy.
Trigger a SageMaker Ground Truth job to label the sampled images using a human workforce.
Add newly labeled images to the training dataset for subsequent model refinement.

The uncertainty sampling strategy reduces the number of images sent to the human labeling job by selecting images that would likely contribute the most to improving model accuracy. Because human labeling is an expensive task, such sampling is an important cost reduction technique. We support four sampling strategies, which can be selected as a parameter stored in Parameter Store, a capability of AWS Systems Manager:

Least confidence
Margin confidence
Ratio of confidence
Entropy

The entire auto-labeling workflow was implemented with AWS Step Functions, which orchestrates the processing job (called the elastic endpoint for batch inference), uncertainty sampling, and SageMaker Ground Truth. The following diagram illustrates the Step Functions workflow.

Cost-efficiency

The main factor influencing labeling costs is manual annotation. Before deploying this solution, the United team had to use a rule-based approach, which required expensive manual data annotation and third-party parsing OCR techniques. With our solution, United reduced their manual labeling workload by manually labeling only images that would result in the largest model improvements. Because the framework is model-agnostic, it can be used in other similar scenarios, extending its value beyond passport images to a much broader set of documents.

We performed a cost analysis based on the following assumptions:

Each batch contains 1,000 images
Training is performed using an mlg4dn.16xlarge instance
Inference is performed on an mlg4dn.xlarge instance
Training is done after each batch with 10% of annotated labels
Each round of training results in the following accuracy improvements:
- 50% after the first batch
- 25% after the second batch
- 10% after the third batch

Our analysis shows that training cost remains constant and high without active learning. Incorporating active learning results in exponentially decreasing costs with each new batch of data.

We further reduced costs by deploying the inference endpoint as an elastic endpoint by adding an auto scaling policy. The endpoint resources can scale up or down between zero and a configured maximum number of instances.

Final solution architecture

Our focus was to help the United team meet their functional requirements while building a scalable and flexible cloud application. The ML Solutions Lab team developed the complete production-ready solution with help of AWS CDK, automating management and provisioning of all cloud resources and services. The final cloud application was deployed as a single AWS CloudFormation stack with four nested stacks, each represented a single functional component.

Almost every pipeline feature, including Docker images, endpoint auto scaling policy, and more, was parameterized through Parameter Store. With such flexibility, the same pipeline instance could be run with a broad range of settings, adding the ability to experiment.

Conclusion

In this post, we discussed how United Airlines, in collaboration with the ML Solutions Lab, built an active learning framework on AWS to automate the processing of passenger documents. The solution had great impact on two important aspects of United’s automation goals:

Reusability – Due to the modular design and model-agnostic implementation, United Airlines can reuse this solution on almost any other auto-labeling ML use case
Recurring cost reduction – By intelligently combining manual and auto-labeling processes, the United team can reduce average labeling costs and replace expensive third-party labeling services

If you are interested in implementing a similar solution or want to learn more about the ML Solutions Lab, contact your account manager or visit us at Amazon Machine Learning Solutions Lab.

About the Authors

Xin Gu is the Lead Data Scientist – Machine Learning at United Airlines’ Advanced Analytics and Innovation division. She contributed significantly to designing machine-learning-assisted document understanding automation and played a key role in expanding data annotation active learning workflows across diverse tasks and models. Her expertise lies in elevating AI efficacy and efficiency, achieving remarkable progress in the field of intelligent technological advancements at United Airlines.

Jon Nelson is the Senior Manager of Data Science and Machine Learning at United Airlines.

Alex Goryainov is Machine Learning Engineer at Amazon AWS. He builds architecture and implements core components of active learning and auto-labeling pipeline powered by AWS CDK. Alex is an expert in MLOps, cloud computing architecture, statistical data analysis and large scale data processing.

Vishal Das is an Applied Scientist at the Amazon ML Solutions Lab. Prior to MLSL, Vishal was a Solutions Architect, Energy, AWS. He received his PhD in Geophysics with a PhD minor in Statistics from Stanford University. He is committed to working with customers in helping them think big and deliver business results. He is an expert in machine learning and its application in solving business problems.

Tianyi Mao is an Applied Scientist at AWS based out of Chicago area. He has 5+ years of experience in building machine learning and deep learning solutions and focuses on computer vision and reinforcement learning with human feedbacks. He enjoys working with customers to understand their challenges and solve them by creating innovative solutions using AWS services.

Yunzhi Shi is an Applied Scientist at the Amazon ML Solutions Lab, where he works with customers across different industry verticals to help them ideate, develop, and deploy AI/ML solutions built on AWS Cloud services to solve their business challenges. He has worked with customers in automotive, geospatial, transportation, and manufacturing. Yunzhi obtained his Ph.D. in Geophysics from The University of Texas at Austin.

Diego Socolinsky is a Senior Applied Science Manager with the AWS Generative AI Innovation Center, where he leads the delivery team for the Eastern US and Latin America regions. He has over twenty years of experience in machine learning and computer vision, and holds a PhD degree in mathematics from The Johns Hopkins University.

Xin Chen is currently the Head of People Science Solutions Lab at Amazon People eXperience Technology (PXT, aka HR) Central Science. He leads a team of applied scientists to build production grade science solutions to proactively identify and launch mechanisms and process improvements. Previously, he was head of Central US, Greater China Region, LATAM and Automotive Vertical in AWS Machine Learning Solutions Lab. He helped AWS customers identify and build machine learning solutions to address their organization’s highest return-on-investment machine learning opportunities. Xin is adjunct faculty at Northwestern University and Illinois Institute of Technology. He obtained his PhD in Computer Science and Engineering at the University of Notre Dame.

Optimize generative AI workloads for environmental sustainability

September 21, 2023

by Wafae Bakkali Amazon AWS

The adoption of generative AI is rapidly expanding, reaching an ever-growing number of industries and users worldwide. With the increasing complexity and scale of generative AI models, it is crucial to work towards minimizing their environmental impact. This involves a continuous effort focused on energy reduction and efficiency by achieving the maximum benefit from the resources provisioned and minimizing the total resources required.

To add to our guidance for optimizing deep learning workloads for sustainability on AWS, this post provides recommendations that are specific to generative AI workloads. In particular, we provide practical best practices for different customization scenarios, including training models from scratch, fine-tuning with additional data using full or parameter-efficient techniques, Retrieval Augmented Generation (RAG), and prompt engineering. Although this post primarily focuses on large language models (LLM), we believe most of the recommendations can be extended to other foundation models.

Generative AI problem framing

When framing your generative AI problem, consider the following:

Align your use of generative AI with your sustainability goals – When scoping your project, be sure to take sustainability into account:
- What are the trade-offs between a generative AI solution and a less resource-intensive traditional approach?
- How can your generative AI project support sustainable innovation?
Use energy that has low carbon-intensity – When regulations and legal aspects allow, train and deploy your model on one of the 19 AWS Regions where the electricity consumed in 2022 was attributable to 100% renewable energy and Regions where the grid has a published carbon intensity that is lower than other locations (or Regions). For more detail, refer to How to select a Region for your workload based on sustainability goals. When selecting a Region, try to minimize data movement across networks: train your models close to your data and deploy your models close to your users.
Use managed services – Depending on your expertise and specific use case, weigh the options between opting for Amazon Bedrock, a serverless, fully managed service that provides access to a diverse range of foundation models through an API, or deploying your models on a fully managed infrastructure by using Amazon SageMaker. Using a managed service helps you operate more efficiently by shifting the responsibility of maintaining high utilization and sustainability optimization of the deployed hardware to AWS.
Define the right customization strategy – There are several strategies to enhance the capacities of your model, ranging from prompt engineering to full fine-tuning. Choose the most suitable strategy based on your specific needs while also considering the differences in resources required for each. For instance, fine-tuning might achieve higher accuracy than prompt engineering but consumes more resources and energy in the training phase. Make trade-offs: by opting for a customization approach that prioritizes acceptable performance over optimal performance, reductions in the resources used by your models can be achieved. The following figure summarizes the environmental impact of LLMs customization strategies.

Model customization

In this section, we share best practices for model customization.

Base model selection

Selecting the appropriate base model is a critical step in customizing generative AI workloads and can help reduce the need for extensive fine-tuning and associated resource usage. Consider the following factors:

Evaluate capabilities and limitations – Use the playgrounds of Amazon SageMaker JumpStart or Amazon Bedrock to easily test the capability of LLMs and assess their core limitations.
Reduce the need for customization – Make sure to gather information by using public resources such as open LLMs leaderboards, holistic evaluation benchmarks, or model cards to compare different LLMs and understand the specific domains, tasks, and languages for which they have been pre-trained on. Depending on your use case, consider domain-specific or multilingual models to reduce the need for additional customization.
Start with a small model size and small context window – Large model sizes and context windows (the number of tokens that can fit in a single prompt) can offer more performance and capabilities, but they also require more energy and resources for inference. Consider available versions of models with smaller sizes and context windows before scaling up to larger models. Specialized smaller models have their capacity concentrated on a specific target task. On these tasks, specialized models can behave qualitatively similarly to larger models (for example, GPT3.5, which has 175 billion parameters) while requiring fewer resources for training and inference. Examples of such models include Alpaca (7 billion parameters) or the utilization of T5 variants for multi-step math reasoning (11 billion parameters or more).

Prompt engineering

Effective prompt engineering can enhance the performance and efficiency of generative AI models. By carefully crafting prompts, you can guide the model’s behavior, reducing unnecessary iterations and resource requirements. Consider the following guidelines:

Keep prompts concise and avoid unnecessary details – Longer prompts lead to a higher number of tokens. As tokens increase in number, the model consumes more memory and computational resources. Consider incorporating zero-shot or few-shot learning to enable the model to adapt quickly by learning from just a few examples.
Experiment with different prompts gradually – Refine the prompts based on the desired output until you achieve the desired results. Depending on your task, explore advanced techniques such as self-consistency, Generated Knowledge Prompting, ReAct Prompting, or Automatic Prompt Engineer to further enhance the model’s capabilities.
Use reproducible prompts – With templates such as LangChain prompt templates, you can save or load your prompts history as files. This enhances prompt experimentation tracking, versioning, and reusability. When you know the prompts that produce the best answers for each model, you can reduce the computational resources used for prompt iterations and redundant experiments across different projects.

Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is a highly effective approach for augmenting model capabilities by retrieving and integrating pertinent external information from a predefined dataset. Because existing LLMs are used as is, this strategy avoids the energy and resources needed to train the model on new data or build a new model from scratch. Use tools such as Amazon Kendra or Amazon OpenSearch Service and LangChain to successfully build RAG-based solutions with Amazon Bedrock or SageMaker JumpStart.

Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) is a fundamental aspect of sustainability in generative AI. It aims to achieve performance comparable to fine-tuning, using fewer trainable parameters. By fine-tuning only a small number of model parameters while freezing most parameters of the pre-trained LLMs, we can reduce computational resources and energy consumption.

Use public libraries such as the Parameter-Efficient Fine-Tuning library to implement common PEFT techniques such as Low Rank Adaptation (LoRa), Prefix Tuning, Prompt Tuning, or P-Tuning. As an example, studies show the utilization of LoRa can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times, depending on the size of your model, with similar or better performance.

Fine-tuning

Fine-tune the entire pre-trained model with the additional data. This approach may achieve higher performance but is more resource-intensive than PEFT. Use this strategy when the available data significantly differs from the pre-training data.

By selecting the right fine-tuning approach, you can maximize the reuse of your model and avoid the resource usage associated with fine-tuning multiple models for each use case. For example, if you anticipate reusing the model within a specific domain or business unit in your organization, you may prefer domain adaptation. On the other hand, instruction-based fine-tuning is better suited for general use across multiple tasks.

Model training from scratch

In some cases, training an LLM model from scratch may be necessary. However, this approach can be computationally expensive and energy-intensive. To ensure optimal training, consider the following best practices:

Use efficient silicon – AWS Trainium-based EC2 Trn1 instances are purpose built for high-performance deep learning model training and optimized for energy efficiency. In 2022, we observed that training models on Trainium helps you reduce energy consumption by up to 29% vs. comparable instances.
Use high-quality data and scalable data curation – Use SageMaker Training to enhance the quality of the training dataset and minimize the reliance on massive data. By training on a diverse, comprehensive, and curated dataset, LLMs can produce responses that are more precise and achieve cost and energy optimization by reducing storage and compute requirements.
Use distributed training – Use SageMaker distributed libraries such as data parallelism, pipeline parallelism, or tensor parallelism to run parallel computing across multiple GPUs or instances. These approaches help maximize GPU utilization by splitting your training batches into smaller microbatches. The smaller microbatches are fed to GPUs in an efficient pipeline to keep all GPU devices simultaneously active, leading to resource optimization.

Model inference and deployment

Consider the following best practices for model inference and deployment:

Use deep learning containers for large model inference – You can use deep learning containers for large model inference on SageMaker and open-source frameworks such as DeepSpeed, Hugging Face Accelerate, and FasterTransformer to implement techniques like weight pruning, distillation, compression, quantization, or compilation. These techniques reduce model size and optimize memory usage.
Set appropriate inference model parameters – During inference, you have the flexibility to adjust certain parameters that influence the model’s output. Understanding and appropriately setting these parameters allows you to obtain the most relevant responses from your models and minimize the number of iterations of prompt-tuning. This ultimately results in reduced memory usage and lower energy consumption. Key parameters to consider are temperature, top_p, top_k, and max_length.
Adopt an efficient inference infrastructure – You can deploy your models on an AWS Inferentia2 accelerator. Inf2 instances offer up to 50% better performance/watt over comparable Amazon Elastic Compute Cloud (Amazon EC2) instances because the underlying AWS Inferentia2 accelerators are purpose built to run deep learning models at scale. As the most energy-efficient option on Amazon EC2 for deploying ultra-large models, Inf2 instances help you meet your sustainability goals when deploying the latest innovations in generative AI.
Align inference Service Level Agreement (SLA) with sustainability goals – Define SLAs that support your sustainability goals while meeting your business requirements. Define SLAs to meet your business requirements, not exceed them. Make trade-offs that significantly reduce your resources usage in exchange for acceptable decreases in service levels:
- Queue incoming requests and process them asynchronously – If your users can tolerate some latency, deploy your model on asynchronous endpoints to reduce resources that are idle between tasks and minimize the impact of load spikes. This will automatically scale the instance count to zero when there are no requests to process, so you only maintain an inference infrastructure when your endpoint is processing requests.
- Adjust availability – If your users can tolerate some latency in the rare case of a failover, don’t provision extra capacity. If an outage occurs or an instance fails, SageMaker automatically attempts to distribute your instances across Availability Zones.
- Adjust response time – When you don’t need real-time inference, use SageMaker batch transform. Unlike a persistent endpoint, clusters are decommissioned when batch transform jobs finish so you don’t continuously maintain an inference infrastructure.

Resource usage monitoring and optimization

Implement an improvement process to track the impact of your optimizations over time. The goal of your improvements is to use all the resources you provision and complete the same work with the minimum resources possible. To operationalize this process, collect metrics about the utilization of your cloud resources. These metrics, combined with business metrics, can be used as proxy metrics for your carbon emissions.

To consistently monitor your environment, you can use Amazon CloudWatch to monitor system metrics like CPU, GPU, or memory utilization. If you are using NVIDIA GPU, consider NVIDIA System Management Interface (nvidia-smi) to monitor GPU utilization and performance state. For Trainium and AWS Inferentia accelerator, you can use AWS Neuron Monitor to monitor system metrics. Consider also SageMaker Profiler, which provides a detailed view into the AWS compute resources provisioned during training deep learning models on SageMaker. The following are some key metrics worth monitoring:

CPUUtilization, GPUUtilization, GPUMemoryUtilization, MemoryUtilization, and DiskUtilization in CloudWatch
nvidia_smi.gpu_utilization, nvidia_smi.gpu_memory_utilization, and nvidia_smi.gpu_performance_state in nvidia-smi logs.
vcpu_usage, memory_info, and neuroncore_utilization in Neuron Monitor.

Conclusion

As generative AI models are becoming bigger, it is essential to consider the environmental impact of our workloads.

In this post, we provided guidance for optimizing the compute, storage, and networking resources required to run your generative AI workloads on AWS while minimizing their environmental impact. Because the field of generative AI is continuously progressing, staying updated with the latest courses, research, and tools can help you find new ways to optimize your workloads for sustainability.

About the Authors

Dr. Wafae Bakkali is a Data Scientist at AWS, based in Paris, France. As a generative AI expert, Wafae is driven by the mission to empower customers in solving their business challenges through the utilization of generative AI techniques, ensuring they do so with maximum efficiency and sustainability.

Benoit de Chateauvieux is a Startup Solutions Architect at AWS, based in Montreal, Canada. As a former CTO, he enjoys helping startups build great products using the cloud. He also supports customers in solving their sustainability challenges through the cloud. Outside of work, you’ll find Benoit in canoe-camping expeditions, paddling across Canadian rivers.

Train and deploy ML models in a multicloud environment using Amazon SageMaker

September 20, 2023

by Raja Vaidyanathan Amazon AWS

As customers accelerate their migrations to the cloud and transform their business, some find themselves in situations where they have to manage IT operations in a multicloud environment. For example, you might have acquired a company that was already running on a different cloud provider, or you may have a workload that generates value from unique capabilities provided by AWS. Another example is independent software vendors (ISVs) that make their products and services available in different cloud platforms to benefit their end customers. Or an organization may be operating in a Region where a primary cloud provider is not available, and in order to meet the data sovereignty or data residency requirements, they can use a secondary cloud provider.

In these scenarios, as you start to embrace generative AI, large language models (LLMs) and machine learning (ML) technologies as a core part of your business, you may be looking for options to take advantage of AWS AI and ML capabilities outside of AWS in a multicloud environment. For example, you may want to make use of Amazon SageMaker to build and train ML model, or use Amazon SageMaker Jumpstart to deploy pre-built foundation or third party ML models, which you can deploy at the click of a few buttons. Or you may want to take advantage of Amazon Bedrock to build and scale generative AI applications, or you can leverage AWS’ pre-trained AI services, which don’t require you to learn machine learning skills. AWS provides support for scenarios where organizations want to bring their own model to Amazon SageMaker or into Amazon SageMaker Canvas for predictions.

In this post, we demonstrate one of the many options that you have to take advantage of AWS’s broadest and deepest set of AI/ML capabilities in a multicloud environment. We show how you can build and train an ML model in AWS and deploy the model in another platform. We train the model using Amazon SageMaker, store the model artifacts in Amazon Simple Storage Service (Amazon S3), and deploy and run the model in Azure. This approach is beneficial if you use AWS services for ML for its most comprehensive set of features, yet you need to run your model in another cloud provider in one of the situations we’ve discussed.

Key concepts

Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for machine learning. SageMaker Studio allows data scientists, ML engineers, and data engineers to prepare data, build, train, and deploy ML models on one web interface. With SageMaker Studio, you can access purpose-built tools for every stage of the ML development lifecycle, from data preparation to building, training, and deploying your ML models, improving data science team productivity by up to ten times. SageMaker Studio notebooks are quick start, collaborative notebooks that integrate with purpose-built ML tools in SageMaker and other AWS services.

SageMaker is a comprehensive ML service enabling business analysts, data scientists, and MLOps engineers to build, train, and deploy ML models for any use case, regardless of ML expertise.

AWS provides Deep Learning Containers (DLCs) for popular ML frameworks such as PyTorch, TensorFlow, and Apache MXNet, which you can use with SageMaker for training and inference. DLCs are available as Docker images in Amazon Elastic Container Registry (Amazon ECR). The Docker images are preinstalled and tested with the latest versions of popular deep learning frameworks as well as other dependencies needed for training and inference. For a complete list of the pre-built Docker images managed by SageMaker, see Docker Registry Paths and Example Code. Amazon ECR supports security scanning, and is integrated with Amazon Inspector vulnerability management service to meet your organization’s image compliance security requirements, and to automate vulnerability assessment scanning. Organizations can also use AWS Trainium and AWS Inferentia for better price-performance for running ML training jobs or inference.

Solution overview

In this section, we describe how to build and train a model using SageMaker and deploy the model to Azure Functions. We use a SageMaker Studio notebook to build, train, and deploy the model. We train the model in SageMaker using a pre-built Docker image for PyTorch. Although we’re deploying the trained model to Azure in this case, you could use the same approach to deploy the model on other platforms such as on premises or other cloud platforms.

When we create a training job, SageMaker launches the ML compute instances and uses our training code and the training dataset to train the model. It saves the resulting model artifacts and other output in an S3 bucket that we specify as input to the training job. When model training is complete, we use the Open Neural Network Exchange (ONNX) runtime library to export the PyTorch model as an ONNX model.

Finally, we deploy the ONNX model along with a custom inference code written in Python to Azure Functions using the Azure CLI. ONNX supports most of the commonly used ML frameworks and tools. One thing to note is that converting an ML model to ONNX is useful if you want to want to use a different target deployment framework, such as PyTorch to TensorFlow. If you’re using the same framework on both the source and target, you don’t need to convert the model to ONNX format.

The following diagram illustrates the architecture for this approach.

We use a SageMaker Studio notebook along with the SageMaker Python SDK to build and train our model. The SageMaker Python SDK is an open-source library for training and deploying ML models on SageMaker. For more details, refer to Create or Open an Amazon SageMaker Studio Notebook.

The code snippets in the following sections have been tested in the SageMaker Studio notebook environment using the Data Science 3.0 image and Python 3.0 kernel.

In this solution, we demonstrate the following steps:

Train a PyTorch model.
Export the PyTorch model as an ONNX model.
Package the model and inference code.
Deploy the model to Azure Functions.

Prerequisites

You should have the following prerequisites:

An AWS account.
A SageMaker domain and SageMaker Studio user. For instructions to create these, refer to Onboard to Amazon SageMaker Domain Using Quick setup.
The Azure CLI.
Access to Azure and credentials for a service principal that has permissions to create and manage Azure Functions.

Train a model with PyTorch

In this section, we detail the steps to train a PyTorch model.

Install dependencies

Install the libraries to carry out the steps required for model training and model deployment:

pip install torchvision onnx onnxruntime

Complete initial setup

We begin by importing the AWS SDK for Python (Boto3) and the SageMaker Python SDK. As part of the setup, we define the following:

A session object that provides convenience methods within the context of SageMaker and our own account.
A SageMaker role ARN used to delegate permissions to the training and hosting service. We need this so that these services can access the S3 buckets where our data and model are stored. For instructions on creating a role that meets your business needs, refer to SageMaker Roles. For this post, we use the same execution role as our Studio notebook instance. We get this role by calling sagemaker.get_execution_role().
The default Region where our training job will run.
The default bucket and the prefix we use to store the model output.

See the following code:

import sagemaker
import boto3
import os

execution_role = sagemaker.get_execution_role()
region = boto3.Session().region_name
session = sagemaker.Session()
bucket = session.default_bucket()
prefix = "sagemaker/mnist-pytorch"

Create the training dataset

We use the dataset available in the public bucket sagemaker-example-files-prod-{region}. The dataset contains the following files:

train-images-idx3-ubyte.gz – Contains training set images
train-labels-idx1-ubyte.gz – Contains training set labels
t10k-images-idx3-ubyte.gz – Contains test set images
t10k-labels-idx1-ubyte.gz – Contains test set labels

We use thetorchvision.datasets module to download the data from the public bucket locally before uploading it to our training data bucket. We pass this bucket location as an input to the SageMaker training job. Our training script uses this location to download and prepare the training data, and then train the model. See the following code:

MNIST.mirrors = [
    f"https://sagemaker-example-files-prod-{region}.s3.amazonaws.com/datasets/image/MNIST/"
]

MNIST(
    "data",
    download=True,
    transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
    ),
)

Create the training script

With SageMaker, you can bring your own model using script mode. With script mode, you can use the pre-built SageMaker containers and provide your own training script, which has the model definition, along with any custom libraries and dependencies. The SageMaker Python SDK passes our script as an entry_point to the container, which loads and runs the train function from the provided script to train our model.

When the training is complete, SageMaker saves the model output in the S3 bucket that we provided as a parameter to the training job.

Our training code is adapted from the following PyTorch example script. The following excerpt from the code shows the model definition and the train function:

# define network

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

# train

def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            if args.dry_run:
                break

Train the model

Now that we have set up our environment and created our input dataset and custom training script, we can start the model training using SageMaker. We use the PyTorch estimator in the SageMaker Python SDK to start a training job on SageMaker. We pass in the required parameters to the estimator and call the fit method. When we call fit on the PyTorch estimator, SageMaker starts a training job using our script as training code:

from sagemaker.pytorch import PyTorch

output_location = f"s3://{bucket}/{prefix}/output"
print(f"training artifacts will be uploaded to: {output_location}")

hyperparameters={
    "batch-size": 100,
    "epochs": 1,
    "lr": 0.1,
    "gamma": 0.9,
    "log-interval": 100
}

instance_type = "ml.c4.xlarge"
estimator = PyTorch(
    entry_point="train.py",
    source_dir="code",  # directory of your training script
    role=execution_role,
    framework_version="1.13",
    py_version="py39",
    instance_type=instance_type,
    instance_count=1,
    volume_size=250,
    output_path=output_location,
    hyperparameters=hyperparameters
)

estimator.fit(inputs = {
    'training': f"{inputs}",
    'testing':  f"{inputs}"
})

Export the trained model as a ONNX model

After the training is complete and our model is saved to the predefined location in Amazon S3, we export the model to an ONNX model using the ONNX runtime.

We include the code to export our model to ONNX in our training script to run after the training is complete.

PyTorch exports the model to ONNX by running the model using our input and recording a trace of operators used to compute the output. We use a random input of the right type with the PyTorch torch.onnx.export function to export the model to ONNX. We also specify the first dimension in our input as dynamic so that our model accepts a variable batch_size of inputs during inference.

def export_to_onnx(model, model_dir, device):
    logger.info("Exporting the model to onnx.")
    dummy_input = torch.randn(1, 1, 28, 28).to(device)
    input_names = [ "input_0" ]
    output_names = [ "output_0" ]
    path = os.path.join(model_dir, 'mnist-pytorch.onnx')
    torch.onnx.export(model, dummy_input, path, verbose=True, input_names=input_names, output_names=output_names,
                     dynamic_axes={'input_0' : {0 : 'batch_size'},    # variable length axes
                                'output_0' : {0 : 'batch_size'}})

ONNX is an open standard format for deep learning models that enables interoperability between deep learning frameworks such as PyTorch, Microsoft Cognitive Toolkit (CNTK), and more. This means you can use any of these frameworks to train the model and subsequently export the pre-trained models in ONNX format. By exporting the model to ONNX, you get the benefit of a broader selection of deployment devices and platforms.

Download and extract the model artifacts

The ONNX model that our training script has saved has been copied by SageMaker to Amazon S3 in the output location that we specified when we started the training job. The model artifacts are stored as a compressed archive file called model.tar.gz. We download this archive file to a local directory in our Studio notebook instance and extract the model artifacts, namely the ONNX model.

import tarfile

local_model_file = 'model.tar.gz'
model_bucket,model_key = estimator.model_data.split('/',2)[-1].split('/',1)
s3 = boto3.client("s3")
s3.download_file(model_bucket,model_key,local_model_file)

model_tar = tarfile.open(local_model_file)
model_file_name = model_tar.next().name
model_tar.extractall('.')
model_tar.close()

Validate the ONNX model

The ONNX model is exported to a file named mnist-pytorch.onnx by our training script. After we have downloaded and extracted this file, we can optionally validate the ONNX model using the onnx.checker module. The check_model function in this module checks the consistency of a model. An exception is raised if the test fails.

import onnx

onnx_model = onnx.load("mnist-pytorch.onnx")
onnx.checker.check_model(onnx_model)

Package the model and inference code

For this post, we use .zip deployment for Azure Functions. In this method, we package our model, accompanying code, and Azure Functions settings in a .zip file and publish it to Azure Functions. The following code shows the directory structure of our deployment package:

mnist-onnx
├── function_app.py
├── model
│ └── mnist-pytorch.onnx
└── requirements.txt

List dependencies

We list the dependencies for our inference code in the requirements.txt file at the root of our package. This file is used to build the Azure Functions environment when we publish the package.

azure-functions
numpy
onnxruntime

Write inference code

We use Python to write the following inference code, using the ONNX Runtime library to load our model and run inference. This instructs the Azure Functions app to make the endpoint available at the /classify relative path.

import logging
import azure.functions as func
import numpy as np
import os
import onnxruntime as ort
import json


app = func.FunctionApp()

def preprocess(input_data_json):
    # convert the JSON data into the tensor input
    return np.array(input_data_json['data']).astype('float32')
    
def run_model(model_path, req_body):
    session = ort.InferenceSession(model_path)
    input_data = preprocess(req_body)
    logging.info(f"Input Data shape is {input_data.shape}.")
    input_name = session.get_inputs()[0].name  # get the id of the first input of the model   
    try:
        result = session.run([], {input_name: input_data})
    except (RuntimeError) as e:
        print("Shape={0} and error={1}".format(input_data.shape, e))
    return result[0] 

def get_model_path():
    d=os.path.dirname(os.path.abspath(__file__))
    return os.path.join(d , './model/mnist-pytorch.onnx')

@app.function_name(name="mnist_classify")
@app.route(route="classify", auth_level=func.AuthLevel.ANONYMOUS)
def main(req: func.HttpRequest) -> func.HttpResponse:
    logging.info('Python HTTP trigger function processed a request.')
    # Get the img value from the post.
    try:
        req_body = req.get_json()
    except ValueError:
        pass

    if req_body:
        # run model
        result = run_model(get_model_path(), req_body)
        # map output to integer and return result string.
        digits = np.argmax(result, axis=1)
        logging.info(type(digits))
        return func.HttpResponse(json.dumps({"digits": np.array(digits).tolist()}))
    else:
        return func.HttpResponse(
             "This HTTP triggered function successfully.",
             status_code=200
        )

Deploy the model to Azure Functions

Now that we have the code packaged into the required .zip format, we’re ready to publish it to Azure Functions. We do that using the Azure CLI, a command line utility to create and manage Azure resources. Install the Azure CLI with the following code:

!pip install -q azure-cli

Then complete the following steps:

Set up the resource creation parameters:

import random

random_suffix = str(random.randint(10000,99999))
resource_group_name = f"multicloud-{random_suffix}-rg"
storage_account_name = f"multicloud{random_suffix}"
location = "ukwest"
sku_storage = "Standard_LRS"
functions_version = "4"
python_version = "3.9"
function_app = f"multicloud-mnist-{random_suffix}"

Use the following commands to create the Azure Functions app along with the prerequisite resources:

!az group create --name {resource_group_name} --location {location}
!az storage account create --name {storage_account_name} --resource-group {resource_group_name} --location {location} --sku {sku_storage}
!az functionapp create --name {function_app} --resource-group {resource_group_name} --storage-account {storage_account_name} --consumption-plan-location "{location}" --os-type Linux --runtime python --runtime-version {python_version} --functions-version {functions_version}

Set up the Azure Functions so that when we deploy the Functions package, the requirements.txt file is used to build our application dependencies:
```
!az functionapp config appsettings set --name {function_app} --resource-group {resource_group_name} --settings @./functionapp/settings.json
```
Configure the Functions app to run the Python v2 model and perform a build on the code it receives after .zip deployment:
```
{
	"AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
	"SCM_DO_BUILD_DURING_DEPLOYMENT": true
}
```
After we have the resource group, storage container, and Functions app with the right configuration, publish the code to the Functions app:
```
!az functionapp deployment source config-zip -g {resource_group_name} -n {function_app} --src {function_archive} --build-remote true
```

Test the model

We have deployed the ML model to Azure Functions as an HTTP trigger, which means we can use the Functions app URL to send an HTTP request to the function to invoke the function and run the model.

To prepare the input, download the test images files from the SageMaker example files bucket and prepare a set of samples to the format required by the model:

from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

transform=transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
)

test_dataset = datasets.MNIST(root='../data',  download=True, train=False, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=True)

test_features, test_labels = next(iter(test_loader))

Use the requests library to send a post request to the inference endpoint with the sample inputs. The inference endpoint takes the format as shown in the following code:

import requests, json

def to_numpy(tensor):
    return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

url = f"https://{function_app}.azurewebsites.net/api/classify"
response = requests.post(url, 
                json.dumps({"data":to_numpy(test_features).tolist()})
            )
predictions = json.loads(response.text)['digits']

Clean up

When you’re done testing the model, delete the resource group along with the contained resources, including the storage container and Functions app:

!az group delete --name {resource_group_name} --yes

Additionally, it is recommended to shut down idle resources within SageMaker Studio to reduce costs. For more information, refer to Save costs by automatically shutting down idle resources within Amazon SageMaker Studio.

Conclusion

In this post, we showed how you can build and train an ML model with SageMaker and deploy it to another cloud provider. In the solution, we used a SageMaker Studio notebook, but for production workloads, we recommended using MLOps to create repeatable training workflows to accelerate model development and deployment.

This post didn’t show all the possible ways to deploy and run a model in a multicloud environment. For example, you can also package your model into a container image along with inference code and dependency libraries to run the model as a containerized application in any platform. For more information about this approach, refer to Deploy container applications in a multicloud environment using Amazon CodeCatalyst. The intent of the post is to show how organizations can use AWS AI/ML capabilities in a multicloud environment.

About the authors

Raja Vaidyanathan is a Solutions Architect at AWS supporting global financial services customers. Raja works with customers to architect solutions to complex problems with long-term positive impact on their business. He’s a strong engineering professional skilled in IT strategy, enterprise data management, and application architecture, with particular interests in analytics and machine learning.

Amandeep Bajwa is a Senior Solutions Architect at AWS supporting financial services enterprises. He helps organizations achieve their business outcomes by identifying the appropriate cloud transformation strategy based on industry trends and organizational priorities. Some of the areas Amandeep consults on are cloud migration, cloud strategy (including hybrid and multicloud), digital transformation, data and analytics, and technology in general.

Prema Iyer is Senior Technical Account Manager for AWS Enterprise Support. She works with external customers on a variety of projects, helping them improve the value of their solutions when using AWS.

Alexa unveils new speech recognition, text-to-speech technologies

September 20, 2023

by Amazon AWS

Leveraging large language models will make interactions with Alexa more natural and engaging.Read More

Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets

September 19, 2023

by Sovik Nath Amazon AWS

Multi-modal data is a valuable component of the financial industry, encompassing market, economic, customer, news and social media, and risk data. Financial organizations generate, collect, and use this data to gain insights into financial operations, make better decisions, and improve performance. However, there are challenges associated with multi-modal data due to the complexity and lack of standardization in financial systems and data formats and quality, as well as the fragmented and unstructured nature of the data. Financial clients have frequently described the operational overhead of gaining financial insights from multi-modal data, which necessitates complex extraction and transformation logic, leading to bloated effort and costs. Technical challenges with multi-modal data further include the complexity of integrating and modeling different data types, the difficulty of combining data from multiple modalities (text, images, audio, video), and the need for advanced computer science skills and sophisticated analysis tools.

One of the ways to handle multi-modal data that is gaining popularity is the use of multi-modal agents. Multi-modal agents are AI systems that can understand and analyze data in multiple modalities using the right tools in their toolkit. They are able to connect insights across these diverse data types to gain a more comprehensive understanding and generate appropriate responses. Multi-modal agents, in conjunction with generative AI, are finding a wide spread application in financial markets. The following are a few popular use cases:

Smart reporting and market intelligence – AI can analyze various sources of financial information to generate market intelligence reports, aiding analysts, investors, and companies to stay updated on trends. Multi-modal agents can summarize lengthy financial reports quickly, saving analysts significant time and effort.
Quantitative modeling and forecasting – Generative models can synthesize large volumes of financial data to train machine learning (ML) models for applications like stock price forecasting, portfolio optimization, risk modeling, and more. Multi-modal models that understand diverse data sources can provide more robust forecasts.
Compliance and fraud detection – This solution can be extended to include monitoring tools that analyze communication channels like calls, emails, chats, access logs, and more to identify potential insider trading or market manipulation. Detecting fraudulent collusion across data types requires multi-modal analysis.

A multi-modal agent with generative AI boosts the productivity of a financial analyst by automating repetitive and routine tasks, freeing time for analysts to focus on high-value work. Multi-modal agents can amplify an analyst’s ability to gain insights by assisting with research and analysis. Multi-modal agents can also generate enhanced quantitative analysis and financial models, enabling analysts to work faster and with greater accuracy.

Implementing a multi-modal agent with AWS consolidates key insights from diverse structured and unstructured data on a large scale. Multi-modal agents can easily combine the power of generative AI offerings from Amazon Bedrock and Amazon SageMaker JumpStart with the data processing capabilities from AWS Analytics and AI/ML services to provide agile solutions that enable financial analysts to efficiently analyze and gather insights from multi-modal data in a secure and scalable manner within AWS. Amazon offers a suite of AI services that enable natural language processing (NLP), speech recognition, text extraction, and search:

Amazon Comprehend is an NLP service that can analyze text for key phrases and analyze sentiment
Amazon Textract is an intelligent document processing service that can accurately extract text and data from documents
Amazon Transcribe is an automatic speech recognition service that can convert speech to text
Amazon Kendra is an enterprise search service powered by ML to find the information across a variety of data sources, including documents and knowledge bases

In this post, we showcase a scenario where a financial analyst interacts with the organization’s multi-modal data, residing on purpose-built data stores, to gather financial insights. In the interaction, we demonstrate how multi-modal agents plan and run the user query and retrieve the results from the relevant data sources. All this is achieved using AWS services, thereby increasing the financial analyst’s efficiency to analyze multi-modal financial data (text, speech, and tabular data) holistically.

The following screenshot shows an example of the UI.

Solution overview

The following diagram illustrates the conceptual architecture to use generative AI with multi-modal data using agents. The steps involved are as follows:

The financial analyst poses questions via a platform such as chatbots.
The platform uses a framework to determine the most suitable multi-modal agent tool to answer the question.
Once identified, the platform runs the code that is linked to the previously identified tool.
The tool generates an analysis of the financial data as requested by the financial analyst.
In summarizing the results, large language models retrieve and report back to the financial analyst.

Technical architecture

The multi-modal agent orchestrates various tools based on natural language prompts from business users to generate insights. For unstructured data, the agent uses AWS Lambda functions with AI services such as Amazon Textract for document analysis, Amazon Transcribe for speech recognition, Amazon Comprehend for NLP, and Amazon Kendra for intelligent search. For structured data, the agent uses the SQL Connector and SQLAlchemy to analyze databases, which includes Amazon Athena. The agent also utilizes Python in Lambda and the Amazon SageMaker SDK for computations and quantitative modeling. The agent also has long-term memory for storing prompts and results in Amazon DynamoDB. The multi-modal agent resides in a SageMaker notebook and coordinates these tools based on English prompts from business users in a Streamlit UI.

The key components of the technical architecture are as follows:

Data storage and analytics – The quarterly financial earning recordings as audio files, financial annual reports as PDF files, and S&P stock data as CSV files are hosted on Amazon Simple Storage Service (Amazon S3). Data exploration on stock data is done using Athena.
Large language models – The large language models (LLMs) are available via Amazon Bedrock, SageMaker JumpStart, or an API.
Agents – We use LangChain’s agents for a non-predetermined chain of calls as user input to LLMs and other tools. In these types of chains, there is an agent that has access to a suite of tools. Each tool has been built for a specific task. Depending on the user input, the agent decides the tool or a combination of tools to call to answer the question. We created the following purpose-built agent tools for our scenario:
- Stocks Querying Tool – To query S&P stocks data using Athena and SQLAlchemy.
- Portfolio Optimization Tool – To build a portfolio based on the chosen stocks.
- Financial Information Lookup Tool – To search for financial earnings information stored in multi-page PDF files using Amazon Kendra.
- Python Calculation Tool – To use for mathematical calculations.
- Sentiment Analysis Tool – To identify and score sentiments on a topic using Amazon Comprehend.
- Detect Phrases Tool – To find key phrases in recent quarterly reports using Amazon Comprehend.
- Text Extraction Tool – To convert the PDF versions of quarterly reports to text files using Amazon Textract.
- Transcribe Audio Tool – To convert audio recordings to text files using Amazon Transcribe.

The agent memory that holds the chain of user interactions with the agent is saved in DynamoDB.

The following sections explain some of the primary steps with associated code. To dive deeper into the solution and code for all the steps shown here, refer to the GitHub repo.

Prerequisites

To run this solution, you must have an API key to an LLM such as Anthropic Claud2, or have access to Amazon Bedrock foundation models.

To generate responses from structured and unstructured data using LLMs and LangChain, you need access to LLMs through either Amazon Bedrock, SageMaker JumpStart, or API keys, and to use databases that are compatible with SQLAlchemy. AWS Identity and Access Management (IAM) policies are also required, the details which you can find in the GitHub repo.

Key components of a multi-modal agent

There are a few key components components of the multi-modal agent:

Functions defined for tools of the multi-modal agent
Tools defined for the multi-modal agent
Long-term memory for the multi-modal agent
Planner-executor based multi-modal agent (defined with tools, LLMs, and memory)

In this section, we illustrate the key components with associated code snippets.

Functions defined for tools of the multi-modal agent

The multi-modal agent needs to use various AI services to process different types of data—text, speech, images, and more. Some of these functions may need to call AWS AI services like Amazon Comprehend to analyze text, Amazon Textract to analyze images and documents, and Amazon Transcribe to convert speech to text. These functions can either be called locally within the agent or deployed as Lambda functions that the agent can invoke. The Lambda functions internally call the relevant AWS AI services and return the results to the agent. This approach modularizes the logic and makes the agent more maintainable and extensible.

The following function defines how to calculate the optimized portfolio based on the chosen stocks. One way to convert a Python-based function to an LLM tool is to use the BaseTool wrapper.

class OptimizePortfolio(BaseTool):

name = "Portfolio Optimization Tool"
description = """
use this tool when you need to build optimal portfolio or for optimization of stock price.
The stock_ls should be a list of stock symbols, such as ['WWW', 'AAA', 'GGGG'].
"""

def _run(self, stock_ls: List):

session = boto3.Session(region_name=region_name)
athena_client = session.client('athena')

database=database_name
table=table_Name
...

The following is the code for Lambda calling the AWS AI service (Amazon Comprehend, Amazon Textract, Amazon Transcribe) APIs:

def SentimentAnalysis(inputString):
print(inputString)
lambda_client = boto3.client('lambda')
lambda_payload = {"inputString:"+inputString}
response=lambda_client.invoke(FunctionName='FSI-SentimentDetecttion',
InvocationType='RequestResponse',
Payload=json.dumps(inputString))
print(response['Payload'].read())
return response

Tools defined for the multi-modal agent

The multi-modal agent has access to various tools to enable its functionality. It can query a stocks database to answer questions on stocks. It can optimize a portfolio using a dedicated tool. It can retrieve information from Amazon Kendra, Amazon’s enterprise search service. A Python REPL tool allows the agent to run Python code. An example of the structure of the tools, including their names and descriptions, is shown in the following code. The actual tool box of this post has eight tools: Stocks Querying Tool, Portfolio Optimization Tool, Financial Information Lookup Tool, Python Calculation Tool, Sentiment Analysis Tool, Detect Phrases Tool, Text Extraction Tool, and Transcribe Audio Tool.

tools = [
Tool(
name="Financial Information Lookup Tool",
func=run_chain,
description="""
Useful for when you need to look up financial information using Kendra.
"""
),
Tool(
name="Sentiment Analysis Tool",
func=SentimentAnalysis,
description="""
Useful for when you need to analyze the sentiment of a topic.
"""
),
Tool(
name="Detect Phrases Tool",
func=DetectKeyPhrases,
description="""
Useful for when you need to detect key phrases in recent quaterly reports.
"""
),
...
]

Long-term memory for the multi-modal agent

The following code illustrates the configuration of long-term memory for the multi-modal agent. In this code, DynamoDB table is added as memory to store prompts and answers for future reference.

chat_history_table = dynamodb_table_name

chat_history_memory = DynamoDBChatMessageHistory(table_name=chat_history_table, session_id=chat_session_id)
memory = ConversationBufferMemory(memory_key="chat_history",
chat_memory=chat_history_memory, return_messages=True)

Planner-executor based multi-modal agent

The planner-executor based multi-modal agent architecture has two main components: a planner and an executor. The planner generates a high-level plan with steps required to run and answer the prompt question. The executor then runs this plan by generating appropriate system responses for each plan step using the language model with necessary tools. See the following code:

llm = ChatAnthropic(temperature=0, anthropic_api_key=ANTHROPIC_API_KEY, max_tokens_to_sample = 512)
model = llm

planner = load_chat_planner(model)

system_message_prompt = SystemMessagePromptTemplate.from_template(combo_template)
human_message_prompt = planner.llm_chain.prompt.messages[1]
planner.llm_chain.prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

executor = load_agent_executor(model, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True, max_iterations=2)

Example scenarios based on questions asked by financial analyst

In this section, we explore two example scenarios to illustrate the end-to-end steps performed by the multi-modal agent based on questions asked by financial analyst.

Scenario 1: Questions by financial analyst related to structured data

In this scenario, the financial analyst asks a question in English related to companies’ stocks to the multi-modal agent. The multi-modal LangChain agent comes up with a multi-step plan and decides what tools to use for each step. The following diagram illustrates an example workflow with the following steps:

The financial analyst asks a financial question in English through the UI to the multi-modal agent.
The agent identifies that it requires the database tool to answer the question. It generates a SQL query using an LLM based on the question and queries the Athena database.
Athena runs the SQL query, retrieves the relevant result (stock price time series of the five companies), and passes the result with relevant data to the agent.
The agent identifies that it requires a second tool to answer the question. It passes the retrieved data to the Python tool for portfolio optimization.
The Python tool calculates the required optimal portfolio, including weights, and passes the answer to the LLM within the agent.
The agent compiles the required information and calculations using the LLM and answers the financial analysist via the UI in English.

The financial analyst asks the following question:

“What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks?”

The following code shows the steps identified by the planner of the multi-modal agent:

steps=[Step(value='Use Stocks Querying Tool to find closing stock prices of AAAA, WWW and DDD in year 2018.'),
Step(value='Use Portfolio Optimization Tool to build an optimized portfolio using the three stocks.'),
Step(value='Provide the closing stock prices and optimized portfolio allocation.')]

The agent translated the question prompt into the following SQL query:

SELECT date, aaaa, www, ddd
FROM "blog-stock-prices-db"."stock_prices"
WHERE date BETWEEN '2018-01-01' AND '2018-12-31'
ORDER BY date DESC

The following code shows the final answer by the multi-modal agent based on the question:

The closing stock prices for AAAA, WWW and DDD in 2018 are:

AAAA: $172.44
WWW: $85.91
DDD: $9.82

The optimized portfolio allocation for stocks AAAA, WWW and DDD is:

AAAA: 55%
WWW: 45%
DDD: 0%'

Scenario 2: Questions by financial analyst related to unstructured data

In this scenario, a financial analyst asks multiple questions about the company’s financial health to the multi-modal agent. The agent invokes two tool agents, Amazon Kendra with unstructured data and a Python compute tool, to gather information and perform the necessary calculations. The agent then compiles the information from the tools using its language model and provides the full answer to the analyst. The key of this scenario is that the agent can remember the questions and answers of the previous questions, and incorporates the context into the conversation. The following diagram illustrates the workflow, which consists of the following steps:

The financial analyst asks a financial question in English through the UI to the multi-modal agent.
The agent identifies that it requires the Amazon Kendra tool to answer the question and invokes the Amazon Kendra agent tool.
Amazon Kendra searches the unstructured data corpus in Amazon S3 and retrieves the relevant document containing the answer.
Using Retrieval Augmented Generation (RAG), Amazon Kendra passes the answer from the relevant document to the LLM within the agent.
The agent identifies that it requires a second tool to answer the second part of the question. It passes information from the first tool to the Python compute tool.
The Python tool calculates the required ratio and passes the answer to the LLM within the agent.
The agent compiles the required information and calculations using the LLM and answers the financial analyst via the UI.

The following are questions and answers between the analyst and multi-modal agent.

The agent answers the question “What is Amazon’s net loss for fiscal year ending in December 2022?” with “Amazon reported total net sales of $514.0 billion and a net loss of $2.7 billion for fiscal year 2022.” The source is the Amazon 10K report for the fiscal year ending December 2022, retrieved using the Retrive_Kendra Tool.

For the question “What is the net loss for the same year ?” the agent answers “$2.7 billion.” The source is the Amazon 10K report for the fiscal year ending December 2022, retrieved using the Retrive_Kendra Tool. Note that even though the question is similar to the second question, this question didn’t specify the company name or the year number. However, the agent was able to use the correct company name and year based on the context in conversation memory.

For the question “What is the formula for Net Profit Margin Ratio which is a type of profitability ratio ?” the agent answers “Amazon reported total net sales of $514.0 billion and a net loss of $2.7 billion for fiscal year 2022. The Net Profit Margin Ratio for Amazon for fiscal year 2022 is -0.5%.” The source is Retrieve_Kendra and PythonREPLTool().

Dive deeper into the solution

To dive deeper into the solution and the code shown in this post, check out the GitHub repo.

In the GitHub repo, you will be able to find a Python notebook that has the end-to-end solution, an AWS CloudFormation template for provisioning the infrastructure, unstructured data (earnings reports PDF files, earnings call audio files), and structured data (stocks time series).

In the appendix at the end, different questions asked by financial analyst, agent tools invoked, and the answer from the multi-modal agent has been tabulated.

Clean up

After you run the multi-modal agent, make sure to clean up any resources that won’t be utilized. Shut down and delete the databases used (Athena). In addition, delete the data in Amazon S3 and stop any SageMaker Studio notebook instances to not incur any further charges. Delete the Lambda functions and DynamoDB tables as part of long-term memory that aren’t used. If you used SageMaker JumpStart to deploy an LLM as a SageMaker real-time endpoint, delete the endpoint through either the SageMaker console or SageMaker Studio.

Conclusion

This post demonstrated the wide range of AWS storage, AI/ML, and compute services that you can use to build an advanced multi-modal AI solution along with the LangChain framework and generative AI. By integrating NLP, speech recognition, and ML technologies, the multi-modal agent developed in this scenario is able to analyze diverse financial data modalities (structured and unstructured) and respond to prompts in natural language (English). This boosts the productivity and efficiency of financial analysts, allowing them to focus on high-value, strategic work. Multi-modal AI solutions powered by AWS have the potential to transform financial markets by automating repetitive tasks, enhancing human capabilities, and generating data-driven insights efficiently. For structured data, a modern data architecture with multiple databases can be incorporated with generative AI and LLM. For more details, refer to Reinventing the data experience: Use generative AI and modern data architecture to unlock insights. Also, refer to Amazon Bedrock for use cases on generative AI, foundation models, and large language models.

Appendix

The following table contains example questions by a financial analyst to ask the multi-modal agent based on structured and unstructured data.

Question Purpose	Question asked by financial analysts through UI	Agent tools invoked	Primary AWS services used by Agent tools	Answer from multi-modal agent
Prompt the LangChain agent to preprocess and extract information from the collected data	Convert the Amazon`s recent quarters 10k and 10q files to text format	1. Text Extraction Tool	Amazon Textract AWS Lambda Amazon S3	`The text files containing Amazon's financial filings have been cleaned and are ready for analysis.`
Prompt the LangChain agent to transcribe the audio recordings to texts	`Can you transcribe most recent quaterly audio recording of earnings call to text format?`	Transcribe Audio Tool	Amazon Transcribe AWS Lambda Amazon S3	`I have the earnings call transcripts, now I can extract key phrases.`
Prompt the LangChain agent to analyze the company’s financial health and performance	`Analyze Amazon's financial health based on the 10k report. Calculate important financial ratios. Please limit your answer to 5 sentences.`	Financial Information Lookup Tool Python Calculation Tool	Amazon Kendra AWS Lambda Amazon S3	Here is a summary of Amazon’s financial health in 5 sentences: Amazon has a low profit margin indicating low profitability relative to revenue. However, Amazon has a high return on assets indicating efficient use of assets to generate profits. Amazon has a low debt-to-equity ratio indicating low financial leverage and risk. Amazon has a high current ratio indicating good short-term liquidity position and ability to pay off short-term obligations. Overall, Amazon seems to be in a good financial position with efficient operations and low risk.
Prompt the LangChain agent to analyze the company’s financial performance from company’s earnings reports	`What were the main drivers of the company’s revenue growth in the quarter? Please limit your answers to 5 sentences.`	Detect Phrases Tool Financial Information Lookup Tool	Amazon Comprehend Amazon Kendra AWS Lambda Amazon S3	Here is a summary of the main drivers of revenue growth in 5 sentences: Amazon's revenue grew due to an increase in subscription and support revenue from new customers. Existing customer accounts also expanded, contributing to revenue growth. Higher net sales drove increased revenue for the company. Although I could not calculate the exact revenue growth percentage, the key phrases and information extracted point to revenue growth being driven primarily by new subscriptions, existing customer expansion, and greater net sales.
Prompt the LangChain agent to analyze the company’s management team and their plans for the future from company’s earnings reports	`What are the company's top priorities for the coming year? What are the biggest risks facing the company? Please limit your answers to 5 sentences.`	Sentiment Analysis Tool Detect Phrases Tool Financial Information Lookup Tool	Amazon Comprehend Amazon Kendra AWS Lambda Amazon S3	Here is a summary response in 5 sentences: Amazon's priorities are investing in growth and expanding into new markets. However, risks like changes in consumer demand and competition threaten their business. By analyzing key phrases in financial reports, their focus on fulfillment, technology, and new products emerges, as do risks around suppliers, exchange rates, and retail competition. Though facing pressures, Amazon aims to innovate, improve customer experience, and strengthen their position. Continuous progress on price, speed, and selection will drive their competitive advantage.
Prompt the LangChain agent to analyze the company’s strategy from company’s earnings reports	`What are company’s financial and business goals for upcoming financial year/quarter? Please limit your answers to 5 sentences.`	Financial Information Lookup Tool Detect Phrases Tool	Amazon Comprehend Amazon Kendra AWS Lambda Amazon S3	`Here is a summary of Amazon's main goals in 5 sentences: Amazon seeks to grow by improving infrastructure and entering new markets. They will invest in fulfillment and technology to enable expansion. By moving into groceries, healthcare, and entertainment, Amazon can diversify their offerings. Focusing on the customer through competitive pricing, selection, and innovation is key. If Amazon achieves operational excellence and continuous progress, they will strengthen their position.`
Prompt the LangChain agent to perform sentiment analysis on the earnings call recordings	`What is the sentiment around inflation in Amazon's earnings call?`	Sentiment Analysis Tool	Amazon Comprehend AWS Lambda Amazon S3	`The sentiment of mentions of 'inflation' in the Amazon earnings call transcript is negative. Justification: Quote 1: "We're seeing inflationary pressures, especially in fuel and transportation costs." Quote 2: "Inflation is a concern and we're seeing cost inflation, which is an unusual thing for us.`
Prompt the LangChain agent to build an optimal portfolio using the collected data	`What are the closing prices of stocks AAAA, WWW, DDD in year 2018? Can you build an optimized portfolio using these three stocks?`	Portfolio Optimization Tool Stocks Querying Tool	Amazon SageMaker Amazon Athena	`The closing stock prices for AAAA, WWW and DDD in 2018 are:` `AAAA: $172.44` `WWW: $85.91` `DDD: $9.82The optimized portfolio allocation for stocks AAAA, WWW and DDD is:` `AAAA: 55%` `WWW: 45%` `DDD: 0%'`

About the Authors

Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.

Mohan Musti is Senior Technical Account Manger based out of Dallas. Mohan helps customers architect and optimize applications on AWS. Mohan has Computer Science and Engineering from JNT University ,India. In his spare time, he enjoys spending time with his family and camping.

Jia (Vivian) Li is a Senior Solutions Architect in AWS, with specialization in AI/ML. She currently supports customers in financial industry. Prior to joining AWS in 2022, she had 7 years of experience supporting enterprise customers use AI/ML in the cloud to drive business results. Vivian has a BS from Peking University and a PhD from University of Southern California. In her spare time, she enjoys all the water activities, and hiking in the beautiful mountains in her home state, Colorado.

Uchenna Egbe is an AIML Solutions Architect who enjoys building reusable AIML solutions. Uchenna has an MS from the University of Alaska Fairbanks. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.

Navneet Tuteja is a Data Specialist at Amazon Web Services. Before joining AWS, Navneet worked as a facilitator for organizations seeking to modernize their data architectures and implement comprehensive AI/ML solutions. She holds an engineering degree from Thapar University, as well as a master’s degree in statistics from Texas A&M University.

Praful Kava is a Sr. Specialist Solutions Architect at AWS. He guides customers to design and engineer Cloud scale Analytics pipelines on AWS. Outside work, he enjoys travelling with his family and exploring new hiking trails.

How VirtuSwap accelerates their pandas-based trading simulations with an Amazon SageMaker Studio custom container and AWS GPU instances

September 19, 2023

by Adir Sharabi Amazon AWS

This post is written in collaboration with Dima Zadorozhny and Fuad Babaev from VirtuSwap.

VirtuSwap is a startup company developing innovative technology for decentralized exchange of assets on blockchains. VirtuSwap’s technology provides more efficient trading for assets that don’t have a direct pair between them. The absence of a direct pair leads to costly indirect trading, meaning that two or more trades are required to complete a desired swap, leading to double or triple trading costs. VirtuSwap’s Reserve-based Virtual Pools technology solves the problem by making every trade direct, saving up to 50% of trading costs. Read more at virtuswap.io.

In this post, we share how VirtuSwap used the bring-your-own-container feature in Amazon SageMaker Studio to build a robust environment to host their GPU-intensive simulations to solve linear optimization problems.

The challenge

The VirtuSwap Minerva engine creates recommendations for optimal distribution of liquidity between different liquidity pools, while taking into account multiple parameters, such as trading volumes, current market liquidity, and volatilities of traded assets, constrained by a total amount of liquidity available for distribution. To provide these recomndations, VirtuSwap Minerva uses thousands of historical trading pairs to simulate their run through various liquidity configurations to find the optimal distribution of liquidity, pool fees, and more.

The initial implementation was coded using pandas dataframes. However, as the simulation data grew, the runtime nearly quadrupled, along with the size of the problem. The result of this was that iterations slowed down and it was almost impossible to run larger dimensionality tasks. VirtuSwap realized that they needed to use GPU instances for the simulation to allow faster results.

VirtuSwap needed a GPU-compatible pandas-like library to run their simulation and chose cuDF, a GPU DataFrame library by Rapids. cuDF is used for loading, joining, aggregating, filtering, and otherwise manipulating data, in a pandas-like API that accelerates the work on dataframes, using CUDA for significantly faster performance than pandas.

Solution overview

VirtuSwap chose SageMaker Studio for end-to-end development, starting with iterative, interactive development in notebooks. Due to the flexibility of SageMaker Studio, they decided to use it for their simulation as well, taking advantage of Amazon SageMaker custom images, which allow VirtuSwap to bring their own custom libraries and software needed, such as cuDF. The following diagram illustrates the solution workflow.

In the following sections, we share the step-by-step instructions to build and use a Rapids cuDF image in SageMaker.

Prerequisites

To run this step-by-step guide, you need an AWS account with permissions to SageMaker, Amazon Elastic Container Registry (Amazon ECR), AWS Identity and Access Management (IAM), and AWS CodeBuild. In addition, you need to have a SageMaker domain ready.

Create IAM roles and policies

For the build process of SageMaker custom notebooks, we used AWS CloudShell, which provides all the required packages to build the custom image. In CloudShell, we used SageMaker Docker Build, a CLI for building Docker images for and in SageMaker Studio. The CLI can create the repository in Amazon ECR and build the container using CodeBuild. For that, we need to provide the tool an IAM role with proper permissions. Complete the following steps:

Sign in to the AWS Management Console and open the IAM console.
In the navigation pane on the left, choose Policies.

Create a policy named sm-build-policy with the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "codebuild:DeleteProject",
                "codebuild:CreateProject",
                "codebuild:BatchGetBuilds",
                "codebuild:StartBuild"
            ],
            "Resource": "arn:aws:codebuild:*:*:project/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogStream",
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:GetLogEvents",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*:log-stream:*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:UploadLayerPart",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "arn:aws:ecr:*:*:repository/sagemaker-studio*"
        },
        {
            "Sid": "ReadAccessToPrebuiltAwsImages",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer"
            ],
            "Resource": [
                "arn:aws:ecr:*:763104351884:repository/*",
                "arn:aws:ecr:*:217643126080:repository/*",
                "arn:aws:ecr:*:727897471807:repository/*",
                "arn:aws:ecr:*:626614931356:repository/*",
                "arn:aws:ecr:*:683313688378:repository/*",
                "arn:aws:ecr:*:520713654638:repository/*",
                "arn:aws:ecr:*:462105765813:repository/*"
            ]
        },
        {
            "Sid": "EcrAuthorizationTokenRetrieval",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::sagemaker-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "arn:aws:s3:::sagemaker*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:ListRoles"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "codebuild.amazonaws.com"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:UploadLayerPart",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "arn:aws:ecr:*:*:repository/*"
        }
    ]
}

The permissions provide the ability to utilize the utility in full: create repositories, create a CodeBuild job, use Amazon Simple Storage Service (Amazon S3), and send logs to Amazon CloudWatch.

Create a role named sm-build-role with the following trust policy, and add the policy sm-build-policy that you created earlier:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "codebuild.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Now, let’s review the steps in CloudShell.

Create a cuDF Docker image in CloudShell

For our purposes, we needed a Rapids CUDA image, which also includes an ipykernel, so that the image can be used in a SageMaker Studio notebook.

We use an existing CUDA image by RapidsAI that is available in the official Rapids AI Docker hub, and add the ipykernel installation.

In a CloudShell terminal, run the following command:

printf "FROM nvcr.io/nvidia/rapidsai/rapidsai:0.16-cuda10.1-base-ubuntu18.04
RUN pip install ipykernel && 
python -m ipykernel install --sys-prefix &&  
useradd --create-home --shell /bin/bash --gid 100 --uid 1000 sagemaker-user
USER sagemaker-user" > Dockerfile

This will create the Dockerfile that will build our custom Docker image for SageMaker.

Build and push the image to a repository

As mentioned, we used the SageMaker Docker Build library, which allows data scientists and developers to easily build custom container images. For more information, refer to Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks.

The following command creates an ECR repository (if the repository doesn’t exist). sm-docker will create it, and build and push the new Docker image to the created repository:

sm-docker build . --repository rapids:v1 --role sm-build-role

In case you are missing sm-docker in your CloudShell, run the following code:

pip3 install sagemaker-studio-image-build

On completion, the ECR image URI will be returned.

Create a SageMaker custom image

After you have created a custom Docker image and pushed it to your container repository (Amazon ECR), you can configure SageMaker to use that custom Docker image. Complete the following steps:

On the SageMaker console, choose Images in the navigation pane.
Choose Create image.
Enter the image URI output from the previous section, then choose Next.
For Image name and Image display name, enter rapids.
For Description, enter a description.
For IAM role, choose the proper IAM role for your SageMaker domain.
For EFS mount path, enter /home/sagemaker-user (default).
Expand Advanced configuration.
For User ID, enter 1000.
For Group ID, enter 100.

In the Image type section, select SageMaker Studio Image.
Choose Add kernel.
For Kernel name, enter conda-env-rapids-py.
For Kernel display name, enter rapids.
Choose Submit to create the SageMaker image.

Attach the new image to your SageMaker Studio domain

Now that you have created the custom image, you need to make it available to use by attaching the image to your domain. Complete the following steps:

On the SageMaker console, choose Domains in the navigation pane.
Choose your domain. This step is optional; you can create and attach the custom image directly from the domain and skip this step.

On the domain details page, choose the Environment tab, then choose Attach image.
Select Existing image and select the new image (rapids) from the list.
Choose Next.

Review the custom image configuration and make sure to set Image type as SageMaker Studio Image, as in the previous step, with the same kernel name and kernel display name.
Choose Submit.

The custom image is now available in SageMaker Studio and ready for use.

Create a new notebook with the image

For instructions to launch a new notebook, refer to Launch a custom SageMaker image in Amazon SageMaker Studio. Complete the following steps:

On the SageMaker Studio console, choose Open launcher.
Choose Change environment.

For Image, choose the newly created image, rapids v1.
For Kernel, choose rapids.
For Instance type¸ choose your instance.

SageMaker Studio provides the option to customize your computing power by choosing an instance from the AWS accelerated compute, general purpose compute, compute optimized, or memory optimized families. This flexibility allowed you to seamlessly transition between CPUs and GPUs, as well as dynamically scale up or down the instance sizes as needed. For our notebook, we used the ml.g4dn.2xlarge instance type to test cuDF performance while utilizing GPU accelerator.

Choose Select.

Select your environment and choose Create notebook, then wait until the notebook kernel becomes ready.

Validate your custom image

To validate that your custom image was launched and cuDF is ready to use, create a new cell, enter import cudf, and run it.

Clean up

Power off the Jupyter instance running the test notebook in SageMaker Studio by choosing Running Terminals and Kernels and powering off the running instance.

Runtime comparison results

We conducted a runtime comparison of our code using both CPU and GPU on SageMaker g4dn.2xlarge instances, with a time complexity of O(N). The results, as shown in the following figure, reveal the efficiency of using GPUs over CPUs.

The main advantage of GPUs lies in their ability to perform parallel processing. As we increase the value of N, the runtime on CPUs increases at a rate of 3N. On the other hand, with GPUs, the rate of increase can be described as 2N, as illustrated in the preceding figure. The larger the problem size, the more efficient the GPU becomes. In our case, using a GPU was at least 20 times faster than using a CPU. This highlights the growing importance of GPUs in modern computing, especially for tasks that require large amounts of data to be processed quickly.

With SageMaker GPU instances, VirtuSwap is able to dramatically increase the dimensionality of the solved problems and find solutions faster.

Conclusion

In this post, we showed how VirtuSwap customized SageMaker Studio by using a custom image to solve a complex problem. With the ability to easily change the run environment and switch between different instances, sizes, and kernels, VirtuSwap was able to experiment fast and speed up the runtime by 15x and deliver a scalable solution.

As a next step, VirtuSwap is considering broadening their usage of SageMaker and running their processing in Amazon SageMaker Processing to process the massive data they’re collecting from various blockchains into their platform.

About the Authors

Adir Sharabi is a Principal Solutions Architect with Amazon Web Services. He works with AWS customers to help them architect secure, resilient, scalable and high performance applications in the cloud. He is also passionate about Data and helping customers to get the most out of it.

Omer Haim is a Senior Startup Solutions Architect at Amazon Web Services. He helps startups with their cloud journey, and is passionate about containers and ML. In his spare time, Omer likes to travel, and occasionally game with his son.

Dmitry Zadorozhny is a data analyst at virtuswap.io. He is responsible for data mining, processing and storage, as well as integrating cloud services such as AWS. Prior to joining virtuswap, he worked in the data science field and was an analytics ambassador lead at dydx foundation. Dima has a M.Sc in Computer Science. Dima enjoys playing computer games in his spare time.

Fuad Babaev serves as a Data Science Specialist at Virtuswap (virtuswap.io). He brings expertise in tackling complex optimization challenges, crafting simulations, and architecting models for trade processes. Outside of his professional career Fuad has a passion in playing chess.

Problem

Solution overview

Data

Solution components

ML framework

Auto-labeling pipeline

Cost-efficiency

Final solution architecture

Conclusion

About the Authors

Generative AI problem framing

Model customization

Base model selection

Prompt engineering

Retrieval Augmented Generation

Parameter-Efficient Fine-Tuning

Fine-tuning

Model training from scratch

Model inference and deployment

Resource usage monitoring and optimization

Conclusion

About the Authors

Key concepts

Solution overview

Prerequisites

Train a model with PyTorch

Install dependencies

Complete initial setup

Create the training dataset

Create the training script

Train the model

Export the trained model as a ONNX model

Download and extract the model artifacts

Validate the ONNX model

Package the model and inference code

List dependencies

Write inference code

Deploy the model to Azure Functions

Test the model

Clean up

Conclusion

About the authors

Solution overview

Technical architecture

Prerequisites

Key components of a multi-modal agent

Functions defined for tools of the multi-modal agent

Tools defined for the multi-modal agent

Long-term memory for the multi-modal agent

Planner-executor based multi-modal agent

Example scenarios based on questions asked by financial analyst

Scenario 1: Questions by financial analyst related to structured data

Scenario 2: Questions by financial analyst related to unstructured data

Dive deeper into the solution

Clean up

Conclusion

Appendix

About the Authors

The challenge

Solution overview

Prerequisites

Create IAM roles and policies

Create a cuDF Docker image in CloudShell

Build and push the image to a repository

Create a SageMaker custom image

Attach the new image to your SageMaker Studio domain

Create a new notebook with the image

Validate your custom image

Clean up

Runtime comparison results

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.