September 2024 – Page 6

Speculative Streaming: Fast LLM Inference Without Auxiliary Models

Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary draft model. While effective, in application-specific settings, it often involves fine-tuning both draft and target models to achieve high acceptance rates. As the number of downstream tasks grows, these draft models add significant complexity to inference systems. We propose Speculative Streaming, a single-model speculative decoding method that fuses drafting into the target model by changing the fine-tuning objective from next token prediction to…Apple Machine Learning Research

High-Speed AI: Hitachi Rail Advances Real-Time Railway Analysis Using NVIDIA Technology

Hitachi Rail, a global transportation company powering railway systems in over 50 countries, is integrating NVIDIA AI technology to lower maintenance costs for rail operators, reduce train idling time and improve transit reliability for passengers.

The company is adopting NVIDIA IGX — an industrial-grade, enterprise-level platform that delivers high-bandwidth sensor processing, powerful AI compute, functional safety capabilities and enterprise security — into its new HMAX platform to process sensor and camera data in real time.

By removing the lag time between data collection and analysis, the HMAX platform will enable Hitachi Rail clients to more quickly detect train tracks that need repair, monitor the degradation of overhead power lines and assess the health of trains and signaling equipment.

Hitachi Rail estimates that proactive maintenance costs around 7x less than emergency repairs done after infrastructure fails unexpectedly. Its existing AI monitoring systems are already reducing service delays by up to 20% and train maintenance costs by up to 15% — and are cutting down energy consumption by decreasing fuel costs at train depots by up to 40%.

With real-time analysis using NVIDIA IGX and NVIDIA Holoscan platform for sensor processing, the company aims to further increase these savings.

“Using previous digital monitoring systems, it would take a few days to process the data and discover issues that need attention,” said Koji Agatsuma, executive director and chief technology officer of rail vehicles at Hitachi Rail. “If we can instead conduct real-time prediction using NVIDIA technology, that enables us to avoid service disruptions and significantly improve safety, reliability and operating costs.”

NVIDIA IGX Powers Real-Time AI Engine

Building on its existing collection of HMAX applications — which are currently running on data from 8,000 train cars on 2,000 trains — Hitachi Rail has used NVIDIA IGX and the NVIDIA AI Enterprise software platform to create new accelerated AI applications to help operators monitor train fleets and infrastructure. NVIDIA AI Enterprise offers tools, pretrained models and application frameworks to streamline the development and deployment of production-grade AI applications.

These applications, available soon through the HMAX platform, can be used by the company’s international customer base to process huge quantities of data streaming from sensors onboard trains, taken from existing systems or imported from third-party software already in use by the customer.

In the U.K., for example, each Hitachi Rail train has sensors that report nearly 50,000 data points as frequently as every fifth of a second. AI infrastructure that keeps pace with this data flow can send train operators timely alerts when a component of a train or rail line needs maintenance. The AI insights can also be accessed through a chatbot interface, helping operators easily identify trends and opportunities to optimize maintenance schedules and more.

“If a potential issue isn’t identified and fixed promptly, it can result in a service disruption that causes significant economic loss for our customers and impacts the passengers who rely on these transit lines,” Agatsuma said. “NVIDIA AI infrastructure has enabled us to get immediate alerts on thousands of miles of railway for the first time, which we anticipate will reduce delays and disruptions to passenger travel.”

Driving Benefits Down the Track

The opportunities go beyond monitoring trains and tracks.

By mounting cameras atop trains, Hitachi Rail can monitor power lines overhead to identify degrading electric cables and help prevent disruptive failures. Traditionally, it takes up to 10 days to process one day’s worth of video data collected by the train. With NVIDIA-accelerated sensor processing, data can be processed in real time at the edge, sending only relevant information back to operational control centers for analysis and action.

Learn more about the NVIDIA IGX platform.

Main image courtesy of Hitachi Rail.

Enhancing Just Walk Out technology with multi-modal AI

Since its launch in 2018, Just Walk Out technology by Amazon has transformed the shopping experience by allowing customers to enter a store, pick up items, and leave without standing in line to pay. You can find this checkout-free technology in over 180 third-party locations worldwide, including travel retailers, sports stadiums, entertainment venues, conference centers, theme parks, convenience stores, hospitals, and college campuses. Just Walk Out technology’s end-to-end system automatically determines which products each customer chose in the store and provides digital receipts, eliminating the need for checkout lines.

In this post, we showcase the latest generation of Just Walk Out technology by Amazon, powered by a multi-modal foundation model (FM). We designed this multi-modal FM for physical stores using a transformer-based architecture similar to that underlying many generative artificial intelligence (AI) applications. The model will help retailers generate highly accurate shopping receipts using data from multiple inputs including a network of overhead video cameras, specialized weight sensors on shelves, digital floor plans, and catalog images of products. To put it in plain terms, a multi-modal model means using data from multiple inputs.

Our research and development (R&D) investments in state-of-the-art multi-modal FMs enables the Just Walk Out system to be deployed in a wide range of shopping situations with greater accuracy and at lower cost. Similar to large language models (LLMs) that generate text, the new Just Walk Out system is designed to generate an accurate sales receipt for every shopper visiting the store.

The challenge: Tackling complicated long-tail shopping scenarios

Because of their innovative checkout-free environment, Just Walk Out stores presented us with a unique technical challenge. Retailers and shoppers as well as Amazon demand nearly 100 percent checkout accuracy, even in the most complex shopping situations. These include unusual shopping behaviors that can create a long and complicated sequence of activities requiring additional effort to analyze what happened.

Previous generations of the Just Walk Out system utilized a modular architecture; it tackled complex shopping situations by breaking down the shopper’s visit into discrete tasks, such as detecting shopper interactions, tracking items, identifying products, and counting what is selected. These individual components were then integrated into sequential pipelines to enable the overall system functionality. While this approach produced highly accurate receipts, significant engineering efforts are required to address challenges in new, previously unencountered situations and complex shopping scenarios. This limitation restricted the scalability of this approach.

The solution: Just Walk Out multi-modal AI

To meet these challenges, we introduced a new multi-modal FM that we designed specifically for retail store environments, enabling Just Walk Out technology to handle complex real-world shopping scenarios. The new multi-modal FM further enhances the Just Walk Out system’s capabilities by generalizing more effectively to new store formats, products, and customer behaviors, which is crucial for scaling up Just Walk Out technology.

The incorporation of continuous learning enables the model training to automatically adapt and learn from new challenging scenarios as they arise. This self-improving capability helps ensure the system maintains high performance, even as shopping environments continue to evolve.

Through this combination of end-to-end learning and enhanced generalization, the Just Walk Out system can tackle a wider range of dynamic and complex retail settings. Retailers can confidently deploy this technology, knowing it will provide a frictionless checkout-free experience for their customers.

The following video shows our system’s architecture in action.

Key elements of our Just Walk Out multi-modal AI model include:

Flexible data inputs –The system tracks how users interact with products and fixtures, such as shelves or fridges. It primarily relies on multi-view video feeds as inputs, using weight sensors solely to track small items. The model maintains a digital 3D representation of the store and can access catalog images to identify products, even if the shopper returns items to the shelf incorrectly.
Multi-modal AI tokens to represent shoppers’ journeys – The multi-modal data inputs are processed by the encoders, which compress them into transformer tokens, the basic unit of input for the receipt model. This allows the model to interpret hand movements, differentiate between items, and accurately count the number of items picked up or returned to the shelf with speed and precision.
Continuously updating receipts – The system uses tokens to create digital receipts for each shopper. It can differentiate between different shopper sessions and dynamically updates each receipt as they pick up or return items.

Training the Just Walk Out FM

By feeding vast amounts of multi-modal data into the Just Walk Out FM, we found it could consistently generate—or, technically, “predict”— accurate receipts for shoppers. To improve accuracy, we designed over 10 auxiliary tasks, such as detection, tracking, image segmentation, grounding (linking abstract concepts to real-world objects), and activity recognition. All of these are learned within a single model, enhancing the model’s ability to handle new, never-before-seen store formats, products, and customer behaviors. This is crucial for bringing Just Walk Out technology to new locations.

AI model training—in which curated data is fed to selected algorithms—helps the system refine itself to produce accurate results. We quickly discovered we could accelerate the training of our model by using a data flywheel that continuously mines and labels high-quality data in a self-reinforcing cycle. The system is designed to integrate these progressive improvements with minimal manual intervention. The following diagram illustrates the process.

To train an FM effectively, we invested in a robust infrastructure that can efficiently process the massive amounts of data needed to train high-capacity neural networks that mimic human decision-making. We built the infrastructure for our Just Walk Out model with the help of several Amazon Web Services (AWS) services, including Amazon Simple Storage Service (Amazon S3) for data storage and Amazon SageMaker for training.

Here are some key steps we followed in training our FM:

Selecting challenging data sources – To train our AI model for Just Walk Out technology, we focus on training data from especially difficult shopping scenarios that test the limits of our model. Although these complex cases constitute only a small fraction of shopping data, they are the most valuable for helping the model learn from its mistakes.
Leveraging auto labeling – To increase operational efficiency, we developed algorithms and models that automatically attach meaningful labels to the data. In addition to receipt prediction, our automated labeling algorithms cover the auxiliary tasks, ensuring the model gains comprehensive multi-modal understanding and reasoning capabilities.
Pre-training the model – Our FM is pre-trained on a vast collection of multi-modal data across a diverse range of tasks, which enhances the model’s ability to generalize to new store environments never encountered before.
Fine-tuning the model – Finally, we refined the model further and used quantization techniques to create a smaller, more efficient model that uses edge computing.

As the data flywheel continues to operate, it will progressively identify and incorporate more high-quality, challenging cases to test the robustness of the model. These additional difficult samples are then fed into the training set, further enhancing the model’s accuracy and applicability across new physical store environments.

Conclusion

In this post, we showed how our multi-modal, AI system represents significant new possibilities for Just Walk Out technology. With our innovative approach, we are moving away from modular AI systems that rely on human-defined subcomponents and interfaces. Instead, we’re building simpler and more scalable AI systems that can be trained end-to-end. Although we’ve just scratched the surface, multi-modal AI has raised the bar for our already highly accurate receipt system and will enable us to improve the shopping experience at more Just Walk Out technology stores around the world.

Visit About Amazon to read the official announcement about the new multi-modal AI system and learn more about the latest improvements in Just Walk Out technology.

To find where you can find Just Walk Out technology locations, visit Just Walk Out technology locations near you. Learn more about how to power your store or venue with Just Walk Out technology by Amazon on the Just Walk Out technology product page.

Visit Build and scale the next wave of AI innovation on AWS to learn more about how AWS can reinvent customer experiences with the most comprehensive set of AI and ML services.

About the Authors

Tian Lan is a Principal Scientist at AWS. He currently leads the research efforts in developing the next-generation Just Walk Out 2.0 technology, transforming it into an end-to-end learned, store domain–focused multi-modal foundation model.

Chris Broaddus is a Senior Manager at AWS. He currently manages all the research efforts for Just Walk Out technology, including the multi-modal AI model and other projects, such as deep learning for human pose estimation and Radio Frequency Identification (RFID) receipt prediction.

Generate synthetic data for evaluating RAG systems using Amazon Bedrock

Evaluating your Retrieval Augmented Generation (RAG) system to make sure it fulfils your business requirements is paramount before deploying it to production environments. However, this requires acquiring a high-quality dataset of real-world question-answer pairs, which can be a daunting task, especially in the early stages of development. This is where synthetic data generation comes into play. With Amazon Bedrock, you can generate synthetic datasets that mimic actual user queries, enabling you to evaluate your RAG system’s performance efficiently and at scale. With synthetic data, you can streamline the evaluation process and gain confidence in your system’s capabilities before unleashing it to the real world.

This post explains how to use Anthropic Claude on Amazon Bedrock to generate synthetic data for evaluating your RAG system. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Fundamentals of RAG evaluation

Before diving deep into how to evaluate a RAG application, let’s recap the basic building blocks of a naive RAG workflow, as shown in the following diagram.

The workflow consists of the following steps:

In the ingestion step, which happens asynchronously, data is split into separate chunks. An embedding model is used to generate embeddings for each of the chunks, which are stored in a vector store.
When the user asks a question to the system, an embedding is generated from the questions and the top-k most relevant chunks are retrieved from the vector store.
The RAG model augments the user input by adding the relevant retrieved data in context. This step uses prompt engineering techniques to communicate effectively with the large language model (LLM). The augmented prompt allows the LLM to generate an accurate answer to user queries.
An LLM is prompted to formulate a helpful answer based on the user’s questions and the retrieved chunks.

Amazon Bedrock Knowledge Bases offers a streamlined approach to implement RAG on AWS, providing a fully managed solution for connecting FMs to custom data sources. To implement RAG using Amazon Bedrock Knowledge Bases, you begin by specifying the location of your data, typically in Amazon Simple Storage Service (Amazon S3), and selecting an embedding model to convert the data into vector embeddings. Amazon Bedrock then creates and manages a vector store in your account, typically using Amazon OpenSearch Serverless, handling the entire RAG workflow, including embedding creation, storage, management, and updates. You can use the RetrieveAndGenerate API for a straightforward implementation, which automatically retrieves relevant information from your knowledge base and generates responses using a specified FM. For more granular control, the Retrieve API is available, allowing you to build custom workflows by processing retrieved text chunks and developing your own orchestration for text generation. Additionally, Amazon Bedrock Knowledge Bases offers customization options, such as defining chunking strategies and selecting custom vector stores like Pinecone or Redis Enterprise Cloud.

A RAG application has many moving parts, and on your way to production you’ll need to make changes to various components of your system. Without a proper automated evaluation workflow, you won’t be able to measure the effect of these changes and will be operating blindly regarding the overall performance of your application.

To evaluate such a system properly, you need to collect an evaluation dataset of typical user questions and answers.

Moreover, you need to make sure you evaluate not only the generation part of the process but also the retrieval. An LLM without relevant retrieved context can’t answer the user’s question if the information wasn’t present in the training data. This holds true even if it has exceptional generation capabilities.

As such, a typical RAG evaluation dataset consists of the following minimum components:

A list of questions users will ask the RAG system
A list of corresponding answers to evaluate the generation step
The context or a list of contexts that contain the answer for each question to evaluate the retrieval

In an ideal world, you would take real user questions as a basis for evaluation. Although this is the optimal approach because it directly resembles end-user behavior, this is not always feasible, especially in the early stages of building a RAG system. As you progress, you should aim for incorporating real user questions into your evaluation set.

To learn more about how to evaluate a RAG application, see Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock.

Solution overview

We use a sample use case to illustrate the process by building an Amazon shareholder letter chatbot that allows business analysts to gain insights about the company’s strategy and performance over the past years.

For the use case, we use PDF files of Amazon’s shareholder letters as our knowledge base. These letters contain valuable information about the company’s operations, initiatives, and future plans. In a RAG implementation, the knowledge retriever might use a database that supports vector searches to dynamically look up relevant documents that serve as the knowledge source.

The following diagram illustrates the workflow to generate the synthetic dataset for our RAG system.

The workflow includes the following steps:

Load the data from your data source.
Chunk the data as you would for your RAG application.
Generate relevant questions from each document.
Generate an answer by prompting an LLM.
Extract the relevant text that answers the question.
Evolve the question according to a specific style.
Filter questions and improve the dataset either using domain experts or LLMs using critique agents.

We use a model from the Anthropic’s Claude 3 model family to extract questions and answers from our knowledge source, but you can experiment with other LLMs as well. Amazon Bedrock makes this effortless by providing standardized API access to many FMs.

For the orchestration and automation steps in this process, we use LangChain. LangChain is an open source Python library designed to build applications with LLMs. It provides a modular and flexible framework for combining LLMs with other components, such as knowledge bases, retrieval systems, and other AI tools, to create powerful and customizable applications.

The next sections walk you through the most important parts of the process. If you want to dive deeper and run it yourself, refer to the notebook on GitHub.

Load and prepare the data

First, load the shareholder letters using LangChain’s PyPDFDirectoryLoader and use the RecursiveCharacterTextSplitter to split the PDF documents into chunks. The RecursiveCharacterTextSplitter divides the text into chunks of a specified size while trying to preserve context and meaning of the content. It’s a good way to start when working with text-based documents. You don’t have to split your documents to create your evaluation dataset if your LLM supports a context window that is large enough to fit your documents, but you could potentially end up with a lower quality of generated questions due to the larger size of the task. You want to have the LLM generate multiple questions per document in this case.

from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders.pdf import PyPDFLoader, PyPDFDirectoryLoader

# Load PDF documents from directory
loader = PyPDFDirectoryLoader("./synthetic_dataset_generation/")  
documents = loader.load()
# Use recursive character splitter, works better for this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Split documents into small chunks
    chunk_size = 1500,  
    # Overlap chunks to reduce cutting sentences in half
    chunk_overlap  = 100,
    separators=["nn", "n", ".", " ", ""],
)

# Split loaded documents into chunks
docs = text_splitter.split_documents(documents)

To demonstrate the process of generating a corresponding question and answer and iteratively refining them, we use an example chunk from the loaded shareholder letters throughout this post:

“page_content=''Our AWS and Consumer businesses have had different demand trajectories during the pandemic. In thenfirst year of the pandemic, AWS revenue continued to grow at a rapid clip—30% year over year (“Y oY”) in2020 on a $35 billion annual revenue base in 2019—but slower than the 37% Y oY growth in 2019. [...] This shift by so many companies (along with the economy recovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.nConversely, our Consumer revenue grew dramatically in 2020. In 2020, Amazon’s North America andnInternational Consumer revenue grew 39% Y oY on the very large 2019 revenue base of $245 billion; and,this extraordinary growth extended into 2021 with revenue increasing 43% Y oY in Q1 2021. These areastounding numbers. We realized the equivalent of three years’ forecasted growth in about 15 months.nAs the world opened up again starting in late Q2 2021, and more people ventured out to eat, shop, and travel,”

Generate an initial question

To facilitate prompting the LLM using Amazon Bedrock and LangChain, you first configure the inference parameters. To accurately extract more extensive contexts, set the max_tokens parameter to 4096, which corresponds to the maximum number of tokens the LLM will generate in its output. Additionally, define the temperature parameter as 0.2 because the goal is to generate responses that adhere to the specified rules while still allowing for a degree of creativity. This value differs for different use cases and can be determined by experimentation.

import boto3

from langchain_community.chat_models import BedrockChat

# set up a Bedrock-runtime client for inferencing large language models
boto3_bedrock = boto3.client('bedrock-runtime')
# Choosing claude 3 Haiku due to cost and performance efficiency
claude_3_haiku = "anthropic.claude-3-haiku-20240307-v1:0"
# Set-up langchain LLM for implementing the synthetic dataset generation logic

# for each model provider there are different parameters to define when inferencing against the model
inference_modifier = {
                        "max_tokens": 4096,
                        "temperature": 0.2
                    }
                                         
llm = BedrockChat(model_id = claude_3_haiku,
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

You use each generated chunk to create synthetic questions that mimic those a real user might ask. By prompting the LLM to analyze a portion of the shareholder letter data, you generate relevant questions based on the information presented in the context. We use the following sample prompt to generate a single question for a specific context. For simplicity, the prompt is hardcoded to generate a single question, but you can also instruct the LLM to generate multiple questions with a single prompt.

The rules can be adapted to better guide the LLM in generating questions that reflect the types of queries your users would pose, tailoring the approach to your specific use case.

# Create a prompt template to generate a question a end-user could have about a given context
initial_question_prompt_template = PromptTemplate(
    input_variables=["context"],
    template="""
    <Instructions>
    Here is some context:
    <context>
    {context}
    </context>

    Your task is to generate 1 question that can be answered using the provided context, following these rules:

    <rules>
    1. The question should make sense to humans even when read without the given context.
    2. The question should be fully answered from the given context.
    3. The question should be framed from a part of context that contains important information. It can also be from tables, code, etc.
    4. The answer to the question should not contain any links.
    5. The question should be of moderate difficulty.
    6. The question must be reasonable and must be understood and responded by humans.
    7. Do not use phrases like 'provided context', etc. in the question.
    8. Avoid framing questions using the word "and" that can be decomposed into more than one question.
    9. The question should not contain more than 10 words, make use of abbreviations wherever possible.
    </rules>

    To generate the question, first identify the most important or relevant part of the context. Then frame a question around that part that satisfies all the rules above.

    Output only the generated question with a "?" at the end, no other text or characters.
    </Instructions>
    
    """)

The following is the generated question from our example chunk:

What is the price-performance improvement of AWS Graviton2 chip over x86 processors?

Generate answers

To use the questions for evaluation, you need to generate a reference answer for each of the questions to test against. With the following prompt template, you can generate a reference answer to the created question based on the question and the original source chunk:

# Create a prompt template that takes into consideration the the question and generates an answer
answer_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""
    <Instructions>
    <Task>
    <role>You are an experienced QA Engineer for building large language model applications.</role>
    <task>It is your task to generate an answer to the following question <question>{question}</question> only based on the <context>{context}</context></task>
    The output should be only the answer generated from the context.

    <rules>
    1. Only use the given context as a source for generating the answer.
    2. Be as precise as possible with answering the question.
    3. Be concise in answering the question and only answer the question at hand rather than adding extra information.
    </rules>

    Only output the generated answer as a sentence. No extra characters.
    </Task>
    </Instructions>
    
    Assistant:""")

The following is the generated answer based on the example chunk:

“The AWS revenue grew 37% year-over-year in 2021.”

Extract relevant context

To make the dataset verifiable, we use the following prompt to extract the relevant sentences from the given context to answer the generated question. Knowing the relevant sentences, you can check whether the question and answer are correct.

# To check whether an answer was correctly formulated by the large language model you get the relevant text passages from the documents used for answering the questions.
source_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""Human:
    <Instructions>
    Here is the context:
    <context>
    {context}
    </context>

    Your task is to extract the relevant sentences from the given context that can potentially help answer the following question. You are not allowed to make any changes to the sentences from the context.

    <question>
    {question}
    </question>

    Output only the relevant sentences you found, one sentence per line, without any extra characters or explanations.
    </Instructions>
    Assistant:""")

The following is the relevant source sentence extracted using the preceding prompt:

“This shift by so many companies (along with the economy recovering) helped re-accelerate AWS's revenue growth to 37% Y oY in 2021.”

Refine questions

When generating question and answer pairs from the same prompt for the whole dataset, it might appear that the questions are repetitive and similar in form, and therefore don’t mimic real end-user behavior. To prevent this, take the previously created questions and prompt the LLM to modify them according to the rules and guidance established in the prompt. By doing so, a more diverse dataset is synthetically generated. The prompt for generating questions tailored to your specific use case heavily depends on that particular use case. Therefore, your prompt must accurately reflect your end-users by setting appropriate rules or providing relevant examples. The process of refining questions can be repeated as many times as necessary.

# To generate a more versatile testing dataset you alternate the questions to see how your RAG systems performs against differently formulated of questions
question_compress_prompt_template = PromptTemplate(
    input_variables=["question"],
    template="""
    <Instructions>
    <role>You are an experienced linguistics expert for building testsets for large language model applications.</role>

    <task>It is your task to rewrite the following question in a more indirect and compressed form, following these rules:

    <rules>
    1. Make the question more indirect
    2. Make the question shorter
    3. Use abbreviations if possible
    </rules>

    <question>
    {question}
    </question>

    Your output should only be the rewritten question with a question mark "?" at the end. Do not provide any other explanation or text.
    </task>
    </Instructions>
    
    """)

Users of your application might not always use your solution in the same way, for instance using abbreviations when asking questions. This is why it’s crucial to develop a diverse dataset:

“AWS rev YoY growth in ’21?”

Automate dataset generation

To scale the process of the dataset generation, we iterate over all the chunks in our knowledge base; generate questions, answers, relevant sentences, and refinements for each question; and save them to a pandas data frame to prepare the full dataset.

To provide a clearer understanding of the structure of the dataset, the following table presents a sample row based on the example chunk used throughout this post.

Chunk	Our AWS and Consumer businesses have had different demand trajectories during the pandemic. In thenfirst year of the pandemic, AWS revenue continued to grow at a rapid clip—30% year over year (“Y oY”) in2020 on a $35 billion annual revenue base in 2019—but slower than the 37% Y oY growth in 2019. […] This shift by so many companies (along with the economy recovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.nConversely, our Consumer revenue grew dramatically in 2020. In 2020, Amazon’s North America andnInternational Consumer revenue grew 39% Y oY on the very large 2019 revenue base of $245 billion; and,this extraordinary growth extended into 2021 with revenue increasing 43% Y oY in Q1 2021. These areastounding numbers. We realized the equivalent of three years’ forecasted growth in about 15 months.nAs the world opened up again starting in late Q2 2021, and more people ventured out to eat, shop, and travel,”
Question	“What was the YoY growth of AWS revenue in 2021?”
Answer	“The AWS revenue grew 37% year-over-year in 2021.”
Source Sentence	“This shift by so many companies (along with the economy recovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.”
Evolved Question	“AWS rev YoY growth in ’21?”

On average, the generation of questions with a given context of 1,500–2,000 tokens results in an average processing time of 2.6 seconds for a set of initial question, answer, evolved question, and source sentence discovery using Anthropic Claude 3 Haiku. The generation of 1,000 sets of questions and answers costs approximately $2.80 USD using Anthropic Claude 3 Haiku. The pricing page gives a detailed overview of the cost structure. This results in a more time- and cost-efficient generation of datasets for RAG evaluation compared to the manual generation of these questions sets.

Improve your dataset using critique agents

Although using synthetic data is a good starting point, the next step should be to review and refine the dataset, filtering out or modifying questions that aren’t relevant to your specific use case. One effective approach to accomplish this is by using critique agents.

Critique agents are a technique used in natural language processing (NLP) to evaluate the quality and suitability of questions in a dataset for a particular task or application using a machine learning model. In our case, the critique agents are employed to assess whether the questions in the dataset are valid and appropriate for our RAG system.

The two main metrics evaluated by the critique agents in our example are question relevance and answer groundedness. Question relevance determines how relevant the generated question is for a potential user of our system, and groundedness assesses whether the generated answers are indeed based on the given context.

groundedness_check_prompt_template = PromptTemplate(
    input_variables=["context","question"],
    template="""
    <Instructions>
    You will be given a context and a question related to that context.

    Your task is to provide an evaluation of how well the given question can be answered using only the information provided in the context. Rate this on a scale from 1 to 5, where:

    1 = The question cannot be answered at all based on the given context
    2 = The context provides very little relevant information to answer the question
    3 = The context provides some relevant information to partially answer the question 
    4 = The context provides substantial information to answer most aspects of the question
    5 = The context provides all the information needed to fully and unambiguously answer the question

    First, read through the provided context carefully:

    <context>
    {context}
    </context>

    Then read the question:

    <question>
    {question}
    </question>

    Evaluate how well you think the question can be answered using only the context information. Provide your reasoning first in an <evaluation> section, explaining what relevant or missing information from the context led you to your evaluation score in only one sentence.

    Provide your evaluation in the following format:

    <rating>(Your rating from 1 to 5)</rating>
    
    <evaluation>(Your evaluation and reasoning for the rating)</evaluation>


    </Instructions>
    
    """)

relevance_check_prompt_template = PromptTemplate(
    input_variables=["question"],
    template="""
    <Instructions>
    You will be given a question related to Amazon Shareholder letters. Your task is to evaluate how useful this question would be for a financial and business analyst working in wallstreet.

    To evaluate the usefulness of the question, consider the following criteria:

    1. Relevance: Is the question directly relevant to your work? Questions that are too broad or unrelated to this domain should receive a lower rating.

    2. Practicality: Does the question address a practical problem or use case that analysts might encounter? Theoretical or overly academic questions may be less useful.

    3. Clarity: Is the question clear and well-defined? Ambiguous or vague questions are less useful.

    4. Depth: Does the question require a substantive answer that demonstrates understanding of financial topics? Surface-level questions may be less useful.

    5. Applicability: Would answering this question provide insights or knowledge that could be applied to real-world company evaluation tasks? Questions with limited applicability should receive a lower rating.

    Provide your evaluation in the following format:

    <rating>(Your rating from 1 to 5)</rating>
    
    <evaluation>(Your evaluation and reasoning for the rating)</evaluation>

    Here is the question:

    {question}
    </Instructions>
    """)

Evaluating the generated questions helps with assessing the quality of a dataset and eventually the quality of the evaluation. The generated question was rated very well:

Groundedness score: 5
“The context provides the exact information needed to answer the question[...]”

Relevance score: 5
“This question is highly relevant and useful for a financial and business analyst working on Wall Street. AWS (Amazon Web Services) is a key business segment for Amazon, and understanding its year-over-year (YoY) revenue growth is crucial for evaluating the company's overall performance and growth trajectory.[...].

Best practices for generating synthetic datasets

Although generating synthetic datasets offers numerous benefits, it’s essential to follow best practices to maintain the quality and representativeness of the generated data:

Combine with real-world data – Although synthetic datasets can mimic real-world scenarios, they might not fully capture the nuances and complexities of actual human interactions or edge cases. Combining synthetic data with real-world data can help address this limitation and create more robust datasets.
Choose the right model – Choose different LLMs for dataset creation than used for RAG generation in order to avoid self-enhancement bias.
Implement robust quality assurance – You can employ multiple quality assurance mechanisms, such as critique agents, human evaluation, and automated checks, to make sure the generated datasets meet the desired quality standards and accurately represent the target use case.
Iterate and refine – You should treat synthetic dataset generation as an iterative process. Continuously refine and improve the process based on feedback and performance metrics, adjusting parameters, prompts, and quality assurance mechanisms as needed.
Domain-specific customization – For highly specialized or niche domains, consider fine-tuning the LLM (such as with PEFT or RLHF) by injecting domain-specific knowledge to improve the quality and accuracy of the generated datasets.

Conclusion

The generation of synthetic datasets is a powerful technique that can significantly enhance the evaluation process of your RAG system, especially in the early stages of development when real-world data is scarce or difficult to obtain. By taking advantage of the capabilities of LLMs, this approach enables the creation of diverse and representative datasets that accurately mimic real human interactions, while also providing the scalability necessary to meet your evaluation needs.

Although this approach offers numerous benefits, it’s essential to acknowledge its limitations. Firstly, the quality of the synthetic dataset heavily relies on the performance and capabilities of the underlying language model, knowledge retrieval system, and quality of prompts used for generation. Being able to understand and adjust the prompts for generation is crucial in this process. Biases and limitations present in these components may be reflected in the generated dataset. Additionally, capturing the full complexity and nuances of real-world interactions can be challenging because synthetic datasets may not account for all edge cases or unexpected scenarios.

Despite these limitations, generating synthetic datasets remains a valuable tool for accelerating the development and evaluation of RAG systems. By streamlining the evaluation process and enabling iterative development cycles, this approach can contribute to the creation of better-performing AI systems.

We encourage developers, researchers, and enthusiasts to explore the techniques mentioned in this post and the accompanying GitHub repository and experiment with generating synthetic datasets for your own RAG applications. Hands-on experience with this technique can provide valuable insights and contribute to the advancement of RAG systems in various domains.

About the Authors

Johannes Langer is a Senior Solutions Architect at AWS, working with enterprise customers in Germany. Johannes is passionate about applying machine learning to solve real business problems. In his personal life, Johannes enjoys working on home improvement projects and spending time outdoors with his family.

Lukas Wenzel is a Solutions Architect at Amazon Web Services in Hamburg, Germany. He focuses on supporting software companies building SaaS architectures. In addition to that, he engages with AWS customers on building scalable and cost-efficient generative AI features and applications. In his free-time, he enjoys playing basketball and running.

David Boldt is a Solutions Architect at Amazon Web Services. He helps customers build secure and scalable solutions that meet their business needs. He is specialized in machine learning to address industry-wide challenges, using technologies to drive innovation and efficiency across various sectors.

Making traffic lights more efficient with Amazon Rekognition

State and local agencies spend approximately $1.23 billion annually to operate and maintain signalized traffic intersections. On the other end, traffic congestion at intersections costs drivers about $22 billion annually. Implementing an artificial intelligence (AI)-powered detection-based solution can significantly mitigate congestion at intersections and reduce operation and maintenance costs. In this blog post, we show you how Amazon Rekognition (an AI technology) can mitigate congestion at traffic intersections and reduce operations and maintenance costs.

State and local agencies rely on traffic signals to facilitate the safe flow of traffic involving cars, pedestrians, and other users. There are two main types of traffic lights: fixed and dynamic. Fixed traffic lights are timed lights controlled by electro-mechanical signals that switch and hold the lights based on a set period of time. Dynamic traffic lights are designed to adjust based on traffic conditions by using detectors both underneath the surface of the road and above the traffic light. However, as population continues to rise, there are more cars, bikes, and pedestrians using the streets. This increase in road users can negatively impact the efficiency of either of the two traffic systems.

Solution overview

At a high level, our solution uses Amazon Rekognition to automatically detect objects (cars, bikes, and so on) and scenes at an intersection. After detection, Amazon Rekognition creates bounding boxes around each object (such as a vehicle) and calculates the distance between each object (in this scenario, that would be the distance between vehicles detected at an intersection). Results from the calculated distances are used programmatically to stop or allow the flow of traffic, thus reducing congestion. All of this happens without human intervention.

Prerequisties

The proposed solution can be implemented in a personal AWS environment using the code that we provide. However, there are a few prerequisites that must in place. Before running the labs in this post, ensure you have the following:

An AWS account. Create one if necessary.
The appropriate AWS Identity and Access Management (IAM) permissions to access services used in the lab. If this is your first time setting up an AWS account, see the IAM documentation for information about configuring IAM.
A SageMaker Studio Notebook. Create one if necessary.

Solution architecture

The following diagram illustrates the lab’s architecture:

This solution uses the following AI and machine learning (AI/ML), serverless, and managed technologies:

Amazon SageMaker, a fully managed machine learning service that enables data scientists and developers to build, train and deploy machine learning applications.
Amazon Rekognition supports adding image and video analysis to your applications.
IAM grants authentication and authorization that allows resources in the solution to talk to each other.

To recap how the solution works

Traffic intersection video footage is uploaded to your SageMaker environment from an external device.
A Python function uses CV2 to split the video footage into image frames.
The function makes a call to Amazon Rekognition when the image frames are completed.
Amazon Rekognition analyzes each frame and creates bounding boxes around each vehicle it detects.
The function counts the bounding boxes and changes the traffic signal based on the number of cars it detects using pre-defined logic.

Solution walkthrough

Now, let’s walk through implementing the solution.

Configure SageMaker:

Choose Domains in the navigation pane, and then select your domain name.
Find and copy the SageMaker Execution Role.
Go to the IAM console and choose Roles in the navigation pane and paste the SageMaker Execution Role you copied in the preceding step.

Enable SageMaker to interact with Amazon Rekognition:

Next, enable SageMaker to interact with Amazon Rekognition using the SageMaker execution role.

In the SageMaker console, select your SageMaker execution role and choose Add permission and then choose Attach policies.
In the search bar, enter and select AmazonRekognitionFullAccess Policy. See the following figure.

With the IAM permissions configured, you can run the notebook in SageMaker with access to Amazon Rekognition for the video analysis.

Download the Rekognition Notebook and traffic intersection data to your local environment. On the Amazon Sagemaker Studio, upload the notebook and data you downloaded.

Code walkthrough:

This lab uses OpenCv and Boto3 to prepare the SageMaker environment. OpenCv is an open source library with over 250 algorithms for computer vision analysis. Boto3 is the AWS SDK for Python that helps you to integrate AWS services with applications or scripts written in Python.

First, we import OpenCv and Boto3 package. The next cell of codes builds a function for analyzing the video. We will walk through key components of the function. The function starts by creating a frame for the video to be analyzed.
The frame is written to a new video writer file with an MP4 extension. The function also loops through the file and, if the video doesn’t have a frame, the function converts it to a JPEG file. Then the code define and identify traffic lanes using bounding boxes. Amazon Rekognition image operations place bounding boxes around images detected for later analysis.
The function captures the video frame and sends it to Amazon Rekognition to analyze images in the video using the bounding boxes. The model uses bounding boxes to detect and classify captured images (cars, pedestrians, and so on) in the video. The code then detects whether a car is in the video sent to Amazon Rekognition. A bounding box is generated for each car detected in the video.
The size and position of the car is computed to accurately detect its position. After computing the size and position of the car, the model checks whether the car is in a detected lane. After determining whether there are cars in one of the detected lanes, the model counts the numbers of detected cars in the lane.
The results from detecting and computing the size, position and numbers of cars in a lane are written to a new file in the rest of the function.
Writing the outputs to a new file, a few geometry computations are done to determine the details of detected objects. For example, polygons are used to determine the size of objects.
With the function completely built, the next step is running the function and with a minimum confidence sore of 95% using a test video.
The last line of codes allow you to download the video from the directory in SageMaker to check the results and confidence level of the output.

Costs

The logic behind our cost estimates is put at $6,000 per intersection with the assumption one frame per second using four cameras with a single SageMaker notebook for each intersection. One important callout is that not every intersection is a 4-way intersection. Implementing this solution on more populated traffic areas will increase the overall flow of traffic.

Cost breakdown and details

Service	Description	First month cost	First 12 months cost
Amazon SageMaker Studio notebooks	· Instance name: ml.t3.medium · Number of data scientists: 1 · Number of Studio notebook instances per data scientist: 1 · Studio notebook hours per day: 24 · Studio notebook days per month: 30	$36	$432
Amazon Rekognition	Number of images processed with labels API calls per month: 345,600 per month	$345.60	$4,147.20
Amazon Simple Storage Service (Amazon S3) (Standard storage class)	· S3 Standard storage: 4,320 GB per month · PUT, COPY, POST, and LIST requests to S3 Standard per month: 2,592,000	$112.32	$1,347.84
Total estimate per year			$5,927.04

However, this is an estimate, and you may incur additional costs depending on customization. For additional information on costs, visit the AWS pricing page for the services covered in the solution architecture. If you have questions, reach out to the AWS team for a more technical and focused discussion.

Clean up

Delete all AWS resources created for this solution that are no longer needed to avoid future charges.

Conclusion

This post provides a solution to make traffic lights more efficient using Amazon Rekognition. The solution proposed in this post can mitigate costs, support road safety, and reduce congestion at intersections. All of these make traffic management more efficient. We strongly recommend learning more about how Amazon Rekognition can help accelerate other image recognition and video analysis tasks by visiting the Amazon Rekognition Developer Guide.

About the authors

Hao Lun Colin Chu is an innovative Solution Architect at AWS, helping partners and customers leverage cutting-edge cloud technologies to solve complex business challenges. With extensive expertise in cloud migrations, modernization, and AI/ML, Colin advises organizations on translating their needs into transformative AWS-powered solutions. Driven by a passion for using technology as a force for good, he is committed to delivering solutions that empower organizations and improve people’s lives. Outside of work, he enjoys playing drum, volleyball and board games!

Joe Wilson is a Solutions Architect at Amazon Web Services supporting nonprofit organizations. He provides technical guidance to nonprofit organizations seeking to securely build, deploy or expand applications in the cloud. He is passionate about leveraging data and technology for social good. Joe background is in data science and international development. Outside work, Joe loves spending time with his family, friends and chatting about innovation and entrepreneurship.

NVIDIA Partners for Globally Inclusive AI in U.S. Government Initiative

NVIDIA is joining the U.S. government’s launch of the Partnership for Global Inclusivity on AI (PGIAI), providing Deep Learning Institute training, GPU credits and hardware and software grants in developing countries.

The partnership was announced today in New York at the U.N. General Assembly by U.S. Secretary of State Antony Blinken. The effort aims to harness the potential of artificial intelligence to advance sustainable development around the world.

“Artificial Intelligence is driving the next industrial revolution, offering incredible potential to contribute meaningful progress on sustainable development goals,” said Ned Finkle, VP of NVIDIA government affairs. “NVIDIA is committed to empowering communities to use AI to innovate through support for research, education and small and medium size enterprises.”

NVIDIA is joined by Amazon, Anthropic, Apple, Google, IBM, Meta, Microsoft and OpenAI in the initiative.

Members of the partnership have pledged to provide access to training, compute and other AI tools to drive sustainable development and improved quality of life in developing countries.

The PGIAI initiative recognizes that equitable AI requires understanding and respect for the diverse cultures, languages and traditions of the communities where services are provided. With that criteria, PGIAI members will focus on increasing access to AI models, APIs, compute credit and other AI tools, as well as technical training and access to local datasets.

Under this partnership, NVIDIA will provide approximately $10 million in free training to universities and developers to help support AI for local solutions and development goals.

NVIDIA’s global Inception program supports nearly 5,000 start-ups in emerging economies with technical expertise, go-to-market support, hardware and software discounts, and access to free cloud computing credits provided by NVIDIA partners.

In 2024, Inception provided access to more than $60 million worth of free cloud compute credits through partners to start-ups in emerging economies.

CTA: Learn more about the NVIDIA Inception program for startups. Learn more about the NVIDIA Deep Learning Institute.

Accelerate development of ML workflows with Amazon Q Developer in Amazon SageMaker Studio

Machine learning (ML) projects are inherently complex, involving multiple intricate steps—from data collection and preprocessing to model building, deployment, and maintenance. Data scientists face numerous challenges throughout this process, such as selecting appropriate tools, needing step-by-step instructions with code samples, and troubleshooting errors and issues. These iterative challenges can hinder progress and slow down projects. Fortunately, generative AI-powered developer assistants like Amazon Q Developer have emerged to help data scientists streamline their workflows and fast-track ML projects, allowing them to save time and focus on strategic initiatives and innovation.

Amazon Q Developer is fully integrated with Amazon SageMaker Studio, an integrated development environment (IDE) that provides a single web-based interface for managing all stages of ML development. You can use this natural language assistant from your SageMaker Studio notebook to get personalized assistance using natural language. It offers tool recommendations, step-by-step guidance, code generation, and troubleshooting support. This integration simplifies your ML workflow and helps you efficiently build, train, and deploy ML models without needing to leave SageMaker Studio to search for additional resources or documentation.

In this post, we present a real-world use case analyzing the Diabetes 130-US hospitals dataset to develop an ML model that predicts the likelihood of readmission after discharge. Throughout this exercise, you use Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle and experience firsthand how this natural language assistant can help even the most experienced data scientists or ML engineers streamline the development process and accelerate time-to-value.

Solution overview

If you’re an AWS Identity and Access Management (IAM) and AWS IAM Identity Center user, you can use your Amazon Q Developer Pro tier subscription within Amazon SageMaker. Administrators can subscribe users to the Pro Tier on the Amazon Q Developer console, enable Pro Tier in the SageMaker domain settings, and provide the Amazon Q Developer profile Amazon Resource Name (ARN). The Pro Tier offers unlimited chat and inline code suggestions. Refer to Set up Amazon Q Developer for your users for detailed instructions.

If you don’t have a Pro Tier subscription but want to try out the capability, you can access the Amazon Q Developer Free Tier by adding the relevant policies to your SageMaker service roles. Admins can navigate to the IAM console, search for the SageMaker Studio role, and add the policy outlined in Set up Amazon Q Developer for your users. The Free Tier is available for both IAM and IAM Identity Center users.

To start our ML project predicting the probability of readmission for diabetes patients, you need to download the Diabetes 130-US hospitals dataset. This dataset contains 10 years (1999–2008) of clinical care data at 130 US hospitals and integrated delivery networks. Each row represents hospital records of patients diagnosed with diabetes, who underwent laboratory, and more.

At the time of writing, Amazon Q Developer support in SageMaker Studio is only available in JupyterLab spaces. Amazon Q Developer is not supported for shared spaces.

Amazon Q Developer chat

After you have uploaded the data to SageMaker Studio, you can start working on your ML problem of reducing readmission rates for diabetes patients. Begin by using the chat capability next to your JupyterLab notebook. You can ask questions like generating code to parse the Diabetes 130-US hospitals data, how you should formulate this ML problem, and develop a plan to build an ML model that predicts the likelihood of readmission after discharge. Amazon Q Developer uses AI to provide code recommendations, and this is non-deterministic. The results you get may be different from the ones shown in the following screenshot.

You can ask Amazon Q Developer to help you plan out the ML project. In this case, we want the assistant to show us how to train a random forest classifier using the Diabetes 130-US dataset. Enter the following prompt into the chat, and Amazon Q Developer will generate a plan. If code is generated, you can use the UI to directly insert the code into your notebook.

I have diabetic_data.csv file containing training data about whether a diabetic patient was readmitted after discharge. I want to use this data to train a random forest classifier using scikit-learn. Can you list out the steps to build this model?

You can ask Amazon Q Developer to help you generate code for specific tasks by inserting the following prompt:

Create a function that takes in a pandas DataFrame and performs one-hot encoding for the gender, race, A1Cresult, and max_glu_serum columns.

You can also ask Amazon Q Developer to explain existing code and troubleshoot for common errors. Just choose the cell with the error and enter /fix in the chat.

The following is a full list of the shortcut commands:

/help – Display this help message
/fix – Fix an error cell selected in your notebook
/clear – Clear the chat window
/export – Export chat history to a Markdown file

To get the most out of your Amazon Q Developer chat, the following best practices are recommended when crafting your prompt:

Be direct and specific – Ask precise questions. For instance, instead of a vague query about AWS services, try: “Can you provide sample code using the SageMaker Python SDK library to train an XGBoost model in SageMaker?” Specificity helps the assistant understand exactly what you need, resulting in more accurate and useful responses.
Provide contextual information – The more context you offer, the better. This allows Amazon Q Developer to tailor its responses to your specific situation. For example, don’t just ask for code to prepare data. Instead, provide the first three rows of your data to get better code suggestions with fewer changes needed.
Avoid sensitive topics – Amazon Q Developer is designed with guardrail controls. It’s best to avoid questions related to security, billing information of your account, or other sensitive subjects.

Following these guidelines can help you maximize the value of Amazon Q Developer’s AI-powered code recommendations and streamline your ML projects.

Amazon Q Developer inline code suggestions

You can also get real-time code suggestions as you type in the JupyterLab notebook, offering context-aware recommendations based on your existing code and comments to streamline the coding process. In the following example, we demonstrate how to use the inline code suggestions feature to generate code blocks for various data science tasks: from data exploration to feature engineering, training a random forest model, evaluating the model, and finally deploying the model to predict the probability of readmission for diabetes patients.

The following figure shows the list of keyboard shortcuts to interact with Amazon Q Developer.

Let’s start with data exploration.

We first import some of the necessary Python libraries, like pandas and NumPy. Add the following code into the first code cell of Jupyter Notebook, and then run the cell:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In the next code cell, add the following comment, and before running the cell, press Enter and Tab. You can watch the bottom status bar to see Amazon Q Developer working to generate code suggestions.

# read 'diabetic-readmission.csv'

You can also ask Amazon Q Developer to create a visualization:

# create a bar chart from df that shows counts of patients by 'race' and 'gender' with a title of 'patients by race and gender'

Now you can perform feature engineering to prepare the model for training.

The dataset provided has a number of categorical features, which need to be converted to numerical features, as well as missing data. In the next code cell, add the following comment, and press TAB to see how Amazon Q Developer can help:

# perform one-hot encoding for gender, race, a1c_result, and max_glu_serum columns

Lastly, you can use Amazon Q Developer to help you create a simple ML model, random forest classifier, using scikit-learn.

Amazon Q Developer in SageMaker data policy

When using Amazon Q Developer in SageMaker Studio, no customer content is used for service improvement, regardless of whether you use the Free Tier or Pro Tier. For IDE-level telemetry sharing, Amazon Q Developer may track your usage of the service, such as how many questions you ask and whether you accept or reject a recommendation. This information doesn’t contain customer content or personally identifiable information, such as your IP address. If you prefer to opt out of IDE-level telemetry, complete the following steps to opt out of sharing usage data with Amazon Q Developer:

On the Settings menu, choose Settings Editor.

Uncheck the option Share usage data with Amazon Q Developer.

Alternatively, an ML platform admin can disable this option for all users inside JupyterLab by default with the help of lifecycle configuration scripts. To learn more, see Using lifecycle configurations with JupyterLab. To disable data sharing with Amazon Q Developer by default for all users within a SageMaker Studio domain, complete the following steps:

On the SageMaker console, choose Lifecycle configurations under Admin configurations in the navigation pane.
Choose Create configuration.

For Name, enter a name.
In the Scripts section, create a lifecycle configuration script that disables the shareCodeWhispererContentWithAWS settings flag for the jupyterlab-q extension:

#!/bin/bash
mkdir -p /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/
cat<<EOL> /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/completer.jupyterlab-settings
{
"shareCodeWhispererContentWithAWS": false,   
"suggestionsWithCodeReferences": true,   
"codeWhispererTelemetry": false,
"codeWhispererLogLevel": "ERROR"
}
EOL

Attach the disable-q-data-sharing lifecycle configuration to a domain.
Optionally, you can force the lifecycle configuration to run with the Run by default

Use this lifecycle configuration when creating a JupyterLab space.

It will be selected by default if the configuration is set to Run by default.

The configuration should run almost instantaneously and disable the Share usage data with Amazon Q Developer option in your JupyterLab space on startup.

Clean up

To avoid incurring AWS charges after testing this solution, delete the SageMaker Studio domain.

Conclusion

In this post, we walked through a real-world use case and developed an ML model that predicts the likelihood of readmission after discharge for patients in the Diabetes 130-US hospitals dataset. Throughout this exercise, we used Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle, demonstrating how this developer assistant can help streamline the development process and accelerate time-to-value, even for experienced ML practitioners. You have access to Amazon Q Developer in all AWS Regions where SageMaker is generally available. Get started with Amazon Q Developer in SageMaker Studio today to access the generative AI–powered assistant.

The assistant is available for all Amazon Q Developer Pro and Free Tier users. For pricing information, see Amazon Q Developer pricing.

About the Authors

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. She has a decade of experience in DevOps, infrastructure, and ML. Her areas of focus include computer vision, MLOps/LLMOps, and generative AI.

Shibin Michaelraj is a Sr. Product Manager with the Amazon SageMaker team. He is focused on building AI/ML-based products for AWS customers.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Bhadrinath Pani is a Software Development Engineer at Amazon Web Services, working on Amazon SageMaker interactive ML products, with over 12 years of experience in software development across domains like automotive, IoT, AR/VR, and computer vision. Currently, his main focus is on developing machine learning tools aimed at simplifying the experience for data scientists. In his free time, he enjoys spending time with his family and exploring the beauty of the Pacific Northwest.

Govern generative AI in the enterprise with Amazon SageMaker Canvas

With the rise of powerful foundation models (FMs) powered by services such as Amazon Bedrock and Amazon SageMaker JumpStart, enterprises want to exercise granular control over which users and groups can access and use these models. This is crucial for compliance, security, and governance.

Launched in 2021, Amazon SageMaker Canvas is a visual point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without writing any code. SageMaker Canvas provides a no-code interface to consume a broad range of FMs from both services in an off-the-shelf fashion, as well as to customize model responses using a Retrieval Augmented Generation (RAG) workflow using Amazon Kendra as a knowledge base or fine-tune using a labeled dataset. This simplifies access to generative artificial intelligence (AI) capabilities to business analysts and data scientists without the need for technical knowledge or having to write code, thereby accelerating productivity.

In this post, we analyze strategies for governing access to Amazon Bedrock and SageMaker JumpStart models from within SageMaker Canvas using AWS Identity and Access Management (IAM) policies. You’ll learn how to create granular permissions to control the invocation of ready-to-use Amazon Bedrock models and prevent the provisioning of SageMaker endpoints with specified SageMaker JumpStart models. We provide code examples tailored to common enterprise governance scenarios. By the end, you’ll understand how to lock down access to generative AI capabilities based on your organizational requirements, maintaining secure and compliant use of cutting-edge AI within the no-code SageMaker Canvas environment.

This post covers an increasingly important topic as more powerful AI models become available, making it a valuable resource for ML operators, security teams, and anyone governing AI in the enterprise.

Solution overview

The following diagram illustrates the solution architecture.

The architecture of SageMaker Canvas allows business analysts and data scientists to interact with ML models without writing any code. However, managing access to these models is crucial for maintaining security and compliance. When a user interacts with SageMaker Canvas, the operations they perform, such as invoking a model or creating an endpoint, are run by the SageMaker service role. SageMaker user profiles can either inherit the default role from the SageMaker domain or have a user-specific role.

By customizing the policies attached to this role, you can control what actions are permitted or denied, thereby governing the access to generative AI capabilities. As part of this post, we discuss which IAM policies to use for this role to control operations within SageMaker Canvas, such as invoking models or creating endpoints, based on enterprise organizational requirements. We analyze two patterns for both Amazon Bedrock models and SageMaker JumpStart models: limiting access to all models from a service or limiting access to specific models.

Govern Amazon Bedrock access to SageMaker Canvas

In order to use Amazon Bedrock models, SageMaker Canvas calls the following Amazon Bedrock APIs:

bedrock:InvokeModel – Invokes the model synchronously
bedrock:InvokeModelWithResponseStream – Invokes the model synchronously, with the response being streamed over a socket, as illustrated in the following diagram

Additionally, SageMaker Canvas can call the bedrock:FineTune API to fine-tune large language models (LLMs) with Amazon Bedrock. At the time of writing, SageMaker Canvas only allows fine-tuning of Amazon Titan models.

To use a specific LLM from Amazon Bedrock, SageMaker Canvas uses the model ID of the chosen LLM as part of the API calls. At the time of writing, SageMaker Canvas supports the following models from Amazon Bedrock, grouped by model provider:

AI21
- Jurassic-2 Mid: j2-mid-v1
- Jurassic-2 Ultra : j2-ultra-v1
Amazon
- Titan: titan-text-premier-v1:*
- Titan Large: titan-text-lite-v1
- Titan Express: titan-text-express-v1
Anthropic
- Claude 2: claude-v2
- Claude Instant: claude-instant-v1
Cohere
- Command Text: command-text-*
- Command Light: command-light-text-*
Meta
- Llama 2 13B: llama2-13b-chat-v1
- Llama 2 70B: llama2-70b-chat-v1

For the complete list of models IDs for Amazon Bedrock, see Amazon Bedrock model IDs.

Limit access to all Amazon Bedrock models

To restrict access to all Amazon Bedrock models, you can modify the SageMaker role to explicitly deny these APIs. This makes sure no user can invoke any Amazon Bedrock model through SageMaker Canvas.

The following is an example IAM policy to achieve this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": "*"
        }
    ]
}

The policy uses the following parameters:

"Effect": "Deny" specifies that the following actions are denied
"Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"] specifies the Amazon Bedrock APIs that are denied
"Resource": "*" indicates that the denial applies to all Amazon Bedrock models

Limit access to specific Amazon Bedrock models

You can extend the preceding IAM policy to restrict access to specific Amazon Bedrock models by specifying the model IDs in the Resources section of the policy. This way, users can only invoke the allowed models.

The following is an example of the extended IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:<region-or-*>::foundation-model/<model-id-1>",
                "arn:aws:bedrock:<region-or-*>::foundation-model/<model-id-2>"
            ]
        }
    ]
}

In this policy, the Resource array lists the specific Amazon Bedrock models that are denied. Provide the AWS Region, account, and model IDs appropriate for your environment.

Govern SageMaker JumpStart access to SageMaker Canvas

For SageMaker Canvas to be able to consume LLMs from SageMaker JumpStart, it must perform the following operations:

Select the LLM from SageMaker Canvas or from the list of JumpStart Model IDs (link below).
Create an endpoint configuration and Deploy the LLM on a real-time endpoint.
Invoke the endpoint to generate the prediction.

The following diagram illustrates this workflow.

For a list of available JumpStart model IDs, see JumpStart Available Model Table. At the time of writing, SageMaker Canvas supports the following model IDs:

huggingface-textgeneration1-mpt-7b-*
huggingface-llm-mistral-*
meta-textgeneration-llama-2-*
huggingface-llm-falcon-*
huggingface-textgeneration-dolly-v2-*
huggingface-text2text-flan-t5-*

To identify the right model from SageMaker JumpStart, SageMaker Canvas passes aws:RequestTag/sagemaker-sdk:jumpstart-model-id as part of the endpoint configuration. To learn more about other techniques to limit access to SageMaker JumpStart models using IAM permissions, refer to Manage Amazon SageMaker JumpStart foundation model access with private hubs.

Configure permissions to deploy endpoints through the UI

On the SageMaker domain configuration page on the SageMaker page of the AWS Management Console, you can configure SageMaker Canvas to be able to deploy SageMaker endpoints. This option also enables deployment of real-time endpoints for classic ML models, such as time series forecasting or classification. To enable model deployment, complete the following steps:

On the Amazon SageMaker console, navigate to your domain.
On the Domain details page, choose the App Configurations

In the Canvas section, choose Edit.

Turn on Enable direct deployment of Canvas models in the ML Ops configuration

Limit access to all SageMaker JumpStart models

To limit access to all SageMaker JumpStart models, configure the SageMaker role to block the CreateEndpointConfig and CreateEndpoint APIs on any SageMaker JumpStart Model ID. This prevents the creation of endpoints using these models. See the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint"
            ],
            "Resource": "*",
"Condition": {
                "Null": {
                    "aws:RequestTag/sagemaker-sdk:jumpstart-model-id":”*”
		    }
		}
        }
    ]
}

This policy uses the following parameters:

"Effect": "Deny" specifies that the following actions are denied
"Action": ["sagemaker:CreateEndpointConfig", "sagemaker:CreateEndpoint"] specifies the SageMaker APIs that are denied
The "Null" condition operator in AWS IAM policies is used to check whether a key exists or not. It does not check the value of the key, only its presence or absence
"aws:RequestTag/sagemaker-sdk:jumpstart-model-id":”*” indicates that the denial applies to all SageMaker JumpStart models

Limit access and deployment for specific SageMaker JumpStart models

Similar to Amazon Bedrock models, you can limit access to specific SageMaker JumpStart models by specifying their model IDs in the IAM policy. To achieve this, an administrator needs to restrict users from creating endpoints with unauthorized models. For example, to deny access to Hugging Face FLAN T5 models and MPT models, use the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/sagemaker-sdk:jumpstart-model-id": [
                        "huggingface-textgeneration1-mpt-7b-*",
                        "huggingface-text2text-flan-t5-*"
                    ]
                }
            }
        }
    ]
}

In this policy, the "StringLike" condition allows for pattern matching, enabling the policy to apply to multiple model IDs with similar prefixes.

Clean up

To avoid incurring future workspace instance charges, log out of SageMaker Canvas when you’re done using the application. Optionally, you can configure SageMaker Canvas to automatically shut down when idle.

Conclusion

In this post, we demonstrated how SageMaker Canvas invokes LLMs powered by Amazon Bedrock and SageMaker JumpStart, and how enterprises can govern access to these models, whether you want to limit access to specific models or to any model from either service. You can combine the IAM policies shown in this post in the same IAM role to provide complete control.

By following these guidelines, enterprises can make sure their use of generative AI models is both secure and compliant with organizational policies. This approach not only safeguards sensitive data but also empowers business analysts and data scientists to harness the full potential of AI within a controlled environment.

Now that your environment is configured according to the enterprise standard, we suggest reading the following posts to learn what SageMaker Canvas enables you to do with generative AI:

About the Authors

Davide Gallitelli is a Senior Specialist Solutions Architect GenAI/ML. He is Italian, based in Brussels, and works closely with customer all around the world on Generative AI workloads and Low-Code No-Code ML technology. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Lijan Kuniyil is a Senior Technical Account Manager at AWS. Lijan enjoys helping AWS enterprise customers build highly reliable and cost-effective systems with operational excellence. Lijan has more than 25 years of experience in developing solutions for financial and consulting companies.

Saptarshi Banerjee serves as a Senior Partner Solutions Architect at AWS, collaborating closely with AWS Partners to design and architect mission-critical solutions. With a specialization in generative AI, AI/ML, serverless architecture, and cloud-based solutions, Saptarshi is dedicated to enhancing performance, innovation, scalability, and cost-efficiency for AWS Partners within the cloud ecosystem.

Transforming home ownership with Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock: Rocket Mortgage’s journey with AWS

This post is co-written with Josh Zook and Alex Hamilton from Rocket Mortgage.

Rocket Mortgage, America’s largest retail mortgage lender, revolutionizes homeownership with Rocket Logic – Synopsis, an AI tool built on AWS. This innovation has transformed client interactions and operational efficiency through the use of Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock. Through Rocket Logic – Synopsis, Rocket achieved remarkable results: automating post call interaction wrap-up resulting in a projected 40,000 team hours saved annually, and a 10% increase in first-call resolutions saved 20,000 hours annually. In addition to Rocket Logic – Synopsis, 70% of servicing clients choose to self-serve over Gen AI powered mediums such as IVR. Rocket’s “start small, launch and learn, scale fast” approach paired with AWS enablement proved effective, deploying 30,000 servicing calls in 10 days, then scaling four times greater for operations and six times greater for banking.

This post offers insights for businesses aiming to use artificial intelligence (AI) and cloud technologies to enhance customer service and streamline operations. We share how Rocket Mortgage’s use of AWS services set a new industry standard and demonstrate how to apply these principles to transform your client interactions and processes with speed and scalability.

Opportunities for innovation

Rocket services over 2.6 million clients, with 65 million voice interactions and 147 million voice minutes inclusive of banking, operations, and servicing, and generates and processes over 10 PB of data. By focusing on three key personas—clients, client advocates, and business leaders or senior leadership—Rocket aims to create a solution that enhances experiences across the board.

At the heart of this transformation is the recognition that clients value their time, but also benefit from hyper-personalized support in ultra complex moments. With call volumes on the rise, solving this problem at scale was essential. Rocket tapped into a crucial insight: 81% of consumers prefer self-service options. This preference opens exciting possibilities for swift, efficient problem-solving. Imagine a world where answers are available at your fingertips, 24/7, without the need to wait in a queue. By implementing enhanced self-service tools, Rocket is poised to offer faster resolution times, greater client autonomy, and a more satisfying overall experience.

Client advocates, the face of the company, stand to benefit significantly from this transformation. Currently, client advocates spend about 30% of their time on administrative tasks. By streamlining processes, client advocates can focus on what they do best: providing exceptional customer service and nurturing client relationships. This shift promises more engaging work, increased job satisfaction, and opportunities for skill development. Rocket envisions their client advocates evolving into trusted advisors, handling complex inquiries that truly take advantage of their expertise and interpersonal skills.

For business leaders, this wealth of data on trends, sentiment, and performance opens up a treasure trove of opportunities. Decision-makers can now drive significant improvements across the board, employing data-driven strategies to enhance customer satisfaction, optimize operations, and boost overall business performance. Business leaders can look forward to leading more efficient teams, and senior leadership can anticipate improved client loyalty and a stronger bottom line.

Strategic requirements

To further elevate their client interactions, Rocket identified key requirements for their solution. These requirements were essential to make sure the solution could handle the demands of their extensive client base and provide actionable insights to enhance client experiences:

Sentiment analysis – Tracking client sentiment and preferences was necessary to offer personalized experiences. The solution needed to accurately gauge client emotions and preferences to tailor responses and services effectively.
Automation – Automating routine tasks, such as call summaries, was essential to free up team members for more meaningful client interactions. This automation would help reduce the manual workload, allowing the team to focus on building stronger client relationships.
AI integration – Using generative AI to analyze calls was crucial for providing actionable insights and enhancing client interactions. The AI integration needed to be robust enough to process vast amounts of data and deliver precise, meaningful results.
Data security – Protecting sensitive client information throughout the process was a non-negotiable requirement. Rocket needed to uphold the highest standards of data security, maintaining regulatory compliance, data privacy, and the integrity of client information.
Compliance and data privacy – Rocket required a solution that met strict compliance and data privacy standards. Given the sensitive nature of the information handled, the solution needed to provide complete data protection and adhere to industry regulations.
Scalability – Rocket needed a solution capable of handling millions of calls annually and scaling efficiently with growing demand. This requirement was vital to make sure the system could support their expansive and continuously increasing volume of voice interactions.

Solution overview

To meet these requirements, Rocket partnered with the AWS team to deploy the AWS Contact Center Intelligence (CCI) solution Post-Call Analytics, branded internally as Rocket Logic – Synopsis. This solution seamlessly integrates into Rocket’s existing operations, using AI technologies to transcribe and analyze client calls. By utilizing services like Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock, the solution extracts valuable insights such as sentiment, call drivers, and client preferences, enhancing client interactions and providing actionable data for continuous improvement.

At the heart of Rocket are their philosophies, known as their -ISMs, which guide their growth and innovation. One of these guiding principles is “launch and learn.”

Embracing the mantra of “think big but start small,” Rocket adopted a rapid, iterative approach to achieve a remarkable time to market of just 10 days, compared to the months it would have traditionally taken. This agile methodology allowed them to create space for exploration and innovation. The team initially focused on a few key use cases, starting simple and rapidly iterating based on feedback and results.

To accelerate development and make sure data was quickly put into the hands of the business, they utilized mechanisms such as a hackathon with targeted goals. By using existing solutions and AWS technical teams, Rocket significantly reduced the time to market, allowing for swift deployment. Additionally, they looked to industry tactics to find solutions to common problems, so their approach was both innovative and practical.

During this “launch and learn” process, Rocket anticipated and managed challenges such as scaling issues and burst volume management using Drip Hopper and serverless technologies through AWS. They also fine-tuned the Anthropic’s Claude 3 Haiku large language model (LLM) on Amazon Bedrock for call classification and data extraction.

The following diagram illustrates the solution architecture.

Post-Call Analytics provides an entire architecture around ingesting audio files in a fully automated workflow with AWS Step Functions, which is initiated when an audio file is delivered to a configured Amazon Simple Storage Service (Amazon S3) bucket. After a few minutes, a transcript is produced with Amazon Transcribe Call Analytics and saved to another S3 bucket for processing by other business intelligence (BI) tools. These transcripts are saved for further processing by BI tools, with stringent security measures making sure personally identifiable information (PII) is redacted and data is encrypted. The PII is redacted throughout, but client ID and interaction ID are used to correlate and trace across the data sets. Downstream applications use those ids to pull from client data services in the UI presentation layer.

Enhancing the analysis, Amazon Comprehend is used for sentiment analysis and entity extraction, providing deeper insights into client interactions. Generative AI is integrated to generate concise call summaries and actionable insights, significantly reducing the manual workload and allowing team members to focus on building stronger client relationships. This generative AI capability, powered by Amazon Bedrock, Anthropic’s Claude Sonnet 3, and customizable prompts, enables Rocket to deliver real-time, contextually relevant information. Data is securely stored and managed within AWS, using Amazon S3 and Amazon DynamoDB, with robust encryption and access controls provided by AWS Key Management Service (AWS KMS) and AWS Identity and Access Management (IAM) policies. This comprehensive setup enables Rocket to efficiently manage, analyze, and act on client interaction data, thereby enhancing both client experience and operational efficiency.

Achieving excellence

The implementation of Rocket Logic – Synopsis has yielded remarkable results for Rocket:

Efficiency gains – Automating call transcription and sentiment analysis is projected to save the servicing team nearly 40,000 hours annually
Enhanced client experience – Approximately 70% of servicing clients fully self-serve over Gen AI powered mediums such as IVR; allowing clients to resolve inquiries without needing team member intervention
Increased first-call resolutions – There has been a nearly 10% increase in first-call resolutions, saving approximately 20,000 team member hours annually
Proactive client solutions – The tool’s predictive capabilities have improved, allowing Rocket to proactively address client needs before they even make a call
Start small, launch and learn, scale fast – Rocket started with 30,000 servicing calls with a 10-day time to market, and then scaled four times greater for operations, followed by six times greater for banking

Roadmap

Looking ahead, Rocket plans to continue enhancing Rocket Logic – Synopsis by using the vast amount of data gathered from call transcripts. Future developments will include:

Advanced predictive analytics – Further improving the tool’s ability to anticipate client needs and offer solutions proactively
Omnichannel integration – Expanding the AI capabilities to other communication channels such as emails and chats
Client preference tracking – Refining the technology to better understand and adapt to individual client preferences, providing more personalized interactions
Enhanced personalization – Utilizing data to create even more tailored client experiences, including understanding preferences for communication channels and timing

Conclusion

The collaboration between Rocket Mortgage and AWS has revolutionized the homeownership process by integrating advanced AI solutions into client interactions. Rocket Logic – Synopsis enhances operational efficiency significantly and improves the client experience. As Rocket continues to innovate and expand its AI capabilities, they remain committed to providing personalized, efficient, and seamless homeownership experiences for their clients. The success of Rocket Logic – Synopsis demonstrates the transformative power of technology in creating more efficient, responsive, and personalized client experiences. To learn more, visit Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock.

About the authors

Josh Zook is the Chief Technology Officer of Rocket Mortgage, working alongside the teams that are shipping the products that clients and partners are using every day to make home ownership a reality. He started in Technology in 1984 by writing a program in BASIC to calculate his weight on the moon using an Apple IIe. Since then, he has been on a relentless pursuit in using technology to make life easier by solving slightly more complex problems. Josh believes the key to success is curiosity combined with the grit and grind to make ideas reality. This has led to a steady paycheck since he was 10 years old, with jobs in landscaping, sandwich artistry, sporting goods sales, satellite installation, firefighter, and bookstore aficionado… just to name a few.

Alex Hamilton is a Director of Engineering at Rocket Mortgage, spearheading the AI driven digital strategy to help everyone home. He’s been shaping the tech scene at Rocket for over 11 years, including launching one of the company’s first models to boost trading revenue and bring modern event streaming and containerization to Rocket. Alex is passionate about solving novel engineering problems and bringing magical client experiences to life. Outside of work Alex enjoys traveling, weekend brunch, and firing up the grill!

Ritesh Shah is a Senior Worldwide GenAI Specialist at AWS. He partners with customers like Rocket to drive AI adoption, resulting in millions of dollars in top and bottom line impact for these customers. Outside work, Ritesh tries to be a dad to his AWSome daughter. Connect with him on LinkedIn.

Venkata Santosh Sajjan Alla is a Senior Solutions Architect at AWS Financial Services, where he partners with North American FinTech companies like Rocket to drive cloud strategy and accelerate AI adoption. His expertise in AI & ML, and cloud native architecture has helped organizations unlock new revenue streams, enhance operational efficiency, and achieve substantial business transformation. By modernizing financial institutions with secure, scalable infrastructures, Sajjan enables them to stay competitive in a rapidly evolving, data-driven landscape. Outside of work, he enjoys spending time with his family and is a proud father to his daughter.

To Save Lives, and Energy, Wellcome Sanger Institute Speeds Cancer Research With NVIDIA Accelerated Computing

The Wellcome Sanger Institute, a key contributor to the international Human Genome Project, is turning to NVIDIA accelerated computing to save energy while saving lives.

With one of the world’s largest sequencing facilities, the U.K.-based institute has read more than 48 petabases — or 48 quadrillion base pairs — of DNA and RNA sequences to uncover crucial insights into health and disease.

Its Cancer, Ageing and Somatic Mutation (CASM) program sequences and analyzes tens of thousands of cancer genomes a year to study the mutational processes driving cancer formation, as well as genetic variations that determine treatment effectiveness.

To tackle such large-scale initiatives, the Sanger Institute is exploring the use of an NVIDIA DGX system with NVIDIA Parabricks, a scalable genomics analysis software suite that taps into accelerated computing to process data in just minutes.

“The Sanger Institute handles hundreds of thousands of somatic samples annually,” said Jingwei Wang, principal software developer for CASM at the Wellcome Sanger Institute. “NVIDIA accelerated computing and Parabricks will save us considerable time, cost and energy when analyzing samples, and we’re excited to explore NVIDIA’s advanced architectures, such as NVIDIA Grace and Grace Hopper, for even higher performance and efficiency.”

Reducing Runtime and Energy Consumption

The Sanger Institute develops high-throughput models of cancer samples for genome-wide functional screens and drug testing.

NVIDIA accelerated computing and software drastically reduce the institute’s analysis runtime and energy consumption per genome.

To accelerate genomic analysis with Burrows-Wheeler Aligner (BWA), a software package for mapping DNA sequences against a large reference genome, Sanger uses its proprietary CaVEMan workflow running on CPUs and is tapping into Parabricks on NVIDIA GPUs.

The institute reduced runtime 1.6x, costs 24x and energy consumption up to 42x — using one NVIDIA DGX system compared with 128 dual-socket CPU servers.

About 125 million CPU hours are consumed per 10,000 genomes sequenced by the institute annually.

This means that the Sanger Institute could, each year, save $1 million and 1,000 megawatt-hours by switching to using BWA with Parabricks on GPUs. That’s about the amount of energy needed to power an average American home for a century.

Collaborating With Industry Leaders

The Sanger Institute’s NVIDIA-accelerated sequencing lab can be considered an AI factory, where data comes in and intelligence comes out.

AI factories are next-generation data centers that host advanced, full-stack accelerated computing platforms for the most computationally intensive tasks.

As it explores crucial scientific questions to discover new cancer genes and mutational processes, the Sanger Institute is boosting operational and energy efficiency by using NVIDIA infrastructure for its AI factory.

In addition, companies and organizations building AI factories are participating in cross-industry collaborations with leaders like Schneider Electric, an energy management and automation company, to optimize data center designs for running demanding workloads in the most energy-efficient way.

The Sanger Institute is collaborating with Schneider Electric to minimize data center downtime and equip the DNA sequencing lab’s data center with uninterruptible power supplies and cooling equipment, among other technologies pivotal to reducing energy consumption.

At the NVIDIA GTC conference in March, Schneider Electric announced it’s helping organizations across industries optimize infrastructure by releasing AI data center reference designs tailored for NVIDIA accelerated computing clusters.

The reference designs — built for data processing, engineering simulation, electronic design automation, computer-aided drug design and generative AI — will focus on high-power distribution, liquid-cooling systems and other aspects of scalable, high-performance, sustainable data centers.

In an NYC Climate Week panel this week hosted by The Economist, representatives from Sanger, Schneider Electric and NVIDIA will talk about their work.

Learn more about sustainable computing and the Sanger Institute’s potentially life-saving work.

Featured image courtesy of the Wellcome Sanger Institute.

NVIDIA IGX Powers Real-Time AI Engine

Driving Benefits Down the Track

The challenge: Tackling complicated long-tail shopping scenarios

The solution: Just Walk Out multi-modal AI

Training the Just Walk Out FM

Conclusion

About the Authors

Fundamentals of RAG evaluation

Solution overview

Load and prepare the data

Generate an initial question

Generate answers

Extract relevant context

Refine questions

Automate dataset generation

Improve your dataset using critique agents

Best practices for generating synthetic datasets

Conclusion

About the Authors

Solution overview

Prerequisties

Solution architecture

Solution walkthrough

Configure SageMaker:

Enable SageMaker to interact with Amazon Rekognition:

Code walkthrough:

Costs

Cost breakdown and details

Clean up

Conclusion

About the authors

Solution overview

Amazon Q Developer chat

Amazon Q Developer inline code suggestions

Amazon Q Developer in SageMaker data policy

Clean up

Conclusion

About the Authors

Solution overview

Govern Amazon Bedrock access to SageMaker Canvas

Limit access to all Amazon Bedrock models

Limit access to specific Amazon Bedrock models

Govern SageMaker JumpStart access to SageMaker Canvas

Configure permissions to deploy endpoints through the UI

Limit access to all SageMaker JumpStart models

Limit access and deployment for specific SageMaker JumpStart models

Clean up

Conclusion

About the Authors

Opportunities for innovation

Strategic requirements

Solution overview

Achieving excellence

Roadmap

Conclusion

About the authors

Reducing Runtime and Energy Consumption

Collaborating With Industry Leaders

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.