Migrating to Amazon SageMaker: Karini AI Cut Costs by 23%

Migrating to Amazon SageMaker: Karini AI Cut Costs by 23%

This post is co-written with Deepali Rajale from Karini AI.

Karini AI, a leading generative AI foundation platform built on AWS, empowers customers to quickly build secure, high-quality generative AI apps. GenAI is not just a technology; it’s a transformational tool that is changing how businesses use technology. Depending on where they are in the adoption journey, the adoption of generative AI presents a significant challenge for enterprises. While pilot projects using Generative AI can start effortlessly, most enterprises need help progressing beyond this phase. According to Everest Research, more than a staggering 50% of projects do not move beyond the pilots as they face hurdles due to the absence of standardized or established GenAI operational practices.

Karini AI offers a robust, user-friendly GenAI foundation platform that empowers enterprises to build, manage, and deploy Generative AI applications. It allows beginners and expert practitioners to develop and deploy Gen AI applications for various use cases beyond simple chatbots, including agentic, multi-agentic, Generative BI, and batch workflows. The no-code platform is ideal for quick experimentation, building PoCs, and rapid transition to production with built-in guardrails for safety and observability for troubleshooting. The platform includes an offline and online quality evaluation framework to assess quality during experimentation and continuously monitor applications post-deployment. Karini AI’s intuitive prompt playground allows authoring prompts, comparison with different models across providers, prompt management, and prompt tuning. It supports iterative testing of more straightforward, agentic, and multi-agentic prompts. For production deployment, the no-code recipes enable easy assembly of the data ingestion pipeline to create a knowledge base and deployment of RAG or agentic chains. The platform owners can monitor costs and performance in real-time with detailed observability and seamlessly integrate with Amazon Bedrock for LLM inference, benefiting from extensive enterprise connectors and data preprocessing techniques.

The following diagram illustrates how Karini AI delivers a comprehensive Generative AI foundational platform encompassing the entire application lifecycle. This platform delivers a holistic solution that speeds up time to market and optimizes resource utilization by providing a unified framework for development, deployment, and management.

In this post, we share how Karini AI’s migration of vector embedding models from Kubernetes to Amazon SageMaker endpoints improved concurrency by 30% and saved over 23% in infrastructure costs.

Karini AI’s Data Ingestion Pipeline for creating vector embeddings

Enriching large language models (LLMs) with new data is crucial to building practical generative AI applications. This is where Retrieval Augmented Generation (RAG) comes into play. RAG enhances LLMs’ capabilities by incorporating external data and producing state-of-the-art performance in knowledge-intensive tasks. Karini AI offers no-code solutions for creating Generative AI applications using RAG. These solutions include two primary components: a data ingestion pipeline for building a knowledge base and a system for knowledge retrieval and summarization. Together, these pipelines simplify the development process, enabling the creation of powerful AI applications with ease.

Data Ingestion Pipeline

Ingesting data from diverse sources is essential for executing Retrieval Augmented Generation (RAG). Karini AI’s data ingestion pipeline enables connection to multiple data sources, including Amazon S3, Amazon Redshift, Amazon Relational Database Service (RDS), websites and Confluence, handling structured and unstructured data. This source data is pre-processed, chunked, and transformed into vector embeddings before being stored in a vector database for retrieval. Karini AI’s platform provides flexibility by offering a range of embedding models from their model hub, simplifying the creation of vector embeddings for advanced AI applications.

Here is a screenshot of Karini AI’s no-code data ingestion pipeline.

Karini AI’s model hub streamlines adding models by integrating with leading foundation model providers such as Amazon Bedrock and self-managed serving platforms.

Infrastructure challenges

As customers explore complex use cases and datasets grow in size and complexity, Karini AI scales the data ingestion process efficiently to provide high concurrency for creating vector embeddings using state-of-the-art embedding models, such as those listed in the MTEB leaderboard, which are rapidly evolving and unavailable on managed platforms.

Before migrating to Amazon SageMaker, we deployed our models on self-managed Kubernetes(K8s) on EC2 instances. Kubernetes offered significant flexibility to deploy models from HuggingFace quickly, but soon, our engineering had to manage many aspects of scaling and deployment. We faced the following challenges with our existing setup that must be addressed to improve efficiency and performance.

  • Keeping up with SOTA(State-Of-The-Art) models: We managed different deployment manifests for each model type (such as classifiers, embeddings, and autocomplete), which was time-consuming and error-prone. We also had to maintain the logic to determine the memory allocation for different model types.
  • Managing dynamic concurrency was hard: A significant challenge with using models hosted on Kubernetes was achieving the highest dynamic concurrency level. We aimed to maximize endpoint performance to achieve target transactions per second (TPS) while meeting strict latency requirements.
  • Higher Costs: While Kubernetes (K8s) provides robust capabilities, it has become more costly due to the dynamic nature of data ingestion pipelines, which results in under-utilized instances and higher costs.

Our search for an inference platform led us to Amazon SageMaker, a solution that efficiently manages our models for higher concurrency, meets customer SLAs, and scales down serving when not needed. The reliability of SageMaker’s performance gave us confidence in its capabilities.

Amazon SageMaker for Model Serving

Choosing Amazon SageMaker was a strategic decision for Karini AI. It balanced the need for higher concurrencies at a lower cost, providing a cost-effective solution for our needs. SageMaker’s ability to scale and maximize concurrency while ensuring sub-second latency addresses various generative AI use cases making it a long-lasting investment for our platform.

Amazon SageMaker is a fully managed service that allows developers and data scientists to quickly build, train, and deploy machine learning (ML) models. With SageMaker, you can deploy your ML models on hosted endpoints and get real-time inference results. You can easily view the performance metrics for your endpoints in Amazon CloudWatch, automatically scale endpoints based on traffic, and update your models in production without losing any availability.

Karini AI’s data ingestion pipeline architecture with Amazon SageMaker Model endpoint is here.

Advantages of using SageMaker hosting

Amazon SageMaker offered our Gen AI ingestion pipeline many direct and indirect benefits.

  1. Technical Debt Mitigation: Amazon SageMaker, being a managed service, allowed us to free our ML engineers from the burden of inference, enabling them to focus more on our core platform features—this relief from technical debt is a significant advantage of using SageMaker, reassuring us of its efficiency.
  2. Meet customer SLAs: Knowledgebase creation is a dynamic task that may require higher concurrencies during vector embedding generation and minuscule load during query time. Based on customer SLAs and data volume, we can choose batch inference, real-time hosting with auto-scaling, or serverless hosting. Amazon SageMaker also provides recommendations for instance types suitable for embedding models.
  3. Reduced Infrastructure cost: SageMaker is a pay-as-you-go service that allows you to create batch or real-time endpoints when there is demand and destroy them when work is complete. This approach reduced our infrastructure cost by more than 23% over the Kubernetes (K8s) platform.
  4. SageMaker Jumpstart: SageMaker Jumpstart provides access to SOTA (State-Of-The-Art) models and optimized inference containers, making it ideal for creating new models that are accessible to our customers.
  5. Amazon Bedrock compatibility: Karini AI integrates with Amazon Bedrock for LLM (Large Language Model) inference. The custom model import feature allows us to reuse the model weights used in SageMaker model hosting in Amazon Bedrock to maintain a joint code base and interchange serving between Bedrock and SageMaker as per the workload.

Conclusion

Karini AI significantly improved, achieving high performance and reducing model hosting costs by migrating to Amazon SageMaker. We can deploy custom third-party models to SageMaker and quickly make them available to Karini’s model hub for data ingestion pipelines. We can optimize our infrastructure configuration for model hosting as needed, depending on model size and our expected TPS. Using Amazon SagaMaker for model inference enabled Karini AI to handle increasing data complexities efficiently and meet concurrency needs while optimizing costs. Moreover, Amazon SageMaker allows easy integration and swapping of new models, ensuring that our customers can continuously leverage the latest advancements in AI technology without compromising performance or incurring unnecessary incremental costs.

Amazon SageMaker and Karini.ai offer a powerful platform to build, train, and deploy machine learning models at scale. By leveraging these tools, you can:

  • Accelerate development:Build and train models faster with pre-built algorithms and frameworks.
  • Enhance accuracy: Benefit from advanced algorithms and techniques for improved model performance.
  • Scale effortlessly:Deploy models to production with ease and handle increasing workloads.
  • Reduce costs:Optimize resource utilization and minimize operational overhead.

Don’t miss out on this opportunity to gain a competitive edge.


About Authors

Deepali Rajale is the founder of Karini AI, which is on a mission to democratize generative AI across enterprises. She enjoys blogging about Generative AI and coaching customers to optimize Generative AI practice. In her spare time, she enjoys traveling, seeking new experiences, and keeping up with the latest technology trends. You can find her on LinkedIn.

Ravindra Gupta is the Worldwide GTM lead for SageMaker and with a passion to help customers adopt SageMaker for their Machine Learning/ GenAI workloads.  Ravi is fond of learning new technologies, and enjoy mentoring startups on their Machine Learning practice. You can find him on Linkedin

Read More

Harnessing the power of AI to drive equitable climate solutions: The AI for Equity Challenge

Harnessing the power of AI to drive equitable climate solutions: The AI for Equity Challenge

The climate crisis is one of the greatest challenges facing our world today. Its impacts are far-reaching, affecting every aspect of our lives—from public health and food security to economic stability and social justice. What’s more, the effects of climate change disproportionately burden the world’s most vulnerable populations, exacerbating existing inequities around gender, race, and socioeconomic status.

But we have the power to create change. By harnessing the transformative potential of AI, we can develop innovative solutions to tackle the intersectional challenges at the heart of the climate crisis. That’s why the International Research Centre on Artificial Intelligence (IRCAI), Zindi, and Amazon Web Services (AWS) are proud to announce the launch of the “AI for Equity Challenge: Climate Action, Gender, and Health”—a global virtual competition aimed at empowering organizations to use advanced AI and cloud technologies to drive real-world impact with a focus on benefitting vulnerable populations around the world.

Aligning with the United Nations Sustainable Development Goals (SDGs) 3, 5, and 13—focused on good health and well-being, gender equality, and climate action respectively—this challenge seeks to uncover the most promising AI-powered solutions that address the compounding issues of climate change, gender equity, and public health. By bringing together a diverse global community of innovators, we hope to accelerate the development of equitable, sustainable, and impactful applications of AI for the greater good.

“As artificial intelligence rapidly evolves, it is crucial that we harness its potential to address real-world challenges. At IRCAI, our mission is to guide the ethical development of AI technologies, ensuring they serve the greater good and are inclusive of marginalized AI communities. This challenge, in collaboration with AWS, is an opportunity to discover and support the most innovative minds that are using AI and advanced computing to create impactful solutions for the climate crisis.”

– Davor Orlic, COO at IRCAI.

The challenge will unfold in two phases, welcoming both ideators and solution builders to participate. In the first phase, organizations are invited to submit technical proposals outlining specific challenges at the intersection of climate action, gender equity, and health that they aim to address using AI and cloud technologies. A steering committee convened by IRCAI will evaluate these proposals based on criteria such as innovation, feasibility, and potential for global impact. The competition will be judged and mentored in collaboration with NAIXUS, a network of AI and sustainable development research organizations.

The top two winning proposals from the first phase will then advance to the second round, where they will serve as the foundation for two AI challenges hosted on the Zindi platform. During this phase, developers and data scientists from around the world will compete to build the most successful AI-powered solutions to tackle the real-world problems identified by the first-round winners.

AI for Equity Challenge Timeline

The winning AI solutions from the second-round challenges will belong entirely to the organizations that submitted the original winning proposals, who will also receive $15,000 in AWS credits and technical support from AWS and IRCAI to help implement their solutions. Additionally, the first-place teams in each of the two final Zindi challenges will receive cash prizes of $6,000, $4,000, and $2,500 for first, second, and third place respectively.

But the true reward goes beyond the prizes. By participating in this challenge, organizations and individuals alike will have the opportunity to make a lasting impact on the lives of those most vulnerable to the effects of climate change. Through the power of AI and advanced cloud computing, we can develop groundbreaking solutions that empower women, improve public health outcomes, and drive sustainable progress on the climate action front.

Throughout the hackathon, participants will have access to a wealth of resources, including mentorship from industry experts, training materials, and AWS cloud computing resources. Amazon Sustainability Data Initiative (ASDI), a collaboration between AWS and leading scientific organizations, provides a catalog of over 200 datasets spanning climate projections, satellite imagery, air quality data, and more, enabling participants to build robust and data-driven solutions.

“Climate change is one of the greatest threats of our time, and we believe innovation is key to overcoming it. The AI for Equity Challenge invites innovators to bring forward their most visionary ideas, and we’ll support them with AWS resources — whether that’s computing power or advanced cloud technologies — to turn those ideas into reality. Our goal is to drive cloud innovation, support sustainability solutions, and make a meaningful impact on the climate crisis.”

– Dave Levy, Vice President of Worldwide Public Sector, AWS

This initiative is made possible through the support of ASDI, which provides researchers, scientists, and innovators with access to a wealth of publicly available datasets on AWS to advance their sustainability-focused work. The AI for Equity Challenge: Climate Action, Gender, and Health is open for submissions from September 23 to November 4, 2024. The two winning proposals from the first round will be announced on December 2, 2024, with the final AI challenge winners revealed on February 12, 2025.

Don’t miss your chance to be part of the solution. Visit https://zindi.africa/ai-equity-challenge to learn more and submit your proposal today. Together, we can harness the power of AI to create a more sustainable, equitable, and just world for generations to come.

Visit http://zindi.africa/ai-equity-challenge to learn more and participate.

This contest is hosted in collaboration with:


About the author

Joe Fontaine is the Product marketing lead for AWS AI Builder Programs. He is passionate about making machine learning more accessible to all through hands-on educational experiences. Outside of work he enjoys freeride mountain biking, aerial cinematography, and exploring the wilderness with his family.

Read More

Enhancing Just Walk Out technology with multi-modal AI

Enhancing Just Walk Out technology with multi-modal AI

Since its launch in 2018, Just Walk Out technology by Amazon has transformed the shopping experience by allowing customers to enter a store, pick up items, and leave without standing in line to pay. You can find this checkout-free technology in over 180 third-party locations worldwide, including travel retailers, sports stadiums, entertainment venues, conference centers, theme parks, convenience stores, hospitals, and college campuses. Just Walk Out technology’s end-to-end system automatically determines which products each customer chose in the store and provides digital receipts, eliminating the need for checkout lines.

In this post, we showcase the latest generation of Just Walk Out technology by Amazon, powered by a multi-modal foundation model (FM). We designed this multi-modal FM for physical stores using a transformer-based architecture similar to that underlying many generative artificial intelligence (AI) applications. The model will help retailers generate highly accurate shopping receipts using data from multiple inputs including a network of overhead video cameras, specialized weight sensors on shelves, digital floor plans, and catalog images of products. To put it in plain terms, a multi-modal model means using data from multiple inputs.

Our research and development (R&D) investments in state-of-the-art multi-modal FMs enables the Just Walk Out system to be deployed in a wide range of shopping situations with greater accuracy and at lower cost. Similar to large language models (LLMs) that generate text, the new Just Walk Out system is designed to generate an accurate sales receipt for every shopper visiting the store.

The challenge: Tackling complicated long-tail shopping scenarios

Because of their innovative checkout-free environment, Just Walk Out stores presented us with a unique technical challenge. Retailers and shoppers as well as Amazon demand nearly 100 percent checkout accuracy, even in the most complex shopping situations. These include unusual shopping behaviors that can create a long and complicated sequence of activities requiring additional effort to analyze what happened.

Previous generations of the Just Walk Out system utilized a modular architecture; it tackled complex shopping situations by breaking down the shopper’s visit into discrete tasks, such as detecting shopper interactions, tracking items, identifying products, and counting what is selected. These individual components were then integrated into sequential pipelines to enable the overall system functionality. While this approach produced highly accurate receipts, significant engineering efforts are required to address challenges in new, previously unencountered situations and complex shopping scenarios. This limitation restricted the scalability of this approach.

The solution: Just Walk Out multi-modal AI

To meet these challenges, we introduced a new multi-modal FM that we designed specifically for retail store environments, enabling Just Walk Out technology to handle complex real-world shopping scenarios. The new multi-modal FM further enhances the Just Walk Out system’s capabilities by generalizing more effectively to new store formats, products, and customer behaviors, which is crucial for scaling up Just Walk Out technology.

The incorporation of continuous learning enables the model training to automatically adapt and learn from new challenging scenarios as they arise. This self-improving capability helps ensure the system maintains high performance, even as shopping environments continue to evolve.

Through this combination of end-to-end learning and enhanced generalization, the Just Walk Out system can tackle a wider range of dynamic and complex retail settings. Retailers can confidently deploy this technology, knowing it will provide a frictionless checkout-free experience for their customers.

The following video shows our system’s architecture in action.

Key elements of our Just Walk Out multi-modal AI model include:

  • Flexible data inputs –The system tracks how users interact with products and fixtures, such as shelves or fridges. It primarily relies on multi-view video feeds as inputs, using weight sensors solely to track small items. The model maintains a digital 3D representation of the store and can access catalog images to identify products, even if the shopper returns items to the shelf incorrectly.
  • Multi-modal AI tokens to represent shoppers’ journeys – The multi-modal data inputs are processed by the encoders, which compress them into transformer tokens, the basic unit of input for the receipt model. This allows the model to interpret hand movements, differentiate between items, and accurately count the number of items picked up or returned to the shelf with speed and precision.
  • Continuously updating receipts – The system uses tokens to create digital receipts for each shopper. It can differentiate between different shopper sessions and dynamically updates each receipt as they pick up or return items.

Training the Just Walk Out FM

By feeding vast amounts of multi-modal data into the Just Walk Out FM, we found it could consistently generate—or, technically, “predict”— accurate receipts for shoppers. To improve accuracy, we designed over 10 auxiliary tasks, such as detection, tracking, image segmentation, grounding (linking abstract concepts to real-world objects), and activity recognition. All of these are learned within a single model, enhancing the model’s ability to handle new, never-before-seen store formats, products, and customer behaviors. This is crucial for bringing Just Walk Out technology to new locations.

AI model training—in which curated data is fed to selected algorithms—helps the system refine itself to produce accurate results. We quickly discovered we could accelerate the training of our model by using a data flywheel that continuously mines and labels high-quality data in a self-reinforcing cycle. The system is designed to integrate these progressive improvements with minimal manual intervention. The following diagram illustrates the process.

To train an FM effectively, we invested in a robust infrastructure that can efficiently process the massive amounts of data needed to train high-capacity neural networks that mimic human decision-making. We built the infrastructure for our Just Walk Out model with the help of several Amazon Web Services (AWS) services, including Amazon Simple Storage Service (Amazon S3) for data storage and Amazon SageMaker for training.

To train an FM effectively, we invested in a robust infrastructure that can efficiently process the massive amounts of data needed to train high-capacity neural networks that mimic human decision-making. We built the infrastructure for our Just Walk Out model with the help of several Amazon Web Services (AWS) services, including Amazon Simple Storage Service (Amazon S3) for data storage and Amazon SageMaker for training.

Here are some key steps we followed in training our FM:

  • Selecting challenging data sources – To train our AI model for Just Walk Out technology, we focus on training data from especially difficult shopping scenarios that test the limits of our model. Although these complex cases constitute only a small fraction of shopping data, they are the most valuable for helping the model learn from its mistakes.
  • Leveraging auto labeling – To increase operational efficiency, we developed algorithms and models that automatically attach meaningful labels to the data. In addition to receipt prediction, our automated labeling algorithms cover the auxiliary tasks, ensuring the model gains comprehensive multi-modal understanding and reasoning capabilities.
  • Pre-training the model – Our FM is pre-trained on a vast collection of multi-modal data across a diverse range of tasks, which enhances the model’s ability to generalize to new store environments never encountered before.
  • Fine-tuning the model – Finally, we refined the model further and used quantization techniques to create a smaller, more efficient model that uses edge computing.

As the data flywheel continues to operate, it will progressively identify and incorporate more high-quality, challenging cases to test the robustness of the model. These additional difficult samples are then fed into the training set, further enhancing the model’s accuracy and applicability across new physical store environments.

Conclusion

In this post, we showed how our multi-modal, AI system represents significant new possibilities for Just Walk Out technology. With our innovative approach, we are moving away from modular AI systems that rely on human-defined subcomponents and interfaces. Instead, we’re building simpler and more scalable AI systems that can be trained end-to-end. Although we’ve just scratched the surface, multi-modal AI has raised the bar for our already highly accurate receipt system and will enable us to improve the shopping experience at more Just Walk Out technology stores around the world.

Visit About Amazon to read the official announcement about the new multi-modal AI system and learn more about the latest improvements in Just Walk Out technology.

To find where you can find Just Walk Out technology locations, visit Just Walk Out technology locations near you. Learn more about how to power your store or venue with Just Walk Out technology by Amazon on the Just Walk Out technology product page.

Visit Build and scale the next wave of AI innovation on AWS to learn more about how AWS can reinvent customer experiences with the most comprehensive set of AI and ML services.


About the Authors

Tian Lan is a Principal Scientist at AWS. He currently leads the research efforts in developing the next-generation Just Walk Out 2.0 technology, transforming it into an end-to-end learned, store domain–focused multi-modal foundation model.

Chris Broaddus is a Senior Manager at AWS. He currently manages all the research efforts for Just Walk Out technology, including the multi-modal AI model and other projects, such as deep learning for human pose estimation and Radio Frequency Identification (RFID) receipt prediction.

Read More

Generate synthetic data for evaluating RAG systems using Amazon Bedrock

Generate synthetic data for evaluating RAG systems using Amazon Bedrock

Evaluating your Retrieval Augmented Generation (RAG) system to make sure it fulfils your business requirements is paramount before deploying it to production environments. However, this requires acquiring a high-quality dataset of real-world question-answer pairs, which can be a daunting task, especially in the early stages of development. This is where synthetic data generation comes into play. With Amazon Bedrock, you can generate synthetic datasets that mimic actual user queries, enabling you to evaluate your RAG system’s performance efficiently and at scale. With synthetic data, you can streamline the evaluation process and gain confidence in your system’s capabilities before unleashing it to the real world.

This post explains how to use Anthropic Claude on Amazon Bedrock to generate synthetic data for evaluating your RAG system. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Fundamentals of RAG evaluation

Before diving deep into how to evaluate a RAG application, let’s recap the basic building blocks of a naive RAG workflow, as shown in the following diagram.

Retrieval Augmented Generation

The workflow consists of the following steps:

  1. In the ingestion step, which happens asynchronously, data is split into separate chunks. An embedding model is used to generate embeddings for each of the chunks, which are stored in a vector store.
  2. When the user asks a question to the system, an embedding is generated from the questions and the top-k most relevant chunks are retrieved from the vector store.
  3. The RAG model augments the user input by adding the relevant retrieved data in context. This step uses prompt engineering techniques to communicate effectively with the large language model (LLM). The augmented prompt allows the LLM to generate an accurate answer to user queries.
  4. An LLM is prompted to formulate a helpful answer based on the user’s questions and the retrieved chunks.

Amazon Bedrock Knowledge Bases offers a streamlined approach to implement RAG on AWS, providing a fully managed solution for connecting FMs to custom data sources. To implement RAG using Amazon Bedrock Knowledge Bases, you begin by specifying the location of your data, typically in Amazon Simple Storage Service (Amazon S3), and selecting an embedding model to convert the data into vector embeddings. Amazon Bedrock then creates and manages a vector store in your account, typically using Amazon OpenSearch Serverless, handling the entire RAG workflow, including embedding creation, storage, management, and updates. You can use the RetrieveAndGenerate API for a straightforward implementation, which automatically retrieves relevant information from your knowledge base and generates responses using a specified FM. For more granular control, the Retrieve API is available, allowing you to build custom workflows by processing retrieved text chunks and developing your own orchestration for text generation. Additionally, Amazon Bedrock Knowledge Bases offers customization options, such as defining chunking strategies and selecting custom vector stores like Pinecone or Redis Enterprise Cloud.

A RAG application has many moving parts, and on your way to production you’ll need to make changes to various components of your system. Without a proper automated evaluation workflow, you won’t be able to measure the effect of these changes and will be operating blindly regarding the overall performance of your application.

To evaluate such a system properly, you need to collect an evaluation dataset of typical user questions and answers.

Moreover, you need to make sure you evaluate not only the generation part of the process but also the retrieval. An LLM without relevant retrieved context can’t answer the user’s question if the information wasn’t present in the training data. This holds true even if it has exceptional generation capabilities.

As such, a typical RAG evaluation dataset consists of the following minimum components:

  • A list of questions users will ask the RAG system
  • A list of corresponding answers to evaluate the generation step
  • The context or a list of contexts that contain the answer for each question to evaluate the retrieval

In an ideal world, you would take real user questions as a basis for evaluation. Although this is the optimal approach because it directly resembles end-user behavior, this is not always feasible, especially in the early stages of building a RAG system. As you progress, you should aim for incorporating real user questions into your evaluation set.

To learn more about how to evaluate a RAG application, see Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock.

Solution overview

We use a sample use case to illustrate the process by building an Amazon shareholder letter chatbot that allows business analysts to gain insights about the company’s strategy and performance over the past years.

For the use case, we use PDF files of Amazon’s shareholder letters as our knowledge base. These letters contain valuable information about the company’s operations, initiatives, and future plans. In a RAG implementation, the knowledge retriever might use a database that supports vector searches to dynamically look up relevant documents that serve as the knowledge source.

The following diagram illustrates the workflow to generate the synthetic dataset for our RAG system.

synthetic dataset generation workflow

The workflow includes the following steps:

  1. Load the data from your data source.
  2. Chunk the data as you would for your RAG application.
  3. Generate relevant questions from each document.
  4. Generate an answer by prompting an LLM.
  5. Extract the relevant text that answers the question.
  6. Evolve the question according to a specific style.
  7. Filter questions and improve the dataset either using domain experts or LLMs using critique agents.

We use a model from the Anthropic’s Claude 3 model family to extract questions and answers from our knowledge source, but you can experiment with other LLMs as well. Amazon Bedrock makes this effortless by providing standardized API access to many FMs.

For the orchestration and automation steps in this process, we use LangChain. LangChain is an open source Python library designed to build applications with LLMs. It provides a modular and flexible framework for combining LLMs with other components, such as knowledge bases, retrieval systems, and other AI tools, to create powerful and customizable applications.

The next sections walk you through the most important parts of the process. If you want to dive deeper and run it yourself, refer to the notebook on GitHub.

Load and prepare the data

First, load the shareholder letters using LangChain’s PyPDFDirectoryLoader and use the RecursiveCharacterTextSplitter to split the PDF documents into chunks. The RecursiveCharacterTextSplitter divides the text into chunks of a specified size while trying to preserve context and meaning of the content. It’s a good way to start when working with text-based documents. You don’t have to split your documents to create your evaluation dataset if your LLM supports a context window that is large enough to fit your documents, but you could potentially end up with a lower quality of generated questions due to the larger size of the task. You want to have the LLM generate multiple questions per document in this case.

from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders.pdf import PyPDFLoader, PyPDFDirectoryLoader

# Load PDF documents from directory
loader = PyPDFDirectoryLoader("./synthetic_dataset_generation/")  
documents = loader.load()
# Use recursive character splitter, works better for this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Split documents into small chunks
    chunk_size = 1500,  
    # Overlap chunks to reduce cutting sentences in half
    chunk_overlap  = 100,
    separators=["nn", "n", ".", " ", ""],
)

# Split loaded documents into chunks
docs = text_splitter.split_documents(documents)

To demonstrate the process of generating a corresponding question and answer and iteratively refining them, we use an example chunk from the loaded shareholder letters throughout this post:

“page_content=''Our AWS and Consumer businesses have had different demand trajectories during the pandemic. In thenfirst year of the pandemic, AWS revenue continued to grow at a rapid clip—30% year over year (“Y oY”) in2020 on a $35 billion annual revenue base in 2019—but slower than the 37% Y oY growth in 2019. [...] This shift by so many companies (along with the economy recovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.nConversely, our Consumer revenue grew dramatically in 2020. In 2020, Amazon’s North America andnInternational Consumer revenue grew 39% Y oY on the very large 2019 revenue base of $245 billion; and,this extraordinary growth extended into 2021 with revenue increasing 43% Y oY in Q1 2021. These areastounding numbers. We realized the equivalent of three years’ forecasted growth in about 15 months.nAs the world opened up again starting in late Q2 2021, and more people ventured out to eat, shop, and travel,”

Generate an initial question

To facilitate prompting the LLM using Amazon Bedrock and LangChain, you first configure the inference parameters. To accurately extract more extensive contexts, set the max_tokens parameter to 4096, which corresponds to the maximum number of tokens the LLM will generate in its output. Additionally, define the temperature parameter as 0.2 because the goal is to generate responses that adhere to the specified rules while still allowing for a degree of creativity. This value differs for different use cases and can be determined by experimentation.

import boto3

from langchain_community.chat_models import BedrockChat

# set up a Bedrock-runtime client for inferencing large language models
boto3_bedrock = boto3.client('bedrock-runtime')
# Choosing claude 3 Haiku due to cost and performance efficiency
claude_3_haiku = "anthropic.claude-3-haiku-20240307-v1:0"
# Set-up langchain LLM for implementing the synthetic dataset generation logic

# for each model provider there are different parameters to define when inferencing against the model
inference_modifier = {
                        "max_tokens": 4096,
                        "temperature": 0.2
                    }
                                         
llm = BedrockChat(model_id = claude_3_haiku,
                    client = boto3_bedrock, 
                    model_kwargs = inference_modifier 
                    )

You use each generated chunk to create synthetic questions that mimic those a real user might ask. By prompting the LLM to analyze a portion of the shareholder letter data, you generate relevant questions based on the information presented in the context. We use the following sample prompt to generate a single question for a specific context. For simplicity, the prompt is hardcoded to generate a single question, but you can also instruct the LLM to generate multiple questions with a single prompt.

The rules can be adapted to better guide the LLM in generating questions that reflect the types of queries your users would pose, tailoring the approach to your specific use case.

# Create a prompt template to generate a question a end-user could have about a given context
initial_question_prompt_template = PromptTemplate(
    input_variables=["context"],
    template="""
    <Instructions>
    Here is some context:
    <context>
    {context}
    </context>

    Your task is to generate 1 question that can be answered using the provided context, following these rules:

    <rules>
    1. The question should make sense to humans even when read without the given context.
    2. The question should be fully answered from the given context.
    3. The question should be framed from a part of context that contains important information. It can also be from tables, code, etc.
    4. The answer to the question should not contain any links.
    5. The question should be of moderate difficulty.
    6. The question must be reasonable and must be understood and responded by humans.
    7. Do not use phrases like 'provided context', etc. in the question.
    8. Avoid framing questions using the word "and" that can be decomposed into more than one question.
    9. The question should not contain more than 10 words, make use of abbreviations wherever possible.
    </rules>

    To generate the question, first identify the most important or relevant part of the context. Then frame a question around that part that satisfies all the rules above.

    Output only the generated question with a "?" at the end, no other text or characters.
    </Instructions>
    
    """)

The following is the generated question from our example chunk:

What is the price-performance improvement of AWS Graviton2 chip over x86 processors?

Generate answers

To use the questions for evaluation, you need to generate a reference answer for each of the questions to test against. With the following prompt template, you can generate a reference answer to the created question based on the question and the original source chunk:

# Create a prompt template that takes into consideration the the question and generates an answer
answer_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""
    <Instructions>
    <Task>
    <role>You are an experienced QA Engineer for building large language model applications.</role>
    <task>It is your task to generate an answer to the following question <question>{question}</question> only based on the <context>{context}</context></task>
    The output should be only the answer generated from the context.

    <rules>
    1. Only use the given context as a source for generating the answer.
    2. Be as precise as possible with answering the question.
    3. Be concise in answering the question and only answer the question at hand rather than adding extra information.
    </rules>

    Only output the generated answer as a sentence. No extra characters.
    </Task>
    </Instructions>
    
    Assistant:""")

The following is the generated answer based on the example chunk:

“The AWS revenue grew 37% year-over-year in 2021.”

Extract relevant context

To make the dataset verifiable, we use the following prompt to extract the relevant sentences from the given context to answer the generated question. Knowing the relevant sentences, you can check whether the question and answer are correct.

# To check whether an answer was correctly formulated by the large language model you get the relevant text passages from the documents used for answering the questions.
source_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""Human:
    <Instructions>
    Here is the context:
    <context>
    {context}
    </context>

    Your task is to extract the relevant sentences from the given context that can potentially help answer the following question. You are not allowed to make any changes to the sentences from the context.

    <question>
    {question}
    </question>

    Output only the relevant sentences you found, one sentence per line, without any extra characters or explanations.
    </Instructions>
    Assistant:""")

The following is the relevant source sentence extracted using the preceding prompt:

“This shift by so many companies (along with the economy recovering) helped re-accelerate AWS's revenue growth to 37% Y oY in 2021.”

Refine questions

When generating question and answer pairs from the same prompt for the whole dataset, it might appear that the questions are repetitive and similar in form, and therefore don’t mimic real end-user behavior. To prevent this, take the previously created questions and prompt the LLM to modify them according to the rules and guidance established in the prompt. By doing so, a more diverse dataset is synthetically generated. The prompt for generating questions tailored to your specific use case heavily depends on that particular use case. Therefore, your prompt must accurately reflect your end-users by setting appropriate rules or providing relevant examples. The process of refining questions can be repeated as many times as necessary.

# To generate a more versatile testing dataset you alternate the questions to see how your RAG systems performs against differently formulated of questions
question_compress_prompt_template = PromptTemplate(
    input_variables=["question"],
    template="""
    <Instructions>
    <role>You are an experienced linguistics expert for building testsets for large language model applications.</role>

    <task>It is your task to rewrite the following question in a more indirect and compressed form, following these rules:

    <rules>
    1. Make the question more indirect
    2. Make the question shorter
    3. Use abbreviations if possible
    </rules>

    <question>
    {question}
    </question>

    Your output should only be the rewritten question with a question mark "?" at the end. Do not provide any other explanation or text.
    </task>
    </Instructions>
    
    """)

Users of your application might not always use your solution in the same way, for instance using abbreviations when asking questions. This is why it’s crucial to develop a diverse dataset:

“AWS rev YoY growth in ’21?”

Automate dataset generation

To scale the process of the dataset generation, we iterate over all the chunks in our knowledge base; generate questions, answers, relevant sentences, and refinements for each question; and save them to a pandas data frame to prepare the full dataset.

To provide a clearer understanding of the structure of the dataset, the following table presents a sample row based on the example chunk used throughout this post.

Chunk Our AWS and Consumer businesses have had different demand trajectories during the pandemic. In thenfirst year of the pandemic, AWS revenue continued to grow at a rapid clip—30% year over year (“Y oY”) in2020 on a $35 billion annual revenue base in 2019—but slower than the 37% Y oY growth in 2019. […] This shift by so many companies (along with the economy recovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.nConversely, our Consumer revenue grew dramatically in 2020. In 2020, Amazon’s North America andnInternational Consumer revenue grew 39% Y oY on the very large 2019 revenue base of $245 billion; and,this extraordinary growth extended into 2021 with revenue increasing 43% Y oY in Q1 2021. These areastounding numbers. We realized the equivalent of three years’ forecasted growth in about 15 months.nAs the world opened up again starting in late Q2 2021, and more people ventured out to eat, shop, and travel,”
Question “What was the YoY growth of AWS revenue in 2021?”
Answer “The AWS revenue grew 37% year-over-year in 2021.”
Source Sentence “This shift by so many companies (along with the economy recovering) helped re-accelerate AWS’s revenue growth to 37% Y oY in 2021.”
Evolved Question “AWS rev YoY growth in ’21?”

On average, the generation of questions with a given context of 1,500–2,000 tokens results in an average processing time of 2.6 seconds for a set of initial question, answer, evolved question, and source sentence discovery using Anthropic Claude 3 Haiku. The generation of 1,000 sets of questions and answers costs approximately $2.80 USD using Anthropic Claude 3 Haiku. The pricing page gives a detailed overview of the cost structure. This results in a more time- and cost-efficient generation of datasets for RAG evaluation compared to the manual generation of these questions sets.

Improve your dataset using critique agents

Although using synthetic data is a good starting point, the next step should be to review and refine the dataset, filtering out or modifying questions that aren’t relevant to your specific use case. One effective approach to accomplish this is by using critique agents.

Critique agents are a technique used in natural language processing (NLP) to evaluate the quality and suitability of questions in a dataset for a particular task or application using a machine learning model. In our case, the critique agents are employed to assess whether the questions in the dataset are valid and appropriate for our RAG system.

The two main metrics evaluated by the critique agents in our example are question relevance and answer groundedness. Question relevance determines how relevant the generated question is for a potential user of our system, and groundedness assesses whether the generated answers are indeed based on the given context.

groundedness_check_prompt_template = PromptTemplate(
    input_variables=["context","question"],
    template="""
    <Instructions>
    You will be given a context and a question related to that context.

    Your task is to provide an evaluation of how well the given question can be answered using only the information provided in the context. Rate this on a scale from 1 to 5, where:

    1 = The question cannot be answered at all based on the given context
    2 = The context provides very little relevant information to answer the question
    3 = The context provides some relevant information to partially answer the question 
    4 = The context provides substantial information to answer most aspects of the question
    5 = The context provides all the information needed to fully and unambiguously answer the question

    First, read through the provided context carefully:

    <context>
    {context}
    </context>

    Then read the question:

    <question>
    {question}
    </question>

    Evaluate how well you think the question can be answered using only the context information. Provide your reasoning first in an <evaluation> section, explaining what relevant or missing information from the context led you to your evaluation score in only one sentence.

    Provide your evaluation in the following format:

    <rating>(Your rating from 1 to 5)</rating>
    
    <evaluation>(Your evaluation and reasoning for the rating)</evaluation>


    </Instructions>
    
    """)

relevance_check_prompt_template = PromptTemplate(
    input_variables=["question"],
    template="""
    <Instructions>
    You will be given a question related to Amazon Shareholder letters. Your task is to evaluate how useful this question would be for a financial and business analyst working in wallstreet.

    To evaluate the usefulness of the question, consider the following criteria:

    1. Relevance: Is the question directly relevant to your work? Questions that are too broad or unrelated to this domain should receive a lower rating.

    2. Practicality: Does the question address a practical problem or use case that analysts might encounter? Theoretical or overly academic questions may be less useful.

    3. Clarity: Is the question clear and well-defined? Ambiguous or vague questions are less useful.

    4. Depth: Does the question require a substantive answer that demonstrates understanding of financial topics? Surface-level questions may be less useful.

    5. Applicability: Would answering this question provide insights or knowledge that could be applied to real-world company evaluation tasks? Questions with limited applicability should receive a lower rating.

    Provide your evaluation in the following format:

    <rating>(Your rating from 1 to 5)</rating>
    
    <evaluation>(Your evaluation and reasoning for the rating)</evaluation>

    Here is the question:

    {question}
    </Instructions>
    """)

Evaluating the generated questions helps with assessing the quality of a dataset and eventually the quality of the evaluation. The generated question was rated very well:

Groundedness score: 5
“The context provides the exact information needed to answer the question[...]”
Relevance score: 5
“This question is highly relevant and useful for a financial and business analyst working on Wall Street. AWS (Amazon Web Services) is a key business segment for Amazon, and understanding its year-over-year (YoY) revenue growth is crucial for evaluating the company's overall performance and growth trajectory.[...].

Best practices for generating synthetic datasets

Although generating synthetic datasets offers numerous benefits, it’s essential to follow best practices to maintain the quality and representativeness of the generated data:

  • Combine with real-world data – Although synthetic datasets can mimic real-world scenarios, they might not fully capture the nuances and complexities of actual human interactions or edge cases. Combining synthetic data with real-world data can help address this limitation and create more robust datasets.
  • Choose the right model – Choose different LLMs for dataset creation than used for RAG generation in order to avoid self-enhancement bias.
  • Implement robust quality assurance – You can employ multiple quality assurance mechanisms, such as critique agents, human evaluation, and automated checks, to make sure the generated datasets meet the desired quality standards and accurately represent the target use case.
  • Iterate and refine – You should treat synthetic dataset generation as an iterative process. Continuously refine and improve the process based on feedback and performance metrics, adjusting parameters, prompts, and quality assurance mechanisms as needed.
  • Domain-specific customization – For highly specialized or niche domains, consider fine-tuning the LLM (such as with PEFT or RLHF) by injecting domain-specific knowledge to improve the quality and accuracy of the generated datasets.

Conclusion

The generation of synthetic datasets is a powerful technique that can significantly enhance the evaluation process of your RAG system, especially in the early stages of development when real-world data is scarce or difficult to obtain. By taking advantage of the capabilities of LLMs, this approach enables the creation of diverse and representative datasets that accurately mimic real human interactions, while also providing the scalability necessary to meet your evaluation needs.

Although this approach offers numerous benefits, it’s essential to acknowledge its limitations. Firstly, the quality of the synthetic dataset heavily relies on the performance and capabilities of the underlying language model, knowledge retrieval system, and quality of prompts used for generation. Being able to understand and adjust the prompts for generation is crucial in this process. Biases and limitations present in these components may be reflected in the generated dataset. Additionally, capturing the full complexity and nuances of real-world interactions can be challenging because synthetic datasets may not account for all edge cases or unexpected scenarios.

Despite these limitations, generating synthetic datasets remains a valuable tool for accelerating the development and evaluation of RAG systems. By streamlining the evaluation process and enabling iterative development cycles, this approach can contribute to the creation of better-performing AI systems.

We encourage developers, researchers, and enthusiasts to explore the techniques mentioned in this post and the accompanying GitHub repository and experiment with generating synthetic datasets for your own RAG applications. Hands-on experience with this technique can provide valuable insights and contribute to the advancement of RAG systems in various domains.


About the Authors

Johannes Langer is a Senior Solutions Architect at AWS, working with enterprise customers in Germany. Johannes is passionate about applying machine learning to solve real business problems. In his personal life, Johannes enjoys working on home improvement projects and spending time outdoors with his family.

Lukas WenzelLukas Wenzel is a Solutions Architect at Amazon Web Services in Hamburg, Germany. He focuses on supporting software companies building SaaS architectures. In addition to that, he engages with AWS customers on building scalable and cost-efficient generative AI features and applications. In his free-time, he enjoys playing basketball and running.

David BoldtDavid Boldt is a Solutions Architect at Amazon Web Services. He helps customers build secure and scalable solutions that meet their business needs. He is specialized in machine learning to address industry-wide challenges, using technologies to drive innovation and efficiency across various sectors.

Read More

Making traffic lights more efficient with Amazon Rekognition

Making traffic lights more efficient with Amazon Rekognition

State and local agencies spend approximately $1.23 billion annually to operate and maintain signalized traffic intersections. On the other end, traffic congestion at intersections costs drivers about $22 billion annually. Implementing an artificial intelligence (AI)-powered detection-based solution can significantly mitigate congestion at intersections and reduce operation and maintenance costs. In this blog post, we show you how Amazon Rekognition (an AI technology) can mitigate congestion at traffic intersections and reduce operations and maintenance costs.

State and local agencies rely on traffic signals to facilitate the safe flow of traffic involving cars, pedestrians, and other users. There are two main types of traffic lights: fixed and dynamic. Fixed traffic lights are timed lights controlled by electro-mechanical signals that switch and hold the lights based on a set period of time. Dynamic traffic lights are designed to adjust based on traffic conditions by using detectors both underneath the surface of the road and above the traffic light. However, as population continues to rise, there are more cars, bikes, and pedestrians using the streets. This increase in road users can negatively impact the efficiency of either of the two traffic systems.

Solution overview

At a high level, our solution uses Amazon Rekognition to automatically detect objects (cars, bikes, and so on) and scenes at an intersection. After detection, Amazon Rekognition creates bounding boxes around each object (such as a vehicle) and calculates the distance between each object (in this scenario, that would be the distance between vehicles detected at an intersection). Results from the calculated distances are used programmatically to stop or allow the flow of traffic, thus reducing congestion. All of this happens without human intervention.

Prerequisties

The proposed solution can be implemented in a personal AWS environment using the code that we provide. However, there are a few prerequisites that must in place. Before running the labs in this post, ensure you have the following:

  1. An AWS account. Create one if necessary.
  2. The appropriate AWS Identity and Access Management (IAM) permissions to access services used in the lab. If this is your first time setting up an AWS account, see the IAM documentation for information about configuring IAM.
  3. A SageMaker Studio Notebook. Create one if necessary.

Solution architecture

The following diagram illustrates the lab’s architecture:

This solution uses the following AI and machine learning (AI/ML), serverless, and managed technologies:

  • Amazon SageMaker, a fully managed machine learning service that enables data scientists and developers to build, train and deploy machine learning applications.
  • Amazon Rekognition supports adding image and video analysis to your applications.
  • IAM grants authentication and authorization that allows resources in the solution to talk to each other.

To recap how the solution works

  1. Traffic intersection video footage is uploaded to your SageMaker environment from an external device.
  2. A Python function uses CV2 to split the video footage into image frames.
  3. The function makes a call to Amazon Rekognition when the image frames are completed.
  4. Amazon Rekognition analyzes each frame and creates bounding boxes around each vehicle it detects.
  5. The function counts the bounding boxes and changes the traffic signal based on the number of cars it detects using pre-defined logic.

Solution walkthrough

Now, let’s walk through implementing the solution.

Configure SageMaker:

  1. Choose Domains in the navigation pane, and then select your domain name.
  2. Find and copy the SageMaker Execution Role.
  3. Go to the IAM console and choose Roles in the navigation pane and paste the SageMaker Execution Role you copied in the preceding step.

Enable SageMaker to interact with Amazon Rekognition:

Next, enable SageMaker to interact with Amazon Rekognition using the SageMaker execution role.

  1. In the SageMaker console, select your SageMaker execution role and choose Add permission and then choose Attach policies.
  2. In the search bar, enter and select AmazonRekognitionFullAccess Policy. See the following figure.

With the IAM permissions configured, you can run the notebook in SageMaker with access to Amazon Rekognition for the video analysis.

Download the Rekognition Notebook and traffic intersection data to your local environment. On the Amazon Sagemaker Studio, upload the notebook and data you downloaded.

Code walkthrough:

This lab uses OpenCv and Boto3 to prepare the SageMaker environment. OpenCv is an open source library with over 250 algorithms for computer vision analysis. Boto3 is the AWS SDK for Python that helps you to integrate AWS services with applications or scripts written in Python.

  1. First, we import OpenCv and Boto3 package. The next cell of codes builds a function for analyzing the video. We will walk through key components of the function. The function starts by creating a frame for the video to be analyzed.
  2. The frame is written to a new video writer file with an MP4 extension. The function also loops through the file and, if the video doesn’t have a frame, the function converts it to a JPEG file. Then the code define and identify traffic lanes using bounding boxes. Amazon Rekognition image operations place bounding boxes around images detected for later analysis.
  3. The function captures the video frame and sends it to Amazon Rekognition to analyze images in the video using the bounding boxes. The model uses bounding boxes to detect and classify captured images (cars, pedestrians, and so on) in the video. The code then detects whether a car is in the video sent to Amazon Rekognition. A bounding box is generated for each car detected in the video.
  4. The size and position of the car is computed to accurately detect its position. After computing the size and position of the car, the model checks whether the car is in a detected lane. After determining whether there are cars in one of the detected lanes, the model counts the numbers of detected cars in the lane.
  5. The results from detecting and computing the size, position and numbers of cars in a lane are written to a new file in the rest of the function.
  6. Writing the outputs to a new file, a few geometry computations are done to determine the details of detected objects. For example, polygons are used to determine the size of objects.
  7. With the function completely built, the next step is running the function and with a minimum confidence sore of 95% using a test video.
  8. The last line of codes allow you to download the video from the directory in SageMaker to check the results and confidence level of the output.

Costs

The logic behind our cost estimates is put at $6,000 per intersection with the assumption one frame per second using four cameras with a single SageMaker notebook for each intersection. One important callout is that not every intersection is a 4-way intersection. Implementing this solution on more populated traffic areas will increase the overall flow of traffic.

Cost breakdown and details

Service Description First month cost First 12 months cost
Amazon SageMaker Studio notebooks

·  Instance name: ml.t3.medium

·  Number of data scientists: 1

·  Number of Studio notebook instances per data scientist: 1

·  Studio notebook hours per day: 24

·  Studio notebook days per month: 30

$36 $432
Amazon Rekognition Number of images processed with labels API calls per month: 345,600 per month $345.60 $4,147.20
Amazon Simple Storage Service (Amazon S3) (Standard storage class)

·  S3 Standard storage: 4,320 GB per month

·  PUT, COPY, POST, and LIST requests to S3 Standard per month: 2,592,000

$112.32 $1,347.84
Total estimate per year $5,927.04

However, this is an estimate, and you may incur additional costs depending on customization. For additional information on costs, visit the AWS pricing page for the services covered in the solution architecture. If you have questions, reach out to the AWS team for a more technical and focused discussion.

Clean up

Delete all AWS resources created for this solution that are no longer needed to avoid future charges.

Conclusion

This post provides a solution to make traffic lights more efficient using Amazon Rekognition. The solution proposed in this post can mitigate costs, support road safety, and reduce congestion at intersections. All of these make traffic management more efficient. We strongly recommend learning more about how Amazon Rekognition can help accelerate other image recognition and video analysis tasks by visiting the Amazon Rekognition Developer Guide.


About the authors

Hao Lun Colin Chu is an innovative Solution Architect at AWS, helping partners and customers leverage cutting-edge cloud technologies to solve complex business challenges. With extensive expertise in cloud migrations, modernization, and AI/ML, Colin advises organizations on translating their needs into transformative AWS-powered solutions. Driven by a passion for using technology as a force for good, he is committed to delivering solutions that empower organizations and improve people’s lives. Outside of work, he enjoys playing drum, volleyball and board games!

Joe Wilson is a Solutions Architect at Amazon Web Services supporting nonprofit organizations. He provides technical guidance to nonprofit organizations seeking to securely build, deploy or expand applications in the cloud. He is passionate about leveraging data and technology for social good. Joe background is in data science and international development. Outside work, Joe loves spending time with his family, friends and chatting about innovation and entrepreneurship.

Read More

Accelerate development of ML workflows with Amazon Q Developer in Amazon SageMaker Studio

Accelerate development of ML workflows with Amazon Q Developer in Amazon SageMaker Studio

Machine learning (ML) projects are inherently complex, involving multiple intricate steps—from data collection and preprocessing to model building, deployment, and maintenance. Data scientists face numerous challenges throughout this process, such as selecting appropriate tools, needing step-by-step instructions with code samples, and troubleshooting errors and issues. These iterative challenges can hinder progress and slow down projects. Fortunately, generative AI-powered developer assistants like Amazon Q Developer have emerged to help data scientists streamline their workflows and fast-track ML projects, allowing them to save time and focus on strategic initiatives and innovation.

Amazon Q Developer is fully integrated with Amazon SageMaker Studio, an integrated development environment (IDE) that provides a single web-based interface for managing all stages of ML development. You can use this natural language assistant from your SageMaker Studio notebook to get personalized assistance using natural language. It offers tool recommendations, step-by-step guidance, code generation, and troubleshooting support. This integration simplifies your ML workflow and helps you efficiently build, train, and deploy ML models without needing to leave SageMaker Studio to search for additional resources or documentation.

In this post, we present a real-world use case analyzing the Diabetes 130-US hospitals dataset to develop an ML model that predicts the likelihood of readmission after discharge. Throughout this exercise, you use Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle and experience firsthand how this natural language assistant can help even the most experienced data scientists or ML engineers streamline the development process and accelerate time-to-value.

Solution overview

If you’re an AWS Identity and Access Management (IAM) and AWS IAM Identity Center user, you can use your Amazon Q Developer Pro tier subscription within Amazon SageMaker. Administrators can subscribe users to the Pro Tier on the Amazon Q Developer console, enable Pro Tier in the SageMaker domain settings, and provide the Amazon Q Developer profile Amazon Resource Name (ARN). The Pro Tier offers unlimited chat and inline code suggestions. Refer to Set up Amazon Q Developer for your users for detailed instructions.

If you don’t have a Pro Tier subscription but want to try out the capability, you can access the Amazon Q Developer Free Tier by adding the relevant policies to your SageMaker service roles. Admins can navigate to the IAM console, search for the SageMaker Studio role, and add the policy outlined in Set up Amazon Q Developer for your users. The Free Tier is available for both IAM and IAM Identity Center users.

To start our ML project predicting the probability of readmission for diabetes patients, you need to download the Diabetes 130-US hospitals dataset. This dataset contains 10 years (1999–2008) of clinical care data at 130 US hospitals and integrated delivery networks. Each row represents hospital records of patients diagnosed with diabetes, who underwent laboratory, and more.

At the time of writing, Amazon Q Developer support in SageMaker Studio is only available in JupyterLab spaces. Amazon Q Developer is not supported for shared spaces.

Amazon Q Developer chat

After you have uploaded the data to SageMaker Studio, you can start working on your ML problem of reducing readmission rates for diabetes patients. Begin by using the chat capability next to your JupyterLab notebook. You can ask questions like generating code to parse the Diabetes 130-US hospitals data, how you should formulate this ML problem, and develop a plan to build an ML model that predicts the likelihood of readmission after discharge. Amazon Q Developer uses AI to provide code recommendations, and this is non-deterministic. The results you get may be different from the ones shown in the following screenshot.

Amazon Q Developer SageMaker Studio integration

You can ask Amazon Q Developer to help you plan out the ML project. In this case, we want the assistant to show us how to train a random forest classifier using the Diabetes 130-US dataset. Enter the following prompt into the chat, and Amazon Q Developer will generate a plan. If code is generated, you can use the UI to directly insert the code into your notebook.

I have diabetic_data.csv file containing training data about whether a diabetic patient was readmitted after discharge. I want to use this data to train a random forest classifier using scikit-learn. Can you list out the steps to build this model?

You can ask Amazon Q Developer to help you generate code for specific tasks by inserting the following prompt:

Create a function that takes in a pandas DataFrame and performs one-hot encoding for the gender, race, A1Cresult, and max_glu_serum columns.

You can also ask Amazon Q Developer to explain existing code and troubleshoot for common errors. Just choose the cell with the error and enter /fix in the chat.

The following is a full list of the shortcut commands:

  • /help – Display this help message
  • /fix – Fix an error cell selected in your notebook
  • /clear – Clear the chat window
  • /export – Export chat history to a Markdown file

To get the most out of your Amazon Q Developer chat, the following best practices are recommended when crafting your prompt:

  • Be direct and specific – Ask precise questions. For instance, instead of a vague query about AWS services, try: “Can you provide sample code using the SageMaker Python SDK library to train an XGBoost model in SageMaker?” Specificity helps the assistant understand exactly what you need, resulting in more accurate and useful responses.
  • Provide contextual information – The more context you offer, the better. This allows Amazon Q Developer to tailor its responses to your specific situation. For example, don’t just ask for code to prepare data. Instead, provide the first three rows of your data to get better code suggestions with fewer changes needed.
  • Avoid sensitive topics – Amazon Q Developer is designed with guardrail controls. It’s best to avoid questions related to security, billing information of your account, or other sensitive subjects.

Following these guidelines can help you maximize the value of Amazon Q Developer’s AI-powered code recommendations and streamline your ML projects.

Amazon Q Developer inline code suggestions

You can also get real-time code suggestions as you type in the JupyterLab notebook, offering context-aware recommendations based on your existing code and comments to streamline the coding process. In the following example, we demonstrate how to use the inline code suggestions feature to generate code blocks for various data science tasks: from data exploration to feature engineering, training a random forest model, evaluating the model, and finally deploying the model to predict the probability of readmission for diabetes patients.

The following figure shows the list of keyboard shortcuts to interact with Amazon Q Developer.

Let’s start with data exploration.

We first import some of the necessary Python libraries, like pandas and NumPy. Add the following code into the first code cell of Jupyter Notebook, and then run the cell:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In the next code cell, add the following comment, and before running the cell, press Enter and Tab. You can watch the bottom status bar to see Amazon Q Developer working to generate code suggestions.

# read 'diabetic-readmission.csv'

You can also ask Amazon Q Developer to create a visualization:

# create a bar chart from df that shows counts of patients by 'race' and 'gender' with a title of 'patients by race and gender' 

Now you can perform feature engineering to prepare the model for training.

The dataset provided has a number of categorical features, which need to be converted to numerical features, as well as missing data. In the next code cell, add the following comment, and press TAB to see how Amazon Q Developer can help:

# perform one-hot encoding for gender, race, a1c_result, and max_glu_serum columns 

Lastly, you can use Amazon Q Developer to help you create a simple ML model, random forest classifier, using scikit-learn.

Amazon Q Developer in SageMaker data policy

When using Amazon Q Developer in SageMaker Studio, no customer content is used for service improvement, regardless of whether you use the Free Tier or Pro Tier. For IDE-level telemetry sharing, Amazon Q Developer may track your usage of the service, such as how many questions you ask and whether you accept or reject a recommendation. This information doesn’t contain customer content or personally identifiable information, such as your IP address. If you prefer to opt out of IDE-level telemetry, complete the following steps to opt out of sharing usage data with Amazon Q Developer:

  1. On the Settings menu, choose Settings Editor.

Amazon Q Developer settings editor

  1. Uncheck the option Share usage data with Amazon Q Developer.

Amazon Q Developer data usage policy

Alternatively, an ML platform admin can disable this option for all users inside JupyterLab by default with the help of lifecycle configuration scripts. To learn more, see Using lifecycle configurations with JupyterLab. To disable data sharing with Amazon Q Developer by default for all users within a SageMaker Studio domain, complete the following steps:

  1. On the SageMaker console, choose Lifecycle configurations under Admin configurations in the navigation pane.
  2. Choose Create configuration.

Amazon SageMaker lifecycle configuration

  1. For Name, enter a name.
  2. In the Scripts section, create a lifecycle configuration script that disables the shareCodeWhispererContentWithAWS settings flag for the jupyterlab-q extension:
#!/bin/bash
mkdir -p /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/
cat<<EOL> /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/completer.jupyterlab-settings
{
"shareCodeWhispererContentWithAWS": false,   
"suggestionsWithCodeReferences": true,   
"codeWhispererTelemetry": false,
"codeWhispererLogLevel": "ERROR"
}
EOL

Amazon SageMaker lifecycle configuration script

  1. Attach the disable-q-data-sharing lifecycle configuration to a domain.
  2. Optionally, you can force the lifecycle configuration to run with the Run by default

Attach lifecycle configuration

  1. Use this lifecycle configuration when creating a JupyterLab space.

It will be selected by default if the configuration is set to Run by default.

Lifecycle configuration script run by default Jupyter space

The configuration should run almost instantaneously and disable the Share usage data with Amazon Q Developer option in your JupyterLab space on startup.

Disable share data usage

Clean up

To avoid incurring AWS charges after testing this solution, delete the SageMaker Studio domain.

Conclusion

In this post, we walked through a real-world use case and developed an ML model that predicts the likelihood of readmission after discharge for patients in the Diabetes 130-US hospitals dataset. Throughout this exercise, we used Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle, demonstrating how this developer assistant can help streamline the development process and accelerate time-to-value, even for experienced ML practitioners. You have access to Amazon Q Developer in all AWS Regions where SageMaker is generally available. Get started with Amazon Q Developer in SageMaker Studio today to access the generative AI–powered assistant.

The assistant is available for all Amazon Q Developer Pro and Free Tier users. For pricing information, see Amazon Q Developer pricing.


About the Authors

James WuJames Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Lauren MullennexLauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. She has a decade of experience in DevOps, infrastructure, and ML. Her areas of focus include computer vision, MLOps/LLMOps, and generative AI.

Shibin Michaelraj is a Sr. Product Manager with the Amazon SageMaker team. He is focused on building AI/ML-based products for AWS customers.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Bhadrinath Pani is a Software Development Engineer at Amazon Web Services, working on Amazon SageMaker interactive ML products, with over 12 years of experience in software development across domains like automotive, IoT, AR/VR, and computer vision. Currently, his main focus is on developing machine learning tools aimed at simplifying the experience for data scientists. In his free time, he enjoys spending time with his family and exploring the beauty of the Pacific Northwest.

Read More

Govern generative AI in the enterprise with Amazon SageMaker Canvas

Govern generative AI in the enterprise with Amazon SageMaker Canvas

With the rise of powerful foundation models (FMs) powered by services such as Amazon Bedrock and Amazon SageMaker JumpStart, enterprises want to exercise granular control over which users and groups can access and use these models. This is crucial for compliance, security, and governance.

Launched in 2021, Amazon SageMaker Canvas is a visual point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without writing any code. SageMaker Canvas provides a no-code interface to consume a broad range of FMs from both services in an off-the-shelf fashion, as well as to customize model responses using a Retrieval Augmented Generation (RAG) workflow using Amazon Kendra as a knowledge base or fine-tune using a labeled dataset. This simplifies access to generative artificial intelligence (AI) capabilities to business analysts and data scientists without the need for technical knowledge or having to write code, thereby accelerating productivity.

In this post, we analyze strategies for governing access to Amazon Bedrock and SageMaker JumpStart models from within SageMaker Canvas using AWS Identity and Access Management (IAM) policies. You’ll learn how to create granular permissions to control the invocation of ready-to-use Amazon Bedrock models and prevent the provisioning of SageMaker endpoints with specified SageMaker JumpStart models. We provide code examples tailored to common enterprise governance scenarios. By the end, you’ll understand how to lock down access to generative AI capabilities based on your organizational requirements, maintaining secure and compliant use of cutting-edge AI within the no-code SageMaker Canvas environment.

This post covers an increasingly important topic as more powerful AI models become available, making it a valuable resource for ML operators, security teams, and anyone governing AI in the enterprise.

Solution overview

The following diagram illustrates the solution architecture.

ml-17149-architecture

The architecture of SageMaker Canvas allows business analysts and data scientists to interact with ML models without writing any code. However, managing access to these models is crucial for maintaining security and compliance. When a user interacts with SageMaker Canvas, the operations they perform, such as invoking a model or creating an endpoint, are run by the SageMaker service role. SageMaker user profiles can either inherit the default role from the SageMaker domain or have a user-specific role.

By customizing the policies attached to this role, you can control what actions are permitted or denied, thereby governing the access to generative AI capabilities. As part of this post, we discuss which IAM policies to use for this role to control operations within SageMaker Canvas, such as invoking models or creating endpoints, based on enterprise organizational requirements. We analyze two patterns for both Amazon Bedrock models and SageMaker JumpStart models: limiting access to all models from a service or limiting access to specific models.

Govern Amazon Bedrock access to SageMaker Canvas

In order to use Amazon Bedrock models, SageMaker Canvas calls the following Amazon Bedrock APIs:

  • bedrock:InvokeModel – Invokes the model synchronously
  • bedrock:InvokeModelWithResponseStream – Invokes the model synchronously, with the response being streamed over a socket, as illustrated in the following diagram

Additionally, SageMaker Canvas can call the bedrock:FineTune API to fine-tune large language models (LLMs) with Amazon Bedrock. At the time of writing, SageMaker Canvas only allows fine-tuning of Amazon Titan models.

To use a specific LLM from Amazon Bedrock, SageMaker Canvas uses the model ID of the chosen LLM as part of the API calls. At the time of writing, SageMaker Canvas supports the following models from Amazon Bedrock, grouped by model provider:

  • AI21
    • Jurassic-2 Mid: j2-mid-v1
    • Jurassic-2 Ultra : j2-ultra-v1
  • Amazon
    • Titan: titan-text-premier-v1:*
    • Titan Large: titan-text-lite-v1
    • Titan Express: titan-text-express-v1
  • Anthropic
    • Claude 2: claude-v2
    • Claude Instant: claude-instant-v1
  • Cohere
    • Command Text: command-text-*
    • Command Light: command-light-text-*
  • Meta
    • Llama 2 13B: llama2-13b-chat-v1
    • Llama 2 70B: llama2-70b-chat-v1

For the complete list of models IDs for Amazon Bedrock, see Amazon Bedrock model IDs.

Limit access to all Amazon Bedrock models

To restrict access to all Amazon Bedrock models, you can modify the SageMaker role to explicitly deny these APIs. This makes sure no user can invoke any Amazon Bedrock model through SageMaker Canvas.

The following is an example IAM policy to achieve this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": "*"
        }
    ]
}

The policy uses the following parameters:

  • "Effect": "Deny" specifies that the following actions are denied
  • "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"] specifies the Amazon Bedrock APIs that are denied
  • "Resource": "*" indicates that the denial applies to all Amazon Bedrock models

Limit access to specific Amazon Bedrock models

You can extend the preceding IAM policy to restrict access to specific Amazon Bedrock models by specifying the model IDs in the Resources section of the policy. This way, users can only invoke the allowed models.

The following is an example of the extended IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:<region-or-*>::foundation-model/<model-id-1>",
                "arn:aws:bedrock:<region-or-*>::foundation-model/<model-id-2>"
            ]
        }
    ]
}

In this policy, the Resource array lists the specific Amazon Bedrock models that are denied. Provide the AWS Region, account, and model IDs appropriate for your environment.

Govern SageMaker JumpStart access to SageMaker Canvas

For SageMaker Canvas to be able to consume LLMs from SageMaker JumpStart, it must perform the following operations:

  1. Select the LLM from SageMaker Canvas or from the list of JumpStart Model IDs (link below).
  2. Create an endpoint configuration and Deploy the LLM on a real-time endpoint.
  3. Invoke the endpoint to generate the prediction.

The following diagram illustrates this workflow.

For a list of available JumpStart model IDs, see JumpStart Available Model Table. At the time of writing, SageMaker Canvas supports the following model IDs:

  • huggingface-textgeneration1-mpt-7b-*
  • huggingface-llm-mistral-*
  • meta-textgeneration-llama-2-*
  • huggingface-llm-falcon-*
  • huggingface-textgeneration-dolly-v2-*
  • huggingface-text2text-flan-t5-*

To identify the right model from SageMaker JumpStart, SageMaker Canvas passes aws:RequestTag/sagemaker-sdk:jumpstart-model-id as part of the endpoint configuration. To learn more about other techniques to limit access to SageMaker JumpStart models using IAM permissions, refer to Manage Amazon SageMaker JumpStart foundation model access with private hubs.

Configure permissions to deploy endpoints through the UI

On the SageMaker domain configuration page on the SageMaker page of the AWS Management Console, you can configure SageMaker Canvas to be able to deploy SageMaker endpoints. This option also enables deployment of real-time endpoints for classic ML models, such as time series forecasting or classification. To enable model deployment, complete the following steps:

  1. On the Amazon SageMaker console, navigate to your domain.
  2. On the Domain details page, choose the App Configurations

  1. In the Canvas section, choose Edit.

  1. Turn on Enable direct deployment of Canvas models in the ML Ops configuration

Limit access to all SageMaker JumpStart models

To limit access to all SageMaker JumpStart models, configure the SageMaker role to block the CreateEndpointConfig and CreateEndpoint APIs on any SageMaker JumpStart Model ID. This prevents the creation of endpoints using these models. See the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint"
            ],
            "Resource": "*",
"Condition": {
                "Null": {
                    "aws:RequestTag/sagemaker-sdk:jumpstart-model-id":”*”
		    }
		}
        }
    ]
}

This policy uses the following parameters:

  • "Effect": "Deny" specifies that the following actions are denied
  • "Action": ["sagemaker:CreateEndpointConfig", "sagemaker:CreateEndpoint"] specifies the SageMaker APIs that are denied
  • The "Null" condition operator in AWS IAM policies is used to check whether a key exists or not. It does not check the value of the key, only its presence or absence
  • "aws:RequestTag/sagemaker-sdk:jumpstart-model-id":”*” indicates that the denial applies to all SageMaker JumpStart models

Limit access and deployment for specific SageMaker JumpStart models

Similar to Amazon Bedrock models, you can limit access to specific SageMaker JumpStart models by specifying their model IDs in the IAM policy. To achieve this, an administrator needs to restrict users from creating endpoints with unauthorized models. For example, to deny access to Hugging Face FLAN T5 models and MPT models, use the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/sagemaker-sdk:jumpstart-model-id": [
                        "huggingface-textgeneration1-mpt-7b-*",
                        "huggingface-text2text-flan-t5-*"
                    ]
                }
            }
        }
    ]
}

In this policy, the "StringLike" condition allows for pattern matching, enabling the policy to apply to multiple model IDs with similar prefixes.

Clean up

To avoid incurring future workspace instance charges, log out of SageMaker Canvas when you’re done using the application. Optionally, you can configure SageMaker Canvas to automatically shut down when idle.

Conclusion

In this post, we demonstrated how SageMaker Canvas invokes LLMs powered by Amazon Bedrock and SageMaker JumpStart, and how enterprises can govern access to these models, whether you want to limit access to specific models or to any model from either service. You can combine the IAM policies shown in this post in the same IAM role to provide complete control.

By following these guidelines, enterprises can make sure their use of generative AI models is both secure and compliant with organizational policies. This approach not only safeguards sensitive data but also empowers business analysts and data scientists to harness the full potential of AI within a controlled environment.

Now that your environment is configured according to the enterprise standard, we suggest reading the following posts to learn what SageMaker Canvas enables you to do with generative AI:


About the Authors

Davide Gallitelli is a Senior Specialist Solutions Architect GenAI/ML. He is Italian, based in Brussels, and works closely with customer all around the world on Generative AI workloads and Low-Code No-Code ML technology. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Lijan Kuniyil is a Senior Technical Account Manager at AWS. Lijan enjoys helping AWS enterprise customers build highly reliable and cost-effective systems with operational excellence. Lijan has more than 25 years of experience in developing solutions for financial and consulting companies.

Saptarshi Banerjee serves as a Senior Partner Solutions Architect at AWS, collaborating closely with AWS Partners to design and architect mission-critical solutions. With a specialization in generative AI, AI/ML, serverless architecture, and cloud-based solutions, Saptarshi is dedicated to enhancing performance, innovation, scalability, and cost-efficiency for AWS Partners within the cloud ecosystem.

Read More

Transforming home ownership with Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock: Rocket Mortgage’s journey with AWS

Transforming home ownership with Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock: Rocket Mortgage’s journey with AWS

This post is co-written with Josh Zook and Alex Hamilton from Rocket Mortgage.

Rocket Mortgage, America’s largest retail mortgage lender, revolutionizes homeownership with Rocket Logic – Synopsis, an AI tool built on AWS.  This innovation has transformed client interactions and operational efficiency through the use of Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock. Through Rocket Logic – Synopsis, Rocket achieved remarkable results: automating post call interaction wrap-up resulting in a projected 40,000 team hours saved annually, and a 10% increase in first-call resolutions saved 20,000 hours annually. In addition to Rocket Logic – Synopsis, 70% of servicing clients choose to self-serve over Gen AI powered mediums such as IVR. Rocket’s “start small, launch and learn, scale fast” approach paired with AWS enablement proved effective, deploying 30,000 servicing calls in 10 days, then scaling four times greater for operations and six times greater for banking.

This post offers insights for businesses aiming to use artificial intelligence (AI) and cloud technologies to enhance customer service and streamline operations. We share how Rocket Mortgage’s use of AWS services set a new industry standard and demonstrate how to apply these principles to transform your client interactions and processes with speed and scalability.

Opportunities for innovation

Rocket services over 2.6 million clients, with 65 million voice interactions and 147 million voice minutes inclusive of banking, operations, and servicing, and generates and processes over 10 PB of data. By focusing on three key personas—clients, client advocates, and business leaders or senior leadership—Rocket aims to create a solution that enhances experiences across the board.

At the heart of this transformation is the recognition that clients value their time, but also benefit from hyper-personalized support in ultra complex moments. With call volumes on the rise, solving this problem at scale was essential. Rocket tapped into a crucial insight: 81% of consumers prefer self-service options. This preference opens exciting possibilities for swift, efficient problem-solving. Imagine a world where answers are available at your fingertips, 24/7, without the need to wait in a queue. By implementing enhanced self-service tools, Rocket is poised to offer faster resolution times, greater client autonomy, and a more satisfying overall experience.

Client advocates, the face of the company, stand to benefit significantly from this transformation. Currently, client advocates spend about 30% of their time on administrative tasks. By streamlining processes, client advocates can focus on what they do best: providing exceptional customer service and nurturing client relationships. This shift promises more engaging work, increased job satisfaction, and opportunities for skill development. Rocket envisions their client advocates evolving into trusted advisors, handling complex inquiries that truly take advantage of their expertise and interpersonal skills.

For business leaders, this wealth of data on trends, sentiment, and performance opens up a treasure trove of opportunities. Decision-makers can now drive significant improvements across the board, employing data-driven strategies to enhance customer satisfaction, optimize operations, and boost overall business performance. Business leaders can look forward to leading more efficient teams, and senior leadership can anticipate improved client loyalty and a stronger bottom line.

Strategic requirements

To further elevate their client interactions, Rocket identified key requirements for their solution. These requirements were essential to make sure the solution could handle the demands of their extensive client base and provide actionable insights to enhance client experiences:

  • Sentiment analysis – Tracking client sentiment and preferences was necessary to offer personalized experiences. The solution needed to accurately gauge client emotions and preferences to tailor responses and services effectively.
  • Automation – Automating routine tasks, such as call summaries, was essential to free up team members for more meaningful client interactions. This automation would help reduce the manual workload, allowing the team to focus on building stronger client relationships.
  • AI integration – Using generative AI to analyze calls was crucial for providing actionable insights and enhancing client interactions. The AI integration needed to be robust enough to process vast amounts of data and deliver precise, meaningful results.
  • Data security – Protecting sensitive client information throughout the process was a non-negotiable requirement. Rocket needed to uphold the highest standards of data security, maintaining regulatory compliance, data privacy, and the integrity of client information.
  • Compliance and data privacy – Rocket required a solution that met strict compliance and data privacy standards. Given the sensitive nature of the information handled, the solution needed to provide complete data protection and adhere to industry regulations.
  • Scalability – Rocket needed a solution capable of handling millions of calls annually and scaling efficiently with growing demand. This requirement was vital to make sure the system could support their expansive and continuously increasing volume of voice interactions.

Solution overview

To meet these requirements, Rocket partnered with the AWS team to deploy the AWS Contact Center Intelligence (CCI) solution Post-Call Analytics, branded internally as Rocket Logic – Synopsis. This solution seamlessly integrates into Rocket’s existing operations, using AI technologies to transcribe and analyze client calls. By utilizing services like Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock, the solution extracts valuable insights such as sentiment, call drivers, and client preferences, enhancing client interactions and providing actionable data for continuous improvement.

At the heart of Rocket are their philosophies, known as their -ISMs, which guide their growth and innovation.  One of these guiding principles is “launch and learn.”

Embracing the mantra of “think big but start small,” Rocket adopted a rapid, iterative approach to achieve a remarkable time to market of just 10 days, compared to the months it would have traditionally taken. This agile methodology allowed them to create space for exploration and innovation. The team initially focused on a few key use cases, starting simple and rapidly iterating based on feedback and results.

To accelerate development and make sure data was quickly put into the hands of the business, they utilized mechanisms such as a hackathon with targeted goals. By using existing solutions and AWS technical teams, Rocket significantly reduced the time to market, allowing for swift deployment. Additionally, they looked to industry tactics to find solutions to common problems, so their approach was both innovative and practical.

During this “launch and learn” process, Rocket anticipated and managed challenges such as scaling issues and burst volume management using Drip Hopper and serverless technologies through AWS. They also fine-tuned the Anthropic’s Claude 3 Haiku large language model (LLM) on Amazon Bedrock for call classification and data extraction.

The following diagram illustrates the solution architecture.

Post-Call Analytics provides an entire architecture around ingesting audio files in a fully automated workflow with AWS Step Functions, which is initiated when an audio file is delivered to a configured Amazon Simple Storage Service (Amazon S3) bucket. After a few minutes, a transcript is produced with Amazon Transcribe Call Analytics and saved to another S3 bucket for processing by other business intelligence (BI) tools. These transcripts are saved for further processing by BI tools, with stringent security measures making sure personally identifiable information (PII) is redacted and data is encrypted.  The PII is redacted throughout, but client ID and interaction ID are used to correlate and trace across the data sets.  Downstream applications use those ids to pull from client data services in the UI presentation layer.

Enhancing the analysis, Amazon Comprehend is used for sentiment analysis and entity extraction, providing deeper insights into client interactions. Generative AI is integrated to generate concise call summaries and actionable insights, significantly reducing the manual workload and allowing team members to focus on building stronger client relationships. This generative AI capability, powered by Amazon Bedrock, Anthropic’s Claude Sonnet 3, and customizable prompts, enables Rocket to deliver real-time, contextually relevant information. Data is securely stored and managed within AWS, using Amazon S3 and Amazon DynamoDB, with robust encryption and access controls provided by AWS Key Management Service (AWS KMS) and AWS Identity and Access Management (IAM) policies. This comprehensive setup enables Rocket to efficiently manage, analyze, and act on client interaction data, thereby enhancing both client experience and operational efficiency.

Achieving excellence

The implementation of Rocket Logic – Synopsis has yielded remarkable results for Rocket:

  • Efficiency gains – Automating call transcription and sentiment analysis is projected to save the servicing team nearly 40,000 hours annually
  • Enhanced client experience – Approximately 70% of servicing clients fully self-serve over Gen AI powered mediums such as IVR; allowing clients to resolve inquiries without needing team member intervention
  • Increased first-call resolutions – There has been a nearly 10% increase in first-call resolutions, saving approximately 20,000 team member hours annually
  • Proactive client solutions – The tool’s predictive capabilities have improved, allowing Rocket to proactively address client needs before they even make a call
  • Start small, launch and learn, scale fast – Rocket started with 30,000 servicing calls with a 10-day time to market, and then scaled four times greater for operations, followed by six times greater for banking

Roadmap

Looking ahead, Rocket plans to continue enhancing Rocket Logic – Synopsis by using the vast amount of data gathered from call transcripts. Future developments will include:

  • Advanced predictive analytics – Further improving the tool’s ability to anticipate client needs and offer solutions proactively
  • Omnichannel integration – Expanding the AI capabilities to other communication channels such as emails and chats
  • Client preference tracking – Refining the technology to better understand and adapt to individual client preferences, providing more personalized interactions
  • Enhanced personalization – Utilizing data to create even more tailored client experiences, including understanding preferences for communication channels and timing

Conclusion

The collaboration between Rocket Mortgage and AWS has revolutionized the homeownership process by integrating advanced AI solutions into client interactions. Rocket Logic – Synopsis enhances operational efficiency significantly and improves the client experience. As Rocket continues to innovate and expand its AI capabilities, they remain committed to providing personalized, efficient, and seamless homeownership experiences for their clients. The success of Rocket Logic – Synopsis demonstrates the transformative power of technology in creating more efficient, responsive, and personalized client experiences. To learn more, visit Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock.


About the authors

Josh Zook is the Chief Technology Officer of Rocket Mortgage, working alongside the teams that are shipping the products that clients and partners are using every day to make home ownership a reality. He started in Technology in 1984 by writing a program in BASIC to calculate his weight on the moon using an Apple IIe. Since then, he has been on a relentless pursuit in using technology to make life easier by solving slightly more complex problems. Josh believes the key to success is curiosity combined with the grit and grind to make ideas reality. This has led to a steady paycheck since he was 10 years old, with jobs in landscaping, sandwich artistry, sporting goods sales, satellite installation, firefighter, and bookstore aficionado… just to name a few.

Alex Hamilton is a Director of Engineering at Rocket Mortgage, spearheading the AI driven digital strategy to help everyone home. He’s been shaping the tech scene at Rocket for over 11 years, including launching one of the company’s first models to boost trading revenue and bring modern event streaming and containerization to Rocket. Alex is passionate about solving novel engineering problems and bringing magical client experiences to life. Outside of work Alex enjoys traveling, weekend brunch, and firing up the grill!

Ritesh Shah is a Senior Worldwide GenAI Specialist at AWS. He partners with customers like Rocket to drive AI adoption, resulting in millions of dollars in top and bottom line impact for these customers. Outside work, Ritesh tries to be a dad to his AWSome daughter.  Connect with him on LinkedIn.

Venkata Santosh Sajjan Alla is a Senior Solutions Architect at AWS Financial Services, where he partners with North American FinTech companies like Rocket to drive cloud strategy and accelerate AI adoption. His expertise in AI & ML, and cloud native architecture has helped organizations unlock new revenue streams, enhance operational efficiency, and achieve substantial business transformation. By modernizing financial institutions with secure, scalable infrastructures, Sajjan enables them to stay competitive in a rapidly evolving, data-driven landscape. Outside of work, he enjoys spending time with his family and is a proud father to his daughter.

Read More

Integrate dynamic web content in your generative AI application using a web search API and Amazon Bedrock Agents

Integrate dynamic web content in your generative AI application using a web search API and Amazon Bedrock Agents

Amazon Bedrock Agents offers developers the ability to build and configure autonomous agents in their applications. These agents help users complete actions based on organizational data and user input, orchestrating interactions between foundation models (FMs), data sources, software applications, and user conversations.

Amazon Bedrock agents use the power of large language models (LLMs) to perform complex reasoning and action generation. This approach is inspired by the ReAct (reasoning and acting) paradigm, which combines reasoning traces and task-specific actions in an interleaved manner.

Amazon Bedrock agents use LLMs to break down tasks, interact dynamically with users, run actions through API calls, and augment knowledge using Amazon Bedrock Knowledge Bases. The ReAct approach enables agents to generate reasoning traces and actions while seamlessly integrating with company systems through action groups. By offering accelerated development, simplified infrastructure, enhanced capabilities through chain-of-thought (CoT) prompting, and improved accuracy, Amazon Bedrock Agents allows developers to rapidly build sophisticated AI solutions that combine the power of LLMs with custom actions and knowledge bases, all without managing underlying complexity.

Web search APIs empower developers to seamlessly integrate powerful search capabilities into their applications, providing access to vast troves of internet data with just a few lines of code. These APIs act as gateways to sophisticated search engines, allowing applications to programmatically query the web and retrieve relevant results including webpages, images, news articles, and more.

By using web search APIs, developers can enhance their applications with up-to-date information from across the internet, enabling features like content discovery, trend analysis, and intelligent recommendations. With customizable parameters for refining searches and structured response formats for parsing, web search APIs offer a flexible and efficient solution for harnessing the wealth of information available on the web.

Amazon Bedrock Agents offers a powerful solution for enhancing chatbot capabilities, and when combined with web search APIs, they address a critical customer pain point. In this post, we demonstrate how to use Amazon Bedrock Agents with a web search API to integrate dynamic web content in your generative AI application.

Benefits of integrating a web search API with Amazon Bedrock Agents

Let’s explore how this integration can revolutionize your chatbot experience:

  • Seamless in-chat web search – By incorporating web search APIs into your Amazon Bedrock agents, you can empower your chatbot to perform real-time web searches without forcing users to leave the chat interface. This keeps users engaged within your application, improving overall user experience and retention.
  • Dynamic information retrieval – Amazon Bedrock agents can use web search APIs to fetch up-to-date information on a wide range of topics. This makes sure that your chatbot provides the most current and relevant responses, enhancing its utility and user trust.
  • Contextual responses – Amazon Bedrock agent uses CoT prompting, enabling FMs to plan and run actions dynamically. Through this approach, agents can analyze user queries and determine when a web search is necessary or—if enabled—gather more information from the user to complete the task. This allows your chatbot to blend information from APIs, knowledge bases, and up-to-date web-sourced content, creating a more natural and informative conversation flow. With these capabilities, agents can provide responses that are better tailored to the user’s needs and the current context of the interaction.
  • Enhanced problem solving – By integrating web search APIs, your Amazon Bedrock agent can tackle a broader range of user inquiries. Whether it’s troubleshooting a technical issue or providing industry insights, your chatbot becomes a more versatile and valuable resource for users.
  • Minimal setup, maximum impact – Amazon Bedrock agents simplify the process of adding web search functionality to your chatbot. With just a few configuration steps, you can dramatically expand your chatbot’s knowledge base and capabilities, all while maintaining a streamlined UI.
  • Infrastructure as code – You can use AWS CloudFormation or the AWS Cloud Development Kit (AWS CDK) to deploy and manage Amazon Bedrock agents.

By addressing the customer challenge of expanding chatbot functionality without complicating the user experience, the combination of web search APIs and Amazon Bedrock agents offers a compelling solution. This integration allows businesses to create more capable, informative, and user-friendly chatbots that keep users engaged and satisfied within a single interface.

Solution overview

This solution uses Amazon Bedrock Agents with a web search capability that integrates external search APIs (SerpAPI and Tavily AI) with the agent. The architecture consists of the following key components:

Visual representation of the system

  • An Amazon Bedrock agent orchestrates the interaction between the user and search APIs, handling the chat sessions and optionally long-term memory
  • An AWS Lambda function implements the logic for calling external search APIs and processing results
  • External search APIs (SerpAPI and Tavily AI) provide web search capabilities
  • Amazon Bedrock FMs generate natural language responses based on search results
  • AWS Secrets Manager securely stores API keys for external services

The solution flow is as follows:

  1. User input is received by the Amazon Bedrock agent, powered by Anthropic Claude 3 Sonnet on Amazon Bedrock.
  2. The agent determines if a web search is necessary, or comes back to the user with clarifying questions.
  3. If required, the agent invokes one of two Lambda functions to perform a web search: SerpAPI for up-to-date events or Tavily AI for web research-heavy questions.
  4. The Lambda function retrieves the API secrets securely from Secrets Manager, calls the appropriate search API, and processes the results.
  5. The agent generates the final response based on the search results.
  6. The response is returned to the user after final output guardrails are applied.

The following figure is a visual representation of the system we are going to implement.

We demonstrate two methods to build this solution. To set up the agent on the AWS Management Console, we use the new agent builder. The following GitHub repository contains the Python AWS CDK code to deploy the same example.

Prerequisites

Make sure you have the following prerequisites:

Amazon Bedrock agents support models like Amazon Titan Text and Anthropic Claude models. Each model has different capabilities and pricing. For the full list of supported models, see Supported regions and models for Amazon Bedrock Agents.

For this post, we use the Anthropic Claude 3 Sonnet model.

Configure the web search APIs

Both SERPER (SerpAPI) and Tavily AI provide web search APIs that can be integrated with Amazon Bedrock agents by calling their REST-based API endpoints from a Lambda function. However, they have some key differences that can influence when you would use each one:

  • SerpAPI provides access to multiple search engines, including Google, Bing, Yahoo, and others. It offers granular control over search parameters and result types (for example, organic results, featured snippets, images, and videos). SerpAPI might be better suited for tasks requiring specific search engine features or when you need results from multiple search engines.
  • Tavily AI is specifically designed for AI agents and LLMs, focusing on delivering relevant and factual results. It offers features like including answers, raw content, and images in search results. It provides customization options such as search depth (basic or advanced) and the ability to include or exclude specific domains. It’s optimized for speed and efficiency in delivering real-time results.

You would use SerpAPI if you need results from specific search engines or multiple engines, and Tavily AI when relevance and factual accuracy are crucial.

Ultimately, the choice between SerpAPI and Tavily AI depends on your specific research requirements, the level of control you need over search parameters, and whether you prioritize general search engine capabilities or AI-optimized results.

For the example in this post, we chose to use both and let the agent decide which API is the more appropriate one, depending on the question or prompt. The agent can also opt to call both if one doesn’t provide a good enough answer. Both SerpAPI and Tavily AI provide a free tier that can be used for the example in this post.

For both APIs, API keys are required and are available from Serper and Tavily.

We securely store the obtained API keys in Secrets Manager. The following examples create secrets for the API keys:

aws secretsmanager create-secret 
--name SERPER_API_KEY 
--description "The API secret key for Serper." 
--secret-string "$SERPER_API_KEY"

aws secretsmanager create-secret 
--name TAVILY_API_KEY 
--description "The API secret key for Tavily AI." 
--secret-string "$TAVILY_API_KEY"

When you enter commands in a shell, there is a risk of the command history being accessed or utilities having access to your command parameters. For more information, see Mitigate the risks of using the AWS CLI to store your AWS Secrets Manager secrets.

Now that the APIs are configured, you can start building the web search Amazon Bedrock agent.

In the following section, we present two methods to create your agent: through the console and using the AWS CDK. Although the console path offers a more visual approach, we strongly recommend using the AWS CDK for deploying the agent. This method not only provides a more robust deployment process, but also allows you to examine the underlying code. Let’s explore both options to help you choose the best approach for your needs.

Build a web search Amazon Bedrock agent using the console

In the first example, you build a web search agent using the Amazon Bedrock console to create and configure the agent, and then the Lambda console to configure and deploy a Lambda function.

Create a web search agent

To create a web search agent using the console, complete the following steps:

  1. On the Amazon Bedrock console, choose Agents in the navigation pane.
  2. Choose Create agent.
  3. Enter a name for the agent (such as websearch-agent) and an optional description, then choose Create.

Create Agent Dialogue

You are now in the new agent builder, where you can access and edit the configuration of an agent.

  1. For Agent resource role, leave the default Create and use a new service role

This option automatically creates the AWS Identity and Access Management (IAM) role assumed by the agent.

  1. For the model, choose Anthropic and Claude 3 Sonnet.

Instructions for the Agent

  1. For Instructions for the Agent, provide clear and specific instructions to tell the agent what it should do. For the web search agent, enter:
You are an agent that can handle various tasks as described below:
1/ Helping users do research and finding up-to-date information. For up-to-date information always uses web search. Web search has two flavors:
a/ Google Search - this is great for looking up up-to-date information and current events
b/ Tavily AI Search - this is used to do deep research on topics your user is interested in. Not good for being used on news because it does not order search results by date.

As you can see from the instruction, we decided to name the SerpAPI option Google Search. In our tests with the Anthropic Claude 3 Sonnet model, Google Search is synonymous with web search. Because the instruction is a natural language instruction to the model, we want to stay as close to the assumed usage of words in a language, therefore, we use Google Search instead of SerpAPI. However, this could vary from model to model. We encourage you to test new instructions when changing the model.
  1. Choose Add in the Action groups

Action groups are how agents can interact with external systems or APIs to get more information or perform actions.

  1. For Enter action group name, enter action-group-web-search for the action group.
  2. For Action group type, select Define with function details so you can specify functions and their parameters as JSON instead of providing an Open API schema.
  3. For Action group invocation, set up what the agent does after this action group is identified by the model. Because we want to call the web search APIs, select Quick create a new Lambda function.

With this option, Amazon Bedrock creates a basic Lambda function for your agent that you can later modify on the Lambda console for the use case of calling the web search APIs. The agent will predict the function and function parameters needed to fulfil its goal and pass the parameters to the Lambda function.

Create Action group

  1. Now, configure the two functions of the action group—one for the SerpAPI Google search, and one for the Tavily AI search.
  2. For each of the two functions, for Parameters, add search_query with a description.

This is a parameter of type String and is required by each of the functions.

  1. Choose Create to complete the creation of the action group.

Action group functions

We use the following parameter descriptions:

“The search query for the Google web search.”
“The search query for the Tavily web search.”

We encourage you to try to add a target website as an extra parameter to the action group functions. Take a look at the lambda function code and infer the settings.

You will be redirected to the agent builder console.

  1. Choose Save to save your agent configuration.

Configure and deploy a Lambda function

Complete the following steps to update the action group Lambda function:

  1. On the Lambda console, locate the new Lambda function with the name action-group-web-search-.
  2. Edit the provided starting code and implement the web search use case:
import http.client
import json
… 
def lambda_handler(event, _):
    action_group = event["actionGroup"]
    function = event["function"]
    parameters = event.get("parameters", [])
    search_query, target_website = extract_search_params(action_group, function, parameters)
    search_results: str = ""
    if function == "tavily-ai-search":
        search_results = tavily_ai_search(search_query, target_website)
    elif function == "google-search":
        search_results = google_search(search_query, target_website)
    # Prepare the response
    function_response_body = {"TEXT": {"body": f"Here are the top search results for the query '{search_query}': {search_results} "}}
    action_response = {
        "actionGroup": action_group,
        "function": function,
        "functionResponse": {"responseBody": function_response_body},
    }
    response = {"response": action_response, "messageVersion": event["messageVersion"]}
    return response

The code is truncated for brevity. The full code is available on GitHub.

  1. Choose Deploy.

The function is configured with a resource-based policy that allows Amazon Bedrock to invoke the function. For this reason, you don’t need to update the IAM role used by the agent.

As part of the Quick create a new Lambda function option selected earlier, the agent builder configured the function with a resource-based policy that allows the Amazon Bedrock service principal to invoke the function. There is no need to update the IAM role used by the agent. However, the function needs permission to access API keys saved in Secrets Manager.

  1. On the function details page, choose the Configuration tab, then choose Permissions.
  2. Choose the link for Role name to open the role on the IAM console.

Execution role

  1. Open the JSON view of the IAM policy under Policy name and choose Edit to edit the policy.

Permissions policies

  1. Add the following statement, which gives the Lambda function the required access to read the API keys from Secrets Manager. Adjust the Region code as needed, and provide your AWS account ID.
{
  "Action": "secretsmanager:GetSecretValue",
  "Resource": [
    "arn:aws:secretsmanager:us-west-2:<account_id>:secret:SERPER_API_KEY*",
    "arn:aws:secretsmanager:<region_name>:<account_id>:secret:TAVILY_API_KEY*"
  ],
  "Effect": "Allow",
  "Sid": "GetSecretsManagerSecret"
}

Test the agent

You’re now ready to test the agent.

  1. On the Amazon Bedrock console, on the websearch-agent details page, choose Test.
  2. Choose Prepare to prepare the agent and test it with the latest changes.
  3. As test input, you can ask a question such as “What are the latest news from AWS?”

Test the agent

  1. To see the details of each step of the agent orchestration, including the reasoning steps, choose Show trace (already opened in the preceding screenshot).

This helps you understand the agent decisions and debug the agent configuration if the result isn’t as expected. We encourage you to investigate how the instructions for the agent and the tool instructions are handed to the agent by inspecting the traces of the agent.

In the next section, we walk through deploying the web search agent with the AWS CDK.

Build a web search Amazon Bedrock agent with the AWS CDK

Both AWS CloudFormation and AWS CDK support have been released for Amazon Bedrock Agents, so you can develop and deploy the preceding agent completely in code.

The AWS CDK example in this post uses Python. The following are the required steps to deploy this solution:

  1. Install the AWS CDK version 2.174.3 or later and set up your AWS CDK Python environment with Python 3.11 or later.
  2. Clone the GitHub repository and install the dependencies.
  3. Run AWS CDK bootstrapping on your AWS account.

The structure of the sample AWS CDK application repository is:

  • /app.py file – Contains the top-level definition of the AWS CDK app
  • /cdk folder – Contains the stack definition for the web search agent stack
  • /lambda folder – Contains the Lambda function runtime code that handles the calls to the Serper and Tavily AI APIs
  • /test folder – Contains a Python script to test the deployed agent

To create an Amazon Bedrock agent, the key resources required are:

  • An action group that defines the functions available to the agent
  • A Lambda function that implements these functions
  • The agent itself, which orchestrates the interactions between the FMs, functions, and user conversations

AWS CDK code to define an action group

The following Python code defines an action group as a Level 1 (L1) construct. L1 constructs, also known as AWS CloudFormation resources, are the lowest-level constructs available in the AWS CDK and offer no abstraction. Currently, the available Amazon Bedrock AWS CDK constructs are L1. With the action_group_executor parameter of AgentActionGroupProperty, you define the Lambda function containing the business logic that is carried out when the action is invoked.

action_group = bedrock.CfnAgent.AgentActionGroupProperty(
    action_group_name=f"{ACTION_GROUP_NAME}",
    description="Action that will trigger the lambda",
    action_group_executor=bedrock.CfnAgent.ActionGroupExecutorProperty(lambda_=lambda_function.function_arn),
    function_schema=bedrock.CfnAgent.FunctionSchemaProperty(
        functions=[
            bedrock.CfnAgent.FunctionProperty(
                name="tavily-ai-search",
                description="""
                    To retrieve information via the internet
                    or for topics that the LLM does not know about and
                    intense research is needed.
                """,
                parameters={
                    "search_query": bedrock.CfnAgent.ParameterDetailProperty(
                        type="string",
                        description="The search query for the Tavily web search.",
                        required=True,
                    )
                },
            ),
            bedrock.CfnAgent.FunctionProperty(
                name="google-search",
                description="For targeted news, like 'what are the latest news in Austria' or similar.",
                parameters={
                    "search_query": bedrock.CfnAgent.ParameterDetailProperty(
                        type="string",
                        description="The search query for the Google web search.",
                        required=True,
                    )
                },
            ),
        ]
),

After the Amazon Bedrock agent determines the API operation that it needs to invoke in an action group, it sends information alongside relevant metadata as an input event to the Lambda function.

The following code shows the Lambda handler function that extracts the relevant metadata and populated fields from the request body parameters to determine which function (Serper or Tavily AI) to call. The extracted parameter is search_query, as defined in the preceding action group function. The complete Lambda Python code is available in the GitHub repository.

def lambda_handler(event, _):  # type: ignore
    action_group = event["actionGroup"]
    function = event["function"]
    parameters = event.get("parameters", [])
    search_query, target_website = extract_search_params(action_group, function, parameters)
    search_results: str = ""
    if function == "tavily-ai-search":
        search_results = tavily_ai_search(search_query, target_website)
    elif function == "google-search":
        search_results = google_search(search_query, target_website)

Lastly, with the CfnAgent AWS CDK construct, specify an agent as a resource. The auto_prepare=True parameter creates a DRAFT version of the agent that can be used for testing.

  agent_instruction = """
      You are an agent that can handle various tasks as described below:
      1/ Helping users do research and finding up to date information. For up to date information always
         uses web search. Web search has two flavours:
         1a/ Google Search - this is great for looking up up to date information and current events
         2b/ Tavily AI Search - this is used to do deep research on topics your user is interested in. Not good on being used on news as it does not order search results by date.
      2/ Retrieving knowledge from the vast knowledge bases that you are connected to.
  """

  agent = bedrock.CfnAgent(
      self,
      "WebSearchAgent",
      agent_name="websearch_agent",
      foundation_model="anthropic.claude-3-sonnet-20240229-v1:0",
      action_groups=[action_group],
      auto_prepare=True,
      instruction=agent_instruction,
      agent_resource_role_arn=agent_role.role_arn,
   )

Deploy the AWS CDK application

Complete the following steps to deploy the agent using the AWS CDK:

  1. Clone the example AWS CDK code:
git clone https://github.com/aws-samples/websearch_agent
  1. Create a Python virtual environment, activate it, and install Python dependencies (make sure that you’re using Python 3.11 or later):
python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
  1. To deploy the agent AWS CDK example, run the cdk deploycommand:
cdk deploy

When the AWS CDK deployment is finished, it will output values for agent_id and agent_alias_id:

Outputs:
WebSearchAgentStack.agentaliasid = <agent_alias_id>
WebSearchAgentStack.agentid = <agent_id>
WebSearchAgentStack.agentversion = DRAFT

For example:

WebSearchAgentStack.agentaliasid = XP3JHPEDMK
WebSearchAgentStack.agentid = WFRPT9IMBO
WebSearchAgentStack.agentversion = DRAFT

Make a note of the outputs; you need them to test the agent in the next step.

Test the agent

To test the deployed agent, a Python script is available in the test/ folder. You must be authenticated using an AWS account and an AWS_REGION environment variable set. For details, see Configure the AWS CLI.

To run the script, you need the output values and to pass in a question using the -prompt parameter:

python invoke-agent.py --agent_id <agent_id> --agent_alias_id <agent_alias_id> --prompt "What are the latest AWS news?"

For example, with the outputs we received from the preceding cdk deploy command, you would run the following:

python invoke-agent.py --agent_id WFRPT9IMBO --agent_alias_id XP3JHPEDMK --prompt "What are the latest AWS news?"

You would receive the following response (output is truncated for brevity):

Here are some of the latest major AWS news and announcements:
At the recent AWS Summit in New York, AWS announced several new services and capabilities across areas like generative AI, machine learning, databases, and more.
Amazon Q, AWS's generative AI assistant, has been integrated with Smartsheet to provide AI-powered assistance to employees. Amazon Q Developer has also reached general availability with new features for developers.
AWS plans to launch a new Region in Mexico called the AWS Mexico (Central) Region, which will be the second AWS Region in Mexico ....

Clean up

To delete the resources deployed with the agent AWS CDK example, run the following command:

cdk destroy

Use the following commands to delete the API keys created in Secrets Manager:

aws secretsmanager delete-secret —secret-id SERPER_API_KEY
aws secretsmanager delete-secret —secret-id TAVILY_API_KEY

Key considerations

Let’s dive into some key considerations when integrating web search into your AI systems.

API usage and cost management

When working with external APIs, it’s crucial to make sure that your rate limits and quotas don’t become bottlenecks for your workload. Regularly check and identify limiting factors in your system and validate that it can handle the load as it scales. This might involve implementing a robust monitoring system to track API usage, setting up alerts for when you’re approaching limits, and developing strategies to gracefully handle rate-limiting scenarios.

Additionally, carefully consider the cost implications of external APIs. The amount of content returned by these services directly translates into token usage in your language models, which can significantly impact your overall costs. Analyze the trade-offs between comprehensive search results and the associated token consumption to optimize your system’s efficiency and cost-effectiveness. Consider implementing caching mechanisms for frequently requested information to reduce API calls and associated costs.

Privacy and security considerations

It’s essential to thoroughly review the pricing and privacy agreements of your chosen web search provider. The agentic systems you’re building can potentially leak sensitive information to these providers through the search queries sent. To mitigate this risk, consider implementing data sanitization techniques to remove or mask sensitive information before it reaches the search provider. This becomes especially crucial when building or enhancing secure chatbots and internally facing systems—educating your users about these privacy considerations is therefore of utmost importance.

To add an extra layer of security, you can implement guardrails, such as those provided by Amazon Bedrock Guardrails, in the Lambda functions that call the web search. This additional safeguard can help protect against inadvertent information leakage to web search providers. These guardrails could include pattern matching to detect potential personally identifiable information (PII), allow and deny lists for certain types of queries, or AI-powered content classifiers to flag potentially sensitive information.

Localization and contextual search

When designing your web search agent, it’s crucial to consider that end-users are accustomed to the search experience provided by standard web browsers, especially on mobile devices. These browsers often supply additional context as part of a web search, significantly enhancing the relevance of results. Key aspects of localization and contextual search include language considerations, geolocation, search history and personalization, and time and date context. For language considerations, you can implement language detection to automatically identify the user’s preferred language or provide it through the agent’s session context.

Refer to Control agent session context for details on how to provide session context in Amazon Bedrock Agents for more details.

It’s important to support multilingual queries and results, using a model that supports your specific language needs. Geolocation is another critical factor; utilizing the user’s approximate location (with permission) can provide geographically relevant results. Search history and personalization can greatly enhance the user experience. Consider implementing a system (with user consent) to remember recent searches and use this context for result ranking. You can customize an Amazon Bedrock agent with the session state feature. Adding a user’s location attributes to the session state is a potential implementation option.

Additionally, allow users to set persistent preferences for result types, such as preferring videos over text articles. Time and date context is also vital; use the user’s local time zone for time-sensitive queries like “latest news on quarterly numbers of company XYZ, now,” and consider seasonal context for queries that might have different meanings depending on the time of year.

For instance, without providing such extra information, a query like “What is the current weather in Zurich?” could yield results for any Zurich globally, be it in Switzerland or various locations in the US. By incorporating these contextual elements, your search agent can distinguish that a user in Europe is likely asking about Zurich, Switzerland, whereas a user in Illinois might be interested in the weather at Lake Zurich. To implement these features, consider creating a system that safely collects and utilizes relevant user context. However, always prioritize user privacy and provide clear opt-in mechanisms for data collection. Clearly communicate what data is being used and how it enhances the search experience. Offer users granular control over their data and the ability to opt out of personalized features. By carefully balancing these localization and contextual search elements, you can create a more intuitive and effective web search agent that provides highly relevant results while respecting user privacy.

Performance optimization and testing

Performance optimization and testing are critical aspects of building a robust web search agent. Implement comprehensive latency testing to measure response times for various query types and content lengths across different geographical regions. Conduct load testing to simulate concurrent users and identify system limits if applicable to your application. Optimize your Lambda functions for cold starts and runtime, and consider using Amazon CloudFront to reduce latency for global users. Implement error handling and resilience measures, including fallback mechanisms and retry logic. Set up Amazon CloudWatch alarms for key metrics such as API latency and error rates to enable proactive monitoring and quick response to performance issues.

To test the solution end to end, create a dataset of questions and correct answers to test if changes to your system improve or deteriorate the information retrieval capabilities of your app.

Migration strategies

For organizations considering a migration from open source frameworks like LangChain to Amazon Bedrock Agents, it’s important to approach the transition strategically. Begin by mapping your current ReAct agent’s logic to the Amazon Bedrock agents’ action groups and Lambda functions. Identify any gaps in functionality and plan for alternative solutions or custom development where necessary. Adapt your existing API calls to work with the Amazon Bedrock API and update authentication methods to use IAM roles and policies.

Develop comprehensive test suites to make sure functionalities are correctly replicated in the new environment. One significant advantage of Amazon Bedrock agents is the ability to implement a gradual rollout. By using the agent alias ID, you can quickly direct traffic between different versions of your agent, allowing for a smooth and controlled migration process. This approach enables you to test and validate your new implementation with a subset of users or queries before fully transitioning your entire system.

By carefully balancing these considerations—from API usage and costs to privacy concerns, localization, performance optimization, and migration strategies—you can create a more intelligent, efficient, and user-friendly search experience that respects individual preferences and data protection regulations. As you build and refine your web search agent with Amazon Bedrock, keep these factors in mind to provide a robust, scalable, and responsible AI system.

Expanding the solution

With this post, you’ve taken the first step towards revolutionizing your applications with Amazon Bedrock Agents and the power of agentic workflows with LLMs. You’ve not only learned how to integrate dynamic web content, but also gained insights into the intricate relationship between AI agents and external information sources.

Transitioning your existing systems to Amazon Bedrock agents is a seamless process, and with the AWS CDK, you can manage your agentic AI infrastructure as code, providing scalability, reliability, and maintainability. This approach not only streamlines your development process, but also paves the way for more sophisticated AI-driven applications that can adapt and grow with your business needs.

Expand your horizons and unlock even more capabilities:

  • Connect to an Amazon Bedrock knowledge base – Augment your agents’ knowledge by integrating them with a centralized knowledge repository, enabling your AI to draw upon a vast, curated pool of information tailored to your specific domain.
  • Embrace streaming – Use the power of streaming responses to provide an enhanced user experience and foster a more natural and interactive conversation flow, mimicking the real-time nature of human dialogue and keeping users engaged throughout the interaction.
  • Expose ReAct prompting and tool use – Parse the streaming output on your frontend to visualize the agent’s reasoning process and tool usage, providing invaluable transparency and interpretability for your users, building trust, and allowing users to understand and verify the AI’s decision-making process.
  • Utilize memory for Amazon Bedrock Agents – Amazon Bedrock agents can retain a summary of their conversations with each user and are able to provide a smooth, adaptive experience if enabled. This allows you to give extra context for tasks like web search and topics of interest, creating a more personalized and contextually aware interaction over time.
  • Give extra context – As outlined earlier, context matters. Try to implement additional user context through the session attributes that you can provide through the session state. Refer to Control agent session context for the technical implementations, and consider how this context can be used responsibly to enhance the relevance and accuracy of your agent’s responses.
  • Add agentic web research – Agents allow you to build very sophisticated workflows. Our system is not limited to a simple web search. The Lambda function can also serve as an environment to implement an agentic web research with multi-agent collaboration, enabling more comprehensive and nuanced information gathering and analysis.

What other tools would you use to complement your agent? Refer to the aws-samples GitHub repo for Amazon Bedrock Agents to see what others have built and consider how these tools might be integrated into your own unique AI solutions.

Conclusion

The future of generative AI is here, and Amazon Bedrock Agents is your gateway to unlocking its full potential. Embrace the power of agentic LLMs and experience the transformative impact they can have on your applications and user experiences. As you embark on this journey, remember that the true power of AI lies not just in its capabilities, but in how we thoughtfully and responsibly integrate it into our systems to solve real-world problems and enhance human experiences.

If you would like us to follow up with a second post tackling any points discussed here, feel free to leave a comment. Your engagement helps shape the direction of our content and makes sure we’re addressing the topics that matter most to you and the broader AI community.

In this post, you have seen the steps needed to integrate dynamic web content and harness the full potential of generative AI, but don’t stop here. Transitioning your existing systems to Amazon Bedrock agents is a seamless process, and with the AWS CDK, you can manage your agentic AI infrastructure as code, providing scalability, reliability, and maintainability.


About the Authors

Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. With a background in data science and mechanical engineering, his focus is on empowering customers to create lasting business impact with the help of AI. Connect with Philipp on LinkedIn.

Markus Rollwagen is a Senior Solutions Architect at AWS, based in Switzerland. He enjoys deep dive technical discussions, while keeping an eye on the big picture and the customer goals. With a software engineering background, he embraces infrastructure as code and is passionate about all things security. Connect with Markus on LinkedIn.

Read More

Build a generative AI assistant to enhance employee experience using Amazon Q Business

Build a generative AI assistant to enhance employee experience using Amazon Q Business

In today’s fast-paced business environment, organizations are constantly seeking innovative ways to enhance employee experience and productivity. There are many challenges that can impact employee productivity, such as cumbersome search experiences or finding specific information across an organization’s vast knowledge bases. Additionally, with the rise of remote and hybrid work models, traditional support systems such as IT Helpdesks and HR might struggle to keep up with the increased demand for assistance. Productivity loss because of these challenges can lead to lengthy onboarding times for new employees, extended task completion times, and call volumes for undifferentiated IT and HR support, to name a few.

Amazon Q Business is a fully managed, generative artificial intelligence (AI) powered assistant that can address the challenges mentioned above by providing 24/7 support tailored to individual needs. It can handle a wide range of tasks such as answering questions, providing summaries, and generating content and completing tasks based on data in your organization. Additionally, Amazon Q Business offers enterprise-grade data security and privacy and has guardrails built-in that are configurable by an admin. Customers like Deriv were successfully able to reduce new employee onboarding time by up to 45% and overall recruiting efforts by as much as 50% by making generative AI available to all of their employees in a safe way.

In this blog post, we will talk about Amazon Q Business use cases, walk-through an example application, and discuss approaches for measuring productivity gains.

Use cases overview

Some key use cases for Amazon Q Business for organizations include:

  • Providing grounded responses to employees: An organization can deploy Amazon Q Business on their internal data, documents, products, and services. This allows Amazon Q Business to understand the business context and provide tailored assistance to employees on common questions, tasks, and issues.
  • Improving employee experience: By deploying Amazon Q Business across various environments like websites, apps, and chatbots, organizations can provide unified, engaging and personalized experiences. Employees will have a consistent experience wherever they choose to interact with the generative AI assistant.
  • Knowledge management: Amazon Q Business helps organizations use their institutional knowledge more effectively. It can be integrated with internal knowledge bases, manuals, best practices, and more, to provide a centralized source of information to employees.
  • Project management and issue tracking: With Amazon Q Business plugins, users can use natural language to open tickets without leaving the chat interface. Previously resolved tickets can also be used to help reduce overall ticket volumes and get employees the information they need faster to resolve an issue.

Amazon Q Business features

The Amazon Q Business-powered chatbot aims to provide comprehensive support to users with a multifaceted approach. It offers multiple data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration. Amazon Q Business supports over 40 connectors at the time of writing. Additionally, Amazon Q Business also supports plugins to enable users to take action from within the conversation. There are four native plugins offered, and a custom plugin option to integrate with any third-party application.

Using the Business User Store feature, users see chat responses generated only from the documents that they have access to within an Amazon Q Business application. You can also customize your application environment to your organizational needs by using application environment guardrails or chat controls such as global controls and topic-level controls that you can configure to manage the user chat experience.

Features like document enrichment and relevance tuning together play a key role in further customizing and enhancing your applications. The document enrichment feature helps you control both what documents and document attributes are ingested into your index and also how they’re ingested. Using document enrichment, you can create, modify, or delete document attributes and document content when you ingest them into your Amazon Q Business index. You can then assign weights to document attributes after mapping them to index fields using the relevance tuning feature. You can use these assigned weights to fine-tune the underlying ranking of Retrieval-Augmented Generation (RAG)-retrieved passages within your application environment to optimize the relevance of chat responses.

Amazon Q Business offers robust security features to protect customer data and promote responsible use of the AI assistant. It uses pre-trained machine learning models and does not use customer data to train or improve the models. The service supports encryption at rest and in transit, and administrators can configure various security controls such as restricting responses to enterprise content only, specifying blocked words or phrases, and defining special topics with customized guardrails. Additionally, Amazon Q Business uses the security capabilities of Amazon Bedrock, the underlying AWS service, to enforce safety, security, and responsible use of AI.

Sample application architecture

The following figure shows a sample application architecture.

Sample Architecture Diagram

Application architecture walkthrough

Before you begin to create an Amazon Q Business application environment, make sure that you complete the setting up tasks and review the Before you begin section. This includes tasks like setting up required AWS Identity and Access Management (IAM) roles and enabling and pre-configuring an AWS IAM Identity Center instance.

As the next step towards creating a generative AI assistant, you can create the Amazon Q Business web experience. The web experience can be created using either the AWS Management Console or the Amazon Q Business APIs.

After creating your Amazon Q Business application environment, you create and select the retriever and provision the index that will power your generative AI web experience. The retriever pulls data from the index in real time during a conversation. After you select a retriever for your Amazon Q Business application environment, you connect data sources to it.

This sample application connects to repositories like Amazon Simple Storage Service (Amazon S3) and SharePoint, and to public facing websites or internal company websites using Amazon Q Web Crawler. The application also integrates with service and project management tools such as ServiceNow and Jira and enterprise communication tools such as Slack and Microsoft Teams. The application uses built-in plugins for Jira and ServiceNow to enable users to perform specific tasks related to supported third-party services from within their web experience chat, such as creating a Jira ticket or opening an incident in ServiceNow.

After the data sources are configured, data is integrated and synchronized into container indexes that are maintained by the Amazon Q Business service. Authorized users interact with the application environment through the web experience URL after successfully authenticating. You could also use Amazon Q Business APIs to build a custom UI to implement special features such as handling feedback, using company brand colors and templates, and using a custom sign-in. It also enables conversing with Amazon Q through an interface personalized to your use case.

Application demo

Here are a few screenshots demonstrating an AI assistant application using Amazon Q Business. These screenshots illustrate a scenario where an employee interacts with the Amazon Q Business chatbot to get summaries, address common queries related to IT support, and open tickets or incidents using IT service management (ITSM) tools such as ServiceNow.

  1. Employee A interacts with the application to get help when wireless access was down and receives suggested actions to take:
    Screenshot showing employee interacting with the application to get help when wireless access was down
  2. Employee B interacts with the application to report an incident of wireless access down and receives a form to fill out to create a ticket:
    Screenshot showing employee interacting with the form presented by the application to create an incident in ServiceNow
    Screenshot showing the created incident in the application
    An incident is created in ServiceNow based on Employee B’s interaction:
    Screenshot of the created incident in ServiceNow
  3. A new employee in the organization interacts with the application to ask several questions about company policies and receives reliable answers:
    Screenshot showing employee interacting with the application to ask several questions about company policies
  4. A new employee in the organization asks the application how to reach IT support and receives detailed IT support contact information:
    Screenshot showing employee interacting with the application on how to reach IT support

Approaches for measuring productivity gains:

There are several approaches to measure productivity gains achieved by using a generative AI assistant. Here are some common metrics and methods:

Average search time reduction: Measure the time employees spend searching for information or solutions before and after implementing the AI assistant. A reduction in average search time indicates faster access to information, which can lead to shorter task completion times and improved efficiency.

    • Units: Percentage reduction in search time or absolute time saved (for example, hours or minutes)
    • Example: 40% reduction in average search time or 1 hour saved per employee per day

Task completion time: Measure the time taken to complete specific tasks or processes with and without the AI assistant. Shorter completion times suggest productivity gains.

    • Units: Percentage reduction in task completion time or absolute time saved (for example, hours or minutes)
    • Example: 30% reduction in task completion time or 2 hours saved per task

Recurring issues: Monitor the number of tickets raised for recurring issues and issues related to tasks or processes that the AI assistant can handle. A decrease in these tickets indicates improved productivity and reduced workload for employees.

    • Units: Percentage reduction in recurring issue frequency or absolute reduction in occurrences
    • Example: 40% reduction in the frequency of recurring issue X or 50 fewer occurrences per quarter

Overall ticket volume: Track the total number of tickets or issues raised related to tasks or processes that the AI assistant can handle.

    • Units: Percentage reduction in ticket volume or absolute number of tickets reduced
    • Example: 30% reduction in relevant ticket volume or 200 fewer tickets per month

Employee onboarding duration: Evaluate the time required for new employees to become fully productive with and without the AI assistant. Shorter onboarding times can indicate that the AI assistant is providing effective support, which translates to cost savings and faster time-to-productivity.

    • Units: Percentage reduction in onboarding time or absolute time saved (for example, days or weeks)
    • Example: 20% reduction in onboarding duration or 2 weeks saved per new employee

Employee productivity metrics: Track metrics such as output per employee or output quality before and after implementing the AI assistant. Improvements in these metrics can indicate productivity gains.

    • Units: Percentage improvement in output quality or reduction in rework or corrections
    • Example: 15% improvement in output quality or 30% reduction in rework required

Cost savings: Calculate the cost savings achieved through reduced labor hours, improved efficiency, and faster turnaround times enabled by the AI assistant.

    • Units: Monetary value (for example, dollars or euros) saved
    • Example: $100,000 in cost savings due to increased productivity

Knowledge base utilization: Measure the increase in utilization or effectiveness of knowledge bases or self-service resources because of the AI assistant’s ability to surface relevant information.

    • Units: Percentage increase in knowledge base utilization
    • Example: 20% increase in knowledge base utilization

Employee satisfaction surveys: Gather feedback from employees on their perceived productivity gains, time savings, and overall satisfaction with the AI assistant. Positive feedback can lead to increased retention, better performance, and a more positive work environment.

    • Units: Employee satisfaction score or percentage of employees reporting positive impact
    • Example: 80% of employees report increased productivity and satisfaction with the AI assistant

It’s important to establish baseline measurements before introducing the AI assistant and then consistently track the relevant metrics over time. Additionally, conducting controlled experiments or pilot programs can help isolate the impact of the AI assistant from other factors affecting productivity.

Conclusion

In this blog post, we explored how you can use Amazon Q Business to build generative AI assistants that enhance employee experience and boost productivity. By seamlessly integrating with internal data sources, knowledge bases, and productivity tools, Amazon Q Business equips your workforce with instant access to information, automated tasks, and personalized support. Using its robust capabilities, including multi-source connectors, document enrichment, relevance tuning, and enterprise-grade security, you can create tailored AI solutions that streamline workflows, optimize processes, and drive tangible gains in areas like task completion times, issue resolution, onboarding efficiency, and cost savings.

Unlock the transformative potential of Amazon Q Business and future-proof your organization—contact your AWS account team today.

Read more about Amazon Q


About the Authors

Puneeth Ranjan Komaragiri is a Principal Technical Account Manager at Amazon Web Services (AWS). He is particularly passionate about Monitoring and Observability, Cloud Financial Management, and Generative Artificial Intelligence (Gen-AI) domains. In his current role, Puneeth enjoys collaborating closely with customers, leveraging his expertise to help them design and architect their cloud workloads for optimal scale and resilience.

Krishna Pramod is a Senior Solutions Architect at AWS. He works as a trusted advisor for customers, helping customers innovate and build well-architected applications in AWS cloud. Outside of work, Krishna enjoys reading, music and traveling.

Tim McLaughlin is a Senior Product Manager for Amazon Q Business at Amazon Web Services (AWS). He is passionate about helping customers adopt generative AI services to meet evolving business challenges. Outside of work, Tim enjoys spending time with his family, hiking, and watching sports.

Read More