Amazon AWS – Page 54

Unleashing Stability AI’s most advanced text-to-image models for media, marketing and advertising: Revolutionizing creative workflows

November 7, 2024

by Isha Dua Amazon AWS

To stay competitive, media, advertising, and entertainment enterprises need to stay abreast of recent dramatic technological developments. Generative AI has emerged as a game-changer, offering unprecedented opportunities for creative professionals to push boundaries and unlock new realms of possibility. At the forefront of this revolution is Stability AI’s family of cutting-edge text-to-image AI models. These models promise to transform the way we approach visual content creation, empowering large media, advertising, and entertainment organizations to tackle real-world business use cases with efficiency and creativity.

This technical post explores how these organizations can use the power of Stability AI to streamline workflows, enhance creative processes, and unleash a new era of advertising campaigning and visual storytelling.

Overview

Amazon Bedrock recently launched three new models by Stability AI: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These advanced models greatly improve performance in multisubject prompts, image quality, and typography and can be used to rapidly generate high-quality visuals for a wide range of use cases across marketing, advertising, media, entertainment, retail, and more. One of the key improvements of these models compared to Stable Diffusion XL (SDXL) (one of Stability AI’s older models) is text quality in generated images, with fewer errors in spelling and typography thanks to its innovative Diffusion Transformer architecture.

By learning the intricate relationships between visual and textual data, these models can generate highly detailed and coherent images from simple text prompts. The improved architecture combines the strengths of various deep learning techniques, including transformer encoders for text understanding, convolutional neural networks (CNNs) for efficient image processing, and attention mechanisms for capturing long-range dependencies and fine-grained details. The new family of models available on Amazon Bedrock are mentioned in the table below:

Features	Stable Image Core	SD3 Large 1.0	Stable Image Ultra 1.0
Parameters	2.6 billion	8 billion	8 billion
Input	Text	Text or Image	Text
Typography	Versatility and readability across different sizes and applications	Tailored for large-scale display	Tailored for large-scale display
Visual Aesthetics	Good rendering, not as detail oriented	Highly realistic with finer attention to detail	Photorealistic image output
Best Fit	Fast and affordable rapid concepting and ideating	Content creation in media, entertainment, retail	High-quality content at speed for media, retail

To evaluate the capabilities of these models, we tested a variety of prompts ranging from simple object descriptions to complex scene compositions. The experiments revealed that, although SDXL excelled at rendering common objects and scenes accurately, these newer models from Stability AI demonstrated improved performance on more nuanced and imaginative prompts. The new models better understand and visually express abstract concepts, stylized artistic renditions, and creative blends of disparate elements.

Stable Image Core is a newer, more affordable and faster version of SDXL. It’s based on the same diffusion architecture as SDXL. In comparison, Stable Diffusion 3 Large and Stable Image Ultra are based on the new diffusion transformer architectures, making them much better at typography.

Expanded training data of the SD3 base model—which is used for both Stable Diffusion 3 Large and Stable Image Ultra—has endowed it with stronger multimodal reasoning and world knowledge compared to SDXL. Some key improvements we observed from the prompt experimentation are the following:

Prompt adherence – These models excel at following complex and detailed prompts, particularly in surreal scenes, making sure that the generated images closely match the specified instructions. Stable Diffusion 3 Large and Stable Image Ultra work the best with natural language.
Text Rendering: Unlike SDXL, which may struggle with incorporating text into images, these newer models effectively generate and integrate text, enhancing the overall coherence of the visuals.
Complex Scene Handling: The new models demonstrate a improved ability to create intricate and detailed scenes, showcasing a better grasp of surreal elements as it understands them in your prompts.
Photorealism: The images produced by these models are more lifelike, with improved handling of textures, lighting, and shadows, making them visually striking.
Visual Aesthetics: The overall visual appeal is enhanced, making them more engaging and attractive.
Multimodal Capabilities: The new models can process various input types beyond just text, allowing for more context-aware image generation.
Scalability: The new architecture of these models supports handling larger datasets and generating higher-resolution images effectively.
Advanced Architecture: The SD3 base model (used for Stable Diffusion 3 Large and Stable Image Ultra) utilizes a new diffusion transformer combined with flow matching, which enhances its performance in generating high-quality images.

The table below showcases the comparison in image generation between the models available on Amazon Bedrock.

Image Generation Comparison – Stability AI Models

Real-world use cases for media, advertising, and entertainment

In the world of media, marketing, and entertainment, concept art and storyboarding are essential for visualizing ideas and communicating creative visions. Stability AI’s models can revolutionize this process by generating high-quality concept art and storyboard frames based on textual descriptions, enabling rapid iteration and exploration of ideas.

Ideation and iteration

Advertising agencies and marketing teams can leverage these models to generate visually stunning and attention-grabbing assets for their campaigns. From product shots to lifestyle imagery, these models can produce a wide range of visuals tailored to specific brand identities and target audiences. In film and television, these models can be a powerful tool for set design and virtual production. By generating realistic environments and backdrops based on textual descriptions, production teams can quickly visualize and iterate on set designs, reducing the need for physical mockups and saving time and resources.

Character design

Character design is a crucial aspect of storytelling in media and entertainment. These models can assist artists and designers in generating unique and compelling character concepts, enabling them to explore a wide range of visual styles and aesthetics.

Social media marketing asset generation

Social media has become a vital marketing channel for media, advertising, and entertainment organizations. Stability AI’s latest models can be leveraged to generate engaging visual content, such as memes, graphics, and promotional materials, tailored to specific social media domains and target audiences.

Stability AI’s capabilities in advertising and marketing campaigns

To showcase the power of Stability AI’s text-to-image models in creating compelling advertising and marketing assets, we walk through a demonstration using a Jupyter notebook that combines large language models (LLMs) and Stable Diffusion 3 Large for end-to-end campaign creation. We demonstrate how to produce generated images for a brand called Young Generational Shoes (YGS), evaluate brand consistency and message effectiveness, use the LLM to analyze images and suggest improvements, and refine prompts based on feedback to generate new iterations. By combining LLM-generated campaign ideas with this model’s advanced image generation capabilities, agencies can rapidly produce high-quality, tailored visual assets that resonate with their target audience. The notebook provides a practical, hands-on example of how these cutting-edge AI tools can be integrated into real-world advertising workflows, potentially saving time and resources while enhancing creative output.

The recorded version of the demo is available here:

Prerequisites

This notebook is designed to run on AWS, leveraging Amazon Bedrock for both the LLM and Stability AI model access. Make sure you have the following set up before moving forward:

An AWS account
An Amazon SageMaker domain
A SageMaker domain user profile

To access Stability AI’s Stable Image Ultra text to image model, request access through the Amazon Bedrock console. For instructions, see Manage access to Amazon Bedrock foundation models. For instructions on how to deploy this sample, refer to the GitHub repo. Use the us-west-2 Region to run this demo.

Setting up the demo

We will be using the Stable Image Ultra for the purposes of this demo. You can use one of the other available models from Stability AI on Bedrock to run through your version of the notebook.

# Amazon Bedrock Model ID used throughout this notebook
# Model IDs: https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns
MODEL_ID = "stability.stable-image-ultra-v1:0"

This following function call essentially acts as a wrapper around the Amazon Bedrock API, simplifying the process of generating images using Stability AI’s models. It handles the API call, response parsing, and image decoding, providing a straightforward way to generate images from text prompts using these advanced AI models.

def generate_image_from_text(model_id, body):
    """
    Generate an image using SD3 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info("Generating image with SD3 model %s", model_id)

    bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")
    
    response = bedrock.invoke_model(modelId=model_id,body=body)
    response_body= json.loads(response["body"].read())
    image_data = base64.b64decode(response_body.get("images")[0]

    logger.info("Successfully generated image with the SD3 model %s", model_id)
    return image_data

Generating creative ad campaigns with multiple models

The demo begins by using an LLM to generate creative ad campaign ideas and follows these steps

Define your product or service and target audience
Prompt the LLM to create multiple ad campaign concepts
The LLM generates diverse ideas, considering factors such as brand identity, audience demographics, and current trends

This process allows for a wide range of creative concepts tailored to your specific marketing needs. The following is the sample prompt we used in the notebook:

You are a seasoned veteran in the advertising industry with a wealth of experience
in creating captivating and impactful campaigns. Your task is to generate five
different creative advertising concepts for our new line of shoes under the brand
"YGS". Our product range includes running shoes, soccer shoes, and training shoes.

Our target audience is the young generation, a demographic known for their energy,
trendiness, and desire to express their individuality.

Each advertising concept should seamlessly incorporate the following elements: 

1. The specific type of shoe (running, soccer, tennis, hiking or training) and 
its intended usage. 
2. A vivid description of the colors and unique features that make our
shoes stand out. 
3. A compelling scenario that vividly illustrates when and where these shoes would
be worn, capturing the essence of the active lifestyle our target audience embraces. 

Your concepts should be fresh, engaging, and resonate with the youthful spirit
of our target market. Creativity, originality, and a deep understanding of
our audience's aspirations and passions should shine through in your advertising
ideas. Remember, the goal is to craft compelling narratives that not only showcase
our product's features but also tap into the emotions and desires of the
young generation, inspiring them to embrace our brand as an extension of
their vibrant lifestyles. 

The output format should follow below Json format: 
[ { "concept": "xxx", "Description": "xxx", "Scenario": "xxx" }, 
{ "concept": "xxx", "Description": "xxx", "Scenario": "xxx" } ... ]"

Prompt engineering for visual assets

Once you have campaign concepts, the next step is to craft effective prompts for SD3 Ultra 1.0. This involves using Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to transform campaign ideas into detailed image prompts, refining these prompts to include specific visual elements, styles, and compositions, and iterating on them to make sure that they capture the essence of the campaign. This process helps create precise instructions to generate visuals that align closely with the campaign’s objectives.

 """You are an expert to use stable diffusion model to generate shoes ad posters.
 Please user below content to generate the positive and negative prompt for stable
 diffusion model:
 - "Concept": {Concept}
 - "Description": {Description}
 - "Scenario": {Scenario}
 
 Output format shoud be Json format as below:
  [
     {
        "positive_prompt": "xxx"
     }
  ]
 Please add this to the positive prompt: text 'YGS' on the Shoes as a logo."""

Generating ad posters with Stable Image Ultra

With well-crafted prompts, Stable Image Ultra can now create stunning visual assets. The process involves entering the refined prompts into the model through the Amazon Bedrock API, adjusting parameters such as image size, number of inference steps, and guidance scale for optimal results and generating multiple variations to provide a range of options for the campaign. This approach allows for the creation of diverse, high-quality visuals that can be fine-tuned to help meet specific campaign requirements. Here are some posters generated by Stable Image Ultra:

Note:

The images generated could be different because your results depend on the parameters and their values, including the following:

The cfg_scale, which determines how strictly the diffusion process adheres to the prompt text
The height and width of the image in pixels
The number of diffusion steps to run
The random noise seed (which, if provided, makes the resulting generated image deterministic)
The sampler used for the diffusion process to denoise the generation
The array of text prompts used for generation
The weight assigned to each prompt

These parameters allow for fine-tuning and customization of the image generation process, resulting in diverse outputs based on their specific configuration.

Clean up

To avoid charges, you must stop the active SageMaker notebook instances. For instructions, refer to Clean up Amazon Sagemaker notebook instance resources.

Conclusion

Stability AI’s new family of models represents a significant milestone in the field of generative AI, offering media, advertising, and entertainment organizations a powerful tool to streamline creative workflows and unlock new realms of visual expression. By using Stability AI’s capabilities, organizations can tackle real-world business use cases, from concept art and storyboarding to advertising campaigns and content creation. However, it’s essential to proceed with a responsible and ethical mindset, addressing potential biases, respecting intellectual property rights, and mitigating the risks of misuse. By embracing the capabilities of these models while navigating their limitations and ethical considerations, creative professionals can push the boundaries of what’s possible in the world of visual content creation. To get started, check out Stability AI models in Amazon Bedrock.

As the field of generative AI continues to evolve rapidly, we can expect even more exciting developments and innovations from Stability AI and other industry leaders. Stay tuned for further advancements that will shape the creative landscape and empower artists, designers, and content creators in unprecedented ways.

About the authors

Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area. She helps AWS enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.

Boshi Huang is a Senior Applied Scientist in Generative AI at Amazon Web Services, where he collaborates with customers to develop and implement generative AI solutions. Boshi’s research focuses on advancing the field of generative AI through automatic prompt engineering, adversarial attack and defense mechanisms, inference acceleration, and developing methods for responsible and reliable visual content generation.

Build a multi-tenant generative AI environment for your enterprise on AWS

November 7, 2024

by Anastasia Tzeveleka Amazon AWS

While organizations continue to discover the powerful applications of generative AI, adoption is often slowed down by team silos and bespoke workflows. To move faster, enterprises need robust operating models and a holistic approach that simplifies the generative AI lifecycle. In the first part of the series, we showed how AI administrators can build a generative AI software as a service (SaaS) gateway to provide access to foundation models (FMs) on Amazon Bedrock to different lines of business (LOBs). In this second part, we expand the solution and show to further accelerate innovation by centralizing common Generative AI components. We also dive deeper into access patterns, governance, responsible AI, observability, and common solution designs like Retrieval Augmented Generation.

Our solution uses Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. It also uses a number of other AWS services such as Amazon API Gateway, AWS Lambda, and Amazon SageMaker.

Architecting a multi-tenant generative AI environment on AWS

A multi-tenant, generative AI solution for your enterprise needs to address the unique requirements of generative AI workloads and responsible AI governance while maintaining adherence to corporate policies, tenant and data isolation, access management, and cost control. As a result, building such a solution is often a significant undertaking for IT teams.

In this post, we discuss the key design considerations and present a reference architecture that:

Accelerates generative AI adoption through quick experimentation, unified model access, and reusability of common generative AI components
Offers tenants the flexibility to choose the optimal design and technical implementation for their use case
Implements centralized governance, guardrails, and controls
Allows for tracking and auditing model usage and cost per tenant, line of business (LOB), or FM provider

Solution overview

The proposed solution consists of two parts:

The generative AI gateway and
The tenant

The following diagram illustrates an overview of the solution.

Generative AI gateway

Shared components lie in this part. Shared components refer to the functionality and features shared by all tenants. Each component in the previous diagram can be implemented as a microservice and is multi-tenant in nature, meaning it stores details related to each tenant, uniquely represented by a tenant_id. Some components are categorized in groups based on the type of functionality they exhibit.

The standalone components are:

The HTTPS endpoint is the entry point to the gateway. Interactions with the shared services goes through this HTTPS endpoint. This is the only entry point of the solution.
The orchestrator is responsible for receiving the requests forwarded by the HTTPS endpoint and invoking relevant microservices, based on the task at hand. This in itself is a microservice, inspired the Orchestrator Saga pattern in microservices.
The generative AI playground is a UI provided to tenants where they can run their one-time experiments, chat with several FMs, and manually test capabilities such as guardrails or model evaluation for exploration purposes.

The component groups are as follows.

Core services is primarily targeted to the environment administrator. It contains services used to onboard, manage, and operate the environment, for example, to onboard and off-board tenants, users, and models, assign quotas to different tenants, and authentication and authorization microservices. It also contains observability components for cost tracking, budgeting, auditing, logging, etc.
Generative AI model components contain microservices for foundation and custom model invocation operations. These microservices abstract communication to FMs served through Amazon Bedrock, Amazon SageMaker, or a third-party model provider.
Generative AI components provide functionalities needed to build a generative AI application. Capabilities such as prompt caching, prompt chaining, agents, or hybrid search are part of these microservices.
Responsible AI components promote the safe and responsible development of AI across tenants. They include features such as guardrails, red teaming, and model evaluation.

Tenant

This part represents the tenants using the AI gateway capabilities. Each tenant has different requirements and needs and their own application stack. They can integrate their application with the generative AI gateway to embed generative AI capabilities in their application. The environment Admin has access to the generative AI gateway and interacts with the core services.

Solution walkthrough

The following sections examine each part of the solution in more depth.

HTTPS endpoint

This serves as the entry point for the generative AI gateway. Incoming requests to the gateway go through this point. There are different approaches you can follow when designing the endpoint:

REST API endpoint – You can set up a REST API endpoint using services such as API Gateway where you can apply all authentication, authorization, and throttling mechanisms. API Gateway is serverless and hence automatically scales with traffic.
WebSockets – For long-running connections, you can use WebSockets instead of a REST interface. This implementation overcomes timeout limitations in synchronous REST requests. A WebSockets implementation keeps the connection open for multiturn or long-running conversations. API Gateway also provides a WebSocket API.
Load balancer – Another option is to use a load balancer that exposes an HTTPS endpoint and routes the request to the orchestrator. You can use AWS services such as Application Load Balancer to implement this approach. The advantage of using Application Load Balancer is that it can seamlessly route the request to virtually any managed, serverless or self-hosted component and can also scale well.

Tenants and access patterns

Tenants, such as LOBs or teams, use the shared services to access APIs and integrate generative AI capabilities into their applications. They can also use the playground UI to assess the suitability of generative AI for their specific use case before diving into full-fledged application development.

Here you also have the data sources, processing pipelines, vector stores, and data governance mechanisms that allow tenants to securely discover, access, andthe data they need for their specific use case. At this point, you need to consider the use case and data isolation requirements. Some applications may need to access data with personal identifiable information (PII) while others may rely on noncritical data. You also need to consider the operational characteristics and noisy neighbor risks.

Take Retrieval Augmented Generation (RAG) as an example. Depending on the use case and data isolation requirements, tenants can have a pooled knowledge base or a siloed one and implement item-level isolation or resource level isolation for the data respectively. Tenants can select data from the data sources they have access to, choose the right chunking strategy for their application, use the shared generative AI FMs for converting the data into embeddings, and store the embeddings in their vector store.

To answer user questions in real time, tenants can implement caching mechanisms to reduce latency and costs for frequent queries. Additionally, they can implement custom logic to retrieve information about previous sessions, the state of the interaction, and information specific to the end user. To generate the final response, they can again access the models and re-ranking functionality available through the gateway.

The following diagram illustrates a potential implementation of a chat-based assistant application with this approach. The tenant application uses FMs available through the generative AI gateway and its own vector store to provide personalized, relevant responses to the end user.

Shared services

The following section describes the shared services groups.

Model components

The goal of this component group is to expose a unified API to tenants for accessing underlying models irrespective of where these are hosted. It abstracts invocation details and accelerates application development. It consists of one or more components depending on the number of FM providers and number and types of custom models used. These components are illustrated in the following diagram.

In terms of how to offer FMs to your tenants, with AWS you have several options:

Amazon Bedrock is a fully managed service that offers a choice of FMs from AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. It’s serverless so you don’t have to manage the infrastructure. You can also bring your own customized models and deploy them to Amazon Bedrock for supported architectures.
SageMaker JumpStart is a machine learning (ML) hub that provides a wide range of publicly available and proprietary FMs from providers such as AI21 Labs, Cohere, Hugging Face, Meta, and Stability AI, which you can deploy to SageMaker endpoints in your own AWS account.
SageMaker offers SageMaker endpoints for inference where you can deploy a publicly available model, such as models from HuggingFace, or your own model.
You can also deploy models on AWS compute using container services such as Amazon Elastic Kubernetes Service (Amazon EKS) or self-managed approaches.

With AWS PrivateLink, you can create a private connection between your virtual private cloud (VPC) and Amazon Bedrock and SageMaker endpoints.

Generative AI application components

This group contains components linked to the unique requirements of generative AI applications. They’re illustrated in the following figure.

Prompt catalog – Crafting effective prompts is important for guiding large language models (LLMs) to generate the desired outputs. Prompt engineering is typically an iterative process, and teams experiment with different techniques and prompt structures until they reach their target outcomes. Having a centralized prompt catalog is essential for storing, versioning, tracking, and sharing prompts. It also lets you automate your evaluation process in your pre-production environments. When a new prompt is added to the catalog, it triggers the evaluation pipeline. If it leads to better performance, your existing default prompt in the application is overridden with the new one. When you use Amazon Bedrock, Amazon Bedrock Prompt Management allows you to create and save your own prompts so you can save time by applying the same prompt to different workflows. Alternatively, you can use Amazon DynamoDB, a serverless, fully managed NoSQL database, to store your prompts.
Prompt chaining – Generative AI developers often use prompt chaining techniques to break complex tasks into subtasks before sending them to an LLM. A centralized service that exposes APIs for common prompt-chaining architectures to your tenants can accelerate development. You can use AWS Step Functions to orchestrate the chaining workflows and Amazon EventBridge to listen to task completion events and trigger the next step. Refer to Perform AI prompt-chaining with Amazon Bedrock for more details.
Agent – Tenants also often employ autonomous agents to complete complex tasks. Such agents orchestrate interactions between models, data sources, APIs, and applications. The agents component allows them to create, manage, access, and share agent implementations. On AWS, you can use the fully managed Amazon Bedrock Agents or tools of your choice such as LangChain agents or LlamaIndex agents.
Re-ranker – In the RAG design, a search in internal company data often returns multiple candidate outputs. A re-ranker, such as a Cohere Rerank 2 model, helps identify the best candidates based on predefined criteria. If your tenants prefer to use the capabilities of managed services such as Amazon OpenSearch Service or Amazon Kendra, this component isn’t needed.
Hybrid search – In RAG, you may also optionally want to implement and expose different templates for performing hybrid search that help improve the quality of the retrieved documents. This logic sits in a hybrid search component. If you use managed services such as Amazon OpenSearch Service, this component is also not required.

Responsible AI components

This group contains key components for Responsible AI, as shown in the following diagram.

Guardrails – Guardrails help you implement safeguards in addition to the FM built-in protections. They can be applied as generic defaults for users in your organization or can be specific to each use case. You can use Amazon Bedrock Guardrails to implement such safeguards based on your application requirements and responsible AI policies. With Amazon Bedrock Guardrails, you can block undesirable topics, filter harmful content, and redact or block sensitive information such as PII and custom regular expression to protect privacy. Additionally, contextual grounding checks can help detect hallucinations in model responses based on a reference source and a user query. The ApplyGuardrail API can evaluate input prompts and model responses for FMs on Amazon Bedrock, custom FMs, and third-party FMs, enabling centralized governance across your generative AI applications.
Red teaming – Red teaming helps reveal model limitations that can cause bad user experiences or enable malicious intentions. LLMs can be vulnerable to security and privacy attacks such as backdoor attacks, poisoning attacks, prompt injection, jailbreaking, PII leakage attacks, membership inference attacks or gradient leakage attacks. You can set up a test application and a red team with your own employees or automate it against a known set of vulnerabilities. For example, you can test the application with known jailbreaking datasets such as these You can use the results to tailor your Amazon Bedrock Guardrails to block undesirable topics, filter harmful content, and redact or block sensitive information.
Human in the loop – The human-in-the-loop approach is the process of collecting human inputs across the ML lifecycle to improve the accuracy and relevancy of models. Humans can perform a variety of tasks, from data generation and annotation to model review, customization, and evaluation. With SageMaker Ground Truth, you have a self-service offering and an AWS managed In the self-service offering, your data annotators, content creators, and prompt engineers (in-house, vendor-managed, or using the public crowd) can use the low-code UI to accelerate human-in-the-loop tasks. The AWS managed offering (SageMaker Ground Truth Plus) designs and customizes an end-to-end workflow and provides a skilled AWS managed team that is trained on specific tasks and meets your data quality, security, and compliance requirements. With model evaluation in Amazon Bedrock, you can set up FM evaluation jobs that use human workers to evaluate the responses from multiple models and compare them with a ground truth response. You can set up different methods including thumbs up or down, 5-point Likert scales, binary choice buttons, or ordinal ranking.
Model evaluation – Model evaluation allows you to compare model outputs and choose the model best suited for downstream generative AI applications. You can use automatic model evaluations, human-in-the-loop evaluations or both. Model evaluation in Amazon Bedrock allows you to set up automatic evaluation jobs and evaluation jobs that use human workers. You can choose existing datasets or provide your own custom prompt dataset. With Amazon SageMaker Clarify, you can evaluate FMs from Amazon SageMaker JumpStart. You can set up model evaluation for different tasks such as text generation, summarization, classification, and question and answering, across different dimensions including prompt stereotyping, toxicity, factual knowledge, semantic robustness, and accuracy. Finally, you can build your own evaluation pipelines and use tools such as fmeval.
Model monitoring – The model monitoring service allows tenants to evaluate model performance against predefined metrics. A model monitoring solution gathers request and response data, runs evaluation jobs to calculate performance metrics against preset baselines, saves the outputs, and sends an alert in case of issues.

If you use Amazon Bedrock, you can enable model invocation logging to collect input and output data and use Amazon Bedrock evaluation to run model evaluation jobs. Alternatively, you can use AWS Lambda and implement your own logic, or use open source tools such as fmeval. In SageMaker, you can enable data capture for your SageMaker real-time endpoint and use SageMaker Clarify to run the model evaluation jobs or implement your own evaluation logic. Both Amazon Bedrock and SageMaker integrate with SageMaker Ground Truth, which helps you gather ground truth data and human feedback for model responses. AWS Step Functions can help you orchestrate the end-to-end monitoring workflow.

Core services

Core services represent a collection of administrative and management components or modules. These components are designed to provide oversight, control, and governance over various aspects of the system’s operation, resource management, user and tenant administration, and model management. These are illustrated in the following diagram.

Tenant management and identity

Tenant management is a crucial aspect of multi-tenant systems, where a single instance of an application or environment serves multiple tenants or customers, each with their own isolated and secure environment. The tenant management component is responsible for managing and administering these tenants within the system.

Tenant onboarding and provisioning – This helps with creating a repeatable onboarding process for new tenants. It involves creating tenant-specific environments, allocating resources, and configuring access controls based on the tenant’s requirements.
Tenant configuration and customization – Many multi-tenant systems allow tenants to customize certain aspects of the application or environment to suit their specific needs. The tenant management component may provide interfaces or tools for tenants to configure settings, branding, workflows, or other customizable features within their isolated environments.
Tenant monitoring and reporting – This component is directly linked to the monitor and metering component and reports on tenant-specific usage, performance, and resource consumption. It can provide insights into tenant activity, identify potential issues, and facilitate capacity planning and resource allocation for each tenant.
Tenant billing and subscription management – In solutions with different pricing models or subscription plans, the tenant management component can handle billing and subscription management for each tenant based on their usage, resource consumption, or contracted service levels.

In the proposed solution, you also need an authorization flow that establishes the identity of the user making the request. With AWS IAM Identity Center, you can create or connect workforce users and centrally manage their access across their AWS accounts and applications. With Amazon Cognito, you can authenticate and authorize users from the built-in user directory, from your enterprise directory, and from other consumer identity providers. AWS Identity and Access Management (IAM) provides fine-grained access control. You can use IAM to specify who can access which FMs and resources to maintain least privilege permissions.

For example, in one common scenario with Cognito that accesses resources with API Gateway and Lambda with a user pool. In the following diagram, when your user signs in to an Amazon Cognito user pool, your application receives JSON Web Tokens (JWTs). You can use groups in a user pool to control permissions with API Gateway by mapping group membership to IAM roles. You can submit your user pool tokens with a request to API Gateway for verification by an Amazon Cognito authorizer Lambda function. For more information, see Using API Gateway with Amazon Cognito user pools.

It is recommended that you don’t use API keys for authentication or authorization to control access to your APIs. Instead, use an IAM role, a Lambda authorizer, or an Amazon Cognito user pool.

Model onboarding

A key aspect of the generative AI gateway is allowing controlled access to foundation and custom models across tenants. For FMs available through Amazon Bedrock, the model onboarding component maintains an allowlist of approved models that tenants can access. You can use a service such as Amazon DynamoDB to track allowlisted models. Similarly, for custom models deployed on Amazon SageMaker, the component tracks which tenants have access to which model versions through entries in the DynamoDB registry table.

To enforce access control, you can use AWS Lambda authorizers with Amazon API Gateway. When a tenant application calls the model invocation API, the Lambda authorizer verifies the tenant’s identity and checks if they have permission to access the requested model based on the DynamoDB registry table. If access is permitted, temporary credentials are issued, which scope down the tenant’s permissions to just the allowed model(s). This prevents tenants from accessing models they shouldn’t have access to. The authorizer logic can be customized based on an organization’s model access policies and governance requirements.

This approach supports model end of life. By managing the model from the allowlist in the DynamoDB registry table for all or selected tenants, models not included aren’t usable automatically, with no further code changes required in the solution.

Model registry

A model registry helps manage and track different versions of custom models. Services such as Amazon SageMaker Model Registry and Amazon DynamoDB help track available models, associated generated model artifacts, and lineage. A model registry offers the following:

Version control – To track different versions of the generative AI models.
Model lineage and provenance – To track the lineage and provenance of each model version, including information about the training data, hyperparameters, model architecture, and other relevant metadata that describes the model’s origin and characteristics.
Model deployment and rollback – To facilitate the deployment and usage of new model versions into production environments and the rollback to previous versions if necessary. This makes sure that models can be updated or reverted seamlessly without disrupting the system’s operation.
Model governance and compliance – To verify that model versions are properly documented, audited, and conform to relevant policies or regulations. This is particularly useful in regulated industries or environments with strict compliance requirements.

Observability

Observability is crucial for monitoring the health of your application, troubleshooting issues, usage of FMs, and optimizing performance and costs.

Logging and monitoring

Amazon CloudWatch is a powerful monitoring and observability service that allows you to collect and analyze logs from your application components, including API Gateway, Amazon Bedrock, Amazon SageMaker, and custom services. Using CloudWatch to capture tenant identity in the logs across the whole stack helps you gain insights into the performance and health of your generative AI gateway down to the tenant level and proactively identify and resolve issues before they escalate. You can also set up alarms to get notified in case of unexpected behavior. Both Amazon SageMaker and Amazon Bedrock are integrated with AWS CloudTrail.

Metering

Metering helps collect, aggregate, and analyze operational and usage data and performance metrics from different parts of the solution. In systems that offer pay-per-use or subscription-based models, metering is crucial for accurately measuring and reporting resource consumption for billing purposes across the different tenants.

In this solution, you need to track the usage of FMs to effectively manage costs and optimize resource utilization. Collecting information related to the models used, number of tokens provided as input, tokens generated as output, AWS Region used, and applying tags related to the team helps you streamline the cost allocation and billing processes. You can log structured data during interactions with the FMs and collect this usage information. The following diagram shows an implementation where the Lambda function logs per tenant information in Amazon CloudWatch and invokes Amazon Bedrock. The invocation generates an AWS CloudTrail event.

Auditing

You can use an AWS Lambda function to aggregate the data from Amazon CloudWatch and store it in S3 buckets for long-term storage and further analysis. Amazon S3 provides a highly durable, scalable, and cost-effective object storage solution, making it an ideal choice for storing large volumes of data. For implementation details, refer to part 1 of this series, Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock.

Once the data is in Amazon S3, you can use AWS analytics services such as Amazon Athena, AWS Glue Data Catalog, and Amazon QuickSight to uncover patterns in the cost and usage data, generate reports, visualize trends, and make informed decisions about resource allocation, budget forecasting, and cost optimization strategies. With AWS Glue Data Catalog, a centralized metadata repository, and Amazon Athena, an interactive query service, you can run one-time SQL queries directly on the data stored in Amazon S3. The following example describes usage and cost per model per tenant in Athena.

Scaling across the enterprise

The following are some design considerations for when you scale this solution across hundreds of LOBs and teams within an organization.

Account limits – So far, we have discussed how to deploy the gateway solution in a single AWS account. As teams rapidly onboard to the gateway and expand their usage of LLMs, this might result in various components hitting their AWS account limits and can quickly become a bottleneck. We recommend deploying the generative AI gateway to more than one AWS accounts where each AWS account corresponds to one LOB. The reasoning behind this suggestion is, generally, the LOBs in large enterprises are quite autonomous and can each have tens to hundreds of teams. In addition, they may have strict data privacy policies which restricts them from sharing the data with other LOBs. In addition to this account, each LOB may have their non-prod AWS account as well where this gateway solution is deployed for testing and integration purposes.
Production and non-production workloads – In most cases, tenant teams will want to use this gateway across their development, test, and production environments. Although it largely depends on an organization’s operating model, our recommendation is to have a dedicated development, test, and production environment for the gateway as well, so the teams can experiment freely without overloading the production gateway or polluting it with non-production data. This offers the additional benefit that you can set the limits for non-production gateways lower than those in production.
Handling RAG data components – For implementing RAG solutions, we suggest keeping all the data-related components on the tenant’s end. Every tenant will have their own data constraints, update cycle, format, terminologies, and permission groups. Assigning the responsibility of managing data sources to the gateway may hinder scalability because the gateway can’t accommodate the unique requirements of each tenant’s data sources and most likely will end up serving the lowest common denominator. Hence, we recommend having the data sources and related components managed on the tenant’s side.
Avoid reinventing the wheel – With this solution, you can build and manage your own components for model evaluation, guardrails, prompt catalogue, monitoring, and more. Services such as Amazon Bedrock provide the capabilities you need to build generative AI applications with security, privacy, and responsible AI right from the start. Our recommendation is to take a balanced approach and, wherever possible, use AWS native capabilities to reduce operational costs.
Keeping the generative AI gateway thin – Our suggestion is to keep this gateway thin in terms of storing business logic. The gateway shouldn’t add any business rules for any specific tenant and should avoid storing any kind of tenant specific data apart from operational data already discussed in the post.

Conclusion

A generative AI multi-tenant architecture helps you maintain security, governance, and cost controls while scaling the use of generative AI across multiple use cases and teams. In this post, we presented a reference multi-tenant architecture to help you accelerate generative AI adoption. We showed how to standardize common generative AI components and how to expose them as shared services. The proposed architecture also addressed key aspects of governance, security, observability, and responsible AI. Finally, we discussed key considerations when scaling this architecture to hundreds of teams.

If you want to read more about this topic, check out also the following resources:

Let us know what you think in the comments section!

About the authors

Anastasia Tzeveleka is a Senior Generative AI/ML Specialist Solutions Architect at AWS. As part of her work, she helps customers across EMEA build foundation models and create scalable generative AI and machine learning solutions using AWS services.

Hasan Poonawala is a Senior AI/ML Specialist Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Hasan helps design, deploy and scale Generative AI and Machine learning applications on AWS. He has over 15 years of combined work experience in machine learning, software development and data science on the cloud. In his spare time, Hasan loves to explore nature and spend time with friends and family.

Bruno Pistone is a Senior Generative AI and ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations

Vikesh Pandey is a Principal Generative AI/ML Solutions architect, specialising in financial services where he helps financial customers build and scale Generative AI/ML platforms and solution which scales to hundreds to even thousands of users. In his spare time, Vikesh likes to write on various blog forums and build legos with his kid.

Antonio Rodriguez is a Principal Generative AI Specialist Solutions Architect at Amazon Web Services. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Enhance customer support with Amazon Bedrock Agents by integrating enterprise data APIs

November 7, 2024

by Deepak Kovvuri Amazon AWS

Generative AI has transformed customer support, offering businesses the ability to respond faster, more accurately, and with greater personalization. AI agents, powered by large language models (LLMs), can analyze complex customer inquiries, access multiple data sources, and deliver relevant, detailed responses.

In this post, we guide you through integrating Amazon Bedrock Agents with enterprise data APIs to create more personalized and effective customer support experiences. Although the principles discussed are applicable across various industries, we use an automotive parts retailer as our primary example throughout this post.

By the end of this post, you’ll have a clear understanding of how to do the following:

Use Amazon Bedrock Agents to create intelligent, context-aware customer support bots
Integrate enterprise data sources, such as inventory management and catalog systems, with agents using AWS Lambda
Build customized chat interfaces using the Amazon Bedrock Agents API
Implement a solution that can instantly cross-reference product specifications with catalogs, check real-time inventory, and provide detailed information to the end-user

Solution overview

To illustrate the potential of this technology, consider an automotive parts retailer. In this industry, finding the right components can be challenging, because it often involves navigating extensive catalogs and complex compatibility requirements. An automotive retailer might use inventory management APIs to track stock levels and catalog APIs for vehicle compatibility and specifications. Access to car manuals and technical documentation helps the agent provide additional context for curated guidance, enhancing the quality of customer interactions.

The solution presented in this post takes approximately 15–30 minutes to deploy and consists of the following key components:

Amazon OpenSearch Service Serverless maintains three indexes: the inventory index, the compatible parts index, and the owner manuals index. These indexes enable efficient searching and retrieval of part data and vehicle information, providing quick and accurate results.
Amazon Bedrock Agents coordinates interactions between foundation models (FMs), knowledge bases, and user conversations. The agents also automatically call APIs to perform actions and access knowledge bases to provide additional information.
Amazon Bedrock Knowledge Bases enables you to use Retrieval Augmented Generation (RAG), a technique that enhances responses from LLMs by incorporating information from a data store. By setting up a knowledge base with your data sources, your application can query it to provide answers, either through direct quotes from the sources or through naturally generated responses based on the query results.
A web application serves as the frontend interface where users can initiate parts lookup requests.

Ingestion flow

The ingestion flow prepares and stores the necessary data for the AI agent to access. The following diagram illustrates how it works.

The workflow includes the following steps:

Documents (owner manuals) are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket.
Amazon Bedrock Knowledge Bases ingests these documents:
1. The knowledge base is configured to use the S3 bucket as a data source.
2. The data source is synchronized and the knowledge base detects new, modified, or deleted documents in the S3 bucket and updates accordingly.
3. The documents are chunked into smaller segments for more effective processing. This solution uses fixed-size chunking, where you can configure the desired chunk size by specifying the number of tokens per chunk and an overlap percentage.
Each chunk is embedded by using an embedding model such as Cohere Embed on Amazon Bedrock to create vector representations (embeddings) of the text.
The embeddings are stored in the Amazon OpenSearch Service owner manuals index. OpenSearch Service is used as the vector store for efficient similarity searching. The embeddings, along with metadata about the source documents, are indexed for quick retrieval.

User interaction flow

The following diagram illustrates the user interaction flow.

A user interacts with the Car Parts Agent through a web application interface. They can ask questions like “What wiper blades fit a 2021 Honda CR-V?” or ”Tell me about part number 76622-T0A-A01.”
The web application sends the user’s query to the Amazon Bedrock agent using the InvokeAgent API. The agent, using Anthropic’s Claude 3 Sonnet, interprets the user’s query and determines the best course of action through chain-of-thought (CoT) reasoning. At this stage, the agent employs guardrails to make sure it stays within its defined scope and capabilities. Through a runtime process that includes preprocessing and postprocessing steps, the agent categorizes the user’s input. This allows it to handle out-of-scope questions or potentially harmful inputs appropriately, without attempting to answer beyond its capabilities or knowledge base. The agent then analyzes the query to extract key information such as vehicle details, part numbers, or general automotive topics. If the query is within scope, the agent proceeds; if not, it provides a response indicating it can’t assist with that particular request.
For general inquiries, the agent consults its knowledge base in Amazon Bedrock, which includes information from various car manuals. This allows the agent to provide context and general information about car parts and systems.
For specific part inquiries, the agent consults the action groups available to the agent and invokes the correct action (API) to retrieve relevant information. This invocation happens when the agent determines that it needs to run a specific action based on the user input.
1. The Lambda function runs the database query against the appropriate OpenSearch Service indexes, searching for exact matches or using fuzzy matching for partial information. It can access the inventory index for specific part details or the compatible parts index for compatibility information.
2. The Lambda function processes the OpenSearch Service results and formats them for the Amazon Bedrock agent.
The Amazon Bedrock agent takes the formatted results and generates a human-readable response, combining database information with its general knowledge to provide comprehensive answers.

The following diagram illustrates the workflow of the agent.

This diagram illustrates the agent’s workflow from user query to response generation, integrating knowledge base and API data to provide comprehensive answers and handle follow-up questions.

Developer tools

The solution also uses the following developer tools:

AWS Powertools for Lambda – This is a suite of utilities for Lambda functions that generates OpenAPI schemas from your Lambda function code. It provides annotations for business logic, descriptions, and parameter validations, automatically producing JSON-serialized OpenAPI schemas for use with Amazon Bedrock Agents.
AWS Generative AI Constructs Library – This is an open source extension of the AWS Cloud Development Kit (AWS CDK) that offers multi-service, well-architected patterns for quickly defining generative AI solutions. It provides constructs to help developers build generative AI applications using pattern-based definitions for your infrastructure.

Prerequisites

You should have the following prerequisites:

An AWS account with the appropriate AWS Identity and Access Management (IAM) permissions to create Amazon Bedrock agents and knowledge bases, Lambda functions, and IAM roles
Access to the following models hosted on Amazon Bedrock:
- Anthropic’s Claude 3 Sonnet (model ID: anthropic.claude-3-sonnet-20240229-v1:0)
- Cohere Embed English v3 (model ID: cohere.embed-english-v3)
A local development environment with the following:
- The AWS Command Line Interface (AWS CLI) installed and configured with appropriate permissions.
- Python 3.9 or later
- Node.js v20.x or later
- The AWS CDK CLI installed

Deploy the solution

The following steps outline the process to deploying the solution using the AWS CDK. The complete source code for this solution is available in the GitHub repository.

Open your terminal and run the following commands to clone the GitHub repository to your local machine:

git clone https://github.com/aws-samples/bedrock-agent-carpart-lookup.git
cd bedrock-agent-carpart-lookup

Create and activate a Python virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows, use .venvScriptsactivate

Install the required Python packages:
```
pip install -r requirements.txt
```
Use the AWS CDK CLI to deploy the solution:
```
cdk deploy
```

During deployment, you may be prompted to approve IAM role creations and security changes. Review and approve these if you’re comfortable with the permissions. After deployment, the AWS CDK CLI will output the web application URL. Make note of this URL (as shown in following screenshot) to access and test the agent.

After you deploy the solution, you can verify the created resources on the Amazon Bedrock console. On the Agents page, you’ll notice a new agent called car-parts-agent.

Effective agent instructions are crucial for optimizing the performance of AI-powered assistants. A well-structured set of instructions should encompass several key components:

Agent role – Define the assistant’s purpose, such as serving as a Car Parts Assistant that helps users find compatible parts and automotive information
Agent actions – Outline primary tasks, such as identifying parts based on vehicle details, verifying compatibility, and providing technical specifications
Agent guidelines – Establish rules for interaction, prioritizing accuracy and safety, clearly stating uncertainties, and using actions for searches
Agent guardrails – Implement limits to make sure the agent operates safely and effectively, using relevant automotive knowledge to enhance user support

For example, the agent we deployed has been preconfigured with the following instruction:

You are an Car Parts Assistant, helping users find compatible parts and providing automotive information. Your main tasks are: Part Identification: Find specific parts based on vehicle details (make, model, year). Assist with partial information. Compatibility Checks: Verify if parts are compatible with given vehicles. Technical Info: Provide part specifications, features, and explain component functions. Use database functions for searches and compatibility checks. Supplement with automotive knowledge for comprehensive help. Your goal is to assist effectively while ensuring users make informed decisions about their vehicle parts. Always prioritize accuracy and safety. State uncertainties clearly.

□ Role

□ Actions

□ Guidelines

□ Guardrails

The agent has two main components:

Action group – An action group named CarpartsApi is created, and the actions it can perform are defined using an OpenAPI schema. Optionally, you can use Powertools for AWS Lambda to simplify the process of generating the OpenAPI schema. For more information, refer to the PowerTools documentation on Amazon Bedrock Agents. The OpenAPI schema used by this agent can be viewed on the following GitHub repo. The action group is then associated with a Lambda function containing the business logic for these actions.
Knowledge base – This repository enhances the agent’s responses using RAG in Amazon Bedrock. It contains information from car manuals and technical documentation. When associating a knowledge base with an agent, you can optionally provide a description on how the agent can use the knowledge base. For this demo, we use the following description for the knowledge base:

This knowledge base contains manuals and technical documentation about various car makes from manufacturers such as Honda, Tesla, Ford, Subaru, Kia, Toyota etc.

□ Instructions

The agent employs CoT reasoning to process user queries, analyzing input against its instructions and evaluating actions based the OpenAPI provided and knowledge base description. When required information is missing, as determined by the OpenAPI schema’s specifications, the agent formulates questions to elicit necessary data from the user. This analysis and information gathering leads to a logical sequence of steps, including API calls and knowledge base queries. The resulting observation enhances the prompt for the FM, which then determines and runs the most effective actions.

For this post, we use the AWS CDK and the AWS Generative AI Constructs Library to create the Amazon Bedrock agent. This approach enables version-controlled, reproducible infrastructure as code (IaC). Alternatively, you can create agents using the AWS CLI or AWS Management Console. For more details, see create and configure agent manually.

Test and validate the solution

After you deploy the solution, you can test the agent either on the Amazon Bedrock console or through the application URL noted earlier. In the application, you can ask questions like “What wiper blades fit a 2021 Honda CR-V?” or “Tell me about part number 76622-T0A-A01.”

To illustrate the Amazon Bedrock agent’s reasoning capabilities, let’s examine the agent’s response to a user question “What wiper blades fit a 2021 Honda CR-V?” The workflow would consist of the following steps:

The agent analyzes the query, identifying key elements: Honda (make), CR-V (model), 2021 (year), and wiper blades (part category).
It then reasons about which API to use, selecting the get_compatible_parts function as the most appropriate for vehicle-specific part queries.
The agent maps the query elements to the function’s parameters, demonstrating its understanding of how to translate natural language to API calls.
The agent calls the carPartsApi/get_compatible_parts action, which queries the OpenSearch Service index for compatible parts. When an action group using a Lambda function is invoked, Amazon Bedrock sends a Lambda input event using a general format.
Upon receiving a response, the agent interprets the data, identifying relevant wiper blade options. The agent formulates a natural language response, recommending specific wiper blades. Similarly, Amazon Bedrock expects a response from your Lambda function to match a pre-defined format.

We get the following natural language response:

Based on the compatible parts search for a 2021 Honda CR-V in the “Wipers” category, here are the recommended wiper blade options:

Wiper Blade Set (Part #76622-TLA-A11) – This is a genuine Honda wiper blade set that includes the driver and passenger side wiper blades for the CR-V.

Rear Wiper Blade (Part #76630-TLA-A01) – This is the rear wiper blade specifically for the 2021 CR-V.

You can also consider individual wiper blades like the Driver Side Wiper Blade (Part #76622-T0A-A01) or the Passenger Side Wiper Arm (Part #76630-T0A-A01) if you only need to replace one side.

This process showcases how Amazon Bedrock agents can intelligently integrate enterprise data APIs with AI-driven reasoning to provide accurate, context-aware responses to customer queries.

Clean up

To avoid future costs and clean up resources after you’re done exploring the solution, delete the resources you created by running the following command from your terminal (from the project directory):

cdk destroy

Key considerations

When implementing Amazon Bedrock Agents, consider the following factors to facilitate optimal performance and scalability:

Agent design – Follow these recommendations when designing your agent:
- Keep instructions focused and clear, with specific responsibilities for the agent
- For complex use cases, consider multiple specialized agents rather than overloading a single one
- Explore different FMs to find the best fit for your needs, considering both behavior and cost
Action management – Consider the following recommendations for action management:
- Define actions carefully, including only those that the agent should reliably perform
- Use clear, descriptive names for actions to help the agent determine their relevance
- Avoid overlapping actions to prevent confusion and conflicts during operation
Testing – Make sure your testing includes the following steps:
- Establish clear testing protocols
- Identify common use case inputs and set accuracy targets
- Define edge case inputs and agree on acceptable accuracy levels
- Determine out-of-domain inputs where the agent should not respond
- Automate tests and run them with system changes to verify consistency and reliability
Performance optimization – Consider the following performance optimizations:
- Break down complex operations into smaller actions to enhance response time and error handling
- Implement a “fail fast” principle for invalid queries, allowing more time for complex tasks
Security and compliance – Use Amazon Bedrock Guardrails to prevent the agent from generating harmful content or making unauthorized actions
Cost management – Monitor usage-based pricing for token processing and storage, facilitating efficient resource allocation and cost management

Conclusion

Integrating enterprise data APIs with Amazon Bedrock Agents offers a powerful solution for streamlining customer support, as demonstrated in the automotive parts industry. This AI-driven approach enables rapid, accurate responses to complex queries, seamlessly integrates multiple data sources, and reduces staff workload while enhancing customer experience through context-aware interactions.

The solution discussed in this post can elevate customer support across various industries. By using Amazon Bedrock agents, organizations can create more efficient, accurate, and satisfying support experiences tailored to their specific needs. To explore how AI agents can transform your own support operations, refer to Automate tasks in your application using conversational agents.

About the Authors

Deepak Kovvuri is a Senior Solutions Architect supporting Automotive and Manufacturing Customers at AWS in the US Northeast. He has over 6 years of experience in helping customers architecting a DevOps strategy for their cloud workloads. Deepak specializes in CI/CD, Systems Administration, Infrastructure as Code and Container Services. He holds an Masters in Computer Engineering from University of Illinois at Chicago.

Kingston Bosco is a Senior Solutions Architect for Global Strategic Partners at AWS. He designs and implements solutions that optimize DevOps workflows, automate cloud operations, and improve infrastructure management for customers. He holds a Master’s in Information Systems. In his free time, he enjoys hiking with his dogs and playing soccer.

Unleash the power of generative AI with Amazon Q Business: How CCoEs can scale cloud governance best practices and drive innovation

November 6, 2024

by Steven Craig Amazon AWS

This post is co-written with Steven Craig from Hearst.

To maintain their competitive edge, organizations are constantly seeking ways to accelerate cloud adoption, streamline processes, and drive innovation. However, Cloud Center of Excellence (CCoE) teams often can be perceived as bottlenecks to organizational transformation due to limited resources and overwhelming demand for their support.

In this post, we share how Hearst, one of the nation’s largest global, diversified information, services, and media companies, overcame these challenges by creating a self-service generative AI conversational assistant for business units seeking guidance from their CCoE. With Amazon Q Business, Hearst’s CCoE team built a solution to scale cloud best practices by providing employees across multiple business units self-service access to a centralized collection of documents and information. This freed up the CCoE to focus their time on high-value tasks by reducing repetitive requests from each business unit.

Readers will learn the key design decisions, benefits achieved, and lessons learned from Hearst’s innovative CCoE team. This solution can serve as a valuable reference for other organizations looking to scale their cloud governance and enable their CCoE teams to drive greater impact.

The challenge: Enabling self-service cloud governance at scale

Hearst undertook a comprehensive governance transformation for their Amazon Web Services (AWS) infrastructure. The CCoE implemented AWS Organizations across a substantial number of business units. These business units then used AWS best practice guidance from the CCoE by deploying landing zones with AWS Control Tower, managing resource configuration with AWS Config, and reporting the efficacy of controls with AWS Audit Manager. As individual business units sought guidance on adhering to the AWS recommended best practices, the CCoE created written directives and enablement materials to facilitate the scaled adoption across Hearst.

The existing CCoE model had several obstacles slowing adoption by business units:

Extreme demand – The CCoE team was becoming a bottleneck, unable to keep up with the growing demand for their expertise and guidance. The team was stretched thin, and the traditional approach of relying on human experts to address every question was impeding the pace of cloud adoption for the organization.
Limited scalability – As the volume of requests increased, the CCoE team couldn’t disseminate updated directives quickly enough. Manually reviewing each request across multiple business units wasn’t sustainable.
Inconsistent governance – Without a standardized, self-service mechanism to access the CCoE teams’ expertise and disseminate guidance on new policies, compliance practices, or governance controls, it was difficult to maintain consistency based on the CCoE best practices across each business unit.

To address these challenges, Hearst’s CCoE team recognized the need to quickly create a scalable, self-service application that could empower the business units with more access to updated CCoE best practices and patterns to follow.

Overview of solution

To enable self-service cloud governance at scale, Hearst’s CCoE team decided to use the power of generative AI with Amazon Q Business to build a conversational assistant. The following diagram shows the solution architecture:

The key steps Hearst took to implement Amazon Q Business were:

Application deployment and authentication – First, the CCoE team deployed Amazon Q Business and integrated AWS IAM Identity Center with their existing identity provider (using Okta in this case) to seamlessly manage user access and permissions between their existing identity provider and Amazon Q Business.
Data source curation and authorization – The CCoE team created several Amazon Simple Storage Service (Amazon S3) buckets to store their curated content, including cloud governance best practices, patterns, and guidance. They set up a general bucket for all users and specific buckets tailored to each business unit’s needs. User authorization for documents within the individual S3 buckets were controlled through access control lists (ACLs). You add access control information to a document in an Amazon S3 data source using a metadata file associated with the document. This made sure end users would only receive responses from documents they were authorized to view. With the Amazon Q Business S3 connector, the CCoE team was able to sync and index their data in just a few clicks.
User access management – With the data source and access controls in place, the CCoE team then set up user access on a business unit by business unit basis, considering various security, compliance, and custom requirements. As a result, the CCoE could deliver a personalized experience to each business unit.
User interface development – To provide a user-friendly experience, Hearst built a custom web interface so employees could interact with the Amazon Q Business assistant through a familiar and intuitive interface. This encouraged widespread adoption and self-service among the business units.
Rollout and continuous improvement – Finally, the CCoE team shared the web experience with the various business units, empowering employees to access the guidance and best practices they needed through natural language interactions. Going forward, the team enriched the knowledge base (S3 buckets) and implemented a feedback loop to facilitate continuous improvement of the solution.

For Hearst’s CCoE team, Amazon Q Business was the quickest way to use generative AI on AWS, with minimal risk and less upfront technical complexity.

Speed to value was an important advantage because it allowed the CCoE to get these powerful generative AI capabilities into the hands of employees as quickly as possible, unlocking new levels of scalability, efficiency, and innovation for cloud governance consistency across the organization.
This strategic decision to use a managed service at the application layer, such as Amazon Q Business, enabled the CCoE to deliver tangible value for the business units in a matter of weeks. By opting for the expedited path to using generative AI on AWS, Hearst was never bogged down in the technical complexities of developing and managing their own generative AI application.

The results: Decreased support requests and increased cloud governance consistency

By using Amazon Q Business, Hearst’s CCoE team achieved remarkable results in empowering self-service cloud governance across the organization. The initial impact was immediate—within the first month, the CCoE team saw a 70% reduction in the volume of requests for guidance and support from the various business units. This freed up the team to focus on higher-value initiatives instead of getting bogged down in repetitive, routine requests. The following month, the number of requests for CCoE support dropped by 76%, demonstrating the power of a self-service assistant with Amazon Q Business. The benefits went beyond just reduced request volume. The CCoE team also saw a significant improvement in the consistency and quality of cloud governance practices across Hearst, enhancing the organization’s overall cloud security, compliance posture, and cloud adoption.

Conclusion

Cloud governance is a critical set of rules, processes, and reports that guide organizations to follow best practices across their IT estate. For Hearst, the CCoE team sets the tone and cloud governance standards that each business unit follows. The implementation of Amazon Q Business allowed Hearst’s CCoE team to scale the governance and security that support business units depend on through a generative AI assistant. By disseminating best practices and guidance across the organization, the CCoE team freed up resources to focus on strategic initiatives, while employees gained access to a self-service application, reducing the burden on the central team. If your CCoE team is looking to scale its impact and enable your workforce, consider using the power of conversational AI through services like Amazon Q Business, which can position your team as a strategic enabler of cloud transformation.

Listen to Steven Craig share how Hearst leveraged Amazon Q Business to scale the Cloud Center of Excellence

Reading References:

Getting started with Amazon Q
The Business Value of AWS Cloud Governance Services IDC white paper.

About the Authors

Steven Craig is a Sr. Director, Cloud Center of Excellence. He oversees Cloud Economics, Cloud Enablement, and Cloud Governance for all Hearst-owned companies. Previously, as VP Product Strategy and Ops at Innova Solutions, he was instrumental in migrating applications to public cloud platforms and creating IT Operations Managed Service offerings. His leadership and technical solutions were key in achieving sequential AWS Managed Services Provider certifications. Steven has been AWS Professionally certified for over 8 years.

Oleg Chugaev is a Principal Solutions Architect and Serverless evangelist with 20+ years in IT, holding multiple AWS certifications. At AWS, he drives customers through their cloud transformation journeys by converting complex challenges into actionable roadmaps for both technical and business audiences.

Rohit Chaudhari is a Senior Customer Solutions Manager with over 15 years of diverse tech experience. His background spans customer success, product management, digital transformation coaching, engineering, and consulting. At AWS, Rohit serves as a trusted advisor for customers to work backwards from their business goals, accelerate their journey to the cloud, and implement innovative solutions.

Al Destefano is a Generative AI Specialist at AWS based in New York City. Leveraging his AI/ML domain expertise, Al develops and executes global go-to-market strategies that drive transformative results for AWS customers at scale. He specializes in helping enterprise customers harness the power of Amazon Q, a generative AI-powered assistant, to overcome complex challenges and unlock new business opportunities.

Integrate foundation models into your code with Amazon Bedrock

November 6, 2024

by Rajakumar Sampathkumar Amazon AWS

The rise of large language models (LLMs) and foundation models (FMs) has revolutionized the field of natural language processing (NLP) and artificial intelligence (AI). These powerful models, trained on vast amounts of data, can generate human-like text, answer questions, and even engage in creative writing tasks. However, training and deploying such models from scratch is a complex and resource-intensive process, often requiring specialized expertise and significant computational resources.

Enter Amazon Bedrock, a fully managed service that provides developers with seamless access to cutting-edge FMs through simple APIs. Amazon Bedrock streamlines the integration of state-of-the-art generative AI capabilities for developers, offering pre-trained models that can be customized and deployed without the need for extensive model training from scratch. Amazon maintains the flexibility for model customization while simplifying the process, making it straightforward for developers to use cutting-edge generative AI technologies in their applications. With Amazon Bedrock, you can integrate advanced NLP features, such as language understanding, text generation, and question answering, into your applications.

In this post, we explore how to integrate Amazon Bedrock FMs into your code base, enabling you to build powerful AI-driven applications with ease. We guide you through the process of setting up the environment, creating the Amazon Bedrock client, prompting and wrapping code, invoking the models, and using various models and streaming invocations. By the end of this post, you’ll have the knowledge and tools to harness the power of Amazon Bedrock FMs, accelerating your product development timelines and empowering your applications with advanced AI capabilities.

Solution overview

Amazon Bedrock provides a simple and efficient way to use powerful FMs through APIs, without the need for training custom models. For this post, we run the code in a Jupyter notebook within VS Code and use Python. The process of integrating Amazon Bedrock into your code base involves the following steps:

Set up your development environment by importing the necessary dependencies and creating an Amazon Bedrock client. This client will serve as the entry point for interacting with Amazon Bedrock FMs.
After the Amazon Bedrock client is set up, you can define prompts or code snippets that will be used to interact with the FMs. These prompts can include natural language instructions or code snippets that the model will process and generate output based on.
With the prompts defined, you can invoke the Amazon Bedrock FM by passing the prompts to the client. Amazon Bedrock supports various models, each with its own strengths and capabilities, allowing you to choose the most suitable model for your use case.
Depending on the model and the prompts provided, Amazon Bedrock will generate output, which can include natural language text, code snippets, or a combination of both. You can then process and integrate this output into your application as needed.
For certain models and use cases, Amazon Bedrock supports streaming invocations, which allow you to interact with the model in real time. This can be particularly useful for conversational AI or interactive applications where you need to exchange multiple prompts and responses with the model.

Throughout this post, we provide detailed code examples and explanations for each step, helping you seamlessly integrate Amazon Bedrock FMs into your code base. By using these powerful models, you can enhance your applications with advanced NLP capabilities, accelerate your development process, and deliver innovative solutions to your users.

Prerequisites

Before you dive into the integration process, make sure you have the following prerequisites in place:

AWS account – You’ll need an AWS account to access and use Amazon Bedrock. If you don’t have one, you can create a new account.
Development environment – Set up an integrated development environment (IDE) with your preferred coding language and tools. You can interact with Amazon Bedrock using AWS SDKs available in Python, Java, Node.js, and more.
AWS credentials – Configure your AWS credentials in your development environment to authenticate with AWS services. You can find instructions on how to do this in the AWS documentation for your chosen SDK. We walk through a Python example in this post.

With these prerequisites in place, you’re ready to start integrating Amazon Bedrock FMs into your code.

In your IDE, create a new file. For this example, we use a Jupyter notebook (Kernel: Python 3.12.0).

In the following sections, we demonstrate how to implement the solution in a Jupyter notebook.

Set up the environment

To begin, import the necessary dependencies for interacting with Amazon Bedrock. The following is an example of how you can do this in Python.

First step is to import boto3 and json:

import boto3, json

Next, create an instance of the Amazon Bedrock client. This client will serve as the entry point for interacting with the FMs. The following is a code example of how to create the client:

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

Define prompts and code snippets

With the Amazon Bedrock client set up, define prompts and code snippets that will be used to interact with the FMs. These prompts can include natural language instructions or code snippets that the model will process and generate output based on.

In this example, we asked the model, “Hello, who are you?”.

To send the prompt to the API endpoint, you need some keyword arguments to pass in. You can get these arguments from the Amazon Bedrock console.

On the Amazon Bedrock console, choose Base models in the navigation pane.

Select Titan Text G1 – Express.

Choose the model name (Titan Text G1 – Express) and go to the API request.

Copy the API request:

{
"modelId": "amazon.titan-text-express-v1",
"contentType": "application/json",
"accept": "application/json",
"body": "{"inputText":"this is where you place your input text","textGenerationConfig":{"maxTokenCount":8192,"stopSequences":[],"temperature":0,"topP":1}}"
}

Insert this code in the Jupyter notebook with the following minor modifications:
- We post the API requests to keyword arguments (kwargs).
- The next change is on the prompt. We will replace ”this is where you place your input text” by ”Hello, who are you?”
Print the keyword arguments:

kwargs = {
 "modelId": "amazon.titan-text-express-v1",
 "contentType": "application/json",
 "accept": "application/json",
 "body": "{"inputText":"Hello, who are you?","textGenerationConfig":{"maxTokenCount":8192,"stopSequences":[],"temperature":0,"topP":1}}"
}
print(kwargs)

This should give you the following output:

{'modelId': 'amazon.titan-text-express-v1', 'contentType': 'application/json', 'accept': 'application/json', 'body': '{"inputText":"Hello, who are you?","textGenerationConfig":{"maxTokenCount":8192,"stopSequences":[],"temperature":0,"topP":1}}'}

Invoke the model

With the prompt defined, you can now invoke the Amazon Bedrock FM.

Pass the prompt to the client:

response = bedrock_runtime.invoke_model(**kwargs)
response

This will invoke the Amazon Bedrock model with the provided prompt and print the generated streaming body object response.

{'ResponseMetadata': {'RequestId': '3cfe2718-b018-4a50-94e3-59e2080c75a3', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Fri, 18 Oct 2024 11:30:14 GMT', 'content-type': 'application/json', 'content-length': '255', 'connection': 'keep-alive', 'x-amzn-requestid': '3cfe2718-b018-4a50-94e3-59e2080c75a3', 'x-amzn-bedrock-invocation-latency': '1980', 'x-amzn-bedrock-output-token-count': '37', 'x-amzn-bedrock-input-token-count': '6'}, 'RetryAttempts': 0}, 'contentType': 'application/json', 'body': <botocore.response.StreamingBody at 0x105e8e7a0>}

The preceding Amazon Bedrock runtime invoke model will work for the FM you choose to invoke.

Unpack the JSON string as follows:

response_body = json.loads(response.get('body').read())
response_body

You should get a response as follows (this is the response we got from the Titan Text G1 – Express model for the prompt we supplied).

{'inputTextTokenCount': 6, 'results': [{'tokenCount': 37, 'outputText': 'nI am Amazon Titan, a large language model built by AWS. It is designed to assist you with tasks and answer any questions you may have. How may I help you?', 'completionReason': 'FINISH'}]}

Experiment with different models

Amazon Bedrock offers various FMs, each with its own strengths and capabilities. You can specify which model you want to use by passing the model_name parameter when creating the Amazon Bedrock client.

Like the previous Titan Text G1 – Express example, get the API request from the Amazon Bedrock console. This time, we use Anthropic’s Claude on Amazon Bedrock.

{
"modelId": "anthropic.claude-v2",
"contentType": "application/json",
"accept": "*/*",
"body": "{"prompt":"\n\nHuman: Hello world\n\nAssistant:","max_tokens_to_sample":300,"temperature":0.5,"top_k":250,"top_p":1,"stop_sequences":["\n\nHuman:"],"anthropic_version":"bedrock-2023-05-31"}"
}

Anthropic’s Claude accepts the prompt in a different way (\n\nHuman:), so the API request on the Amazon Bedrock console provides the prompt in the way that Anthropic’s Claude can accept.

Edit the API request and put it in the keyword argument:

kwargs = {
  "modelId": "anthropic.claude-v2",
  "contentType": "application/json",
  "accept": "*/*",
  "body": "{"prompt":"\n\nHuman: we have received some text without any context.\nWe will need to label the text with a title so that others can quickly see what the text is about \n\nHere is the text between these <text></text> XML tags\n\n<text>\nToday I sent to the beach and saw a whale. I ate an ice-cream and swam in the sea\n</text>\n\nProvide title between <title></title> XML tags\n\nAssistant:","max_tokens_to_sample":300,"temperature":0.5,"top_k":250,"top_p":1,"stop_sequences":["\n\nHuman:"],"anthropic_version":"bedrock-2023-05-31"}"
}
print(kwargs)

You should get the following response:

{'modelId': 'anthropic.claude-v2', 'contentType': 'application/json', 'accept': '*/*', 'body': '{"prompt":"\n\nHuman: we have received some text without any context.\nWe will need to label the text with a title so that others can quickly see what the text is about \n\nHere is the text between these <text></text> XML tags\n\n<text>\nToday I sent to the beach and saw a whale. I ate an ice-cream and swam in the sea\n</text>\n\nProvide title between <title></title> XML tags\n\nAssistant:","max_tokens_to_sample":300,"temperature":0.5,"top_k":250,"top_p":1,"stop_sequences":["\n\nHuman:"],"anthropic_version":"bedrock-2023-05-31"}'}

With the prompt defined, you can now invoke the Amazon Bedrock FM by passing the prompt to the client:

response = bedrock_runtime.invoke_model(**kwargs)
response

You should get the following output:

{'ResponseMetadata': {'RequestId': '72d2b1c7-cbc8-42ed-9098-2b4eb41cd14e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Thu, 17 Oct 2024 15:07:23 GMT', 'content-type': 'application/json', 'content-length': '121', 'connection': 'keep-alive', 'x-amzn-requestid': '72d2b1c7-cbc8-42ed-9098-2b4eb41cd14e', 'x-amzn-bedrock-invocation-latency': '538', 'x-amzn-bedrock-output-token-count': '15', 'x-amzn-bedrock-input-token-count': '100'}, 'RetryAttempts': 0}, 'contentType': 'application/json', 'body': <botocore.response.StreamingBody at 0x1200b5990>}

Unpack the JSON string as follows:

response_body = json.loads(response.get('body').read())
response_body

This results in the following output on the title for the given text.

{'type': 'completion', 'completion': ' <title>A Day at the Beach</title>', 'stop_reason': 'stop_sequence', 'stop': 'nnHuman:'}

Print the completion:

completion = response_body.get('completion')
completion

Because the response is returned in the XML tags as you defined, you can consume the response and display it to the client.

' <title>A Day at the Beach</title>'

Invoke model with streaming code

For certain models and use cases, Amazon Bedrock supports streaming invocations, which allow you to interact with the model in real time. This can be particularly useful for conversational AI or interactive applications where you need to exchange multiple prompts and responses with the model. For example, if you’re asking the FM for an article or story, you might want to stream the output of the generated content.

Import the dependencies and create the Amazon Bedrock client:

import boto3, json
bedrock_runtime = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)

Define the prompt as follows:

prompt = "write an article about fictional planet Foobar"

Edit the API request and put it in keyword argument as before:
We use the API request of the claude-v2 model.

kwargs = {
  "modelId": "anthropic.claude-v2",
  "contentType": "application/json",
  "accept": "*/*",
  "body": "{"prompt":"\n\nHuman: " + prompt + "\nAssistant:","max_tokens_to_sample":300,"temperature":0.5,"top_k":250,"top_p":1,"stop_sequences":["\n\nHuman:"],"anthropic_version":"bedrock-2023-05-31"}"
}

You can now invoke the Amazon Bedrock FM by passing the prompt to the client:
We use invoke_model_with_response_stream instead of invoke_model.

response = bedrock_runtime.invoke_model_with_response_stream(**kwargs)

stream = response.get('body')
if stream:
    for event in stream:
        chunk = event.get('chunk')
        if chunk:
            print(json.loads(chunk.get('bytes')).get('completion'), end="")

You get a response like the following as streaming output:

Here is a draft article about the fictional planet Foobar: Exploring the Mysteries of Planet Foobar Far off in a distant solar system lies the mysterious planet Foobar. This strange world has confounded scientists and explorers for centuries with its bizarre environments and alien lifeforms. Foobar is slightly larger than Earth and orbits a small, dim red star. From space, the planet appears rusty orange due to its sandy deserts and red rock formations. While the planet looks barren and dry at first glance, it actually contains a diverse array of ecosystems. The poles of Foobar are covered in icy tundra, home to resilient lichen-like plants and furry, six-legged mammals. Moving towards the equator, the tundra slowly gives way to rocky badlands dotted with scrubby vegetation. This arid zone contains ancient dried up riverbeds that point to a once lush environment. The heart of Foobar is dominated by expansive deserts of fine, deep red sand. These deserts experience scorching heat during the day but drop to freezing temperatures at night. Hardy cactus-like plants manage to thrive in this harsh landscape alongside tough reptilian creatures. Oases rich with palm-like trees can occasionally be found tucked away in hidden canyons. Scattered throughout Foobar are pockets of tropical jungles thriving along rivers and wetlands.

Conclusion

In this post, we showed how to integrate Amazon Bedrock FMs into your code base. With Amazon Bedrock, you can use state-of-the-art generative AI capabilities without the need for training custom models, accelerating your development process and enabling you to build powerful applications with advanced NLP features.

Whether you’re building a conversational AI assistant, a code generation tool, or another application that requires NLP capabilities, Amazon Bedrock provides a simple and efficient solution. By using the power of FMs through Amazon Bedrock APIs, you can focus on building innovative solutions and delivering value to your users, without worrying about the underlying complexities of language models.

As you continue to explore and integrate Amazon Bedrock into your projects, remember to stay up to date with the latest updates and features offered by the service. Additionally, consider exploring other AWS services and tools that can complement and enhance your AI-driven applications, such as Amazon SageMaker for machine learning model training and deployment, or Amazon Lex for building conversational interfaces.

To further explore the capabilities of Amazon Bedrock, refer to the following resources:

Share and learn with our generative AI community at community.aws.

Happy coding and building with Amazon Bedrock!

About the Authors

Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customer guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.

YaduKishore Tatavarthi is a Senior Partner Solutions Architect at Amazon Web Services, supporting customers and partners worldwide. For the past 20 years, he has been helping customers build enterprise data strategies, advising them on Generative AI, cloud implementations, migrations, reference architecture creation, data modeling best practices, and data lake/warehouse architectures.

Build and deploy a UI for your generative AI applications with AWS and Python

November 6, 2024

by Lior Perez Amazon AWS

The emergence of generative AI has ushered in a new era of possibilities, enabling the creation of human-like text, images, code, and more. However, as exciting as these advancements are, data scientists often face challenges when it comes to developing UIs and to prototyping and interacting with their business users. Traditionally, building frontend and backend applications has required knowledge of web development frameworks and infrastructure management, which can be daunting for those with expertise primarily in data science and machine learning.

AWS provides a powerful set of tools and services that simplify the process of building and deploying generative AI applications, even for those with limited experience in frontend and backend development. In this post, we explore a practical solution that uses Streamlit, a Python library for building interactive data applications, and AWS services like Amazon Elastic Container Service (Amazon ECS), Amazon Cognito, and the AWS Cloud Development Kit (AWS CDK) to create a user-friendly generative AI application with authentication and deployment.

Solution overview

For this solution, you deploy a demo application that provides a clean and intuitive UI for interacting with a generative AI model, as illustrated in the following screenshot.

The UI consists of a text input area where users can enter their queries, and an output area to display the generated results.

The default interface is simple and straightforward, but you can extend and customize it to fit your specific needs. With Streamlit’s flexibility, you can add additional features, adjust the styling, and integrate other functionalities as required by your use case.

The solution we explore consists of two main components: a Python application for the UI and an AWS deployment architecture for hosting and serving the application securely.

The Python application uses the Streamlit library to provide a user-friendly interface for interacting with a generative AI model. Streamlit allows data scientists to create interactive web applications using Python, using their existing skills and knowledge. With Streamlit, you can quickly build and iterate on your application without the need for extensive frontend development experience.

The AWS deployment architecture makes sure the Python application is hosted and accessible from the internet to authenticated users. The solution uses the following key components:

Amazon ECS and AWS Fargate provide a serverless container orchestration platform for running the Python application
Amazon Cognito handles user authentication, making sure only authorized users can access the generative AI application
Application Load Balancer (ALB) and Amazon CloudFront are responsible for load balancing and content delivery, so the application is available for users worldwide
The AWS CDK allows you to define and provision AWS infrastructure resources using familiar programming languages like Python
Amazon Bedrock is a fully managed service that offers a choice of high-performing generative AI models through an API

The following diagram illustrates this architecture.

Prerequisites

As a prerequisite, you need to enable model access in Amazon Bedrock and have access to a Linux or macOS development environment. You could also use a Windows development environment, in which case you need to update the instructions in this post.

Access to Amazon Bedrock foundation models is not granted by default. Complete the following steps to enable access to Anthropic’s Claude on Amazon Bedrock, which we use as part of this post:

Sign in to the AWS Management Console.
Choose the us-east-1 AWS Region from the top right corner.
On the Amazon Bedrock console, choose Model access in the navigation pane.
Choose Manage model access.
Select the model you want access to (for this post, Anthropic’s Claude). You can also select other models for future use.
Choose Next and then Submit to confirm your selection.

For more information on how to manage model access, see Access Amazon Bedrock foundation models.

Set up your development environment

To get started with deploying the Streamlit application, you need access to a development environment with the following software installed:

Python version 3.8 or later. You can download Python from the official website or use your Linux distribution’s package manager.
The AWS Command Line Interface (AWS CLI). For installation instructions, see Installing or updating to the latest version of the AWS CLI.
The AWS CDK. For installation instructions, see Getting started with the AWS CDK.
Docker or Colima. For installation instructions, see Install Docker Engine. For macOS, we have tested the deployment with Colima container runtimes in replacement for Docker Desktop.

You also need to configure the AWS CLI. One way to do it is to get your access key through the console, and use the aws configure command in your terminal to set up your credentials.

Clone the GitHub repository

Use the terminal of your development environment to enter the commands in the following steps:

Clone the deploy-streamlit-app repository from the AWS Samples GitHub repository:

git clone https://github.com/aws-samples/deploy-streamlit-app.git

Navigate to the cloned repository:

cd deploy-streamlit-app

Create the Python virtual environment and install the AWS CDK

Complete the following steps to set up the virtual environment and the AWS CDK:

Create a new Python virtual environment (your Python version should be 3.8 or greater):

python3 -m venv .venv

Activate the virtual environment:

source .venv/bin/activate

Install the AWS CDK, which is in the required Python dependencies:

pip install -r requirements.txt

Configure the Streamlit application

Complete the following steps to configure the Streamlit application:

In the docker_app directory, locate the config_file.py file.
Open config_file.py in your editor and modify the STACK_NAME and CUSTOM_HEADER_VALUE variables:
1. The stack name enables you to deploy multiple applications in the same account. Choose a different stack name for each application. For your first application, you can leave the default value.
2. The custom header value is a security token that CloudFront uses to authenticate on the load balancer. You can choose it randomly, and it must be kept secret.

Deploy the AWS CDK template

Complete the following steps to deploy the AWS CDK template:

From your terminal, bootstrap the AWS CDK:

cdk bootstrap

Deploy the AWS CDK template, which will create the necessary AWS resources:

cdk deploy

Enter y (yes) when asked if you want to deploy the changes.

The deployment process may take 5–10 minutes. When it’s complete, note the CloudFront distribution URL and Amazon Cognito user pool ID from the output.

Create an Amazon Cognito user

Complete the following steps to create an Amazon Cognito user:

On the Amazon Cognito console, navigate to the user pool that you created as part of the AWS CDK deployment.
On the Users tab, choose Create user.

Enter a user name and password.
Choose Create user.

Access the Streamlit application

Complete the following steps to access the Streamlit application:

Open a new web browser window or tab and navigate to the CloudFront distribution URL from the AWS CDK deployment output.

If you have not noted this URL, you can open the AWS CloudFormation console and find it in the outputs of the stack.

You should now be able to access and interact with the Streamlit application, which is deployed and running on AWS using the provided AWS CDK template.

This deployment is intended as a starting point and a demo. Before using this application in a production environment, you should thoroughly review and implement appropriate security measures, such as configuring HTTPS on the load balancer and following AWS best practices for securing your resources. See the README.md file in the GitHub repository for more information.

Customize the application

The aws-samples/deploy-streamlit-app GitHub repository provides a solid foundation for building and deploying generative AI applications, but it’s also highly customizable and extensible.

Let’s explore how you can customize the Streamlit application. Because the application is written in Python, you can modify it to integrate with different generative AI models, add new features, or change the UI to better align with your application’s requirements.

For example, let’s say you want to add a button to invoke the LLM answer instead of invoking it automatically when the user enters input text. Complete the following steps to modify the docker_app/app.py file:

After the definition of the input_sent text input, add a Streamlit button:

# Insert this after the line starting with input_sent = …
submit_button = st.button("Get LLM Response")

Change the if condition to check if the button is clicked instead of checking for input_sent:

# Replace the line `if input_sent:` by the following
if submit_button:

Redeploy the application by entering the following in the terminal:

cdk deploy

The deployment should take less than 5 minutes. In the next section, we show how to test your changes locally before deploying, which will accelerate your development workflow.

When the deployment is complete, refresh the webpage in your browser.

The Streamlit application will now display a button labeled Get LLM Response. When the user chooses this button, the LLM will be invoked, and the output will be displayed on the UI.

This is just one example of how you can customize the Streamlit application to meet your specific requirements. You can modify the code further to integrate with different generative AI models, add additional features, or enhance the UI as needed.

Test your changes locally before deploying

Although deploying the application using cdk deploy allows you to test your changes in the actual AWS environment, it can be time-consuming, especially during the development and testing phase. Fortunately, you can run and test your application locally before deploying it to AWS.

To test your changes locally, follow these steps:

In your terminal, navigate to the docker_app directory, where the Streamlit application is located:

cd docker_app

If you haven’t already, install the dependencies of the Python application. These dependencies are different from the ones of the AWS CDK application that you installed previously.

pip install -r requirements.txt

Start the Streamlit server with the following command:

streamlit run app.py --server.port 8080

This will start the Streamlit application on port 8080.

You should now be able to interact with the locally running Streamlit application and test your changes without having to redeploy the application to AWS.

Remember to stop the Streamlit server (by pressing Ctrl+C in the terminal) when you’re done testing.

By testing your changes locally, you can significantly speed up the development and testing cycle, allowing you to iterate more quickly and catch issues early in the process.

Clean up

To avoid incurring additional charges, clean up the resources created during this demo:

Open the terminal in your development environment.
Make sure you’re in the root directory of the project and your virtual environment is activated:

cd ~/environment/deploy-streamlit-app
source .venv/bin/activate

Destroy the AWS CDK stack:

cdk destroy

Confirm the deletion by entering yes when prompted.

Conclusion

Building and deploying user-friendly generative AI applications no longer requires extensive knowledge of frontend and backend development frameworks. By using Streamlit and AWS services, data scientists can focus on their core expertise while still delivering secure, scalable, and accessible applications to business users.

The full code of the demo is available in the GitHub repository. It provides a valuable starting point for building and deploying generative AI applications, allowing you to quickly set up a working prototype and iterate from there. We encourage you to explore the repository and experiment with the provided solution to create your own applications.

As the adoption of generative AI continues to grow, the ability to build and deploy user-friendly applications will become increasingly important. With AWS and Python, data scientists now have the tools and resources to bridge the gap between their technical expertise and the need to showcase their models to business users through secure and accessible UIs.

About the Author

Lior Perez is a Principal Solutions Architect on the Construction team based in Toulouse, France. He enjoys supporting customers in their digital transformation journey, using big data, machine learning, and generative AI to help solve their business challenges. He is also personally passionate about robotics and IoT, and constantly looks for new ways to use technologies for innovation.

Unearth insights from audio transcripts generated by Amazon Transcribe using Amazon Bedrock

November 6, 2024

by Ana Echeverri Amazon AWS

Generative AI continues to push the boundaries of what’s possible. One area garnering significant attention is the use of generative AI to analyze audio and video transcripts, increasing our ability to extract valuable insights from content stored in audio or video files. Speech data is unique and complex, which makes it difficult to analyze and extract insights. Manually transcribing and analyzing it can be time-consuming and resource-intensive.

Existing methods for extracting insights from speech data often require tedious human transcription and review. You can use automatic voice recognition tools to convert your audio and video data to text. However, you still have to rely on manual processes for extracting specific insights and data points, or get summaries of the content. This approach is time-consuming and as organizations amass vast amounts of this content, the need for a more efficient and insightful solution becomes increasingly pressing. There is a significant opportunity to add business value given the amount of data organizations store in these formats and the valuable insights that might otherwise go undiscovered. The following are some of the new insights and capabilities that can be obtained through the use of large language models (LLM) with audio transcripts:

LLMs can analyze and understand the context of a conversation, not just the words spoken, but also the implied meaning, intent, and emotions. Previously, this would have required extensive human interpretation.
LLMs can perform advanced sentiment analysis. Previously, sentiment analysis could be captured, but LLMs can capture more emotions, such as sarcasm, ambivalence, or mixed feelings by understanding the context of the conversation.
LLMs can generate concise summarizations not just by extracting content, but by understanding the context of the conversation.
Users can now ask complex, natural language questions and receive insightful answers.
LLMs can infer personas or roles in a conversation, enabling targeted insights and actions.
LLMs can support the creation of new content based on audio assets or conversations following predetermined templates or flows.

In this post, we examine how to create business value through speech analytics with some examples focused on the following:

Automatically summarizing, categorizing, and analyzing marketing content such as podcasts, recorded interviews, or videos, and creating new marketing materials based on those assets
Automatically extracting key points, summaries, and sentiment from a recorded meeting (such as an earnings call)
Transcribing and analyzing contact center calls to improve customer experience.

The first step in getting these audio data insights involves transcribing the audio file using Amazon Transcribe. Amazon Transcribe is a machine learning (ML) based managed service that automatically converts speech to text, enabling developers to seamlessly integrate speech-to-text capabilities into their applications. It also recognizes multiple speakers, automatically redacts personally identifiable information (PII), and allows you to enhance the accuracy of a transcription by providing custom vocabularies specific to your industries or use case, or by using custom language models.

The second step involves using foundation models (FMs) with Amazon Bedrock to summarize the content, identify topics, and recognize conclusions, extracting valuable insights that can guide strategic decisions and innovations. Automatic generation of new content also adds value, increasing creativity and productivity.

Generative AI is reshaping the way we analyze audio transcripts, enabling you to unlock insights such as customer sentiment, pain points, common themes, avenues for risk mitigation, and more, that were previously obfuscated.

Use case overview

In this post, we discuss three example use cases in detail. The code artifacts are in Python. We used a Jupyter notebook to run the code snippets. You can follow along by creating and running a notebook in Amazon SageMaker Studio.

Audio summarization and insights, and automated generation of new content using Amazon Transcribe and Amazon Bedrock

Through this use case, we demonstrate how to take an existing marketing asset (a video) and create a new blog post to announce the launch of the video, create an abstract, and extract the main topics and the search engine optimization (SEO) keywords present in the post for documenting and categorizing the asset.

Transcribe audio with Amazon Transcribe

In this case, we use an AWS re:Invent 2023 technical talk as a sample. For the purpose of this notebook, we downloaded the MP4 file for the recording and stored it in an Amazon Simple Storage Service (Amazon S3) bucket.

The first step is to transcribe the audio file using Amazon Transcribe:

# Create a Amazon Transcribe transcirption job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"podcast-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }
)
max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete.

When the job is complete, you can inspect the transcription output and check the plain text transcript that was generated (the following has been trimmed for brevity):

# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

……….Once the alert comes, how do you kind of correlate these alerts, not by just text signals and text passing, but understanding the topology, the infrastructure topology that is supporting that application or that business service. It is the topology that ultimately gives the source of truth when an alert comes, right. That’s what we mean by a correlation that is assisted with topology in in in the thing that ultimately results in finding a probable root cause. And once ……….

After you have validated the existence of the text, you can use Amazon Bedrock to analyze the output:

# Now let's use this transcript to extract insights with a help of a Large Language Model on Amazon Bedrock
# First let's initialize the bedrock runtime client to invoke the model. 
bedrock_runtime = boto3.client('bedrock-runtime')
# Selecting Claude 3 Sonnet
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

Using the transcription from the technical talk, we use Amazon Bedrock to call an FM (we use Anthropic’s Claude 3 Sonnet on Amazon Bedrock in this case). You can choose from the language models available on Amazon Bedrock from AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon.

You can now perform additional tasks.

Extract the main topics with Amazon Bedrock

The following prompt provides instructions to ask the LLM for the main topics in the technical talk:

# Extracting the main topics with Amazon Bedrock
main_topics_prompt = """Based on the contents of <transcript></transcript>, what are the main topics being discussed? Display the topics as a list.

<transcript>
{transcript}
</transcript>
"""

user_message = {"role": "user", "content": main_topics_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": messages
}
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
main_topics = response_body['content'][0]['text']
print(main_topics)

We have created a prompt that uses prompting best practices for Anthropic’s Claude. In this case, we pass the transcript within the <transcript></transcript> XML tags and ask for main topics discussed. Based on that, we get the following output:

Based on the contents of the transcript, the main topics being discussed are:

AI and Machine Learning in IT Operations

Challenges faced by IT Operations teams

Observable data explosion from cloud, on-premise, and edge infrastructure

The convergence of IT, SRE, and DevOps roles

Discovery to Resolution pipeline for IT Operations

Alert correlation and intelligent alerting using AI/ML

Probable root cause analysis with AI/ML

Automated remediation and runbook automation

Predictive analytics and preventive maintenance using AI/ML

Benefits of AI/ML for IT Operations (proactive operations, reduced human effort, improved service health, cost optimization)

Generate a blog post to announce the video with Amazon Bedrock

The following prompt provides instructions to ask the LLM for the generation of a new blog post announcing the launch of the preceding recording. Just like the preceding prompt, we use prompting best practices for Anthropic’s Claude:

# Generating a blog post to announce the Podcast with Amazon Bedrock
generate_blog_prompt = """The contents of <transcript></transcript> are a transcription of a technical talk video recording. Based on this information write a blogpost that summarizes the technical talk and announces that the recording of this talk is now available for public consumption. Limit the blogpost to 1500 words. Do not include any information that is not mentioned in <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": generate_blog_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
generate_blog = response_body['content'][0]['text']
print(generate_blog)

Based on that, we get the following output:

Here is a draft 1462 word blogpost summarizing the technical talk and announcing the recording is now available:

Title: Revolutionizing IT Operations with AI/ML: A Future-Ready Approach

The world of IT operations is undergoing a seismic shift, driven by the explosion of observable data, the convergence of IT roles, and the advent of hybrid multi-cloud environments. Traditional reactive approaches are no longer sufficient to keep pace with the complexity and scale of modern IT systems. This is where AI and machine learning (ML) come into play, offering a future-ready approach to revolutionize IT operations.

In a recent technical talk, Verma from HPE OpsRamp outlined the drivers behind this AI/ML imperative and demonstrated how this cutting-edge technology is fundamentally transforming IT operations. The recording of this insightful session is now available for public consumption.

The Drivers Behind the AI/ML Imperative

The first driving force behind the adoption of AI/ML in IT operations is the sheer volume of observable data that………………..

Create an abstract with Amazon Bedrock

The following prompt provides instructions to create an abstract summarizing the content of the technical talk:

# Creating an Abstract with Amazon Bedrock
abstract_prompt = """The contents of <transcript></transcript> are a transcription of a technical talk video recording. Based on this information write an abstract for this technical talk in no more than 100 words. Do not include any information that is not mentioned in <transcript></transcript>.

<transcript>
{transcript}
</transcript>
"""

user_message = {"role": "user", "content": abstract_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": messages
}
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
abstract = response_body['content'][0]['text']
print(abstract)

Based on this, we get the following output:

Here is a 96 word abstract for the technical talk:

This talk discussed how AI and machine learning innovations are transforming IT operations for hybrid multi-cloud workloads. The speakers outlined the challenges of dealing with exploding observable data from modern infrastructure and the convergence of IT roles. They demonstrated how AI/ML techniques like intelligent alerting, alert correlation, probable root cause analysis, and automated remediation can drive more proactive, predictive operations. Key benefits showcased included reduced human effort, improved service health, and cost optimization. The talk featured a demo of the OpsRamp platform leveraging AI/ML models to streamline the discovery-to-resolution pipeline for managing edge-to-cloud environments.

Extract SEO keywords from the generated blog post with Amazon Bedrock

The following prompt provides instructions to extract the main SEO keywords from the generated blog post. Based on online research on how to extract SEO keywords from long-form text, we came up with the following list. This demonstrates how you can empower an LLM like Anthropic’s Claude to follow instructions and best practices for a particular task or domain. Also, the prompt specifies that the output should be in JSON. This is helpful in use cases where you want to programmatically get results from an LLM and therefore require consistent formatting. Based on best practices for Anthropic’s Claude, we use the Assistant message in the messages API to pre-fill the model’s response to have further control on the output format:

# Extracting SEO keywords from the generated blog post
SEO_keywords_prompt = """Extract the most relevant keywords and phrases from the given blog post text present in <blog></blog> that would be valuable for SEO (search engine optimization) based on the instructions present in <instructions></instructions> below. The ideal keywords should capture the main topics, concepts, entities, and high-value terms present in the content. Use JSON format with key "keywords" and value as an array of keywords. Skip the preamble; go straight into the JSON. 

<blog>
{textblog}
</blog>

<instructions>
1. Carefully read through the entire blog post text to understand the main topics, concepts, and ideas covered.
2. Identify important nouns, noun phrases, multi-word phrases, and relevant adjective-noun combinations that relate to the core subject matter of the post.
3. Look for words and phrases that potential searchers might use to find content like this.
4. Prioritize terms that are highly specific and relevant to the blog topic over generic words.
5. Vary the keyword length and include both head terms (shorter, more popular keywords) and long-tail terms (longer, more specific phrases).
6. Aim to extract around 10-20 of the most valuable, high-impact keywords and phrases for SEO.
</instructions>
"""

user_message =  {"role": "user", "content": SEO_keywords_prompt.format(textblog = generate_blog)}
assistant_message = {"role": "assistant", "content": '{"keywords": ['}
messages = [user_message, assistant_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
SEO_keywords = response_body['content'][0]['text']
SEO_keywords = '{"keywords": [' + SEO_keywords
SEO_keywords_json = json.loads(SEO_keywords)
print(SEO_keywords_json)

Based on this, we get the following output:

{‘keywords’: [‘AI-driven IT operations’, ‘machine learning IT operations’, ‘proactive IT operations’, ‘predictive IT operations’, ‘AI for hybrid cloud’, ‘AI for multi-cloud’, ‘FutureOps’, ‘AI-assisted IT operations’, ‘AI-powered event correlation’, ‘intelligent alerting’, ‘automated remediation workflows’, ‘predictive analytics for IT’, ‘AI anomaly detection’, ‘AI root cause analysis’, ‘AI-driven observability’, ‘AI for DevOps’, ‘AI for SRE’, ‘AI IT operations management’]}

For consistent formatting and structured output, you can also use the Converse and ConverseStream APIs in Amazon Bedrock and use the tool calling capabilities of the LLMs that offer it.

Generate a new blog post version emphasizing specific SEO keywords with Amazon Bedrock

The following prompt provides instructions to generate a new blog post emphasizing three keywords or phrases from the ones extracted in the previous step. In this case, we use the first three keywords in the returned keywords list. The instructions specified in the prompt are derived from SEO content best practices:

# Extracting SEO keywords from the generated blog post
rewrite_SEO_keywords_prompt = """Rewrite the given blog post text present in <blog></blog> to improve its ranking for 3 target keywords listed in <keywords></keywords>. Rewrite the blog post, naturally incorporating the target keywords multiple times throughout the text. Keep the rewritten content high-quality, readable, and relevant, while strategically emphasizing the keywords to improve SEO.

<blog>
{textblog}
</blog>

<keywords>
{kw1},
{kw2},
{kw3}
</keywords>
"""

user_message = {"role": "user", "content": rewrite_SEO_keywords_prompt.format(textblog = generate_blog, kw1 = SEO_keywords_json['keywords'][0], kw2 = SEO_keywords_json['keywords'][1], kw3 = SEO_keywords_json['keywords'][2])}
messages = [user_message]

body=json.dumps(
{
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": messages
}
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
rewrite_SEO_blog = response_body['content'][0]['text']
print(rewrite_SEO_blog)

Based on this, we get the following output:

<blog>

Title: Unleash the Power of AI-Driven and Machine Learning IT Operations for Proactive IT Operations

In today’s rapidly evolving digital landscape, traditional IT operations approaches are struggling to keep pace. The explosion of data from cloud workloads, Kubernetes clusters, and distributed edge locations has overwhelmed conventional monitoring tools, leading to fragmented views and reactive operations that respond to issues after they occur. To stay ahead of the curve, organizations must embrace AI-driven and machine learning IT operations, paving the way for proactive IT operations that enhance efficiency and minimize downtime.

At a recent technical talk, Verma and Radha Krishna Kunni from OpsRamp (recently acquired by HPE) delved into the transformative impact of AI and machine learning on IT operations, DevOps, and SRE for hybrid multi-cloud environments. They highlighted the key challenges ops teams face today and introduced the innovative “FutureOps” approach, which leverages AI and machine learning to revolutionize IT operations.

The full video recording of this insightful technical talk is now available [link], providing a comprehensive understanding of…………

Summarize content discussed in a recorded meeting using Amazon Transcribe and Amazon Bedrock

Through this use case, we demonstrate how to take an existing recording from a meeting (we use a recording from an AWS earnings call) to summarize the content discussed, extract the key points, and provide details on the sentiment of the meeting. For additional information on this use case, see Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Amazon Bedrock Knowledge Bases or Amazon Q Business.

Transcribe audio with Amazon Transcribe

In this use case, we use an Amazon 2024 Q1 earnings call as a sample. For the purpose of this notebook, we downloaded the WAV file for the recording and stored in an S3 bucket.

The first step is to transcribe the audio file using Amazon Transcribe:

# Create a Amazon Transcribe transcription job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"meeting-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 10
    }
)
# Check whether the transcribe job is complete

max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete.

When the job is complete, you can inspect the transcription output and check for the plain text transcript that was generated:

import json
# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

Thank you for standing by. Good day, everyone and welcome to the amazon.com first quarter, 2024 financial results teleconference. At this time, all participants are in a listen only mode. After the presentation, we will conduct a question and answer session. Today’s call is being recorded and for opening remarks, I’ll be turning the call over to the Vice President of Investor………

After you have validated the existence of the text, you can use Amazon Bedrock to analyze the output:

# Now let's use this transcript to extract insights with a help of a Large Language Model on Amazon Bedrock
# First let's initialize the bedrock runtime client to invoke the model. 
bedrock_runtime = boto3.client('bedrock-runtime')
# Selecting Claude 3 Sonnet
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

Using the transcription from the earnings call recording, we use Amazon Bedrock to call an FM (we use Anthropic’s Claude 3 Sonnet in this case). You can choose from other FMs available on Amazon Bedrock.

You can now perform additional tasks.

Identify the financial ratios highlighted during this earnings call

The following prompt provides instructions to identify financial ratios highlighted during the earnings call and their implications:

# Identify the financial ratios highlighted during this earnings call 
financial_ratios_prompt = """Based on the contents of <transcript></transcript>,identify the financial ratios highlighted during this earnings call and their implications <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": financial_ratios_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
financial_ratios = response_body['content'][0]['text']
print(financial_ratios)

Based on this, we get the following output:

Based on the earnings call transcript, the following financial ratios and their implications were highlighted:

Operating Income Margin:

Amazon reported its highest ever quarterly operating income of $15.3 billion, which was $3.3 billion above the high end of their guidance range. This was driven by strong operational performance across all three reportable segments (North America, International, and AWS) and better-than-expected operating leverage, including lower cost to serve.

North America segment operating income was $5 billion with an operating margin of 5.8%, up 460 basis points year-over-year, driven by improvements in cost to serve, including benefits from regionalization efforts, more consolidated customer shipments, and improved leverage.

International segment operating income was $903 million with an operating margin of 2.8%, up 710 basis points year-over-year, primarily driven by cost efficiencies through network design enhancements and improved volume leverage in established countries, as well as progress in emerging countries.

AWS operating income was $9.4 billion, an increase of $4.3 billion year-over-year, with improved leverage from managing infrastructure and fixed costs while growing at a healthy rate.

Implication: The higher operating income margins across all segments indicate Amazon’s focus on driving efficiencies and improving profitability while continuing to invest in growth opportunities.

Revenue Growth:

Worldwide revenue was $143.3 billion, up 13% year-over-year (excluding the impact of foreign exchange).

AWS revenue grew 17.2% year-over-year, accelerating from 13.2% in Q4 2023, driven by strong demand for both generative AI and non-generative AI workloads.

Advertising revenue grew 24% year-over-year (excluding the impact of foreign exchange), primarily driven by sponsored products and improvements in relevancy and measurement capabilities.

Implication: The strong revenue growth, particularly in AWS and advertising, highlights Amazon’s diversified revenue streams and the growth opportunities in cloud computing and digital advertising.

Capital Expenditures (Capex):

Amazon anticipates a meaningful increase in overall capital expenditures in 2024, primarily driven by higher infrastructure Capex for growth in AWS, including generative AI investments.

In Q1 2024, Capex was $14 billion, expected to be the lowest quarter of the year.

Implication: The increase in Capex signals Amazon’s confidence in the strong demand for AWS and their commitment to investing in emerging technologies like generative AI to drive future growth.

Overall, the financial ratios and commentary indicate Amazon’s focus on improving profitability, driving operational efficiencies, and investing in growth opportunities, particularly in AWS and generative AI, while maintaining a diversified revenue stream and managing costs effectively.

Identify the speakers from the earnings call with Amazon Bedrock

The following prompt provides instructions to identify the speakers in the meeting from the transcription:

# Identify the speakers from the earnings call 
speakers_prompt = """Based on the contents of <transcript></transcript>,identify the speakers on this earnings call <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": speakers_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
speakers = response_body['content'][0]['text']
print(speakers)

Based on this, we get the following output:

Based on the transcript, the key speakers on this Amazon earnings call appear to be:

Andy Jassy – CEO of Amazon

Brian Olsavsky – CFO of Amazon

Dave Fildes – Vice President of Investor Relations at Amazon

The call begins with opening remarks from Dave Fildes, followed by prepared statements from Andy Jassy and Brian Olsavsky. They then take questions from analysts, with Andy and Brian providing the responses.

Obtain the challenges or negative areas discussed on the earnings call with Amazon Bedrock

The following prompt provides instructions to obtain the challenges or negative areas discussed from the transcription:

# Obtain the challenges or negative areas discussed on earnings call

challenges_prompt = """Based on the contents of <transcript></transcript>,Obtaining the challenges or negative areas discussed on earnings <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": challenges_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
challenges = response_body['content'][0]['text']
print(challenges)

Based on this, we get the following output:

Based on the transcript, some of the key challenges or negative areas discussed include:

Foreign exchange headwinds: Amazon faced an unfavorable impact from global currencies weakening against the U.S. dollar in Q1, leading to a $700 million or 50 basis point headwind to revenue compared to guidance.

Increasing capital expenditures: Amazon expects to meaningfully increase its capital expenditures year-over-year in 2024, primarily driven by higher infrastructure spending for AWS growth, including investments in generative AI capabilities.

Consumer spending concerns: Amazon mentioned keeping an eye on consumer spending trends, specifically in Europe, where it appears weaker relative to the U.S.

International segment profitability: While the international segment’s profitability improved, with an operating margin of 2.8%, Amazon acknowledged the need to continue working on cost efficiencies and profitability, particularly in emerging countries.

Cost optimization challenges: Although Amazon believes the majority of cost optimization efforts are behind them, there is still a need to continually streamline processes, optimize inventory placement, and invest in automation to further reduce the cost to serve.

Overall, the challenges centered around foreign exchange impacts, increasing capital intensity for AWS and generative AI investments, consumer demand uncertainties, and ongoing efforts to improve operational efficiencies and international profitability.

Get insights from a call center call between an agent and a customer using Amazon Transcribe and Amazon Bedrock

Through this use case, we demonstrate how to take an existing call recording from a contact center and summarize the content discussed, extract the main topic, key phrases, call reason, customer satisfaction, overall call sentiment, and sentiment about the products and services discussed. For additional details about this use case, see Live call analytics and agent assist for your contact center with Amazon language AI services and Post call analytics for your contact center with Amazon language AI services.

Transcribe audio with Amazon Transcribe

The first step is to transcribe the audio file using Amazon Transcribe. In this case, we use a sample from the Amazon Transcribe Post Call Analytics Solution GitHub repository. For the purpose of this notebook, we downloaded the WAV file and stored it in an S3 bucket.

# Create a Amazon Transcribe transcirption job by specifying the audio/video file's S3 location
import boto3
import time
import random
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"call_center-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }
)
max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete.

When the job’s complete, you can inspect the transcription output and check for the plain text transcript that was generated:

import json
# Get the Transcribe Output JSON file
s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

# Let's see what we have inside the job output JSON
print(transcription_output['results']['transcripts'][0]['transcript'])

Thank you for calling Big Jim’s Auto. This is Travis. How can I help you today? Hello, my name is Violet King and I bought a car not too long ago and a light is coming on um a light on the dashboard. And so I was wondering what I should do about that. Ok. It may depend on what kind of light we’re looking at here today, ma’am. Uh Could I get your first and last name spelled out for me so I can just get some information pulled up? Yes. My name is Violet Vviolet. My last name is King King. Ok, I got the call last week. Ok. Uh And what kind of car are we examining today? It’s, it’s a Ford Fusion. It’s 2017, 2017 Ford Fusion. OK. And for verification, ma’am. Do you happen to know the purchase date of the car? Yes, it, it, it was last Tuesday, August 10th. You say the 10th? Ok. And can you describe to me what kind of light we’re looking at? Y yes, it’s uh I, I think it’s a, an oil, an oil light, an oil light? Ok. Ok. And uh just for clarity on my end, ma’am. Um, uh, is this the first call you’ve made regarding this? Yes. Ok. And, uh, this might be kind of a silly question because I know you just got the car. But sometimes they make me ask a silly question about how many miles has the car been driven since you bought it? Oh. I’m not sure. Should I check? Uh, no, that’s ok. I’ll just, I’ll just put in that. We don’t know at this time. It’s ok. Um, so, um, uh, under the warranty we offer, um, we, the, we don’t handle in house oil changes. Um, we basically, when, when someone buys a car from us, the warranty, we have, it, it covers some stuff like, um, weather damage. Um, and, uh, if the engine light comes on, we take a look at that, but the oil change is something that we just don’t have, uh, here at the dealership that’s a little bit outsourced out and they’re pretty backed up right now because a lot of people have been, uh, staying in due to the recent pandemic and now everyone’s just starting to get out and a whole bunch of places are just completely bogged down. So we have a place that we typically outsource to, um, and they’re, they’re pretty reasonable. They’re about, I wanna say somewhere between 25 and $35 to do an oil change. So it’s really not that bad, but they’re a little bit backed up right now from what I’ve heard, I would recommend giving them a call as soon as you can before they close. Ok. What is their number? Uh, give me a second. Let me just rustle through the desk here. See if I can find their information. Uh, ok. Ok. Yes, I’m all right with them one moment, please. Sure. Ok. All their number is 888 333 2222. Ok, and they can fix my car. Yeah, they should be able to handle the oil change. I’m sorry, that’s not something that we cover under the warranty that uh, we have, um, but they should be able to get you settled and, uh, sorted. Ok. Ok. Thank you. Cool. No problem. Have a good one. Thank you. Yup. Bye.

After you have validated the existence of the text, you can use Amazon Bedrock to analyze the output:

# Now let's use this transcript to extract insights with a help of a Large Language Model on Amazon Bedrock
# First let's initialize the bedrock runtime client to invoke the model. 
bedrock_runtime = boto3.client('bedrock-runtime')
# Selecting Claude 3 Sonnet
model_id = 'anthropic.claude-3-sonnet-20240229-v1:0'

Using the transcription from the recorded call center call, we use Amazon Bedrock to call an FM (we use Anthropic’s Claude 3 Sonnet in this case), but you can choose from the other language models available on Amazon Bedrock.

You can now perform additional tasks.

Summarize the call between agent and client with Amazon Bedrock

The following prompt provides instructions to summarize the call discussion from the transcription:

#Summarize the call between agent and customer
summarize_prompt = """Based on the contents of <transcript></transcript>,summarize the call between agent and customer with focus on resolution <transcript></transcript> . 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": summarize_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
summarization = response_body['content'][0]['text']
print(summarization)

We get the following output:

Based on the contents of the transcript, here is a summary of the call between the agent (Travis) and the customer (Violet King) with a focus on resolution:

Violet King called about a light on the dashboard of her recently purchased 2017 Ford Fusion from Big Jim’s Auto. The light appeared to be an oil light. Travis explained that while their warranty covers certain issues like weather damage and engine lights, it does not cover oil changes. He recommended calling an outsourced oil change service that Big Jim’s Auto typically uses, which charges between $25-35 for an oil change.

Travis provided the phone number for the oil change service (888-333-2222) and mentioned that they are currently backed up due to the recent pandemic. He advised Violet to call them as soon as possible before they close to get her oil changed and resolve the issue with the oil light on her dashboard.

The resolution was for Violet to contact the recommended third-party oil change service to have her car’s oil changed, which should address the oil light issue she was experiencing with her newly purchased vehicle.

Extract the main topics with Amazon Bedrock

The following prompt provides instructions to extract the main topics discussed in the conversation from the transcription:

# Extracting the main topics from the conversation
maintopic_prompt = """The contents of <transcript></transcript> are a transcription of a conversation between agent and client. Based on the information, extract the main topics from the conversation  <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": maintopic_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
main_topic = response_body['content'][0]['text']
print(main_topic)

We get the following output:

Based on the conversation transcript, the main topics appear to be:

Dashboard warning light (specifically an oil light) on a recently purchased 2017 Ford Fusion.

Determining if the issue is covered under the warranty provided by the dealership (Big Jim’s Auto).

Recommendation to contact an external auto service provider (phone number provided) for an oil change service, as the dealership does not handle oil changes in-house.

Confirming that the external auto service provider can likely resolve the oil light issue by performing an oil change.

Extract the key phrases with Amazon Bedrock

The following prompt provides instructions to extract the key phrases discussed in the conversation from the transcription:

# Extracting the key phrases with Amazon Bedrock
keyphrase_prompt = """The contents of <transcript></transcript> are a transcription of a conversation between agent and client. Based on the information, extract the key phrases discussed in the conversation <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": keyphrase_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
keyphrases = response_body['content'][0]['text']
print(keyphrases)

We get the following output:

Based on the conversation transcript, here are the key phrases discussed:

Oil light

2017 Ford Fusion

Purchase date: August 10th

First call regarding the issue

Car mileage unknown

Warranty does not cover oil changes

Outsourced oil change service recommended

Oil change service contact number: 888-333-2222

Oil change service cost: $25 – $35

Service is backed up due to the pandemic

Extract the reason why the client called the call center with Amazon Bedrock

The following prompt provides instructions to extract the reason for this client call to the call center from the transcription:

# Extracting the reason why client called the call center
reason_prompt = """The content of <transcript></transcript> is  transcription of a conversation between agent and client. Based on the information, extract the reason why client called the call center <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": reason_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
reason = response_body['content'][0]['text']
print(reason)

We get the following output:

Based on the transcription, the client called the call center because a light (specifically an oil light) was coming on the dashboard of their recently purchased 2017 Ford Fusion car. The client was seeking guidance on what to do about the oil light being on.

Extract the level of customer satisfaction with Amazon Bedrock

The following prompt provides instructions to extract the level of customer satisfaction experienced by the client from the transcription:

# Extracting the level of Customer Satisfaction 
satisfaction_prompt = """The content of <transcript></transcript> is  transcription of a conversation between agent and client. Based on the information, extract the level of customer satisfaction <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": satisfaction_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
csat= response_body['content'][0]['text']
print(csat)

We get the following output:

Based on the transcript, the level of customer satisfaction seems to be moderate.

Evidence:

The agent provided clear explanations regarding the issue with the oil light and why oil changes are not covered under their warranty.

The agent offered a recommendation for an external service provider that could perform the oil change, along with their contact information.

The customer acknowledged the information provided by the agent, indicating some level of satisfaction with the response.

However, there are no explicit statements from the customer expressing high satisfaction or dissatisfaction. The interaction remains polite and resolves the customer’s initial query, but there is no strong indication of exceptional satisfaction or disappointment.

Obtain the overall customer sentiment with Amazon Bedrock

The following prompt provides instructions to obtain the overall customer sentiment from the transcription:

# Extracting the overall customer sentiment.
sentiment_prompt = """The content of <transcript></transcript> is transcription of a conversation between agent and client. Based on the information, what is the overall customer sentiment of the conversation <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": sentiment_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
sentiment = response_body['content'][0]['text']
print(sentiment)

We get the following output:

Based on the conversation transcript, the overall customer sentiment seems neutral to slightly positive. Although the customer, Violet King, was initially concerned about a warning light on her recently purchased car’s dashboard, the agent (Travis) explained the situation clearly and provided a recommendation for getting an oil change from a third-party service provider. The customer acknowledged and accepted the suggestion without expressing significant frustration or dissatisfaction. The conversation ended on a polite note with the customer thanking the agent.

Obtain sentiment about the products or services discussed during the call with Amazon Bedrock

The following prompt provides instructions to obtain sentiment about products and services discussed during the call from the transcription:

## Obtaining sentiment about the products or services discussed.
sentiment_product_prompt = """The content of <transcript></transcript> is transcription of a conversation between agent and client. Based on the information, what is the overall sentiment about the products discussed in the conversation <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": sentiment_product_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
sentiment_product = response_body['content'][0]['text']
print(sentiment_product)

We get the following output:

Based on the conversation transcript, the overall sentiment about the products (the 2017 Ford Fusion car) discussed seems neutral to slightly positive. There are no major complaints or strong negative sentiments expressed about the car itself. The issue raised is regarding an oil light coming on, which seems to be a routine maintenance matter that needs to be addressed by getting an oil change. The agent helpfully provides the customer with information on where to get an oil change done, although mentioning that the warranty does not cover routine oil changes at the dealership. Overall, there is no indication of major dissatisfaction with the car itself from the conversation.

Create an email to send to the client with the call summary

The following prompt provides instructions to create an email to send to the client after the call is completed, summarizing the call:

## Creating an email to send to the client with the call summary
email_prompt = """The content of <transcript></transcript> is transcription of a conversation between agent and client. Based on the information, create an email to send to the client after the call is completed summarizing the call. 
User formal but friendly english style. <transcript></transcript>. 

<transcript>
{transcript}
</transcript>
"""

user_message =  {"role": "user", "content": email_prompt.format(transcript = transcription_output['results']['transcripts'][0]['transcript'])}

messages = [user_message]

body=json.dumps(
    {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1000,
        "messages": messages
    }
)
bedrock_response = bedrock_runtime.invoke_model(body=body, modelId=model_id)
response_body = json.loads(bedrock_response.get('body').read())
email_conversation = response_body['content'][0]['text']
print(email_conversation)

We get the following output:

Dear Ms. Violet King,

I hope this email finds you well. I’m writing to summarize our conversation regarding the issue with your 2017 Ford Fusion that you purchased last Tuesday, August 10th.

During our call, you mentioned that an oil light was illuminated on the dashboard of your vehicle. As discussed, our dealership’s warranty does not cover routine maintenance services like oil changes. However, we have a trusted partner that we typically recommend for such services.

The recommended auto service provider is reachable at 888-333-2222. They are experienced in handling oil changes and should be able to assist you with your vehicle’s needs. Please note that they have been experiencing a high volume of requests due to the recent pandemic, so it’s advisable to call them as soon as possible to schedule an appointment.

Conclusion

Using generative AI through Amazon Bedrock to analyze audio transcripts generated by Amazon Transcribe unlocks valuable insights that would otherwise remain hidden within the audio data. By combining the powerful speech-to-text capabilities of Amazon Transcribe with the natural language understanding and generation capabilities of LLMs like those available through Amazon Bedrock, businesses can more efficiently extract key information, generate summaries, identify topics and sentiments, and create new content from their audio and video assets. This approach not only saves time and resources compared to manual transcription and analysis, but also opens up new opportunities for using existing content in innovative ways.

Whether it’s repurposing marketing materials, quickly capturing key points from meetings, or improving customer experience through call center analytics, the combination of Amazon Transcribe and large language models (LLMs) on Amazon Bedrock provides a powerful solution for unlocking the full potential of audio data.As these use cases have demonstrated, this technology can be applied across various domains, from content creation and SEO optimization to business intelligence and customer service. By staying at the forefront of these advancements, organizations can gain a competitive edge by effectively harnessing the wealth of information contained within their audio and video repositories, driving insights, and making more informed decisions.

About the Authors

Ana Maria Echeverri is an AI/ML Worldwide Service Specialist at AWS, focused on driving adoption of generative AI speech analytics use cases. She has worked in the data and AI industry for over 30 years, with over 10 years focused on helping organizations grow their AI maturity and capabilities for successful execution of AI strategies.

Vishesh Jha is a Senior Solutions Architect at AWS. His area of interest lies in generative AI, and he has helped customers and partners get started with NLP using AWS services such as Amazon Bedrock, Amazon Transcribe, and Amazon SageMaker. He is an avid soccer fan, and in his free time enjoys watching and playing the sport. He also loves cooking, gaming, and traveling with his family.

Bala Krishna Jakka is a Technical Account manager at AWS, with a passion for contact center and generative AI technologies. With extensive expertise in helping organizations use cutting-edge solutions, he thrives on staying ahead of the curve in this rapidly evolving field. When not immersed in the realms of AI and customer experience, he finds joy in the game of cricket, showcasing his skills on the pitch. A devoted family man, he cherishes the moments spent with his loved ones, creating lasting memories and finding balance amidst the demands of his professional pursuits

Best practices and lessons for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock

November 1, 2024

by Yanyan Zhang Amazon AWS

Fine-tuning is a powerful approach in natural language processing (NLP) and generative AI, allowing businesses to tailor pre-trained large language models (LLMs) for specific tasks. This process involves updating the model’s weights to improve its performance on targeted applications. By fine-tuning, the LLM can adapt its knowledge base to specific data and tasks, resulting in enhanced task-specific capabilities. To achieve optimal results, having a clean, high-quality dataset is of paramount importance. A well-curated dataset forms the foundation for successful fine-tuning. Additionally, careful adjustment of hyperparameters such as learning rate multiplier and batch size plays a crucial role in optimizing the model’s adaptation to the target task.

The capabilities in Amazon Bedrock for fine-tuning LLMs offer substantial benefits for enterprises. This feature enables companies to optimize models like Anthropic’s Claude 3 Haiku on Amazon Bedrock for custom use cases, potentially achieving performance levels comparable to or even surpassing more advanced models such as Anthropic’s Claude 3 Opus or Anthropic’s Claude 3.5 Sonnet. The result is a significant improvement in task-specific performance, while potentially reducing costs and latency. This approach offers a versatile solution to satisfy your goals for performance and response time, allowing businesses to balance capability, domain knowledge, and efficiency in your AI-powered applications.

In this post, we explore the best practices and lessons learned for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock. We discuss the important components of fine-tuning, including use case definition, data preparation, model customization, and performance evaluation. This post dives deep into key aspects such as hyperparameter optimization, data cleaning techniques, and the effectiveness of fine-tuning compared to base models. We also provide insights on how to achieve optimal results for different dataset sizes and use cases, backed by experimental data and performance metrics.

As part of this post, we first introduce general best practices for fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock, and then present specific examples with the TAT- QA dataset (Tabular And Textual dataset for Question Answering).

Recommended use cases for fine-tuning

The use cases that are the most well-suited for fine-tuning Anthropic’s Claude 3 Haiku include the following:

Classification – For example, when you have 10,000 labeled examples and want Anthropic’s Claude 3 Haiku to do well at this task.
Structured outputs – For example, when you have 10,000 labeled examples specific to your use case and need Anthropic’s Claude 3 Haiku to accurately identify them.
Tools and APIs – For example, when you need to teach Anthropic’s Claude 3 Haiku how to use your APIs well.
Particular tone or language – For example, when you need Anthropic’s Claude 3 Haiku to respond with a particular tone or language specific to your brand.

Fine-tuning Anthropic’s Claude 3 Haiku has demonstrated superior performance compared to few-shot prompt engineering on base Anthropic’s Claude 3 Haiku, Anthropic’s Claude 3 Sonnet, and Anthropic’s Claude 3.5 Sonnet across various tasks. These tasks include summarization, classification, information retrieval, open-book Q&A, and custom language generation such as SQL. However, achieving optimal performance with fine-tuning requires effort and adherence to best practices.

To better illustrate the effectiveness of fine-tuning compared to other approaches, the following table provides a comprehensive overview of various problem types, examples, and their likelihood of success when using fine-tuning versus prompting with Retrieval Augmented Generation (RAG). This comparison can help you understand when and how to apply these different techniques effectively.

Problem	Examples	Likelihood of Success with Fine-tuning	Likelihood of Success with Prompting + RAG
Make the model follow a specific format or tone	Instruct the model to use a specific JSON schema or talk like the organization’s customer service reps	Very High	High
Teach the model a new skill	Teach the model how to call APIs, fill out proprietary documents, or classify customer support tickets	High	Medium
Teach the model a new skill, and hope it learns similar skills	Teach the model to summarize contract documents, in order to learn how to write better contract documents	Low	Medium
Teach the model new knowledge, and expect it to use that knowledge for general tasks	Teach the model the organizations’ acronyms or more music facts	Low	Medium

Prerequisites

Before diving into the best practices and optimizing fine-tuning LLMs on Amazon Bedrock, familiarize yourself with the general process and how-to outlined in Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality. The post provides essential background information and context for the fine-tuning process, including step-by-step guidance on fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock both through the Amazon Bedrock console and Amazon Bedrock API.

LLM fine-tuning lifecycle

The process of fine-tuning an LLM like Anthropic’s Claude 3 Haiku on Amazon Bedrock typically follows these key stages:

Use case definition – Clearly define the specific task or knowledge domain for fine-tuning
Data preparation – Gather and clean high-quality datasets relevant to the use case
Data formatting – Structure the data following best practices, including semantic blocks and system prompts where appropriate
Model customization – Configure the fine-tuning job on Amazon Bedrock, setting parameters like learning rate and batch size, enabling features like early stopping to prevent overfitting
Training and monitoring – Run the training job and monitor the status of training job
Performance evaluation – Assess the fine-tuned model’s performance against relevant metrics, comparing it to base models
Iteration and deployment – Based on the result, refine the process if needed, then deploy the model for production

Throughout this journey, depending on the business case, you may choose to combine fine-tuning with techniques like prompt engineering for optimal results. The process is inherently iterative, allowing for continuous improvement as new data or requirements emerge.

Use case and dataset

The TAT-QA dataset is related to a use case for question answering on a hybrid of tabular and textual content in finance where tabular data is organized in table formats such as HTML, JSON, Markdown, and LaTeX. We focus on the task of answering questions about the table. The evaluation metric is the F1 score that measures the word-to-word matching of the extracted content between the generated output and the ground truth answer. The TAT-QA dataset has been divided into train (28,832 rows), dev (3,632 rows), and test (3,572 rows).

The following screenshot provides a snapshot of the TAT-QA data, which comprises a table with tabular and textual financial data. Following this financial data table, a detailed question-answer set is presented to demonstrate the complexity and depth of analysis possible with the TAT-QA dataset. This comprehensive table is from the paper TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, and it includes several key components:

Reasoning types – Each question is categorized by the type of reasoning required
Questions – A variety of questions that test different aspects of understanding and interpreting the financial data
Answers – The correct responses to each question, showcasing the precision required in financial analysis
Scale – Where applicable, the unit of measurement for the answer
Derivation – For some questions, the calculation or logic used to arrive at the answer is provided

The following screenshot shows a formatted version of the data as JSONL and is passed to Anthropic’s Claude 3 Haiku for fine-tuning training data. The preceding table has been structured in JSONL format with system, user role (which contains the data and the question), and assistant role (which has answers). The table is enclosed within the XML tag <table><table>, helping Anthropic’s Claude 3 Haiku parse the prompt with the data from the table. For the model fine-tuning and performance evaluation, we randomly selected 10,000 examples from the TAT-QA dataset to fine-tune the model, and randomly picked 3,572 records from the remainder of the dataset as testing data.

Best practices for data cleaning and data validation

When fine-tuning the Anthropic’s Claude 3 Haiku model, the quality of training data is paramount and serves as the primary determinant of the output quality, surpassing the importance of any other step in the fine-tuning process. Our experiments have consistently shown that high-quality datasets, even if smaller in size, yield better results than a larger but less refined one. This “quality over quantity” approach should guide the entire data preparation process. Data cleaning and validation are essential steps in maintaining the quality of the training set. The following are two effective methods:

Human evaluation – This method involves subject matter experts (SMEs) manually reviewing each data point for quality and relevance. Though time-consuming, it provides unparalleled insight into the nuances of the specific tasks.
LLM as a judge – For large datasets, using Anthropic’s Claude models as a judge can be more efficient. For example, you can use Anthropic’s Claude 3.5 Sonnet as a judge to decide whether each provided training record meets the high quality requirement. The following is an example prompt template:

{'prompt': {
'system': "You are a reliable and impartial expert judge in question/answering data assessment. ",
'messages': [
{'role': 'user', 'content': [{'type': 'text', 'text': 'Your task is to take a question, an answer, and a context which may include multiple documents, and provide a judgment on whether the answer to the question is correct or not. This decision should be based either on the provided context or your general knowledge and memory. If the answer contradicts the information in context, it's incorrect. A correct answer is ideally derived from the given context. If no context is given, a correct answer should be factually true and directly and unambiguously address the question.nnProvide a short step-by-step reasoning with a maximum of 4 sentences within the <reason></reason> xml tags and provide a single correct or incorrect response within the <judgement></judgement> xml tags.n <context>n...n</context>n<question>n...n</question>n<answer>n...n</answer>n'}]}]}}

The following is a sample output from Anthropic’s Claude 3.5 Sonnet:

{'id': 'job_id',
'type': 'message',
'role': 'assistant',
'model': 'claude-3-5-sonnet-20240620',
'content': [{'type': 'text',
'text': '<reason>n1. I'll check the table for information... </reason>nn<judgement>correct</judgement>'}],
'stop_reason': 'end_turn',
'stop_sequence': None,
'usage': {'input_tokens': 923, 'output_tokens': 90}}

This LLM-as-a-judge approach is effective for large datasets, allowing for efficient and consistent quality assessment across a wide range of examples. It can help identify and filter out low-quality or irrelevant data points, making sure only the most suitable examples are used for fine-tuning.

The format of your training data is equally important. Although it’s optional, it’s highly recommended to include a system prompt that clearly defines the model’s role and tasks. In addition, including rationales within XML tags can provide valuable context for the model and facilitate extraction of key information. Prompt optimization is one of the key factors in improving model performance. Following established guidelines, such as those provided by Anthropic, can significantly enhance results. This might include structuring prompts with semantic blocks within XML tags, both in training samples and at inference time.

By adhering to these best practices in data cleaning, validation, and formatting, you can create a high-quality dataset that forms the foundation for successful fine-tuning. In the world of model training, quality outweighs quantity, and a well-prepared dataset is key to unlocking the full potential of fine-tuning Anthropic’s Claude 3 Haiku.

Best practices for performing model customization training jobs

When fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock, it’s crucial to optimize your training parameters to achieve the best possible performance. Our experiments have revealed several key insights that can guide you in effectively setting up your customization training jobs.

One of the most critical aspects of fine-tuning is selecting the right hyperparameters, particularly learning rate multiplier and batch size (see the appendix in this post for definitions). Our experiment results have shown that these two factors can significantly impact the model’s performance, with improvements ranging from 2–10% across different tasks. For the learning rate multiplier, the value ranges between 0.1–2.0, with a default value of 1.0. We suggest starting with the default value and potentially adjusting this value based on your evaluation result. Batch size is another important parameter, and its optimal value can vary depending on your dataset size. Based on our hyperparameter tuning experiments across different use cases, the API allows a range of 4–256, with a default of 32. However, we’ve observed that dynamically adjusting the batch size based on your dataset size can lead to better results:

For datasets with 1,000 or more examples, aim for a batch size between 32–64
For datasets between 500–1,000 examples, a batch size between 16–32 is generally suitable
For smaller datasets with fewer than 500 examples, consider a batch size between 4–16

The following chart illustrates how model performance improves as the size of the training dataset increases, as well as the change of optimal parameters, using the TAT-QA dataset. Each data point is annotated with the optimal learning rate multiplier (LRM), batch size (BS), and number of epochs (Epoch) used to achieve the best performance with the dataset size. We can observe that larger datasets tend to benefit from higher learning rates and batch sizes, whereas smaller datasets require more training epochs. The red dashed line is the baseline Anthropic’s Claude 3 Haiku performance without fine-tuning efforts.

By following these guidelines, you can configure an Anthropic’s Claude 3 Haiku fine-tuning job with a higher chance of success. However, remember that these are general recommendations and the optimal settings may vary depending on your specific use case and dataset characteristics.

In scenarios with large amounts of data (1,000–10,000 examples), the learning rate tends to have a more significant impact on performance. Conversely, for smaller datasets (32–100 examples), the batch size becomes the dominant factor.

Performance evaluations

The fine-tuned Anthropic’s Claude 3 Haiku model demonstrated substantial performance improvements over base models when evaluated on the financial Q&A task, highlighting the effectiveness of the fine-tuning process on specialized data. Based on the evaluation results, we found the following:

Fine-tuned Anthropic’s Claude 3 Haiku performed better than Anthropic’s Claude 3 Haiku, Anthropic’s Claude 3 Sonnet, and Anthropic’s Claude 3.5 Sonnet for TAT-QA dataset across the target use case of question answering on financial text and tabular content.
For the performance evaluation metric F1 score (see the appendix for definition), fine-tuned Anthropic’s Claude 3 Haiku achieved a score of 91.2%, which is a 24.60% improvement over the Anthropic’s Claude 3 Haiku base model’s score of 73.2%. Fine-tuned Anthropic’s Claude 3 Haiku also achieved a 19.6% improvement over the Anthropic’s Claude 3 Sonnet base model’s performance, which obtained an F1 score of 76.3%. Fine-tuned Anthropic’s Claude 3 Haiku even achieved better performance over the Anthropic’s Claude 3.5 Sonnet base model.

The following table provides a detailed comparison of the performance metrics for the fine-tuned Claude 3 Haiku model against various base models, illustrating the significant improvements achieved through fine-tuning.

.	.	.	.	.	Fine-Tuned Model Performance	Base Model Performance			Improvement: Fine-Tuned Anthropic’s Claude 3 Haiku vs. Base Models
Target Use Case	Task Type	Fine-Tuning Data Size	Test Data Size	Eval Metric	Anthropic’s Claude 3 Haiku	Anthropic’s Claude 3 Haiku (Base Model)	Anthropic’s Claude 3 Sonnet	Anthropic’s Claude 3.5 Sonnet	vs. Anthropic’s Claude 3 Haiku Base	vs. Anthropic’s Claude 3 Sonnet Base	vs. Anthropic’s Claude 3.5 Sonnet Base
TAT-QA	Q&A on financial text and tabular content	10,000	3,572	F1 score	91.2%	73.2%	76.3%	83.0%	24.6%	19.6%	9.9%

Few-shot examples improve performance not only on the base model, but also on fine-tuned models, especially when the fine-tuning data is small.

Fine-tuning also demonstrated significant benefits in reducing token usage. On the TAT-QA HTML test set (893 examples), the fine-tuned Anthropic’s Claude 3 Haiku model reduced the average output token count by 35% compared to the base model, as shown in the following table.

Model	Average Output Token	% Reduced	Median	% Reduced	Standard Deviation	Minimum Token	Maximum Token
Anthropic’s Claude 3 Haiku Base	34	–	28	–	27	13	245
Anthropic’s Claude 3 Haiku Fine-Tuned	22	35%	17	39%	14	13	179

We use the following figures to illustrate the token count distribution for both the base Anthropic’s Claude 3 Haiku and fine-tuned Anthropic’s Claude 3 Haiku models. The left graph shows the distribution for the base model, and the right graph displays the distribution for the fine-tuned model. These histograms demonstrate a shift towards more concise output in the fine-tuned model, with a notable reduction in the frequency of longer token sequences.

To further illustrate this improvement, consider the following example from the test set:

Question: "How did the company adopt Topic 606?"
Ground truth answer: "the modified retrospective method"
Base Anthropic’s Claude 3 Haiku response: "The company adopted the provisions of Topic 606 in fiscal 2019 utilizing the modified retrospective method"
Fine-tuned Anthropic’s Claude 3 Haiku response: "the modified retrospective method"

As evident from this example, the fine-tuned model produces a more concise and precise answer, matching the ground truth exactly, whereas the base model includes additional, unnecessary information. This reduction in token usage, combined with improved accuracy, can lead to enhanced efficiency and reduced costs in production deployments.

Conclusion

Fine-tuning Anthropic’s Claude 3 Haiku on Amazon Bedrock offers significant performance improvements for specialized tasks. Our experiments demonstrate that careful attention to data quality, hyperparameter optimization, and best practices in the fine-tuning process can yield substantial gains over base models. Key takeaways include the following:

The importance of high-quality, task-specific datasets, even if smaller in size
Optimal hyperparameter settings vary based on dataset size and task complexity
Fine-tuned models consistently outperform base models across various metrics
The process is iterative, allowing for continuous improvement as new data or requirements emerge

Although fine-tuning provides impressive results, combining it with other techniques like prompt engineering may lead to even better outcomes. As LLM technology continues to evolve, mastering fine-tuning techniques will be crucial for organizations looking to use these powerful models for specific use cases and tasks.

Now you’re ready to fine-tune Anthropic’s Claude 3 Haiku on Amazon Bedrock for your use case. We look forward to seeing what you build when you put this new technology to work for your business.

Appendix

We used the following hyperparameters as part of our fine-tuning:

Learning rate multiplier – Learning rate multiplier is one of the most critical hyperparameters in LLM fine-tuning. It influences the learning rate at which model parameters are updated after each batch.
Batch size – Batch size is the number of training examples processed in one iteration. It directly impacts GPU memory consumption and training dynamics.
Epoch – One epoch means the model has seen every example in the dataset one time. The number of epochs is a crucial hyperparameter that affects model performance and training efficiency.

For our evaluation, we used the F1 score, which is an evaluation metric to assess the performance of LLMs and traditional ML models.

To compute the F1 score for LLM evaluation, we need to define precision and recall at the token level. Precision measures the proportion of generated tokens that match the reference tokens, and recall measures the proportion of reference tokens that are captured by the generated tokens. The F1 score ranges from 0–100, with 100 being the best possible score and 0 being the lowest. However, interpretation can vary depending on the specific task and requirements.

We calculate these metrics as follows:

Precision = (Number of matching tokens in generated text) / (Total number of tokens in generated text)
Recall = (Number of matching tokens in generated text) / (Total number of tokens in reference text)
F1 = (2 * (Precision * Recall) / (Precision + Recall)) * 100

For example, let’s say the LLM generates the sentence “The cat sits on the mat in the sun” and the reference sentence is “The cat sits on the soft mat under the warm sun.” The precision would be 6/9 (6 matching tokens out of 9 generated tokens), and the recall would be 6/11 (6 matching tokens out of 11 reference tokens).

Precision = 6/9 ≈ 0.667
Recall = 6/11 ≈ 0.545
F1 score = (2 * (0.667 * 0.545) / (0.667 + 0.545)) * 100 ≈ 59.90

About the Authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Sovik Kumar Nath is an AI/ML and Generative AI Senior Solutions Architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double master’s degrees from the University of South Florida and University of Fribourg, Switzerland, and a bachelor’s degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, and adventures.

Jennifer Zhu is a Senior Applied Scientist at AWS Bedrock, where she helps building and scaling generative AI applications with foundation models. Jennifer holds a PhD degree from Cornell University, and a master degree from University of San Francisco. Outside of work, she enjoys reading books and watching tennis games.

Fang Liu is a principal machine learning engineer at Amazon Web Services, where he has extensive experience in building AI/ML products using cutting-edge technologies. He has worked on notable projects such as Amazon Transcribe and Amazon Bedrock. Fang Liu holds a master’s degree in computer science from Tsinghua University.

Yanjun Qi is a Senior Applied Science Manager at the Amazon Bedrock Science. She innovates and applies machine learning to help AWS customers speed up their AI and cloud adoption.

Track, allocate, and manage your generative AI cost and usage with Amazon Bedrock

November 1, 2024

by Kyle Blocksom Amazon AWS

As enterprises increasingly embrace generative AI , they face challenges in managing the associated costs. With demand for generative AI applications surging across projects and multiple lines of business, accurately allocating and tracking spend becomes more complex. Organizations need to prioritize their generative AI spending based on business impact and criticality while maintaining cost transparency across customer and user segments. This visibility is essential for setting accurate pricing for generative AI offerings, implementing chargebacks, and establishing usage-based billing models.

Without a scalable approach to controlling costs, organizations risk unbudgeted usage and cost overruns. Manual spend monitoring and periodic usage limit adjustments are inefficient and prone to human error, leading to potential overspending. Although tagging is supported on a variety of Amazon Bedrock resources—including provisioned models, custom models, agents and agent aliases, model evaluations, prompts, prompt flows, knowledge bases, batch inference jobs, custom model jobs, and model duplication jobs—there was previously no capability for tagging on-demand foundation models. This limitation has added complexity to cost management for generative AI initiatives.

To address these challenges, Amazon Bedrock has launched a capability that organization can use to tag on-demand models and monitor associated costs. Organizations can now label all Amazon Bedrock models with AWS cost allocation tags, aligning usage to specific organizational taxonomies such as cost centers, business units, and applications. To manage their generative AI spend judiciously, organizations can use services like AWS Budgets to set tag-based budgets and alarms to monitor usage, and receive alerts for anomalies or predefined thresholds. This scalable, programmatic approach eliminates inefficient manual processes, reduces the risk of excess spending, and ensures that critical applications receive priority. Enhanced visibility and control over AI-related expenses enables organizations to maximize their generative AI investments and foster innovation.

Introducing Amazon Bedrock application inference profiles

Amazon Bedrock recently introduced cross-region inference, enabling automatic routing of inference requests across AWS Regions. This feature uses system-defined inference profiles (predefined by Amazon Bedrock), which configure different model Amazon Resource Names (ARNs) from various Regions and unify them under a single model identifier (both model ID and ARN). While this enhances flexibility in model usage, it doesn’t support attaching custom tags for tracking, managing, and controlling costs across workloads and tenants.

To bridge this gap, Amazon Bedrock now introduces application inference profiles, a new capability that allows organizations to apply custom cost allocation tags to track, manage, and control their Amazon Bedrock on-demand model costs and usage. This capability enables organizations to create custom inference profiles for Bedrock base foundation models, adding metadata specific to tenants, thereby streamlining resource allocation and cost monitoring across varied AI applications.

Creating application inference profiles

Application inference profiles allow users to define customized settings for inference requests and resource management. These profiles can be created in two ways:

Single model ARN configuration: Directly create an application inference profile using a single on-demand base model ARN, allowing quick setup with a chosen model.
Copy from system-defined inference profile: Copy an existing system-defined inference profile to create an application inference profile, which will inherit configurations such as cross-Region inference capabilities for enhanced scalability and resilience.

The application inference profile ARN has the following format, where the inference profile ID component is a unique 12-digit alphanumeric string generated by Amazon Bedrock upon profile creation.

arn:aws:bedrock:<region>:<account_id>:application-inference-profile/<inference_profile_id>

System-defined compared to application inference profiles

The primary distinction between system-defined and application inference profiles lies in their type attribute and resource specifications within the ARN namespace:

System-defined inference profiles: These have a type attribute of SYSTEM_DEFINED and utilize the inference-profile resource type. They’re designed to support cross-Region and multi-model capabilities but are managed centrally by AWS.

{
 …
"inferenceProfileArn": "arn:aws:bedrock:us-east-1:<Account ID>:inference-profile/us-1.anthropic.claude-3-sonnet-20240229-v1:0",
"inferenceProfileId": "us-1.anthropic.claude-3-sonnet-20240229-v1:0",
"inferenceProfileName": "US-1 Anthropic Claude 3 Sonnet",
"status": "ACTIVE",
"type": "SYSTEM_DEFINED",
…
}

Application inference profiles: These profiles have a type attribute of APPLICATION and use the application-inference-profile resource type. They’re user-defined, providing granular control and flexibility over model configurations and allowing organizations to tailor policies with attribute-based access control (ABAC) using AWS Identity and Access Management (IAM). This enables more precise IAM policy authoring to manage Amazon Bedrock access more securely and efficiently.
```
{
…
"inferenceProfileArn": "arn:aws:bedrock:us-east-1:<Account ID>:application-inference-profile/<Auto generated ID>",
"inferenceProfileId": <Auto generated ID>,
"inferenceProfileName": <User defined name>,
"status": "ACTIVE",
"type": "APPLICATION"
…
}
```

These differences are important when integrating with Amazon API Gateway or other API clients to help ensure correct model invocation, resource allocation, and workload prioritization. Organizations can apply customized policies based on profile type, enhancing control and security for distributed AI workloads. Both models are shown in the following figure.

Establishing application inference profiles for cost management

Imagine an insurance provider embarking on a journey to enhance customer experience through generative AI. The company identifies opportunities to automate claims processing, provide personalized policy recommendations, and improve risk assessment for clients across various regions. However, to realize this vision, the organization must adopt a robust framework for effectively managing their generative AI workloads.

The journey begins with the insurance provider creating application inference profiles that are tailored to their diverse business units. By assigning AWS cost allocation tags, the organization can effectively monitor and track their Bedrock spend patterns. For example, the claims processing team established an application inference profile with tags such as dept:claims, team:automation, and app:claims_chatbot. This tagging structure categorizes costs and allows assessment of usage against budgets.

Users can manage and use application inference profiles using Bedrock APIs or the boto3 SDK:

CreateInferenceProfile: Initiates a new inference profile, allowing users to configure the parameters for the profile.
GetInferenceProfile: Retrieves the details of a specific inference profile, including its configuration and current status.
ListInferenceProfiles: Lists all available inference profiles within the user’s account, providing an overview of the profiles that have been created.
TagResource: Allows users to attach tags to specific Bedrock resources, including application inference profiles, for better organization and cost tracking.
ListTagsForResource: Fetches the tags associated with a specific Bedrock resource, helping users understand how their resources are categorized.
UntagResource: Removes specified tags from a resource, allowing for management of resource organization.
Invoke models with application inference profiles:

- Converse API: Invokes the model using a specified inference profile for conversational interactions.
- ConverseStream API: Similar to the Converse API but supports streaming responses for real-time interactions.
- InvokeModel API: Invokes the model with a specified inference profile for general use cases.
- InvokeModelWithResponseStream API: Invokes the model and streams the response, useful for handling large data outputs or long-running processes.

Note that application inference profile APIs cannot be accessed through the AWS Management Console.

Invoke model with application inference profile using Converse API

The following example demonstrates how to create an application inference profile and then invoke the Converse API to engage in a conversation using that profile –

def create_inference_profile(profile_name, model_arn, tags):
    """Create Inference Profile using base model ARN"""
    response = bedrock.create_inference_profile(
        inferenceProfileName=profile_name,
        description="test",
        modelSource={'copyFrom': model_arn},
        tags=tags
    )
    print("CreateInferenceProfile Response:", response['ResponseMetadata']['HTTPStatusCode']),
    print(f"{response}n")
    return response

# Create Inference Profile
print("Testing CreateInferenceProfile...")
tags = [{'key': 'dept', 'value': 'claims'}]
base_model_arn = "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-sonnet-20240229-v1:0"
claims_dept_claude_3_sonnet_profile = create_inference_profile("claims_dept_claude_3_sonnet_profile", base_model_arn, tags)

# Extracting the ARN and retrieving Application Inference Profile ID
claims_dept_claude_3_sonnet_profile_arn = claims_dept_claude_3_sonnet_profile['inferenceProfileArn']

def converse(model_id, messages):
    """Use the Converse API to engage in a conversation with the specified model"""
    response = bedrock_runtime.converse(
        modelId=model_id,
        messages=messages,
        inferenceConfig={
            'maxTokens': 300,  # Specify max tokens if needed
        }
    )
    
    status_code = response.get('ResponseMetadata', {}).get('HTTPStatusCode')
    print("Converse Response:", status_code)
    parsed_response = parse_converse_response(response)
    print(parsed_response)
    return response

# Example of Converse API with Application Inference Profile
print("nTesting Converse...")
prompt = "nnHuman: Tell me about Amazon Bedrock.nnAssistant:"
messages = [{"role": "user", "content": [{"text": prompt}]}]
response = converse(claims_dept_claude_3_sonnet_profile_arn, messages)

Tagging, resource management, and cost management with application inference profiles

Tagging within application inference profiles allows organizations to allocate costs with specific generative AI initiatives, ensuring precise expense tracking. Application inference profiles enable organizations to apply cost allocation tags at creation and support additional tagging through the existing TagResource and UnTagResource APIs, which allow metadata association with various AWS resources. Custom tags such as project_id, cost_center, model_version, and environment help categorize resources, improving cost transparency and allowing teams to monitor spend and usage against budgets.

Visualize cost and usage with application inference profiles and cost allocation tags

Leveraging cost allocation tags with tools like AWS Budgets, AWS Cost Anomaly Detection, AWS Cost Explorer, AWS Cost and Usage Reports (CUR), and Amazon CloudWatch provides organizations insights into spending trends, helping detect and address cost spikes early to stay within budget.

With AWS Budgets, organization can set tag-based thresholds and receive alerts as spending approach budget limits, offering a proactive approach to maintaining control over AI resource costs and quickly addressing any unexpected surges. For example, a $10,000 per month budget could be applied on a specific chatbot application for the Support Team in the Sales Department by applying the following tags to the application inference profile: dept:sales, team:support, and app:chat_app. AWS Cost Anomaly Detection can also monitor tagged resources for unusual spending patterns, making it easier to operationalize cost allocation tags by automatically identifying and flagging irregular costs.

The following AWS Budgets console screenshot illustrates an exceeded budget threshold:

For deeper analysis, AWS Cost Explorer and CUR enable organizations to analyze tagged resources daily, weekly, and monthly, supporting informed decisions on resource allocation and cost optimization. By visualizing cost and usage based on metadata attributes, such as tag key/value and ARN, organizations gain an actionable, granular view of their spending.

The following AWS Cost Explorer console screenshot illustrates a cost and usage graph filtered by tag key and value:

The following AWS Cost Explorer console screenshot illustrates a cost and usage graph filtered by Bedrock application inference profile ARN:

Organizations can also use Amazon CloudWatch to monitor runtime metrics for Bedrock applications, providing additional insights into performance and cost management. Metrics can be graphed by application inference profile, and teams can set alarms based on thresholds for tagged resources. Notifications and automated responses triggered by these alarms enable real-time management of cost and resource usage, preventing budget overruns and maintaining financial stability for generate AI workloads.

The following Amazon CloudWatch console screenshot highlights Bedrock runtime metrics filtered by Bedrock application inference profile ARN:

The following Amazon CloudWatch console screenshot highlights an invocation limit alarm filtered by Bedrock application inference profile ARN:

Through the combined use of tagging, budgeting, anomaly detection, and detailed cost analysis, organizations can effectively manage their AI investments. By leveraging these AWS tools, teams can maintain a clear view of spending patterns, enabling more informed decision-making and maximizing the value of their generative AI initiatives while ensuring critical applications remain within budget.

Retrieving application inference profile ARN based on the tags for Model invocation

Organizations often use a generative AI gateway or large language model proxy when calling Amazon Bedrock APIs, including model inference calls. With the introduction of application inference profiles, organizations need to retrieve the inference profile ARN to invoke model inference for on-demand foundation models. There are two primary approaches to obtain the appropriate inference profile ARN.

Static configuration approach: This method involves maintaining a static configuration file in the AWS Systems Manager Parameter Store or AWS Secrets Manager that maps tenant/workload keys to their corresponding application inference profile ARNs. While this approach offers simplicity in implementation, it has significant limitations. As the number of inference profiles scales from tens to hundreds or even thousands, managing and updating this configuration file becomes increasingly cumbersome. The static nature of this method requires manual updates whenever changes occur, which can lead to inconsistencies and increased maintenance overhead, especially in large-scale deployments where organizations need to dynamically retrieve the correct inference profile based on tags.
Dynamic retrieval using the Resource Groups API: The second, more robust approach leverages the AWS Resource Groups GetResources API to dynamically retrieve application inference profile ARNs based on resource and tag filters. This method allows for flexible querying using various tag keys such as tenant ID, project ID, department ID, workload ID, model ID, and region. The primary advantage of this approach is its scalability and dynamic nature, enabling real-time retrieval of application inference profile ARNs based on current tag configurations.

However, there are considerations to keep in mind. The GetResources API has throttling limits, necessitating the implementation of a caching mechanism. Organizations should maintain a cache with a Time-To-Live (TTL) based on the API’s output to optimize performance and reduce API calls. Additionally, implementing thread safety is crucial to help ensure that organizations always read the most up-to-date inference profile ARNs when the cache is being refreshed based on the TTL.

As illustrated in the following diagram, this dynamic approach involves a client making a request to the Resource Groups service with specific resource type and tag filters. The service returns the corresponding application inference profile ARN, which is then cached for a set period. The client can then use this ARN to invoke the Bedrock model through the InvokeModel or Converse API.

By adopting this dynamic retrieval method, organizations can create a more flexible and scalable system for managing application inference profiles, allowing for more straightforward adaptation to changing requirements and growth in the number of profiles.

The architecture in the preceding figure illustrates two methods for dynamically retrieving inference profile ARNs based on tags. Let’s describe both approaches with their pros and cons:

Bedrock client maintaining the cache with TTL: This method involves the client directly querying the AWS ResourceGroups service using the GetResources API based on resource type and tag filters. The client caches the retrieved keys in a client-maintained cache with a TTL. The client is responsible for refreshing the cache by calling the GetResources API in the thread safe way.
Lambda-based Method: This approach uses AWS Lambda as an intermediary between the calling client and the ResourceGroups API. This method employs Lambda Extensions core with an in-memory cache, potentially reducing the number of API calls to ResourceGroups. It also interacts with Parameter Store, which can be used for configuration management or storing cached data persistently.

Both methods use similar filtering criteria (resource-type-filter and tag-filters) to query the ResourceGroup API, allowing for precise retrieval of inference profile ARNs based on attributes such as tenant, model, and Region. The choice between these methods depends on factors such as the expected request volume, desired latency, cost considerations, and the need for additional processing or security measures. The Lambda-based approach offers more flexibility and optimization potential, while the direct API method is simpler to implement and maintain.

Overview of Amazon Bedrock resources tagging capabilities

The tagging capabilities of Amazon Bedrock have evolved significantly, providing a comprehensive framework for resource management across multi-account AWS Control Tower setups. This evolution enables organizations to manage resources across development, staging, and production environments, helping organizations track, manage, and allocate costs for their AI/ML workloads.

At its core, the Amazon Bedrock resource tagging system spans multiple operational components. Organizations can effectively tag their batch inference jobs, agents, custom model jobs, knowledge bases, prompts, and prompt flows. This foundational level of tagging supports granular control over operational resources, enabling precise tracking and management of different workload components. The model management aspect of Amazon Bedrock introduces another layer of tagging capabilities, encompassing both custom and base models, and distinguishes between provisioned and on-demand models, each with its own tagging requirements and capabilities.

With the introduction of application inference profiles, organizations can now manage and track their on-demand Bedrock base foundation models. Because teams can create application inference profiles derived from system-defined inference profiles, they can configure more precise resource tracking and cost allocation at the application level. This capability is particularly valuable for organizations that are running multiple AI applications across different environments, because it provides clear visibility into resource usage and costs at a granular level.

The following diagram visualizes the multi-account structure and demonstrates how these tagging capabilities can be implemented across different AWS accounts.

Conclusion

In this post we introduced the latest feature from Amazon Bedrock, application inference profiles. We explored how it operates and discussed key considerations. The code sample for this feature is available in this GitHub repository. This new capability enables organizations to tag, allocate, and track on-demand model inference workloads and spending across their operations. Organizations can label all Amazon Bedrock models using tags and monitoring usage according to their specific organizational taxonomy—such as tenants, workloads, cost centers, business units, teams, and applications. This feature is now generally available in all AWS Regions where Amazon Bedrock is offered.

About the authors

Kyle T. Blocksom is a Sr. Solutions Architect with AWS based in Southern California. Kyle’s passion is to bring people together and leverage technology to deliver solutions that customers love. Outside of work, he enjoys surfing, eating, wrestling with his dog, and spoiling his niece and nephew.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Advance environmental sustainability in clinical trials using AWS

November 1, 2024

by Sidharth Rampally Amazon AWS

Traditionally, clinical trials not only place a significant burden on patients and participants due to the costs associated with transportation, lodging, meals, and dependent care, but also have an environmental impact. With the advancement of available technologies, decentralized clinical trials have become a widely popular topic of discussion and offer a more sustainable approach. Decentralized clinical trials reduce the need to travel to study sites by lowering the financial burden on all parties involved, thereby accelerating patient recruitment and reducing dropout rates. Decentralized clinical trials use technologies such as wearable devices, patient apps, smartphones, and telemedicine to accelerate recruitment, reduce dropout, and minimize the carbon footprint of clinical research. AWS can play a key role in enabling fast implementation of these decentralized clinical trials.

In this post, we discuss how to use AWS to support a decentralized clinical trial across the four main pillars of a decentralized clinical trial (virtual trials, personalized patient engagement, patient-centric trial design, and centralized data management). By exploring these AWS powered alternatives, we aim to demonstrate how organizations can drive progress towards more environmentally friendly clinical research practices.

The challenge and impact of sustainability on clinical trials

With the rise of greenhouse gas emissions globally, finding ways to become more sustainable is quickly becoming a challenge across all industries. At the same time, global health awareness and investments in clinical research have increased as a result of motivations by major events like the COVID-19 pandemic. For instance, in 2021, we saw a significant increase in awareness of clinical research studies seeking volunteers, which was reported at 63% compared to 54% in 2019 by Applied Clinical Trials. This suggests that the COVID-19 pandemic brought increased attention to clinical trials among the public and magnified the importance of including diverse populations in clinical research.

These clinical research trials study new tests and treatments while evaluating their effects on human health outcomes. People often volunteer to take part in clinical trials to test medical interventions, including drugs, biological products, surgical procedures, radiological procedures, devices, behavioral treatments, and preventive care. The rise of clinical trials presents a major sustainability challenge—they are often not sustainable and can contribute substantially to greenhouse gas emissions due to how they are being implemented. The main sources of these are usually associated with the intensive energy use associated with research premises and air travel.

This post discusses an alternative to clinical trials—by decentralizing clinical trials, we can reduce the major greenhouse gas emissions caused by human activities present in clinical trials today.

The CRASH trial case study

We can further examine the impact of carbon emissions associated with clinical trials through the carbon audit of the CRASH trial case lead by medical research journal, BMJ. The CRASH trial was a clinical trial conducted from 1999–2004 and recruited patients from 49 countries in the span of 5 years. In the study, the effect of intravenous corticosteroids (a drug produced by Pfizer) on death within 14 days in 10,008 adults with clinically significant head injuries was examined. BMJ conducted an audit on the total emissions of greenhouse gases that were produced by the trials and calculated that roughly 126 metric tons (carbon dioxide equivalent) was emitted during a 1-year period. Over a 5-year period, it would mean that the entire trial would be responsible for about 630 metric tons of carbon dioxide equivalent.

Much of these greenhouse gas emissions can be attributed to travel (such as air travel, hotel, meetings), distribution associated for drugs and documents, and electricity used in coordination centers. According to the EPA, the average passenger vehicle emits about 4.6 metric tons of carbon dioxide per year. In comparison, 630 tons of carbon dioxide would be equivalent to the annual emissions of around 137 passenger vehicles. Similarly, the average US household generates about 20 metric tons of carbon dioxide per year from energy use. 630 tons of carbon dioxide would also be equal to the annual emissions of around 31 average US homes. 630 tons of carbon dioxide already represents a very substantial amount of greenhouse gas for one clinical trial. According to sources from government databases and research institutions, there are around 300,000–600,000 clinical trials conducted globally each year, amplifying this impact by several hundred thousand times.

Clinical trials vs. decentralized clinical trials

Decentralized clinical trials present opportunities to address the sustainability challenges associated with traditional clinical trial models. As a byproduct of decentralized trials, there are also improvements in the patient experience by reducing their burden, making the process more convenient and sustainable.

Today, clinical trials can contribute significantly to greenhouse gas emissions, primarily through energy use in research facilities and air travel. In contrast to the energy-intensive nature of centralized trial sites, the distributed nature of decentralized clinical trials offers a more practical and cost-effective approach to implementing renewable energy solutions.

For centralized clinical trials, many are conducted in energy-intensive healthcare facilities. Traditional trial sites, such as hospitals and dedicated research centers, can have high energy demands for equipment, lighting, and climate control. These facilities often rely on regional or national power grids for their energy needs. Integrating renewable energy solutions in these facilities can also be costly and challenging, because it can involve significant investments into new equipment, renewable energy projects, and more.

In decentralized clinical trials, the reduction in infrastructure and onsite resources will allow for a lower energy demand overall. This, in turn, will result in benefits such as simplified trial designs, reduced bureaucracy, and less human travel required for video conferencing. Furthermore, the additional appointments required for clinical trials might create additional time and financial burdens for participants. Decentralized clinical trials can reduce the burden on patients for in-person visits and increase patient retention and long-term follow-up.

Core pillars on how AWS can power sustainable decentralized clinical trials

AWS customers have developed proven solutions that power sustainable decentralized clinical trials. SourceFuse is an AWS partner that has developed a mobile app and web interface that enables patients to participate in decentralized clinical trials remotely from their homes, eliminating the environmental impact of travel and paper-based data collection. The platform’s cloud-centered architecture, built on AWS services, supports the scalable and sustainable operation of these remote clinical trials.

In this post, we provide sustainability-oriented guidance focused on four key areas: virtual trials, personalized patient engagement, patient-centric trial design, and centralized data management. The following figure showcases the AWS services that can help in these four areas.

Personalized remote patient engagement

The average dropout rate for clinical trials is 30%, so providing an omnichannel experience for subjects to interact with trial facilitators is imperative. Because decentralized clinical trials provide flexibility for patients to participate at home, the experience for patients to collect and report data should be seamless. One solution is to use voice applications to enable patient data reporting, using Amazon Alexa and Amazon Connect. For example, a patient can report symptoms to their Amazon Echo device, invoking an automated patient outreach scheduler using Amazon Connect.

Trial facilitators can also use Amazon Pinpoint to connect with customers through multiple channels. They can use Amazon Pinpoint to send medication reminders, automate surveys, or push other communications without the need for paper mail delivery.

Virtual trials

Decentralized clinical trials reduce emissions compared to regular clinical trials by eliminating the need for travel and physical infrastructure. Instead, a core component of decentralized clinical trials is a secure, scalable data infrastructure with strong data analytics capabilities. Amazon Redshift is a fully managed cloud data warehouse that trial scientists can use to perform analytics.

Clinical Research Organizations (CROs) and life sciences organizations can also use AWS for mobile device and wearable data capture. Patients, in the comfort of their own home, can collect data passively through wearables, activity trackers, and other smart devices. This data is streamed to AWS IoT Core, which can write data to Amazon Data Firehose in real time. This data can then be sent to services like Amazon Simple Storage Service (Amazon S3) and AWS Glue for data processing and insight extraction.

Patient-centric trial design

A key characteristic of decentralized clinical trials is patient-centric protocol design, which prioritizes the patients’ needs throughout the entire clinical trial process. This involves patient-reported outcomes and often implement flexible participation, which can complicate protocol development and necessitate more extensive regulatory documentation. This can add days or even weeks to the lifespan of a trial, leading to avoidable costs. Amazon SageMaker enables trial developers to build and train machine learning (ML) models that reduce the likelihood of protocol amendments and inconsistencies. Models can also be built to determine the appropriate sample size and recruitment timelines.

With SageMaker, you can optimize your ML environment for sustainability. Amazon SageMaker Debugger provides profiler capabilities to detect under-utilization of system resources, which helps right-size your environment and avoid unnecessary carbon emissions. Organizations can further reduce emissions by choosing deployment regions near renewable energy projects. Currently, there are 22 AWS data center regions where 100% of the electricity consumed is matched by renewable energy sources. Additionally, you can use Amazon Q, a generative AI-powered assistant, to surface and generate potential amendments to avoid expensive costs associated with protocol revisions.

Centralized data management

CROs and bio-pharmaceutical companies are striving to achieve end-to-end data linearity for all clinical trials within an organization. They want to see traceability across the board, while achieving data harmonization for regulatory clinical trial guardrails. The pipeline approach to data management in clinical trials has led to siloed, disconnected data across an organization, because separate storage is used for each trial. Decentralized clinical trials, however, often employ a singular data lake for all of an organization’s clinical trials.

With a centralized data lake, organizations can avoid the duplication of data across separate trial databases. This leads to savings in storage costs and computing resources, as well as a reduction in the environmental impact of maintaining multiple data silos. To build a data management platform, the process could begin with ingesting and normalizing clinical trial data using AWS HealthLake. HealthLake is designed to ingest data from various sources, such as electronic health records, medical imaging, and laboratory results, and automatically transform the data into the industry-standard FHIR format. This clinical voice application solution built entirely on AWS showcases the advantages of having a centralized location for clinical data, such as avoiding data drift and redundant storage.

With the normalized data now available in HealthLake, the next step would be to orchestrate the various data processing and analysis workflows using AWS Step Functions. You can use Step Functions to coordinate the integration of the HealthLake data into a centralized data lake, as well as invoke subsequent processing and analysis tasks. This could involve using serverless computing with AWS Lambda to perform event-driven data transformation, quality checks, and enrichment activities. By combining the power powerful data normalization capabilities of HealthLake and the orchestration features of Step Functions, the platform can provide a robust, scalable, and streamlined approach to managing decentralized clinical trial data within the organization.

Conclusion

In this post, we discussed the critical importance of sustainability in clinical trials. We provided an overview of the key distinctions between traditional centralized clinical trials and decentralized clinical trials. Importantly, we explored how AWS technologies can enable the development of more sustainable clinical trials, addressing the four main pillars that underpin a successful decentralized trial approach.

To learn more about how AWS can power sustainable clinical trials for your organization, reach out to your AWS Account representatives. For more information about optimizing your workloads for sustainability, see Optimizing Deep Learning Workloads for Sustainability on AWS.

References

[1] https://www.appliedclinicaltrialsonline.com/view/awareness-of-clinical-research-increases-among-underrepresented-groups

[2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1839193/

[3] https://pubmed.ncbi.nlm.nih.gov/15474134/

[4] ClinicalTrials.gov and https://www.iqvia.com/insights/the-iqvia-institute/reports/the-global-use-of-medicines-2022

[5] https://aws.amazon.com/startups/learn/next-generation-data-management-for-clinical-trials-research-built-on-aws?lang=en-US#overview

[6] https://pubmed.ncbi.nlm.nih.gov/39148198/

About the Authors

Sid Rampally is a Customer Solutions Manager at AWS driving GenAI acceleration for Life Sciences customers. He writes about topics relevant to his customers, focusing on data engineering and machine learning. In his spare time, Sid enjoys walking his dog in Central Park and playing hockey.

Nina Chen is a Customer Solutions Manager at AWS specializing in leading software companies to leverage the power of the AWS cloud to accelerate their product innovation and growth. With over 4 years of experience working in the strategic Independent Software Vendor (ISV) vertical, Nina enjoys guiding ISV partners through their cloud transformation journeys, helping them optimize their cloud infrastructure, driving product innovation, and deliver exceptional customer experiences.