Orchestrate generative AI workflows with Amazon Bedrock and AWS Step Functions

Orchestrate generative AI workflows with Amazon Bedrock and AWS Step Functions

Companies across all industries are harnessing the power of generative AI to address various use cases. Cloud providers have recognized the need to offer model inference through an API call, significantly streamlining the implementation of AI within applications. Although a single API call can address simple use cases, more complex ones may necessitate the use of multiple calls and integrations with other services.

This post discusses how to use AWS Step Functions to efficiently coordinate multi-step generative AI workflows, such as parallelizing API calls to Amazon Bedrock to quickly gather answers to lists of submitted questions. We also touch on the usage of Retrieval Augmented Generation (RAG) to optimize outputs and provide an extra layer of precision, as well as other possible integrations through Step Functions.

Introduction to Amazon Bedrock and Step Functions

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don’t have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

AWS Step Functions is a fully managed service that makes it easier to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function helps you scale more easily and change applications more quickly. Step Functions is a reliable way to coordinate components and step through the functions of your application. Step Functions provides a graphical console to arrange and visualize the components of your application as a series of steps. This makes it easier to build and run multi-step applications. Step Functions automatically triggers and tracks each step and retries when there are errors, so your application executes in order and as expected. Step Functions logs the state of each step, so when things do go wrong, you can diagnose and debug problems more quickly. You can change and add steps without even writing code, so you can more easily evolve your application and innovate faster.

Orchestrating parallel tasks using the map functionality

Arrays are fundamental data structures in programming, consisting of ordered collections of elements. In the context of Step Functions, arrays play a crucial role in enabling parallel processing and efficient task orchestration. The map functionality in Step Functions uses arrays to execute multiple tasks concurrently, significantly improving performance and scalability for workflows that involve repetitive operations. Step Functions provides two different mapping strategies for iterating through arrays: inline mapping and distributed mapping, each with its own advantages and use cases.

Inline mapping

The inline map functionality allows you to perform parallel processing of array elements within a single Step Functions state machine execution. This approach is suitable when you have a relatively small number of items to process and when the processing of each item is independent of the others.
Here’s how it works:

  1. You define a Map state in your Step Functions state machine.
  2. Step Functions iterates over the array and runs the specified tasks for each element concurrently.
  3. The results of each iteration are collected and made available for subsequent steps in the state machine.

Inline mapping is efficient for lightweight tasks and helps avoid launching multiple Step Functions executions, which can be more costly and resource intensive. But there are limitations. When using inline mapping, only JSON payloads can be accepted as input, your workflow’s execution history can’t exceed 25,000 entries, and you can’t run more than 40 concurrent map iterations.

Distributed mapping

The distributed map functionality is designed for scenarios where many items need to be processed or when the processing of each item is resource intensive or time-consuming. Instead of handling all items within a single execution, Step Functions launches a separate execution for each item in the array, letting you concurrently process large-scale data sources stored in Amazon Simple Storage Service (Amazon S3), such as a single JSON or CSV file containing large amounts of data, or even a large set of Amazon S3 objects. This approach offers the following advantages:

  • Scalability – By distributing the processing across multiple executions, you can scale more efficiently and take advantage of the built-in parallelism in Step Functions
  • Fault isolation – If one execution fails, it doesn’t affect the others, providing better fault tolerance and reliability
  • Resource management – Each execution can be allocated its own resources, helping prevent resource contention and providing consistent performance

However, distributed mapping can incur additional costs due to the overhead of launching multiple Step Functions executions.

Choosing a mapping approach

In summary, inline mapping is suitable for lightweight tasks with a relatively small number of items, whereas distributed mapping is better suited for resource-intensive tasks or large datasets that require better scalability and fault isolation. The choice between the two mapping strategies depends on the specific requirements of your application, such as the number of items, the complexity of processing, and the desired level of parallelism and fault tolerance.

Another important consideration when building generative AI applications using Amazon Bedrock and Step Functions Map states together would be the Amazon Bedrock runtime quotas. Generally, these model quotas allow for hundreds or even thousands of requests per minute. However, you may run into issues trying to run a large map on models with low requests processed per minute quotas, such as image generation models. In that scenario, you can include a retrier in the error handling of your Map state.

Solution overview

In the following sections, we get hands-on to see how this solution works. Amazon Bedrock has a variety of model choices to address specific needs of individual use cases. For the purposes of this exercise, we use Amazon Bedrock to run inference on Anthropic’s Claude 3.5 Haiku model to receive answers to an array of questions because it’s a performant, fast, and cost-effective option.

Our goal is to create an express state machine in Step Functions using the inline Map state to parse through the JSON array of questions sent by an API call from an application. For each question, Step Functions will scale out horizontally, creating a simultaneous call to Amazon Bedrock. After all the answers come back, Step Functions will concatenate them into a single response, which our original calling application can then use for further processing or displaying to end-users.

The payload we send consists of an array of nine Request for Proposal (RFP) questions, as well as a company description:

{
  "questions": [
    "Can you describe your technical capabilities and infrastructure?",
    "What security measures do you have in place to protect data and privacy?",
    "Can you provide case studies or examples of similar projects you have handled?",
    "How do you handle project management, and what tools do you use?",
    "What are your support and maintenance services like?",
    "What is your pricing model?",
    "Can you provide references from other clients?",
    "How do you ensure the scalability of your solution?",
    "What is your approach to data backup and recovery?"
  ],
  "description": "Our company, AnyCompany Tech, boasts a robust technical infrastructure that allows us to handle complex projects with ease. Our strength lies in our dynamic team of experts and our cutting-edge technology, which, when combined, can deliver solutions of any scale. We've worked with clients across the globe, for instance, our project with Example Corp involved a sophisticated upgrade of their system. In terms of security, we prioritize data privacy and have put in place stringent measures to ensure that all data is stored securely. We're quite proud of our project with AnyCompany Networks, where we overhauled their security systems to bolster their data protection capabilities. We use a range of project management tools, including Product-1 and Product-2, which allows us to customize our approach to each client's needs. Our pricing model varies depending on the project, but we always aim to provide cost-effective solutions. We've had numerous positive feedback from our clients, with Example Corp and AnyCompany Networks among those who have expressed satisfaction with our services. We're more than happy to provide further references upon request. Software updates and upgrades are a critical part of our service. We have a dedicated team that ensures all systems are up-to-date and running smoothly. Furthermore, our solutions are designed to be scalable, ensuring that they can grow alongside your business. Lastly, in terms of data backup and recovery, we have a comprehensive plan in place, which includes regular data backups and a robust recovery strategy. We understand the importance of data in today's world and we're committed to ensuring its safety and accessibility at all times."
}

You can use the step-by-step guide in this post or use the prebuilt AWS CloudFormation template in the us-west-2 Region to provision the necessary AWS resources. AWS CloudFormation gives developers and businesses a straightforward way to create a collection of related AWS and third-party resources, and provision and manage them in an orderly and predictable fashion.

Prerequisites

You need the following prerequisites to follow along with this solution implementation:

Create a State Machine and add a Map state

In the AWS console in the us-west-2 Region, launch into Step Functions, and select Get started and Create your own to open a blank canvas in Step Functions Workflow Studio.

Edit the state machine by adding an inline Map state with items sourced from a JSON payload.

Next, tell the Map state where the array of questions is located by selecting Provide a path to items array and pointing it to the questions array using JSONPath syntax. Selecting Modify items with ItemSelector allows you to structure the payload, which is then sent to each of the child workflow executions. Here, we map the description through with no change and use $$.Map.Item.Value to map the question from the array at the index of the map iteration.

Invoke an Amazon Bedrock model

Next, add a Bedrock: InvokeModel action task as the next state within the Map state.

Now you can structure your Amazon Bedrock API calls through Workflow Studio. Because we’re using Anthropic’s Claude 3.5 Haiku model on Amazon Bedrock, we select the corresponding model ID for Bedrock model identifier and edit the provided sample with instructions to incorporate the incoming payload. Depending on which model you select, the payload may have a different structure and prompt syntax.

Build the payload

The prompt you build uses the Amazon State Language intrinsic function States.Format in order to do string interpolation, substituting {} for the variables declared after the string. We must also include .$ after our text key to reference a node in this state’s JSON input.

When building out this prompt, you should be very prescriptive in asking the model to do the following:

  • Answer the questions thoroughly using the following description
  • Not repeat the question
  • Only respond with the answer to the question

We set the max_tokens to 800 to allow for longer responses from Amazon Bedrock. Additionally, you can include other inference parameters such as temperature, top_p, top_k, and stop_sequences. Tuning these parameters can help limit the length or influence the randomness or diversity of the model’s response. For the sake of this example, we keep all other optional parameters as default.

{
  "anthropic_version": "bedrock-2023-05-31",
  "max_tokens": 800,
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text.$": "States.Format('Answer following question thoroughly, using the following description. Do not repeat the question. Only respond with the answer to the question. Question: {} Description: {}', $.questions.question, $.description)"
        }
      ]
    }
  ]
}

Form the response

To provide a cleaner response back to our calling application, we want to use some options to transform the output of the Amazon Bedrock Task state. First, use ResultSelector to filter the response coming back from the service to pull out the text completion, then add the original input back to the output using ResultPath and finish by filtering the final output using OutputPath. That way you don’t have to see the description being mapped unnecessarily for each array item.

To simulate the state machine being called by an API, choose Execute in Workflow Studio. Using the preceding input, the Step Functions output should look like the following code, although it may vary slightly due to the diversity and randomness of FMs:

{
  "output": [
    {
      "question": "Can you describe your technical capabilities and infrastructure?",
      "response": {
        "completion": "AnyCompany Tech offers a comprehensive technical infrastructure powered by a dynamic expert team and cutting-edge technology capable of handling complex projects at any scale. The company prioritizes data security through stringent protection measures, uses advanced project management tools like Product-1 and Product-2, and offers customizable solutions. They provide global client support with a focus on scalability, software updates, and robust data backup and recovery strategies. Their approach emphasizes cost-effectiveness, client satisfaction, and adaptable technological solutions that can grow with a client's business needs. The team maintains a dedicated focus on keeping systems updated, secure, and responsive to evolving technological requirements."
      }
    },
    {
      "question": "What security measures do you have in place to protect data and privacy?",
      "response": {
        "completion": "At AnyCompany Tech, we have implemented stringent security measures to protect data and privacy. These include a comprehensive data backup and recovery strategy with regular backups, robust security systems that have been successfully deployed for clients like AnyCompany Networks, and dedicated teams ensuring systems are up-to-date. We prioritize data privacy and use advanced technical infrastructure and cutting-edge technology to secure client information. Our approach involves customizable security solutions, scalable systems, and a commitment to protecting data accessibility and safety at all times."
      }
    },
    {
      "question": "Can you provide case studies or examples of similar projects you have handled?",
      "response": {
        "completion": "Based on the description provided, we have successfully completed projects for Example Corp, involving a sophisticated system upgrade, and AnyCompany Networks, where we overhauled their security systems. These case studies demonstrate our ability to handle complex technical infrastructure projects across different industries. Our team has worked with global clients, delivering solutions that prioritize security, data privacy, and scalability. While specific technical details aren't elaborated, our track record shows proven expertise in system improvements and security enhancements. We have received positive client feedback from both Example Corp and AnyCompany Networks, and we are prepared to provide additional references upon request."
      }
    },
    {
      "question": "How do you handle project management, and what tools do you use?",
      "response": {
        "completion": "At AnyCompany Tech, we utilize Product-1 and Product-2 as our primary project management tools, allowing us to create customized approaches for each client's unique needs. Our project management strategy is built on our dynamic team of experts and cutting-edge technology, which enables us to handle complex projects efficiently. We prioritize adaptability, scalability, and client-specific solutions, ensuring that our project management approach can accommodate projects of any scale. Our tools and processes are designed to provide transparent tracking, seamless communication, and effective resource allocation, which allows us to deliver high-quality results consistently. Additionally, we have dedicated teams for software updates, system maintenance, and data backup/recovery to support our robust project management infrastructure."
      }
    },
    {
      "question": "What are your support and maintenance services like?",
      "response": {
        "completion": "Our support and maintenance services are comprehensive and tailored to each client's unique needs. We provide a dedicated team that ensures all systems are up-to-date and running smoothly, with regular software updates and upgrades. Our solutions are designed to be scalable, allowing them to grow alongside your business. We have a robust data backup and recovery strategy that includes regular backups and a comprehensive recovery plan to ensure data safety and accessibility. Our technical infrastructure, combined with our expert team, allows us to deliver high-quality, secure, and efficient maintenance services. We use advanced project management tools like Product-1 and Product-2 to customize our approach, and we prioritize data privacy with stringent security measures. Our goal is to provide cost-effective solutions that meet and exceed client expectations, with a track record of successful projects and satisfied clients like Example Corp and AnyCompany Networks."
      }
    },
    {
      "question": "What is your pricing model?",
      "response": {
        "completion": "Our pricing model is flexible and tailored to each specific project. We aim to provide cost-effective solutions that are customized to meet the unique needs of our clients. The exact pricing depends on the scope, complexity, and requirements of the project, ensuring that we deliver value while addressing the specific challenges of each client's technological needs."
      }
    },
    {
      "question": "Can you provide references from other clients?",
      "response": {
        "completion": "Yes, we can provide references from clients like Example Corp and AnyCompany Networks, who have previously worked with us and expressed satisfaction with our services. We are more than happy to connect you with these references upon request, allowing you to gain direct insights into our capabilities, project performance, and customer satisfaction."
      }
    },
    {
      "question": "How do you ensure the scalability of your solution?",
      "response": {
        "completion": "We ensure scalability through a multi-faceted approach. Our solutions are intentionally designed to be flexible and adaptable, capable of growing alongside our clients' businesses. We leverage cutting-edge technology and our dynamic team of experts to create infrastructure that can handle increasingly complex projects. Our robust technical infrastructure allows us to seamlessly expand systems, and we use advanced project management tools like Product-1 and Product-2 to customize solutions that can scale effectively. Additionally, we have a dedicated team focused on software updates and upgrades, which helps maintain system performance and enables smooth expansion. Our comprehensive approach to scalability is further supported by our data backup and recovery strategies, ensuring system reliability and continuity as businesses grow. By prioritizing adaptability and having the technical expertise to implement scalable solutions, we can confidently meet the evolving needs of our clients across various industries."
      }
    },
    {
      "question": "What is your approach to data backup and recovery?",
      "response": {
        "completion": "Our data backup and recovery approach encompasses regular, comprehensive data backups and a robust recovery strategy. We understand the critical importance of data protection and have implemented a systematic process to ensure data safety and accessibility at all times. Our strategy involves creating multiple backup copies, storing them in secure, redundant locations, and maintaining a well-defined recovery protocol that allows for quick restoration of data in the event of any potential loss or system failure. This approach is designed to minimize downtime and protect our clients' valuable information, reflecting our commitment to maintaining the highest standards of data security and reliability."
      }
    }
  ],
  "outputDetails": {
    "truncated": false
  }
}

Clean up resources

To delete this solution, navigate to the State machines page on the Step Functions console, select your state machine, choose Delete, and enter delete to confirm. It will be marked for deletion and will be deleted when all executions are stopped.

RAG and other possible integrations

RAG is a strategy that enhances the output of a large language model (LLM) by allowing it to reference an authoritative external knowledge base, generating more accurate or secure responses. This powerful tool can extend the capabilities of LLMs to specific domains or an organization’s internal knowledge base without needing to retrain or even fine-tune the model.

A straightforward way to integrate RAG into the preceding RFP example is by adding a Bedrock Runtime Agents: Retrieve action task to your Map state before invoking the model. This enables queries to Amazon Bedrock Knowledge Bases, which supports various vector storage databases, including the Amazon OpenSearch Serverless vector engine, Pinecone, Redis Enterprise Cloud, and soon Amazon Aurora and MongoDB. Using Knowledge Bases to ingest and vectorize example RFPs and documents stored in Amazon S3 eliminates the need to include a description with the question array. Also, because a vector store can accommodate a broader range of information than a single prompt is able to, RAG can greatly enhance the specificity of the responses.

In addition to Amazon Bedrock Knowledge Bases, there are other options to integrate for RAG depending on your existing tech stack, such as directly with an Amazon Kendra Task state or with a vector database of your choosing through third-party APIs using HTTP Task states.

Step Functions offers composability, allowing you to seamlessly integrate over 9,000 AWS API actions from more than 200 services directly into your workflows. These optimized service integrations simplify the use of common services like AWS Lambda, Amazon Elastic Container Service (Amazon ECS), AWS Glue, and Amazon EMR, offering features such as IAM policy generation and the Run A Job (.sync) pattern, which automatically waits for the completion of asynchronous jobs. Another common pattern seen in generative AI applications is chaining models together to accomplish secondary tasks, like language translation after a primary summarization task is completed. This can be accomplished by adding another Bedrock: InvokeModel action task just as we did earlier.

Conclusion

In this post, we demonstrated the power and flexibility of Step Functions for orchestrating parallel calls to Amazon Bedrock. We explored two mapping strategies—inline and distributed—for processing small and large datasets, respectively. Additionally, we delved into a practical use case of answering a list of RFP questions, demonstrating how Step Functions can efficiently scale out and manage multiple Amazon Bedrock calls.

We introduced the concept of RAG as a strategy for enhancing the output of an LLM by referencing an external knowledge base and demonstrated multiple ways to incorporate RAG into Step Functions state machines. We also highlighted the integration capabilities of Step Functions, particularly the ability to invoke over 9,000 AWS API actions from more than 200 services directly from your workflow.

As next steps, explore the possibilities of application patterns offered by the GenAI Quick Start PoCs GitHub repo as well as various Step Functions integrations through sample project templates within Workflow Studio. Also, consider integrating RAG into your workflows to use your organization’s internal knowledge base or specific domain expertise.


About the Author

Dimitri Restaino is a Brooklyn-based AWS Solutions Architect specialized in designing innovative and efficient solutions for healthcare companies, with a focus on the potential applications of AI, blockchain and other promising industry disruptors. Off the clock, he can be found spending time in nature or setting fastest laps in his racing sim.

Read More

Build generative AI applications on Amazon Bedrock with the AWS SDK for Python (Boto3)

Build generative AI applications on Amazon Bedrock with the AWS SDK for Python (Boto3)

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources. Because Amazon Bedrock is serverless, you don’t have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

In this post, we demonstrate how to use Amazon Bedrock with the AWS SDK for Python (Boto3) to programmatically incorporate FMs.

Solution overview

The solution uses an AWS SDK for Python script with features that invoke Anthropic’s Claude 3 Sonnet on Amazon Bedrock. By using this FM, it generates an output using a prompt as input. The following diagram illustrates the solution architecture.

Prerequisites

Before you invoke the Amazon Bedrock API, make sure you have the following:

Deploy the solution

After you complete the prerequisites, you can start using Amazon Bedrock. Begin by scripting with the following steps:

  1. Import the required libraries:
import boto3
import json
  1. Set up the Boto3 client to use the Amazon Bedrock runtime and specify the AWS Region:
# Set up the Amazon Bedrock client
bedrock_client = boto3.client(
    	service_name="bedrock-runtime",
    region_name="us-east-1"
)
  1. Define the model to invoke using its model ID. In this example, we use Anthropic’s Claude 3 Sonnet on Amazon Bedrock:
# Define the model ID
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
  1. Assign a prompt, which is your message that will be used to interact with the FM at invocation:
# Prepare the input prompt.
prompt = "Hello, how are you?"

Prompt engineering techniques can improve FM performance and enhance results.

Before invoking the Amazon Bedrock model, we need to define a payload, which acts as a set of instructions and information guiding the model’s generation process. This payload structure varies depending on the chosen model. In this example, we use Anthropic’s Claude 3 Sonnet on Amazon Bedrock. Think of this payload as the blueprint for the model, and provide it with the necessary context and parameters to generate the desired text based on your specific prompt. Let’s break down the key elements within this payload:

  • anthropic_version – This specifies the exact Amazon Bedrock version you’re using.
  • max_tokens – This sets a limit on the total number of tokens the model can generate in its response. Tokens are the smallest meaningful unit of text (word, punctuation, subword) processed and generated by large language models (LLMs).
  • temperature – This parameter controls the level of randomness in the generated text. Higher values lead to more creative and potentially unexpected outputs, and lower values promote more conservative and consistent results.
  • top_k – This defines the number of most probable candidate words considered at each step during the generation process.
  • top_p – This influences the sampling probability distribution for selecting the next word. Higher values favor frequent words, whereas lower values allow for more diverse and potentially surprising choices.
  • messages – This is an array containing individual messages for the model to process.
  • role – This defines the sender’s role within the message (the user for the prompt you provide).
  • content – This array holds the actual prompt text itself, represented as a “text” type object.
  1. Define the payload as follows:
payload = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "temperature": 0.9,
    "top_k": 250,
    "top_p": 1,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                }
            ]
        }
    ]
}
  1. You have set the parameters and the FM you want to interact with. Now you send a request to Amazon Bedrock by providing the FM to interact with and the payload that you defined:
# Invoke the Amazon Bedrock model
response = bedrock_client.invoke_model(
    modelId=model_id,
    body=json.dumps(payload)
)
  1. After the request is processed, you can display the result of the generated text from Amazon Bedrock:
# Process the response
result = json.loads(response["body"].read())
generated_text = "".join([output["text"] for output in result["content"]])
print(f"Response: {generated_text}")

Let’s look at our complete script:

import boto3
import json

# Set up the Amazon Bedrock client
bedrock_client = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1"
)

# Define the model ID
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

# Prepare the input prompt
prompt = "Hello, how are you?"

# Create the request payload
payload = {
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "temperature": 0.9,
    "top_k": 250,
    "top_p": 1,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                }
            ]
        }
    ]
}

# Invoke the Amazon Bedrock model
response = bedrock_client.invoke_model(
    modelId=model_id,
    body=json.dumps(payload)
)

# Process the response
result = json.loads(response["body"].read())
generated_text = "".join([output["text"] for output in result["content"]])
print(f"Response: {generated_text}")

 

Invoking the model with the prompt “Hello, how are you?” will yield the result shown in the following screenshot.

Clean up

When you’re done using Amazon Bedrock, clean up temporary resources like IAM users and Amazon CloudWatch logs to avoid unnecessary charges. Cost considerations depend on usage frequency, chosen model pricing, and resource utilization while the script runs. See Amazon Bedrock Pricing for pricing details and cost-optimization strategies like selecting appropriate models, optimizing prompts, and monitoring usage.

Conclusion

In this post, we demonstrated how to programmatically interact with Amazon Bedrock FMs using Boto3. We explored invoking a specific FM and processing the generated text, showcasing the potential for developers to use these models in their applications for a variety of use cases, such as:

  • Text generation – Generate creative content like poems, scripts, musical pieces, or even different programming languages
  • Code completion – Enhance developer productivity by suggesting relevant code snippets based on existing code or prompts
  • Data summarization – Extract key insights and generate concise summaries from large datasets
  • Conversational AI – Develop chatbots and virtual assistants that can engage in natural language conversations

Stay curious and explore how generative AI can revolutionize various industries. Explore the different models and APIs and run comparisons of how each model provides different outputs. Find the model that will fit your use case and use this script as a base to create agents and integrations in your solution.


About the Author

Merlin Naidoo is a Senior Technical Account Manager at AWS with over 15 years of experience in digital transformation and innovative technical solutions. His passion is connecting with people from all backgrounds and leveraging technology to create meaningful opportunities that empower everyone. When he’s not immersed in the world of tech, you can find him taking part in active sports.

Read More

Improve factual consistency with LLM Debates

Improve factual consistency with LLM Debates

In this post, we demonstrate the potential of large language model (LLM) debates using a supervised dataset with ground truth. In this LLM debate, we have two debater LLMs, each one taking one side of an argument and defending it based on the previous arguments for N(=3) rounds. The arguments are saved for a judge LLM to review. After N(=3) rounds, the same judge LLM with no access to original dataset but only with the LLM arguments decides which side is correct.

One challenging use case that can be addressed using this technique is scaling up the ground truth curation/alignment process for unsupervised and raw datasets. We can start with human annotation for labelling ground truth, but it can be expensive, slow, hard to scale, and may not reach consensus. We can also use this LLM debate generated synthetic ground truth data to build and pre-train larger and more powerful LLMs.

This post and the subsequent code implementation were inspired by one of the International Conference on Machine Learning (ICML) 2024 best papers on LLM debates Debating with More Persuasive LLMs Leads to More Truthful Answers. It uses a different dataset, TofuEval.

Note that the question asked to the judge LLM for every technique is always the same: `Which one of these summaries is the most factually consistent one?” The answer is binary. Either Summary A or summary B is correct. For each of these techniques, the same judge LLM is used to give the final answer.

The LLM debating technique can be more factually consistent (truthful) over existing methods like LLM consultancy and standalone LLM inferencing with self-consistency. To demonstrate this, we compare each of the four techniques mentioned below in this post:

  1. Naive Judge: This standalone LLM has no access to the transcript, but only the question and two summaries. It is used to measure the baseline performance on pre-trained LLM knowledge.
  2. Expert Judge: This LLM has access to the transcript along with the question and two summaries.
  3. LLM Consultancy: The standalone LLM defends one side of the summary choice for N(=3) rounds, expanding in more depth why it thinks it is correct in selecting the summary choice. After 3 rounds, a judge LLM with no access to transcript but only the LLM defense notes decides which summary choice is correct.
  4. LLM Debates: 2 LLMs each take one side of the argument and defends it based on the previous arguments for 3 rounds. After 3 rounds, a judge LLM with no access to the transcript but only with the LLM arguments decides which summary choice is correct.

As an overall solution, we use Amazon Sagemaker and Amazon Bedrock to invoke the different types of LLMs for each technique.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can quickly experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don’t have to manage the infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

Use-case overview

The overall task of each of the four techniques is to choose which one of the two summaries is most appropriate for a given transcript. There is a total of 10 transcripts and each transcript has 2 summaries – one correct and the other incorrect. Refer to the dataset section of this post for the generation details. The incorrect summaries have various classes of errors like Nuanced Meaning Shift, Extrinsic Information and Reasoning errors.

In this post, we navigate the LLM debating technique with persuasive LLMs having two expert debater LLMs (Anthropic Claude 3 Sonnet and Mixtral 8X7B) and one judge LLM (Mistral 7B v2 to measure, compare, and contrast its performance against other techniques like self-consistency (with naive and expert judges) and LLM consultancy.

The choice of judge and all other candidate LLMs can be varied from very small to large LLMs (based on model parameters) based on the nature of the use case, task complexity, dataset, and cost incurred. In this post, we have used at least 7B or greater parameter LLMs to demonstrate the overall efficacy of each technique as well as keeping cost in mind. It is possible to choose smaller LLMs depending on the task complexity; For example, if complex common-sense reasoning is not involved, we can choose Claude Haiku over Sonnet. Depending on the use-case, task complexity, dataset, and budget constraints, LLMs can be switched out to observe the performance changes (if any). The model cards for each LLM also serve as a good starting point to understand at which ML tasks each LLM excels. We recommend that these experiments along with choosing LLMs are tried out over diverse smaller subsets of the original dataset before scaling up.

To demonstrate the measurement and improvement of factual consistency (veracity) with explainability, we conduct a series of experiments with each of the four techniques to choose the best summary for each transcript. In each experiment with a different technique, we measure the factual consistency of the summaries generated from the transcripts and improve upon the decision to choose the correct one via methods like LLM consultancy and LLM debates.

The following question is repeated for all 3 rounds:

"Which one of these summaries is the most factually consistent one?"

Dataset

The dataset for this post is manually distilled from the Amazon Science evaluation benchmark dataset called TofuEval. For this post, 10 meeting transcripts have been curated from the MediaSum repository inside the TofuEval dataset. Details on the exact dataset can be found in the GitHub repository.

MediaSum is a large-scale media interview dataset containing 463.6K transcripts with abstractive summaries, collected from interview transcripts and overview / topic descriptions from NPR and CNN.

We use the following AWS services:

In the following sections, we demonstrate how to use the GitHub repository to run all of the techniques in this post.

Setup Prerequisites

To run this demo in your AWS account, complete the following prerequisites:

  1. Create an AWS account if you don’t already have one.
  2. Clone the GitHub repository and follow the steps explained in the README.
  3. Set up a SageMaker notebook using an AWS CloudFormation template, available in the GitHub repository. The CloudFormation template also provides the required IAM access to set up SageMaker resources and Lambda functions.
  4. Acquire access to models hosted on Amazon Bedrock. Choose Manage model access in the navigation pane on the Amazon Bedrock console and choose from the list of available options. We are invoking Anthropic Claude 3 Sonnet, Mistral 7B, and Mixtral 8X7B using Amazon Bedrock for this post.

Solution overview

In this section, we will deep-dive into each of the four techniques being compared against each other.

  1. Naive Judge
  2. Expert Judge
  3. LLM Consultancy
  4. LLM Debates

Details of prompt used for each technique can be found here

Commonalities across all four techniques

  1. Each question is repeated for 3 rounds. This is to introduce LLM self-consistency. The majority answer is deemed correct.
  2. We flip the side of the argument the LLM takes for each round. This accounts for errors due to position bias (choosing an answer due to its order/position) and verbosity bias (one answer longer than the other).

Part 1: Standalone LLMs

In , we use a standalone LLM Mistral 7B to find out which of the two summaries is more factually consistent. There are 2 techniques: naïve judge and expert judge.

Technique 1: (Naive judge)

This standalone LLM chooses on one of the two summaries as the more factually consistent answer. It is used to measure the baseline performance on this dataset for a pretrained LLM like Mistral 7B. The visualization of the naive judge technique is as follows:

In this image, we depict the flow of Naive LLM judge. First the naive judge LLM has NO access to transcripts just the question and two summaries to choose from as the more factually consistent. Next the naive judge makes a random guess which of the two summaries are more factually consistent for 3 rounds. Majority answer is chosen based on self-consistency technique.

Naive Judge LLM

Prompt template for Naïve Judge

standalone_naive_mistral = """
[INST]
User: You are a super-intelligent AI tasked to answer a reading comprehension question with two possible answers. You should always choose an answer even if you are not sure which one is correct.
Question: {question}
Answer A: {answer_a}
Answer B: {answer_b}

1. Your task is to choose the answer you think is most likely to be correct.
2. Think step by step through your reasoning and output your reasoning.
3. Output the answer text you chose as correct.
4. Then, on a new line, give your final answer formatted as ”Final Answer: A|B”.
[/INST]
"""
CNN-1041291Which one of these summaries is the most factually consistent one? Answer A:GM lost $10.6B in 2005, continues losing market share. Answer B:GM lost $10B in 2005, continues losing market share.Reasoning: Both answers state that General Motors (GM) lost money in 2005 and continued losing market share. The only difference between the two answers is the exact amount of money GM lost in 2005. The difference is $0.6B. Given that the difference is relatively small compared to the overall amount lost, and considering that rounding errors or slight discrepancies in reporting financial figures are common, I would consider both answers factually consistent. However, since the task asks for the most factually consistent answer, I would choose the answer that is closer to the reported figure. Answer text I chose as correct: Answer A: GM lost $10.6B in 2005, continues losing market share. Final Answer: A. Reasoning explanation: Although both answers are factually consistent, I chose answer A because it is closer to the reported figure of GM's losses in 2005. The difference between the two answers is $0.6B, which is a relatively small amount compared to the overall loss. However, it is still important to choose the answer that is as close to the reported figure as possible to maintain factual consistency.

Naive Judge LLM Response

For each question, we ask the LLM number_of_rounds=3 times to follow a self-consistency paradigm.

for round_number in range(number_of_rounds):
        logger.info(f"START OF Naive Judge Round #{round_number + 1} for debate_id {debate_id} >>>>>> n")
        judge_response = invoke_mistral_standalone_naive(
            debate_id = debate_id,
            question = question,
            answer_a = answer_a,
            answer_b = answer_b
        )

        naive_judge_per_round.append(extract_final_answer(judge_response, flipped=False))

Technique 2: (Expert judge)

Mistral 7B now becomes an expert judge with access to the transcripts and chooses which of the two summaries is the more factually consistent one. The visualization of the expert judge technique is as follows:

In this image, we depict the flow of LLM Expert Judge. First the expert Judge LLM has access to transcripts along with the question and two summaries to choose from as more factually consistent. Next the expert judge uses the transcript contents to decide which of the two summaries are more factually consistent for 3 rounds. Majority answer is chosen based on self-consistency technique

Expert Judge LLM Response

Prompt template for expert judge:

standalone_expert_mistral = """
[INST]
Human: You are a super-intelligent AI and you will be shown a story, followed by a question and two possible answers.

Here is the complete story, use it to answer the question below:

{complete_interview}


Here is the question and the two answers: Question: {question}
Answer A: {answer_a}
Answer B: {answer_b}

1. Your task is to choose the answer you think is most likely to be correct. 
2. Think step by step about which answer is likely to be correct. Output your thoughts.
3. Output the answer text you chose as correct.
4. Finally output your final answer formatted as ”Final Answer: A|B”.
Assistant:
[/INST]
"""
Debate IDRound #Task QuestionJudge Response 0CNN-1041291Which one of these summaries is the most factually consistent one? Answer A:GM lost $10.6B in 2005, continues losing market share. Answer B:GM lost $10B in 2005, continues losing market share.1. My task is to choose the answer that is most factually consistent with the information provided in the story. 2. I have carefully read the story and both answers. The story states that General Motors reported a loss of $10.6 billion in 2005. Therefore, the correct answer is: 3. Answer A: GM lost $10.6B in 2005, continues losing market share. 4. Final Answer: A.

Expert Judge LLM response

For each question, we ask the LLM number_of_rounds=3 times to follow a self-consistency paradigm.


for round_number in range(number_of_rounds):

        logger.info(f"Expert Judge Round #{round_number + 1} >>>>>> n")
        judge_response = invoke_mistral_standalone_expert(
            debate_id = debate_id,
            question = question,
            answer_a = answer_a,
            answer_b = answer_b,
            complete_interview = complete_interview_transcript
        )
        expert_judge_per_round.append(extract_final_answer(judge_response, flipped=False))
        logger.info(f">>>>>>> judge_response Round #{round_number + 1}>>>>> ::  {judge_response}")

Technique 3: (LLM consultancy)

In , we use Anthropic Claude 3 Sonnet as an LLM consultant for both sides of the answers separately. In other words, in the first experiment the LLM consultant defends answer A for N(=3) and in the second experiment defends answer B for the N(=3) rounds. We take the average accuracy of both the experiments as final factual consistency accuracy. (Refer to the evaluation metrics section for accuracy definition) This continues for N(=3 in this notebook) rounds. We flip the argument sides for the consultant LLM and take the average of the experiments results as final accuracy. Refer to the Evaluation section to see how we calculate this accuracy.

The visualization of the LLM consultancy technique is as follows:

In this image, we depict the flow of LLM Consulancy. First a consultant LLMs is assigned a side to defend.They persuade the judge why their choice of summary is correct based on transcript contents. Next each consultation from the LLM is saved to a file and the consultant picks up the entire rationale history before posting their next thought. Finally, Once all 3 rounds of consultancy are over, the Judge LLM reads all the content and decides whether to agree or disagree with the consultant.

LLM Consultancy judge

Prompt template for LLM consultancy

consultancy_mistral_judge_template = """
[INST]
User: You are an expert judge in a consultancy game where a consultant argues for the correct or incorrect answer to a reading comprehension question based on a short story. Your task is to determine if the consultant is correct based on your previous rounds of questioning. However, you haven’t read the story, making your task challenging. To guide you, the consultant can include direct quotes from the story. Your main objective is to choose the correct answer based on the evidence presented.
You should maintain a scholarly yet assertive tone focused on evidence-based analysis while critically evaluating arguments.
Here is some important information regarding the quote system which is applied after the debaters have finished their arguments:
• Verified quotes are direct string matches from the story and are enclosed in  tags.
• Unverified quotes are not direct string matches from the story, so they should not be trusted and are enclosed in tags.
• Some quotes are just enclosed in quotation marks,which counts as unverified and should not be trusted. Here is the question and the two answers:
Question: {question}
Answer A: {answer_a}
Answer B: {answer_b}
Here’s the current transcript of the consultancy:

{current_consultancy_transcript}

Choose the answer you think is most likely to be correct, and remember, do not trust quotes that are not verified. Think about your answer in  tags by carrying out these steps:
1. Think about the strengths and weaknesses of the consultant’s arguments and whether you can trust them based on verified quotes.
2. Think step by step about which answer is likely to be correct. Output your thoughts.
3. Output the answer text you chose as correct.
4. Then, on a new line, format your final answer as "Final Answer: A|B".
[/INST]
"""
Debate IDRound #Task QuestionJudge Response 0CNN-1041291Which one of these summaries is the most factually consistent one? Answer A:GM lost $10.6B in 2005, continues losing market share. Answer B:GM lost $10B in 2005, continues losing market share.<thinking> The consultant's arguments focus on the financial losses of General Motors in 2005 and the company's continuing market share decline. The consultant provides a direct quote from the story stating that General Motors reported a $10.6 billion loss in 2005. The consultant argues that rounding down this figure to $10 billion is a reasonable approximation and more accurate than stating the precise figure in the competing summary. The consultant also emphasizes the importance of including the context that General Motors is continuing to lose market share. The competing summary only mentions that General Motors lost $10.6 billion in 2005 without mentioning the continuing market share decline. Based on the evidence presented, the consultant's summary seems more factually consistent as it includes the context of General Motors' continuing market share decline and provides a reasonable approximation of the reported loss. Final Answer: A. GM lost $10B in 2005, continues losing market share.

LLM Consultancy Judge response

For each question, we ask the LLM number_of_rounds=3 times to follow a self-consistency paradigm.

for round_number in range(number_of_rounds):
        logger.info(f"START Consultancy with Claude Debate_id {debate_id} Round #{round_number + 1} >>>>>> n")

        consultancy_response = invoke_claude_v3(debate_id = debate_id, 
                         round_number = round_number + 1,
                         question=question,
                         summary_defending = claude_defending_summary, 
                         summary_opposing = claude_opposing_summary, 
                         complete_interview = complete_interview_transcript,
                         debate=False
                         )

Technique 4: (LLM Debate)

In , we use Anthropic Claude 3 Sonnet as the first debater and Mixtral 8X7B as the second debater with Mistral 7b as the judge. We let each debater argue their side for N(=3) rounds. Each round of debate is saved in a file. For the next round, each debater continues to defend their side based on the previous round’s argument. Once N(=3) rounds are over, the judge LLM uses only these arguments to decide which side is better. Now we flip Anthropic Claude 3 Sonnet (LLM-1) and Mixtral 8X7B (LLM-2) argument sides in both of the experiments and take the average of the experiment results as final accuracy. Refer to the Evaluation section to see how we calculate this accuracy.

The visualization of the LLM debate technique is as follows:

In this image, we depict the flow of LLM Debate. First Debater LLMs like Claude and Mixtral argue their side based on transcript contents. Next each argument is saved to a file and the next debater picks up the entire argument history before posting their next argument. Finally, once all 3 rounds of arguments are over, the Judge LLM reads all the arguments and decides which summary is the most factually consistent answer.

LLM Debate Judge

Prompt template for judge LLM

mistral_debate_judge_template = """
[INST]
User: You are an expert judge in a debate where the participants are arguing over the correct answer to a reading comprehension question based on a short story. Your task is to determine which debater is correct. However, you haven’t read the story, making your task challenging. To guide you, debaters can include direct quotes from the story. Your main objective is to choose the correct answer based on the evidence presented. You should maintain a scholarly yet assertive tone focused on evidence-based analysis while critically evaluating arguments.
Here is some important information regarding the quote system which is applied after the debaters have finished their arguments:
• Verified quotes are direct string matches from the story and are enclosed in quote tags.
• Unverified quotes are not direct string matches from the story, so they should not be trusted and are enclosed in quote tags.
• Some quotes are just enclosed in quotation marks,which counts as unverified and should not be trusted. Here is the question and the two answers:
Question: "{question}"
Answer A: {answer_a}
Answer B: {answer_b}
{debater_a} is arguing for answer A, and {debater_b} is arguing for answer B.
Here’s the current transcript of the debate:

{current_debate_transcript}

Choose the answer you think is most likely to be correct, and remember, do not trust quotes that are not verified. Think about your answer in  tags by carrying out these steps:
1. Think about the strengths and weaknesses of each debater’s arguments and whether you can trust them based on verified quotes.
2. Think step by step about which answer is likely to be correct. Output your thoughts.
3. Output the answer you chose as correct.
4. Then, on a new line, format your final answer as ”Final Answer: A or B”.
[/INST]
"""
Debate IDRound #Task QuestionJudge Response 0CNN-1041291Which one of these summaries is the most factually consistent one? Answer A:GM lost $10.6B in 2005, continues losing market share. Answer B:GM lost $10B in 2005, continues losing market share.<thinking> Both debaters make valid points, but the precision of the numbers in question is a significant difference. Debater expert_debater_mixtral_8_7B correctly quotes the exact loss reported by the story, while debater expert_debater_sonnet_v3 approximates the loss with $10B. However, the story's emphasis seems to be on the dramatic decline and massive losses of GM in 2005, rather than the exact figure. Both summaries accurately reflect GM's continuing market share loss. Given this context, I believe debater expert_debater_sonnet_v3's approximation is more factually consistent with the story's key details, as it captures the essence of GM's financial struggles in 2005. </thinking> Final Answer: A Answer A: GM lost $10B in 2005, continues losing market share.

LLM Debate Judge Response

For each question, we ask the LLM number_of_rounds=3 times to follow a self-consistency paradigm.

for round_number in range(number_of_rounds):
        print(f"=========== START OF 2 model DEBATE debate_id {debate_id} Round #1..{round_number + 1} ======= n")
        logger.info(f"START Debate with Claude Debate_id {debate_id} Round #{round_number + 1} >>>>>> n") 
        claude_debate_response = invoke_claude_v3(debate_id = debate_id,
                         question=question,
                         round_number = round_number + 1,
                         summary_defending = claude_defending_summary, 
                         summary_opposing = claude_opposing_summary, 
                         complete_interview = complete_interview_transcript,
                         debate=True
                         )

        logger.info(f" >>>>> claude_debate_response Round #{round_number + 1} >>>>> {claude_debate_response}")
        logger.info(f"END Debate with Claude Round #{round_number + 1} >>>>>> n")

        mixtral_debate_response = invoke_mistral(debate_id = debate_id,
                     question=question,
                     round_number = round_number + 1,
                     summary_defending = mixtral_defending_summary, 
                     summary_opposing = mixtral_opposing_summary, 
                     complete_interview = complete_interview_transcript, 
                     )

        logger.info(f" >>>>> mixtral_debate_response Round #{round_number + 1} >>>>> {mixtral_debate_response}")
        logger.info(f"END Debate with Mixtral Round #{round_number + 1} >>>>>> n")

Evaluation Metrics

Factual Consistency Accuracy (for all techniques):

For each question in every technique, the judge chooses whether summary A or B is True. As mentioned above, we also flip the position of summary A and B and repeat the same question to the same LLM. At the end of a run, we define the factual consistency accuracy as the number of times the judge chose the same answer regardless of its position being flipped (to account for position bias, verbosity bias, or random guess).

factual_consistency_accuracy = find_number_of_matching_elements(judge_regular_answers, judge_flipped_answers)/total_data_points

Finally, we compare the accuracy of each technique against each other.

Win rate per LLM (this metric only applies to LLM debates):

For the LLM debate, we can calculate the win rate of the LLM debaters to evaluate which of the LLMs got most of the answers right as adjudicated by the judge LLM. With this win rate of expert models, we empirically understand which LLM as a debater is more successful than the other. This metric may be used to choose one LLM over the other given a particular use case and dataset.

claude_avg_win_rate, mixtral_avg_win_rate = get_win_rate_per_model(debate_judge_regular_answers, debate_judge_flipped_answers)

Details about the win rate per model can be found in the GitHub repository here.

Cost considerations

The following are important cost considerations:

Conclusion

In this post, we demonstrated how LLM debate is a technique that can improve factual consistency. While it can be expensive to use three LLMs (two debaters and one judge), a potential direction could be scaling up the ground truth curation/alignment process for unsupervised/raw datasets for fine-tuning existing LLMs and building new LLMs.

From the examples in each of the techniques, we see the interpretability and rationale used by the LLMs in getting to the final answer. The naïve judge technique establishes a lower threshold of performance whereas the LLM debate technique is the most verbose providing a detailed explanation of how it got to the final answer. The expert judge technique outperforms the naïve judge and the LLM consultancy technique does better than the expert judge as shown in the figure below.

For many repeated runs across this small subset of TofuEval dataset, we observe the LLM debating technique out-performing the other techniques mentioned in this post. One entire end-to-end run snapshot of performance is as follows:

bar graph, x = Experiment Type, y = Accuracy. Values are Naive Judge = 0.1, Expert Judge=0.4, LLM Consultancy=0.5, LLM Debate=0.7

Compare accuracies across all four techniques

Depending on the use case and dataset volume, while we can start with human annotation, it can quickly become expensive, slow, and disagreement amongst human annotators can add layers of complexity. A scalable oversight direction could be this LLM debating technique to align on the ground truth options via this debating and critique mechanism thereby establishing factual consistency. However, before scaling up this technique for your use case, it is necessary to compare the LLM debate performance against human annotation over a diverse subset of the domain-specific dataset.

Readers are highly encouraged to switch LLMs that are apt for their use case with this debating technique. LLM debates need to be calibrated and aligned with human preference for the task and dataset. You can use Amazon SageMaker Ground Truth for labeling jobs to record human preferences with their own private skilled work teams or use Amazon SageMaker Ground Truth Plus for a fully managed experience for this human alignment task.

To learn more about customizing models with Amazon Bedrock, see Customize your model to improve its performance for your use case.

Acknowledgements

The author thanks all the reviewers for their valuable feedback.


About the Author

Image of Author

Shayan Ray is an Applied Scientist at Amazon Web Services. His area of research is all things natural language (like NLP, NLU, and NLG). His work has been focused on conversational AI, task-oriented dialogue systems and LLM-based agents. His research publications are on natural language processing, personalization, and reinforcement learning.

Read More

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

This post is part of an ongoing series about governing the machine learning (ML) lifecycle at scale. To view this series from the beginning, start with Part 1. This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product. It enables different business units within an organization to create, share, and govern their own data assets, promoting self-service analytics and reducing the time required to convert data experiments into production-ready applications. The data mesh architecture aims to increase the return on investments in data teams, processes, and technology, ultimately driving business value through innovative analytics and ML projects across the enterprise.

Organizations spanning various industries are progressively utilizing data and ML to drive innovation, enhance decision-making processes, and gain a competitive advantage. However, as data volumes and complexity continue to grow, effective data governance becomes a critical challenge. Organizations must make sure their data assets are properly managed, secured, and compliant with regulatory requirements, while also enabling seamless access and collaboration among various teams and stakeholders.

This post explores the role of Amazon DataZone, a comprehensive data management and governance service, in addressing these challenges at scale. We dive into a real-world use case from the financial services industry, where effective marketing campaigns are crucial for acquiring and retaining customers, as well as cross-selling products. By taking advantage of the data governance capabilities of Amazon DataZone, financial institutions like banks can securely access and use their comprehensive customer datasets to design and implement targeted marketing campaigns tailored to individual customer needs and preferences.

We explore the following key aspects:

  • Traditional challenges in data management and governance across multiple systems and accounts
  • The benefits of Amazon DataZone in simplifying data governance and enabling seamless data sharing
  • A detailed use case on using governed customer data for effective marketing campaigns in the banking and financial services industry
  • The reference architecture for a multi-account ML platform, highlighting the role of Amazon DataZone in the data management and governance layer
  • Step-by-step guidance on setting up Amazon DataZone in a multi-account environment, including account setup, blueprint enablement, user management, and project configuration for data publishers and subscribers

By the end of this post, you will have a comprehensive understanding of how Amazon DataZone can empower organizations to establish centralized data governance, enforce consistent policies, and facilitate secure data sharing across teams and accounts, ultimately unlocking the full potential of your data assets while maintaining compliance and security.

Challenges in data management

Traditionally, managing and governing data across multiple systems involved tedious manual processes, custom scripts, and disconnected tools. This approach was not only time-consuming but also prone to errors and difficult to scale. Organizations often struggled with the following challenges:

  • Discovering data assets scattered everywhere
  • Enforcing consistent data policies and access controls
  • Understanding data lineage and dependencies
  • A lack of centralized data governance, leading to data silos, compliance issues, and inefficient data utilization

Amazon DataZone solves these problems by providing a comprehensive solution for data management and governance:

  • You can automatically discover and catalog data assets across multiple AWS accounts and virtual private clouds (VPCs)
  • It allows you to define and enforce consistent governance policies, track data lineage, and securely share data with fine-grained access controls—all from a single platform
  • Amazon DataZone integrates with AWS Identity and Access Management (IAM) for secure access management, making sure only authorized users and applications can access data assets based on their roles and permissions
  • With Amazon DataZone, organizations gain better visibility, control, and governance over their data, enabling informed decision-making, better compliance with regulations, and unlocking the full potential of their data

Use case

In the competitive banking and financial services industry, effective marketing campaigns are crucial for acquiring new customers, retaining existing ones, and cross-selling products. With the data governance capabilities of Amazon DataZone, banks can securely access and use their own comprehensive customer datasets to design and implement targeted marketing campaigns for financial products, such as certificates of deposit, investment portfolios, and loan offerings. In this post, we discuss how banks can establish a centralized data catalog, enabling data publishers to share customer datasets and marketing teams to subscribe to relevant data using Amazon DataZone.

The following diagram gives a high-level illustration of the use case.

The diagram shows several accounts and personas as part of the overall infrastructure. In the given use case of using Amazon DataZone for effective marketing campaigns in the banking and financial services industry, the different accounts serve the following functions:

  • Management account – This account manages organization-level functions, such as defining the organizational structure, provisioning new accounts, managing identities and access (identity management), implementing security and governance best practices, and orchestrating the creation of the landing zone (a secure and compliant environment for workloads). For example, in the bank marketing use case, the management account would be responsible for setting up the organizational structure for the bank’s data and analytics teams, provisioning separate accounts for data governance, data lakes, and data science teams, and maintaining compliance with relevant financial regulations.
  • Data governance account – This account hosts the central data governance services provided by Amazon DataZone. It serves as the hub for defining and enforcing data governance policies, data cataloging, data lineage tracking, and managing data access controls across the organization. For instance, for our use case, the data governance account would be used to define and enforce policies around customer data privacy, data quality rules for customer datasets, and access controls for sharing customer data with the marketing team.
  • Data lake account (producer) – There can be one or more data lake accounts within the organization. We discuss this in more detail later in this post.
  • Data science team account (consumer) – There can be one or more data science team accounts or data consumer accounts within the organization. We provide additional information later in this post.

By separating these accounts and their responsibilities, the organization can maintain a clear separation of duties, enforce appropriate access controls, and make sure data governance policies are consistently applied across the entire data lifecycle. The data governance account, acting as the central hub, enables seamless data sharing and collaboration between the data producers (data lake accounts) and data consumers (data science team accounts), while meeting data privacy, security, and compliance requirements.

Solution overview

The following diagram illustrates the ML platform reference architecture using various AWS services. The functional architecture with different capabilities is implemented using a number of AWS services, including AWS Organizations, Amazon SageMaker, AWS DevOps services, and a data lake. For more information about the architecture in detail, refer to Part 1 of this series. In this post, we focus on the highlighted Amazon DataZone section.

solution__architecture

The data management services function is organized through the data lake accounts (producers) and data science team accounts (consumers).

The data lake accounts are responsible for storing and managing the enterprise’s raw, curated, and aggregated datasets. Data engineers and data publishers work within these accounts to ingest, process, and publish data assets that can be consumed by other teams, such as the marketing team or data science teams. In the bank marketing use case, the data lake accounts would store and manage the bank’s customer data, including raw data from various sources, curated datasets with customer profiles, and aggregated datasets for marketing segmentation.

As producers, data engineers in these accounts are responsible for creating, transforming, and managing data assets that will be cataloged and governed by Amazon DataZone. They make sure data is produced consistently and reliably, adhering to the organization’s data governance rules and standards set up in the data governance account. Data engineers contribute to the data lineage process by providing the necessary information and metadata about the data transformations they perform.

Amazon DataZone plays a crucial role in maintaining data lineage information, enabling traceability and impact analysis of data transformations across the organization. It handles the actual maintenance and management of data lineage information, using the metadata provided by data engineers to build and maintain the data lineage.

The data science team accounts are used by data analysts, data scientists, or marketing teams to access and consume the published data assets from the data lake accounts. Within these accounts, they can perform analyses, build models, or design targeted marketing campaigns by using the governed and curated datasets made available through the data sharing and access control mechanisms of Amazon Data Zone. For example, in the bank marketing use case, the data science team accounts would be used by the bank’s marketing teams to access and analyze customer datasets, build predictive models for targeted marketing campaigns, and design personalized financial product offerings based on the shared customer data.

Using Amazon DataZone in a multi-account ML platform

You can find practical, step-by-step instructions for implementing this setup in module 2 of this AWS Multi-Account Data & ML Governance Workshop.  This workshop provides detailed guidance on setting up Amazon DataZone in the central governance account.

Conclusion

Effective governance is crucial for organizations to unlock their data’s potential while maintaining compliance and security. Amazon DataZone provides a comprehensive solution for data management and governance at scale, automating complex tasks like data cataloging, policy enforcement, lineage tracking, and secure data sharing.

As demonstrated in the financial services use case, Amazon DataZone empowers organizations to establish a centralized data catalog, enforce consistent governance policies, and facilitate secure data sharing between data producers and consumers. Financial institutions can use Amazon DataZone to gain a competitive edge by designing and implementing effective, tailored marketing campaigns while adhering to data privacy and compliance regulations.

The multi-account ML platform architecture, combined with Amazon DataZone and other AWS services, provides a scalable and secure foundation for governing data and ML workflows effectively. By following the outlined steps, you can streamline the setup and management of Amazon DataZone, enabling seamless collaboration between stakeholders involved in the data and ML lifecycle.

As data generation and utilization continue to grow, robust data governance solutions become paramount. Amazon DataZone offers a powerful approach to data management and governance, empowering organizations to unlock their data’s true value while maintaining the highest standards of security, compliance, and data privacy.


About the Authors

Ajit Mungale is a Senior Solutions Architect at Amazon Web Services with specialization in AI/ML/Generative AI, IoT and .Net technologies. At AWS, he helps customers build, migrate, and create new cost effective cloud solutions. He possesses extensive experience in developing distributed applications and has worked with multiple cloud platforms. With his deep technical knowledge and business understanding, Ajit guides organizations in leveraging the full capabilities of the cloud.

Ram Vittal is a Principal Generative AI Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides motorcycle and walks with his sheep-a-doodle!

Read More

Amazon Bedrock Flows is now generally available with enhanced safety and traceability

Amazon Bedrock Flows is now generally available with enhanced safety and traceability

Today, we are excited to announce the general availability of Amazon Bedrock Flows (previously known as Prompt Flows). With Bedrock Flows, you can quickly build and execute complex generative AI workflows without writing code. Key benefits include:

  1. Simplified generative AI workflow development with an intuitive visual interface.
  2. Seamless integration of latest foundation models (FMs), Prompts, Agents, Knowledge Bases, Guardrails, and other AWS services.
  3. Flexibility to define the workflow based on your business logic.
  4. Reduced time and effort in testing and deploying AI workflows with SDK APIs and serverless infrastructure.

Bedrock Flows makes it easier for developers and businesses to harness the power of generative AI, enabling you to create more sophisticated and efficient AI-driven solutions for your customers.

Thomson Reuters transforms the way professionals work by delivering innovative tech and GenAI powered by trusted expertise and industry-leading insights.

“The mandate of the Thomson Reuters Enterprise AI Platform is to enable our subject-matter experts, engineers, and AI researchers to co-create Gen-AI capabilities that bring cutting-edge, trusted technology in the hands of our customers and shape the way professionals work. Amazon Bedrock Flows will enable us to create complex, flexible, multi-prompt workflows which we can easily evaluate, compare and version. We can also quickly integrate flows with our applications using the SDK APIs for serverless flow execution — without wasting time in deployment and infrastructure management. We are excited about the potential productivity gain and acceleration for generative-AI application development with Bedrock Flows.”

– Laura Skylaki, VP of Artificial Intelligence, Business Intelligence and Data Platforms at Thomson Reuters.

Dentsu Creative is a global creative agency network designed to create meaningful connection between brands and consumers.

“We have successfully leveraged Amazon Bedrock Flows to transform customer experiences. Using Bedrock Flows, we accelerated the process of reshaping books into an easy-to-read format for readers with learning disabilities. Bedrock Flows also enabled us to easily connect customer service solutions with foundation models like Claude Haiku to address common inquiries, saving hours and allowing customer support teams to focus on more complex requests. By empowering non-technical users to understand how AI and business logic are applied with the intuitive visual interface, Bedrock Flows has driven transparency and visibility for generative AI solutions in our organization. Whether reaching new audiences or scaling customer requests, Dentsu continues to innovate with cutting-edge generative AI technology powered by Amazon Bedrock Flows.”

– Thiago Winkler, Executive Director of Operations for Dentsu Creative Brazil

New capabilities in Amazon Bedrock Flows

Organizations leveraging generative AI need robust safety controls and clear visibility into their AI workflows. Today, we’re announcing two new capabilities in Amazon Bedrock Flows that help customers build more secure and traceable AI applications:

  1. Enhanced safety: Ability to filter out harmful content and unwanted topics for Prompt and Knowledge base nodes powered by Amazon Bedrock Guardrails. Guardrails are now supported in two types of nodes:
    • Prompt node: Define and enforce controls over your FM interactions.
    • Knowledge base node: Apply guardrails to responses generated from your knowledge base.
  1. Enhanced traceability: Ability to quickly validate and debug workflows with traceability of input & output and inline validations. Gain comprehensive visibility into your workflow execution and quickly pinpoint errors through:
    • Detailed traceability support for input and output nodes.
    • Complete execution path information showing input, output, execution time, and errors for each node.
    • Inline validation status of nodes in the visual builder.

Consider ACME Corp, a fictional ecommerce company building a customer service chatbot using Amazon Bedrock Flows. They face several challenges in their implementation:

  • Their chatbot sometimes generates responses containing sensitive customer information.
  • They struggle to maintain consistent response quality and tone across different customer interactions.
  • They spend a lot of time and effort in troubleshooting issues in their application.
  • They have no way to ensure that responses comply with company policies and regulatory requirements.
  • They lack visibility into performance bottlenecks affecting customer experience.

Let’s explore how the new capabilities in Amazon Bedrock Flows address these challenges and enable Acme Corp to build a more secure, efficient, and transparent customer service solution.

Prerequisites

Before implementing the new capabilities, make sure that you have the following:

  1. An AWS account
  2. In Amazon Bedrock:
    • Create and test your base prompts for customer service interactions in Prompt Management.
    • Set up your knowledge base with relevant customer service documentation, FAQs, and product information.
    • Configure any auxiliary AWS services needed for your customer service workflow (for example, Amazon DynamoDB for order history).
  1. In Amazon Bedrock Guardrails:
    • Create a guardrail configuration for customer service interactions (for example, CustomerServiceGuardrail-001) with:
      • Content filters for inappropriate language and harmful content
      • Personally identifiable information (PII) detection and masking rules for customer data
      • Custom word filters for company-specific terms
    • Contextual grounding checks to ensure accurate information
    • Test and validate your guardrail configuration.
    • Publish a working version of your guardrail.
  1. Required IAM permissions:

After these components are in place, you can proceed with implementing the new capabilities in your customer service workflow.

Enabling enhanced safety in Flows

For Acme Corp’s customer service chatbot, implementing guardrails helps ensure safe, compliant, and consistent customer interactions.

Here’s how to enable guardrails in both Prompt node and Knowledge base node:

  1. In the AWS Management Console for Amazon Bedrock, open the Prompt node or Knowledge base node in your customer service flow where you want to add guardrails. Create a new flow if required.
  2. In the node configuration panel, locate the Guardrail section.

  3. Select an existing guardrail from the dropdown menu. For example, CustomerServiceGuardrail-001.
  4. In this instance, CustomerServiceGuardrail-001 is configured to:
    1. Mask customer PII data (name and email)
    2. Block inappropriate language and harmful content
    3. Have responses align with company policy
    4. Maintain professional tone in responses
  1. Choose the appropriate version of your guardrail. For example, Working draft.
  2. Enter your prompt message for customer service scenarios. For example, Respond to customer queries.
  3. Connect your Prompt node to the flow’s input and output nodes.
  4. Test your Flows with the implemented guardrails by entering a prompt in the Test Flow. For example, Hi, my name is John Smith, email – john.smith@email.com. How do I get started with setting up an ACME Corp account?
  5. In the Test flow shown on the right pane of the interface, you can see how the model response handles sensitive information. For example:
    1. Original response: “Dear Mr. John Smith…”
    2. Guardrail response: “Dear Mr. {NAME}…”

Enhanced traceability with Flows Trace View

The new Flows Tracing capability now provides detailed visibility into the execution of the flows, enhancing debugging capability with Trace view and inline validations. This comprehensive monitoring solution helps developers monitor, debug, and optimize their AI workflows more effectively.

Key benefits of enhanced traceability include:

  • Complete execution path with visibility through Trace view
  • Detailed input/output tracing for each node
  • Errors, warnings, and execution timing for every node
  • Quick identification of bottlenecks and issues
  • Faster root-cause analysis for errors

For Acme Corp’s customer service team, the new Flows Tracing capability provides crucial insights into their chatbot’s performance and behaviours. This helps them:

  • Monitor response times for customer interactions
  • Identify patterns in customer queries that cause delays
  • Debug issues in the conversation flow
  • Optimize the customer experience

To use the Trace view:

  1. In the Amazon Bedrock console, open your flow and test it with sample query.
  2. After running your flow, choose Show trace to analyze the interaction.
  3. Review the Flow Trace window showing:
    1. Response times for each step of the customer interaction
    2. How customer inputs are processed
    3. Where guardrails are applied
    4. Performance bottlenecks

  4. Analyze execution details, including:
    1. Customer query processing steps
    2. Response generation and validation
    3. Time taken by each step
    4. Error details and cause analysis

Inline validation status

The Flows visual builder and SDK now include intuitive node validation capabilities:

Visual Builder:

  • Green background indicates a valid node configuration.
  • Red background indicates an invalid node configuration that needs attention.
  • Yellow background indicates a node configuration with warnings.

These validation capabilities help developers quickly identify and resolve potential issues in their flows by giving real-time validation feedback during both visual and programmatic development.

Conclusion

The integration of Bedrock Guardrails and enhanced traceability in Bedrock Flows represent a significant advancement in generative AI development. These capabilities enable developers to create more secure, transparent, and efficient AI-powered solutions, addressing critical challenges in the rapidly evolving field of AI application development.

Bedrock Flows with the new capabilities are now generally available in all regions that Amazon Bedrock is available except for GovCloud. Starting February 1st 2025, you will also be charged for Bedrock Flows usage based on the number of node transitions required to operate your workflows at $0.035 per 1000 node transitions. We invite you to explore these new capabilities and experience firsthand how they can improve your generative AI development process. To get started, open the Amazon Bedrock console and begin building flows with enhanced safety and visibility with Flows today. To learn more, see the AWS user guide for Guardrails integration and Traceability. For pricing information, visit the Amazon Bedrock pricing page.

We’re excited to see the innovative applications you’ll build with these new capabilities. As always, we welcome your feedback through AWS re:Post for Amazon Bedrock or your usual AWS contacts. Join the generative AI builder community at community.aws to share your experiences and learn from others.


About the Authors

Amit Lulla is a Principal Solutions Architect at AWS, where he architects enterprise-scale generative AI and machine learning solutions for software companies. With over 15 years in software development and architecture, he’s passionate about turning complex AI challenges into bespoke solutions that deliver real business value. When he’s not architecting cutting-edge systems or mentoring fellow architects, you’ll find Amit on the squash court, practicing yoga, or planning his next travel adventure. He also maintains a daily meditation practice, which he credits for keeping him centered in the fast-paced world of AI innovation.

Huong Nguyen is a Principal Product Manager at AWS. She is leading the Amazon Bedrock Flows, with 18 years of experience building customer-centric and data-driven products. She is passionate about democratizing responsible machine learning and generative AI to enable customer experience and business innovation. Outside of work, she enjoys spending time with family and friends, listening to audiobooks, traveling, and gardening.

Read More

Implement secure API access to your Amazon Q Business applications with IAM federation user access management

Implement secure API access to your Amazon Q Business applications with IAM federation user access management

Amazon Q Business is a conversational assistant powered by generative AI that enhances workforce productivity by answering questions and completing tasks based on information in your enterprise systems, which each user is authorized to access. AWS recommends using AWS IAM Identity Center when you have a large number of users in order to achieve a seamless user access management experience for multiple Amazon Q Business applications across many AWS accounts in AWS Organizations. When you want to use Amazon Q Business to build enterprise generative AI applications and have yet to adopt organization-wide use of IAM Identity Center, you can build private and secure enterprise generative AI applications with Amazon Q Business using IAM federation. This allows you to directly manage user access to Amazon Q Business applications from your enterprise identity provider (IdP), such as Okta or PingFederate.

Amazon Q Business provides a rich set of APIs to perform administrative tasks and to build an AI assistant with customized user experience for your enterprise. In this post, we show how to use Amazon Q Business APIs when using AWS Identity and Access Management (IAM) federation for user access management. We use illustrative scripts from the AWS samples open source repository to do the following:

  • As an Amazon Q Business administrator, use APIs to automate creation of Amazon Q Business applications using IAM federation for user access management
  • As an application builder, build and deploy custom applications to get AWS Sig V4 credentials with identity information on behalf of a user authenticated with the IdP
  • As an application developer, use the credentials obtained to enable the user to chat with your Amazon Q Business application and get responses only from that enterprise content which the user is authorized to access

To make this post consistent and self-sufficient, some content included overlaps with the post Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.

Solution overview

Amazon Q Business IAM Federation requires federating the user identities provisioned in your enterprise IdP (such as Okta or Ping Identity) account using federation with IAM. This involves a setup described in the following steps:

  1. Create a SAML or OIDC application integration in your IdP account. This step is performed by the IAM or security administrator in your organization.
  2. Create a corresponding SAML IAM identity provider or OIDC IAM identity provider in IAM. The IAM identity provider is used by the Amazon Q Business application to validate and trust federated identities of users authenticated by the enterprise IdP and associate a unique identity with each user. This way, a user is uniquely identified across Amazon Q Business applications sharing the same SAML IAM identity provider or OIDC IAM identity provider. This step is performed by an AWS administrator or by an Amazon Q Business administrator, provided they have the IAM permissions to do so.
  3. Create an Amazon Q Business application using the SAML or OIDC IAM identity provider. This step is performed by an Amazon Q Business administrator. The sample scripts create-iam-saml-qbiz-app.py and create-iam-oidc-qbiz-app.py illustrate how the administrators can automate Steps 2 and 3 using AWS APIs.
  4. Users in your organization can use the Amazon Q Business web experience, a built-in application, to authenticate with your IdP and chat with the AI assistant. However, to address unique requirements of your organization, your developers can build a custom application or integrate a preexisting enterprise portal with the Amazon Q Business application using the Amazon Q Business APIs, for the users to authenticate with your IdP, and chat with the AI assistant. The sample scripts samlapp.py and oidcapp.py in conjunction with simple_aq.py illustrate how to acquire AWS Sig V4 credentials that include the user identities of your authenticated users, and then you can use these credentials to invoke Amazon Q Business conversation APIs and implement chat functionality.

Architecture

The following diagram shows a high-level architecture and authentication workflow. The enterprise IdP, such as Okta or Ping Identity, is used as the access manager for an authenticated user to interact with an Amazon Q Business application using an Amazon Q web experience or a custom application using an API.

The user authentication workflow consists of the following steps:

  1. The client application makes an authentication request to the IdP on behalf of the user.
  2. The IdP responds with identity or access tokens in OIDC mode, or a SAML assertion in SAML 2.0 mode. Amazon Q Business IAM Federation requires the enterprise IdP application integration to provide a special principal tag email attribute with its value set to the email address of the authenticated user. If user attributes such as role or location (city, state, country) are present in the SAML or OIDC assertions, Amazon Q Business will extract these attributes for personalization. These attributes are included in the identity token claims in OIDC mode, and SAML assertions in the SAML 2.0 mode. The email attribute ties the authenticated human user with the identity token, and is later enforced using session tags in AWS Security Token Service (AWS STS).
  3. The client application makes an AssumeRoleWithWebIdentity (OIDC mode) or AssumeRoleWithSAML (SAML mode) API call to AWS STS to acquire AWS Sig V4 credentials. Email and other attributes are extracted and enforced by the Amazon Q Business application using session tags in AWS STS. The AWS Sig V4 credentials include information about the federated user. The sample scripts samlapp.py and oidcapp.py illustrate this step.
  4. AWS STS returns AWS Sig V4 credentials, which include user identity information.
  5. The client application uses the credentials obtained in the previous step to make Amazon Q Business API calls on behalf of the authenticated user. The Amazon Q Business application knows the user identity based on the credential used to make the API calls, shows only the specific user’s conversation history, and enforces document access control lists (ACLs). The application retrieves only those documents from the index that the user is authorized to access and are relevant to the user’s query, to be included as context when the query is sent to the underlying large language model (LLM). The application generates a response based only on enterprise content that the user is authorized to access. The sample script simple_aq.py illustrates this step.

Working with groups when using Amazon Q Business IAM Federation

It is not possible to get the groups defined in the enterprise IdP in the IAM federation workflow. If you’re using ACLs in your data sources with groups federated from the enterprise IdP, you can use the Amazon Q PutGroup API to define the federated groups in the Amazon Q Business user store. This way, the Amazon Q Business application can validate a user’s membership to the federated group and enforce the ACLs accordingly. This limitation doesn’t apply to configurations where groups used in ACLs are defined locally within the data sources. For more information, refer to Group mapping.

This is illustrated here using a group core-team, defined in Okta as shown in the following screenshot.

If document ACLs in the data sources are defined for the group core-team, based on the group defined in IdP, and the group core-team isn’t defined locally in the data sources, then you will first need to define the group in the Amazon Q Business user store using the PutGroup API. The AWS Command Line Interface (AWS CLI) command put-group (see the following code) demonstrates the use of this API. This API needs to be invoked by an AWS administrator or Amazon Q Business administrator persona. The Amazon Q Business user store must be updated to reflect group membership changes in your IdP. You might want to build an automation that updates the group membership in Amazon Q Business as group membership changes in your IdP directory.

aws qbusiness put-group 
--application-id XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX 
--index-id XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX 
--group-name core-team —type INDEX 
--group-members "memberUsers=[{userId=mary_major@example.com,type=INDEX},{userId=mateo_jackson@example.com,type=INDEX},{userId=demo1@example.com,type=INDEX}]"

Prerequisites

To implement the sample use case described in this post, you need an Okta account. This post covers workflows for both OIDC and SAML 2.0, so you can follow either one or both workflows based on your business needs. You need to create application integrations for OIDC or SAML mode, and then configure the respective IAM identity providers in your AWS account, which will be required to create and configure your Amazon Q Business applications.

You will need a command line environment installed with the AWS CLI and the AWS SDK for Python (Boto3).

Command line environment for AWS administrator persona

Open a command line window installed with the AWS CLI and SDK for Python and have AWS credentials for the AWS administrator persona. Clone the GitHub repo with the sample scripts in a new working directory and change directory to iam-federation-samples. It will look like the following screenshot.

The files contained in this directory are:

  • README.md – The description of each file in the directory.
  • requirements.txt – The file containing a list of Python modules that are required to be installed.
  • create-iam-oidc-qbiz-app.py – You will assume the persona of an AWS administrator, and use this script to set up Amazon Q Business applications by federating user identities provisioned in your enterprise IdP (Okta for the example in this post), using OIDC application integration.
  • oidc-qbiz-app-env.sh – The shell script to set environment variables required by create-iam-oidc-qbiz-app.py.
  • create-iam-saml-qbiz-app.py – You will assume the persona of an AWS administrator, and use this script to set up Amazon Q Business applications by federating user identities provisioned in your enterprise IdP (Okta), using SAML application integration.
  • saml-qbiz-app-env.sh – The shell script to set environment variables required by create-iam-saml-qbiz-app.py.
  • oidcapp.py – You will assume the persona of a custom application developer and deploy this script to integrate with your IdP’s OIDC application integration. The end-users will use the deployment to authenticate with the IdP, and the deployed script will provide AWS Sig V4 credentials using the user’s identity information for the authenticated user.
  • oidcapp-env.py – The shell script to set environment variables required by oidcapp.py.
  • samlapp.py – You will assume the persona of a custom application developer and deploy this script to integrate with your IdP’s SAML application integration. The end-users will use the deployment to authenticate with the IdP, and the deployed script will provide AWS Sig V4 credentials using the user’s identity information for the authenticated user. In many use cases, the application could acquire the credentials and communicate with the Amazon Q Business application by invoking conversation APIs. The sample code has split this functionality into two Python scripts, because it can be useful during a proof of concept or testing to get the credentials, and then uses them to run a number of scripts using the same credentials.
  • samlapp-env.sh – The shell script to set environment variables required by samlapp.py.
  • simple_aq.py – You will assume the persona of the end-user who has acquired AWS Sig V4 credentials using identity information, set up a command line window with those credentials, and run this script in that window, to make queries to the Amazon Q Business application using the ChatSync API.

A typical custom application will combine the functionality in oidcapp.py with the functionality in simple_aq.py or combine the functionality in samlapp.py with the functionality in simple_aq.py. In subsequent sections, you will use the command line environment for the AWS administrator persona to run create-iam-oidc-qbiz-app.py and create-iam-saml-qbiz-app.py.

Command line environment for Amazon Q Business developer persona

An Amazon Q Business developer persona who develops and deploys a custom application that accesses Amazon Q Business applications using APIs will also require the use of a command line environment with the SDK for Python. There is no need for the command line environment to start with AWS Sig V4 credentials. These will be obtained by the custom application using IAM federation on behalf of a user who authenticates with the IdP (Okta).

You can use the following steps to prepare the command line environment for the Amazon Q Business developer persona:

  1. Clone the GitHub repo with the sample scripts in a new working directory, and change directory to iam-federation-samples. The list of files is described in the previous section.
  2. As a best practice, using a Python virtual environment is recommended. Use the command python -m venv qbiz-env to create a new Python virtual environment.
  3. Run the command . ./qbiz-env/bin/activate to activate the virtual environment you just created.
  4. Run the command pip install -r requirements.txt to install the required libraries.

In subsequent sections, you will use the command line environment for the Amazon Q Business developer persona to deploy the custom application illustrated by oidcapp.py and samlapp.py.

Create an Amazon Q Business application with an OIDC IAM identity provider

To set up an Amazon Q Business application with an OIDC IAM identity identifier, you first configure the Okta application integration with OIDC. Then you use create-iam-oidc-qbiz-app.py, which automates the following:

  1. Creating an IAM identity provider for that OIDC app integration for the Amazon Q Business application.
  2. Creating IAM policies and roles needed to deploy the web experience of the Amazon Q Business application.
  3. Deploying the web experience for the Amazon Q Business application.

After that, you will update the Okta application integration with the web experience URIs of the newly created Amazon Q Business application.

Create an Okta application integration with OIDC

Complete the following steps to create your Okta application integration with OIDC. These steps are usually performed by the IdP administrator in your organization.

  1. On the administration console of your Okta account, choose Applications, then choose Applications in the navigation pane.
  2. Choose Create App Integration.
  3. For Sign-in method, select OIDC.
  4. For Application type, select Web Application.
  5. Choose Next.

  1. Give your app integration a name.
  2. Select Authorization Code and Refresh Token for Grant Type.
  3. For Sign-in redirect URIs, provide a placeholder value, such as https://example.com/authorization-code/callback.

You update this later with the web experience URI of the Amazon Q Business application you create.

  1. On the Assignments tab, assign access to the appropriate users within your organization to your Amazon Q Business application.
  2. Choose Save to save the application integration.

Your integration will look similar to the following screenshots.

  1. Note the values for Client ID and Client secret to use in subsequent steps, because the illustrative scripts use the client secret to authenticate back to the IdP. When implementing your scripts, you can choose the authentication method suitable for your use case.
  2. In the navigation pane, choose Security and then API.
  3. Under API, on the Authorization Servers tab, note the Issuer URI for your authorization server, then choose your authorization server.

It’s best practice to avoid using the default authorization server.

  1. On the Settings tab, note the Metadata URI. You will need to use it in subsequent steps.

  1. On the Claims tab, choose Add Claim.
  2. For Name, enter https://aws.amazon.com/tags.
  3. On the Claims tab, choose Add Claim.
  4. For Name, enter https://aws.amazon.com/tags.
  5. For Include in token type, select ID Token.
  6. For Value, enter {"principal_tags": {"Email": {user.email}}}.
  7. Choose Create.

You can add more attributes to enable Amazon Q Business response personalization. For more information, refer to Create and configure an Okta application.

The claim will look similar to the following screenshot.

  1. On the Access Policies tab, verify that there is at least one policy that enables access to the application integration you created. If required, create a new policy to enable access to your application integration.

Store the OIDC client secret in AWS Secrets Manager

For this post, we store the OIDC client secret in AWS Secrets Manager. Complete the following steps:

  1. In a new tab of your web browser, open the Secrets Manager console. Make sure that you are in the AWS Region where you want to create your Amazon Q Business application.
  2. Choose Store a new secret.
  3. For Choose secret type, select Other type of secret.
  4. For Key/value pairs, enter client_secret as the key and enter the client secret you copied from the Okta application integration as the value.
  5. Choose Next.

  1. For Configure secret, enter a secret name starting with the prefix QBusiness-
  2. For Configure rotation, unless you want to make changes, accept the defaults, and choose Next.
  3. For Review, review the secret you just stored, and choose Store.
  4. On the Secrets page of the Secrets Manager console, choose the secret you just created.
  5. Note the values for Secret name and Secret ARN.

Create the OIDC IAM identity provider, required IAM roles, and Amazon Q Business application

These steps are usually performed by an AWS administrator or an Amazon Q Business administrator with permissions to create IAM identity providers and IAM roles.

  1. In the command line window for the AWS administrator persona, edit oidc-qbiz-app-env.sh and replace the placeholders with the information from your AWS account and IdP application integration from the previous steps (as seen in the following code). Then run the shell script in your command line window to set the environment variables using the command source ./oidc-qbiz-app-env.sh.

Take this opportunity to read the code in create-iam-oidc-qbiz-app.py and understand how it makes API calls to create the OIDC IAM identity provider and the Amazon Q Business application, creates the retriever and index for the Amazon Q Business application and the IAM roles required for the Amazon Q Business web experience, creates the Amazon Q Business web experience, and enables auto subscription to the Amazon Q Business application.

export AWS_ACCOUNT_ID="<REPLACE-WITH-YOUR-AWS-ACCOUNT-ID>"
export AWS_DEFAULT_REGION="<REPLACE-WITH-YOUR-AWS-REGION>"
export AWS_SECRET_ID="<REPLACE-WITH-YOUR-SECRETS-MANAGER-SECRET-STORING-IDP-CLIENT-SECRET>"
export AWS_SECRET_ENCRYPTION_KEY="<REPLACE-WITH-YOUR-SECRET-ENCRYPTION-KEY>"
export IDP_CLIENT_ID="<REPLACE-WITH-YOUR-IDP-APP-INTEGRATION-CLIENT-ID>"
export IDP_ISSUER_URL="<REPLACE-WITH-YOUR-IDP-ISSUER-URL>"

  1. Run python ./create-iam-oidc-qbiz-app.py from your command line.

You should see output similar to the following. Capture the output; you will need this information in subsequent steps.

OpenID Connect Provider ARN: arn:aws:iam::XXXXXXXXXXXX:oidc-provider/XXXXXXXXXX.okta.com/XXXXXXXXXXXX
QBusiness Application ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
QBusiness Index ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
QBusiness Retriever ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
Web experience policy: arn:aws:iam::XXXXXXXXXXXX:policy/qbiz-XXXXXXXXXX-XXXX-web-experience-policy
Web experience role: arn:aws:iam::XXXXXXXXXXXX:role/qbiz-XXXXXXXXXX-XXXX-web-experience-role
Attached arn:aws:iam::XXXXXXXXXXXX:policy/qbiz-XXXXXXXXXX-XXXX-web-experience-policy to role qbiz-XXXXXXXXXX-XXXX-web-experience-role
Secrets manager policy: arn:aws:iam::XXXXXXXXXXXX:policy/qbiz-XXXXXXXXXX-XXXX-secrets-manager-policy
Secrets manager role: arn:aws:iam::XXXXXXXXXXXX:role/qbiz-XXXXXXXXXX-XXXX-secrets-manager-role
Attached arn:aws:iam::XXXXXXXXXXXX:policy/qbiz-XXXXXXXXXX-XXXX-secrets-manager-policy to role qbiz-XXXXXXXXXX-XXXX-secrets-manager-role
Created web experience: arn:aws:qbusiness:XXXX:XXXXXXXXXXXX:application/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/web-experience/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
{
"ResponseMetadata": {
"RequestId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"x-amzn-requestid": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"strict-transport-security": "max-age=47304000; includeSubDomains",
"cache-control": "no-store, no-cache, no-cache",
"date": "Sun, 15 Sep 2024 23:32:11 GMT",
"content-type": "application/json",
"content-length": "881",
"connection": "keep-alive"
},
"RetryAttempts": 0
},
"applicationId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"webExperienceId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"webExperienceArn": "arn:aws:qbusiness:XXXX:XXXXXXXXXXXX:application/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/web-experience/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"defaultEndpoint": "https://xxxxxxxx.chat.qbusiness.xxxx.on.aws/",
"status": "ACTIVE",
"createdAt": "2024-09-15 16:32:11.455000-07:00",
"updatedAt": "2024-09-15 16:32:11.455000-07:00",
"title": "qbiz-XXXXXXXX-XXXX-web-experience",
"samplePromptsControlMode": "DISABLED",
"roleArn": "arn:aws:iam::XXXXXXXXXXXX:role/qbiz-XXXXXXXXXXXX-XXXX-web-experience-role",
"identityProviderConfiguration": {
"openIDConnectConfiguration": {
"secretsArn": "arn:aws:secretsmanager:XXXX:XXXXXXXXXXXXX:secret:XXXXXXXXXXXX",
"secretsRole": "arn:aws:iam::XXXXXXXXXXXX:role/qbiz-XXXXXXXXXXXX-XXXX-secrets-manager-role"
}
},
"error": {}
}
QBusiness auto subscription enabled for Q_BUSINESS

  1. Note the default endpoint for the Amazon Q Business web experience.
  2. You need to replace the placeholder of https://example.com with the default endpoint by editing the General Settings of your IdP application integration, in the LOGIN section, under Sign-in redirect URIs, while verifying that the rest of the URI, such as /authorization-code/callback, stays as is.
  3. Add one more sign-in redirect URI of http://localhost:8000/auth/oidc/callback that we will use in subsequent steps.
  4. Change the value for Sign-out URI to http://localhost:8000/login/oidc.
  5. Choose Save.

Typically, the AWS administrator or Amazon Q Business administrator needs to request the IdP administrator to perform this step.

  1. On the Amazon Q Business console, choose the application you created to review the Application Details
  2. In the Web experience settings section, you can choose the link for Deployed URL to open the web experience in a new tab of your browser.
  3. On the Okta login page, use the credentials of a user assigned to the IdP application integration to log in to the web experience.

Deploy a custom application for authenticated users to obtain credentials to be used to make API calls to an Amazon Q Business application with an OIDC IAM identity provider

These steps are usually performed by a custom application developer:

  1. In the command line window for the Amazon Q Business developer persona, edit oidcapp-env.sh and replace the placeholders with the information from your AWS account and IdP application integration from the previous steps. Then run the shell script in your command line window to set the environment variables using the command source ./oidcapp-env.sh.

Take this opportunity to read the code in oidcapp.py and understand how it makes API calls to authenticate a user with the IdP, obtain the identity token for the authenticated user, and make the assume_role_with_web_identity API call to get AWS Sig V4 credentials, which include the identity information of the authenticated user.

export OIDC_CLIENT_ID="<REPLACE-WITH-YOUR-IDP-OIDC-CLIENT-ID>"
export OIDC_CLIENT_SECRET="<REPLACE-WITH-YOUR-IDP-OIDC-CLIENT-SECRET>"
export OIDC_DISCOVERY_URL="<REPLACE-WITH-YOUR-IDP-OIDC-DISCOVERY-URL>"

export OIDC_REDIRECT_URI="http://localhost:8000/auth/oidc/callback"
export LOGOUT_REDIRECT_URI="http://localhost:8000/login/oidc"
export OIDC_ROLE_ARN="<REPLACE-WITH-YOUR-WEB-EXPERIENCE-ROLE>"

  1. Run python oidcapp.py in the command line window for the Amazon Q Business persona, where you have activated the virtual Python environment you created earlier. It will start a local HTTP server on port 8000.

Now you are assuming the persona of the end-user using the deployed custom application.

  1. Open a new incognito window on your web browser.

This will make sure that your sessions with the AWS Management Console and Okta continue in the regular windows of your browser.

  1. Browse to http://localhost:8000/.

  1. Choose Login with OIDC.
  2. Log in using the credentials of a user assigned to the OIDC application integration.

On a successful login, you will see a page similar to the following screenshot. The AWS AssumeRoleWithWebIdentity Response section has the AWS Sig V4 credentials including the identity information of the authenticated user.

  1. Copy the three lines starting with export.

Later, you will enter these into a command line window with the AWS CLI and SDK for Python installed, and then run the script simple_aq.py from that window, which will use the API to interact with your Amazon Q Business application.

Create an Amazon Q Business application with a SAML IAM identity provider

To set up an Amazon Q Business application with a SAML IAM identity identifier, you first configure the Okta application integration with SAML. Then you use create-iam-saml-qbiz-app.py, which automates the following:

  1. Creating an IAM identity provider for that SAML app integration for the Amazon Q Business application.
  2. Creating IAM policies and roles needed to deploy the web experience of the Amazon Q Business application.
  3. Deploying the web experience for the Amazon Q Business application.

After that, you will update the Okta application integration with the web experience URIs of the newly created Amazon Q Business application.

Create an Okta application integration with SAML 2.0

Complete the following steps to create your Okta application integration with SAML 2.0:

  1. On the administration console of your Okta account, choose Applications, then Applications in the navigation pane.
  2. Choose Create App Integration.
  3. For Sign-in method, select SAML 2.0.
  4. Choose Next.

  1. On the General Settings page, enter an app name and choose Next.

This will open the Create SAML Integration page. In the following steps, the URL https://signin.aws.amazon.com/saml is the AWS sign-in service endpoint based in the us-east-1 Region. AWS recommends using Regional sign-in service endpoints specific to the Region where you will create your Amazon Q business application.

  1. For Single sign-on URL, enter a placeholder URL, such as https://example.com/saml, and deselect Use this for Recipient URL and Destination URL.
  2. For Recipient URL, enter https://signin.aws.amazon.com/saml.
  3. For Destination URL, enter the placeholder https://example.com/saml.
  4. For Audience URL (SP Entity ID), enter https://signin.aws.amazon.com/saml.
  5. For Name ID format, choose Persistent.
  6. Choose Next and then Finish.

The placeholder values of https://example.com will need to be updated with the deployment URL of the Amazon Q Business web experience, which you create in subsequent steps.

  1. On the Sign On tab of the app integration you just created, choose View SAML setup instructions.

  1. On the How to Configure SAML 2.0 for <YOUR-APPLICATION-INTEGRATION-NAME> Application page, copy the values of Identity Provider Single Sign-On URL, Identity Provider Issuer, and the IdP metadata and keep them in a temporary text file on your computer. You will need to use them in subsequent steps.

Create the SAML IAM identity provider, required IAM roles, and Amazon Q Business application

Complete the following steps:

  1. In the command line window for the AWS administrator persona, edit saml-qbiz-app.sh and replace the placeholders with the information from your AWS account and IdP application integration in the previous steps. Then run the shell script in your command line window to set the environment variables using the command source ./saml-qbiz-app-env.sh.

Take this opportunity to read the code in create-iam-saml-qbiz-app.py and understand how it makes API calls to create the SAML IAM identity provider, Amazon Q Business application, retriever and index for the Amazon Q Business application, IAM roles required for the Amazon Q Business web experience, and Amazon Q Business web experience, and then enable auto subscription to the Amazon Q Business application.

read -r -d '' SAML_METADATA_DOCUMENT <<METADATA_EOF
<REPLACE-WITH-SAML-METADATA-DOCUMENT-FROM-YOUR-IDP>
METADATA_EOF

export SAML_METADATA_DOCUMENT
export IDP_SSO_URL="<REPLACE-WITH-YOUR-IDP-SSO-URL>"
export CUSTOM_ACS_URL="<REPLACE-WITH-YOUR-CUSTOM-APPLICATION-HOSTING-URL e.g. http://localhost:8000/saml>"

export AWS_ACCOUNT_ID="<REPLACE-WITH-YOUR-AWS-ACCOUNT-ID>"
export AWS_DEFAULT_REGION="<REPLACE-WITH-YOUR-AWS-REGION>"
export AWS_SECRET_ENCRYPTION_KEY="<REPLACE-WITH-YOUR-SECRETS-MANAGER-SECRET-STORING-IDP-CLIENT-SECRET>"

  1. Run python ./create-iam-saml-qbiz-app.py from your command line.

You should see output similar to the following:

SAML Provider ARN: arn:aws:iam::XXXXXXXXXXXX:saml-provider/qbiz-saml-XXXX-id-provider
QBusiness Application ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
QBusiness Index ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
QBusiness Retriever ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
Web experience policy: arn:aws:iam::XXXXXXXXXXXX:policy/qbiz-saml-XXXX-web-experience-policy
Web experience role: arn:aws:iam::XXXXXXXXXXXX:role/qbiz-saml-XXXX-web-experience-role
Attached arn:aws:iam::XXXXXXXXXXXX:policy/qbiz-saml-XXXX-web-experience-policy to role qbiz-saml-XXXX-web-experience-role
Created web experience: arn:aws:qbusiness:us-east-1:XXXXXXXXXXXX:application/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/web-experience/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
{
"ResponseMetadata": {
"RequestId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"x-amzn-requestid": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"strict-transport-security": "max-age=47304000; includeSubDomains",
"cache-control": "no-store, no-cache, no-cache",
"date": "Tue, 17 Sep 2024 02:48:49 GMT",
"content-type": "application/json",
"content-length": "771",
"connection": "keep-alive"
},
"RetryAttempts": 0
},
"applicationId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"webExperienceId": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"webExperienceArn": "arn:aws:qbusiness:us-east-1:XXXXXXXXXXXX:application/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX/web-experience/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"defaultEndpoint": "https://XXXXXXXX.chat.qbusiness.XXXXXXXX.on.aws/",
"status": "ACTIVE",
"createdAt": "2024-09-16 19:48:48.903000-07:00",
"updatedAt": "2024-09-16 19:48:48.903000-07:00",
"title": "qbiz-saml-XXXX-web-experience",
"samplePromptsControlMode": "DISABLED",
"roleArn": "arn:aws:iam::XXXXXXXXXXXX:role/qbiz-saml-XXXX-web-experience-role",
"identityProviderConfiguration": {
"samlConfiguration": {
"authenticationUrl": "https://XXXXXXX.okta.com/app/XXXXXXX/XXXXXXX/sso/saml"
}
},
"error": {}
}
QBusiness auto subscription enabled for Q_BUSINESS

Before you can use the web experience to interact with the Amazon Q Business application you just created, you need to update the Okta application integration with the redirect URL of the web experience.

  1. On the Okta administration console, open the Okta application integration you created earlier.
  2. On the General tab, choose Edit next to SAML Settings.
  3. For Single sign-on URL and Destination URL, replace the placeholder https://example.com/ with the value for the default endpoint URL of your web experience. Make sure the /saml suffix isn’t deleted.

  1. Choose Show Advanced Settings.
  2. For Other Requestable SSO URLs, choose Add Another.
  3. Set URL to http://localhost:8000/saml and Index to 0.

  1. On the Edit SAML Integration page, in the Attribute Statements (optional) section, add attribute statements as listed in the following table.

This step is not optional and these attributes are used by the Amazon Q Business application to determine the identity of the user, so be sure to confirm their correctness.

Name Name Format Value
https://aws.amazon.com/SAML/Attributes/PrincipalTag:Email Unspecified user.email
https://aws.amazon.com/SAML/Attributes/Role Unspecified <Web experience IAM role ARN>,<identity-provider-arn>
https://aws.amazon.com/SAML/Attributes/RoleSessionName Unspecified user.email

For the value of the https://aws.amazon.com/SAML/Attributes/Role attribute, you need to concatenate the web experience IAM role ARN and IAM identity provider ARN you copied earlier with a comma between them, without spaces or other characters. You can add more attributes to enable Amazon Q Business response personalization. For more information, refer to Create and configure an Okta application.

  1. Choose Next and Finish.
  2. On the Assignments tab, assign users who can access the app integration you just created.
  3. On the Amazon Q Business console, choose the application you created to review the Application Details
  4. In the Web experience settings section, you can choose the link for Deployed URL to open the web experience in a new tab of your browser.
  5. On the Okta login page, use the credentials of a user assigned to the IdP application integration to log in to the web experience.

Deploy a custom application for authenticated users to obtain credentials to be used to make API calls to an Amazon Q Business application having SAML IAM identity provider

These steps are usually performed by a custom application developer.

  1. In the command line window for the Amazon Q Business developer persona, edit samlapp-env.sh and replace the placeholders with the information from your AWS account and IdP application integration from the previous steps (as shown in the following code). Then run the shell script in your command line window to set the environment variables using the command source ./samlapp-env.sh.

Take this opportunity to read the code in samlapp.py and understand how it makes API calls to authenticate a user with the IdP, obtain an identity token for the authenticated user, and make the assume_role_with_saml API call to get AWS Sig V4 credentials, which include the identity information of the authenticated user.

export IDP_SSO_URL="<REPLACE-WITH-YOUR-IDP-SSO-URL>"
export IDP_ISSUER="<REPLACE-WITH-YOUR-IDP-ISSUER-URL>"
export CUSTOM_ACS_URL="http://localhost:8000/saml"  # Your AssertionConsumerService URL
export WEB_EXPERIENCE_ROLE_ARN="<REPLACE-WITH-YOUR-WEB-EXPERIENCE-ROLE-ARN>"
export IAM_IDENTITY_PROVIDER_ARN="<REPLACE-WITH-YOUR-IAM-IDENTITY-PROVIDER-ARN>"

  1. Run python samlapp.py in the command line window for the Amazon Q Business persona, where you have activated the virtual Python environment you created earlier. It will start a local HTTP server on port 8000.

Now you are assuming the persona of the end-user using the deployed custom application.

  1. Open a new incognito window on your web browser.

This will make sure that your sessions with the AWS Management Console and Okta continue in the regular windows of your browser.

  1. Browse to http://localhost:8000/.

  1. Choose Login with SAML.
  2. On the login page for Okta, log in using the credentials of a user assigned to the SAML application integration.

On a successful login, you will see a page similar to the following screenshot. The AWS AssumeRoleWithSAML Response section has the AWS Sig V4 credentials including the identity information of the authenticated user.

  1. Copy the three lines starting with export.

Later, you will enter these into a command line window with the AWS CLI and SDK for Python installed, and then run the script simple_aq.py from that window, which will use an API to interact with your Amazon Q Business application.

User interaction with the custom application implementing conversational APIs to interface with the Amazon Q Business application

Whether you created the Amazon Q Business application using an OIDC IAM identity provider or SAML 2.0 IAM identity provider, you will first need to index some content. You can use Amazon Q Business data source connectors to connect with your enterprise content repositories and index that content along with the access control information to your Amazon Q Business application. For illustration purposes, we use the Employee AI assistant use case from the earlier post Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation. Refer to the Set up the data source section in the post to understand the details of how the Confluence data source is configured to index Confluence spaces to the Amazon Q Business application.

You can use the following steps to interact with the Amazon Q Business application you created earlier using the simple_aq.py script. Here you are assuming the persona of the end-user.

  1. Use a command line window installed with the AWS CLI and SDK for Python.
  2. Use oidcapp.py or samlapp.py deployed by the developer persona to authenticate as a user and obtain AWS Sig V4 credentials containing identity information.
  3. Edit the script simple_aq.py and replace the placeholders with the details of your Amazon Q Business application and the queries you want to issue.

Take this opportunity to read and understand how the chat_sync API is used with the underlying Amazon Q Business application. The chat_sync API call doesn’t have an explicit parameter for the user ID. The identity information for the authenticated user is included in the underlying AWS Sig V4 credential and is passed on to the Amazon Q Business application by IAM federation.

AWS_REGION='<REPLACE-WITH-YOUR-AWS-REGION>' #Replace with the AWS region where your Amazon Q Business application is created
QBUSINESS_APPLICATION_ID='<REPLACE-WITH-YOUR-AMAZON-Q-BUSINESS-APPLICATION-ID>' #Replace with the application id of your Amazon Q Business application
queries = [ #Replace with queries appropriate to the content you indexed.
"REPLACE-WITH-QUERY1", #For illustration we will use 'what is the checklist of new team member onboarding activities?'
"REPLACE-WITH-QUERY2"  #For illustration we will use 'who are the project team members?'
]

  1. Run the script using the command python simple_aq.py.

The following screenshot illustrates running the script using the Sig V4 credentials of user Mary Major. When you run simple_aq.py with a particular user’s credentials for the first time, you will see the error An error occurred (AccessDeniedException) when calling the ChatSync operation: Exception occurred for requestId: 4ad66cea-c3b2-47c6-ac08-f621e0ded2c1 with message: User does not have a subscription for the given application. This is expected, and the user is automatically subscribed to the Amazon Q Business application on this call. Run simple_aq.py again with the same credentials, and you will get the expected response.

The following screenshot illustrates another run using the Sig V4 credentials of user Mateo Jackson with the same queries.

Observe the difference in the outputs when run using the credentials of two different users. When using the credentials of user Mary Major, the query responses are about the ACME project, whereas when using the credentials of user Mateo Jackson, the query responses are about the AnyOrgApp project. This is due to the differences in their authorization to access project information. The user Mary Major has access to the Confluence space for the ACME project, and doesn’t have access to the Confluence space for the AnyOrgApp project, whereas the user Mateo Jackson has access to the Confluence space for the AnyOrgApp project, and doesn’t have access to the Confluence space for the ACME project.

Clean up

If you created a new Amazon Q Business application to try out the integration with IAM federation, and don’t plan to use it further, you can unsubscribe, remove automatically subscribed users from the application, and delete it so that your AWS account doesn’t accumulate costs. Also delete the Secrets Manager secret you created to store the IdP application integration client secret.

Although they don’t accumulate costs, as a best practice remove the IAM identity providers, IAM roles, and policies that were created by create-iam-oidc-qbiz-app.py and create-iam-saml-qbiz-app.py.

Conclusion

For enterprise generative AI assistants such as the one shown in this post to be successful, they must respect access control as well as assure the privacy and confidentiality of every employee. Amazon Q Business helps achieve this by integrating with IAM Identity Center or with IAM federation to provide a solution that authenticates each user and validates the user identity at each step to enforce access control along with privacy and confidentiality.

In this post, we showed how to use APIs to build and deploy custom applications to create and interact with Amazon Q Business applications using IAM federation with OIDC or SAML IAM identity providers. We also saw how different personas, namely IdP administrators, AWS or Amazon Q Business administrators, custom application developers, and end-users use Amazon Q Business APIs through the lifecycle of an Amazon Q Business application and the custom applications to create and build Amazon Q Business applications.

To learn about how to use APIs to work with Amazon Q Business applications using IAM Identity Center for user access management, refer to Configure Amazon Q Business with AWS IAM Identity Center trusted identity propagation.


About the authors

Abhinav JawadekarAbhinav Jawadekar is a Principal Solutions Architect in the Amazon Q Business service team at AWS. Abhinav works with AWS customers and partners to help them build generative AI solutions on AWS.

Zia Seymour is a Generative AI Specialist Solutions Architect focused on Amazon Q. Zia works with AWS customers to understand their Generative AI needs and help them in their Generative AI journey on AWS.

Read More

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

As generative AI models advance in creating multimedia content, the difference between good and great output often lies in the details that only human feedback can capture. Audio and video segmentation provides a structured way to gather this detailed feedback, allowing models to learn through reinforcement learning from human feedback (RLHF) and supervised fine-tuning (SFT). Annotators can precisely mark and evaluate specific moments in audio or video content, helping models understand what makes content feel authentic to human viewers and listeners.

Take, for instance, text-to-video generation, where models need to learn not just what to generate but how to maintain consistency and natural flow across time. When creating a scene of a person performing a sequence of actions, factors like the timing of movements, visual consistency, and smoothness of transitions contribute to the quality. Through precise segmentation and annotation, human annotators can provide detailed feedback on each of these aspects, helping models learn what makes a generated video sequence feel natural rather than artificial. Similarly, in text-to-speech applications, understanding the subtle nuances of human speech—from the length of pauses between phrases to changes in emotional tone—requires detailed human feedback at a segment level. This granular input helps models learn how to produce speech that sounds natural, with appropriate pacing and emotional consistency. As large language models (LLMs) increasingly integrate more multimedia capabilities, human feedback becomes even more critical in training them to generate rich, multi-modal content that aligns with human quality standards.

The path to creating effective AI models for audio and video generation presents several distinct challenges. Annotators need to identify precise moments where generated content matches or deviates from natural human expectations. For speech generation, this means marking exact points where intonation changes, where pauses feel unnatural, or where emotional tone shifts unexpectedly. In video generation, annotators must pinpoint frames where motion becomes jerky, where object consistency breaks, or where lighting changes appear artificial. Traditional annotation tools, with basic playback and marking capabilities, often fall short in capturing these nuanced details.

Amazon SageMaker Ground Truth enables RLHF by allowing teams to integrate detailed human feedback directly into model training. Through custom human annotation workflows, organizations can equip annotators with tools for high-precision segmentation. This setup enables the model to learn from human-labeled data, refining its ability to produce content that aligns with natural human expectations.

In this post, we show you how to implement an audio and video segmentation solution in the accompanying GitHub repository using SageMaker Ground Truth. We guide you through deploying the necessary infrastructure using AWS CloudFormation, creating an internal labeling workforce, and setting up your first labeling job. We demonstrate how to use Wavesurfer.js for precise audio visualization and segmentation, configure both segment-level and full-content annotations, and build the interface for your specific needs. We cover both console-based and programmatic approaches to creating labeling jobs, and provide guidance on extending the solution with your own annotation needs. By the end of this post, you will have a fully functional audio/video segmentation workflow that you can adapt for various use cases, from training speech synthesis models to improving video generation capabilities.

Feature Overview

The integration of Wavesurfer.js in our UI provides a detailed waveform visualization where annotators can instantly see patterns in speech, silence, and audio intensity. For instance, when working on speech synthesis, annotators can visually identify unnatural gaps between words or abrupt changes in volume that might make generated speech sound robotic. The ability to zoom into these waveform patterns means they can work with millisecond precision—marking exactly where a pause is too long or where an emotional transition happens too abruptly.

In this snapshot of audio segmentation, we are capturing a customer-representative conversation, annotating speaker segments, emotions, and transcribing the dialogue. The UI allows for playback speed adjustment and zoom functionality for precise audio analysis.

The multi-track feature lets annotators create separate tracks for evaluating different aspects of the content. In a text-to-speech task, one track might focus on pronunciation accuracy, another on emotional consistency, and a third on natural pacing. For video generation tasks, annotators can mark segments where motion flows naturally, where object consistency is maintained, and where scene transitions work well. They can adjust playback speed to catch subtle details, and the visual timeline for precise start and end points for each marked segment.

In this snapshot of video segmentation, we’re annotating a scene with dogs, tracking individual animals, their colors, emotions, and gaits. The UI also enables overall video quality assessment, scene change detection, and object presence classification.

Annotation process

Annotators begin by choosing Add New Track and selecting appropriate categories and tags for their annotation task. After you create the track, you can choose Begin Recording at the point where you want to start a segment. As the content plays, you can monitor the audio waveform or video frames until you reach the desired end point, then choose Stop Recording. The newly created segment appears in the right pane, where you can add classifications, transcriptions, or other relevant labels. This process can be repeated for as many segments as needed, with the ability to adjust segment boundaries, delete incorrect segments, or create new tracks for different annotation purposes.

Importance of high-quality data and reducing labeling errors

High-quality data is essential for training generative AI models that can produce natural, human-like audio and video content. The performance of these models depends directly on the accuracy and detail of human feedback, which stems from the precision and completeness of the annotation process. For audio and video content, this means capturing not just what sounds or looks unnatural, but exactly when and how these issues occur.

Our purpose built UI in SageMaker Ground Truth addresses common challenges in audio and video annotation that often lead to inconsistent or imprecise feedback. When annotators work with long audio or video files, they need to mark precise moments where generated content deviates from natural human expectations. For example, in speech generation, an unnatural pause might last only a fraction of a second, but its impact on perceived quality is significant. The tool’s zoom functionality allows annotators to expand these brief moments across their screen, making it possible to mark the exact start and end points of these subtle issues. This precision helps models learn the fine details that separate natural from artificial-sounding speech.

Solution overview

This audio/video segmentation solution combines several AWS services to create a robust annotation workflow. At its core, Amazon Simple Storage Service (Amazon S3) serves as the secure storage for input files, manifest files, annotation outputs, and the web UI components. SageMaker Ground Truth provides annotators with a web portal to access their labeling jobs and manages the overall annotation workflow. The following diagram illustrates the solution architecture.

The UI template, which includes our specialized audio/video segmentation interface built with Wavesurfer.js, requires specific JavaScript and CSS files. These files are hosted through Amazon CloudFront distribution, providing reliable and efficient delivery to annotators’ browsers. By using CloudFront with an origin access identity and appropriate bucket policies, we allow the UI components to be served to annotators. This setup follows AWS best practices for least-privilege access, making sure CloudFront can only access the specific UI files needed for the annotation interface.

Pre-annotation and post-annotation AWS Lambda functions are optional components that can enhance the workflow. The pre-annotation Lambda function can process the input manifest file before data is presented to annotators, enabling any necessary formatting or modifications. Similarly, the post-annotation Lambda function can transform the annotation outputs into specific formats required for model training. These functions provide flexibility to adapt the workflow to specific needs without requiring changes to the core annotation process.

The solution uses AWS Identity and Access Management (IAM) roles to manage permissions:

  • A SageMaker Ground Truth IAM role enables access to Amazon S3 for reading input files and writing annotation outputs
  • If used, Lambda function roles provide the necessary permissions for preprocessing and postprocessing tasks

Let’s walk through the process of setting up your annotation workflow. We start with a simple scenario: you have an audio file stored in Amazon S3, along with some metadata like a call ID and its transcription. By the end of this walkthrough, you will have a fully functional annotation system where your team can segment and classify this audio content.

Prerequisites

For this walkthrough, make sure you have the following:

Create your internal workforce

Before we dive into the technical setup, let’s create a private workforce in SageMaker Ground Truth. This allows you to test the annotation workflow with your internal team before scaling to a larger operation.

  1. On the SageMaker console, choose Labeling workforces.
  2. Choose Private for the workforce type and create a new private team.
  3. Add team members using their email addresses—they will receive instructions to set up their accounts.

Deploy the infrastructure

Although this demonstrates using a CloudFormation template for quick deployment, you can also set up the components manually. The assets (JavaScript and CSS files) are available in our GitHub repository. Complete the following steps for manual deployment:

  1. Download these assets directly from the GitHub repository.
  2. Host them in your own S3 bucket.
  3. Set up your own CloudFront distribution to serve these files.
  4. Configure the necessary permissions and CORS settings.

This manual approach gives you more control over infrastructure setup and might be preferred if you have existing CloudFront distributions or a need to customize security controls and assets.

The rest of this post will focus on the CloudFormation deployment approach, but the labeling job configuration steps remain the same regardless of how you choose to host the UI assets.

Launch Button

This CloudFormation template creates and configures the following AWS resources:

  • S3 bucket for UI components:
    • Stores the UI JavaScript and CSS files
    • Configured with CORS settings required for SageMaker Ground Truth
    • Accessible only through CloudFront, not directly public
    • Permissions are set using a bucket policy that grants read access only to the CloudFront Origin Access Identity (OAI)
  • CloudFront distribution:
    • Provides secure and efficient delivery of UI components
    • Uses an OAI to securely access the S3 bucket
    • Is configured with appropriate cache settings for optimal performance
    • Access logging is enabled, with logs being stored in a dedicated S3 bucket
  • S3 bucket for CloudFront logs:
    • Stores access logs generated by CloudFront
    • Is configured with the required bucket policies and ACLs to allow CloudFront to write logs
    • Object ownership is set to ObjectWriter to enable ACL usage for CloudFront logging
    • Lifecycle configuration is set to automatically delete logs older than 90 days to manage storage
  • Lambda function:
    • Downloads UI files from our GitHub repository
    • Stores them in the S3 bucket for UI components
    • Runs only during initial setup and uses least privilege permissions
    • Permissions include Amazon CloudWatch Logs for monitoring and specific S3 actions (read/write) limited to the created bucket

After the CloudFormation stack deployment is complete, you can find the CloudFront URLs for accessing the JavaScript and CSS files on the AWS CloudFormation console. You need these CloudFront URLs to update your UI template before creating the labeling job. Note these values—you will use them when creating the labeling job.

Prepare your input manifest

Before you create the labeling job, you need to prepare an input manifest file that tells SageMaker Ground Truth what data to present to annotators. The manifest structure is flexible and can be customized based on your needs. For this post, we use a simple structure:

{ 
"source": "s3://YOUR-BUCKET/audio/sample1.mp3", 
"call-id": "call-123", 
"transcription": "Customer: I'm really happy with your smart home security system. However, I have feature request that would make it betternRepresentative: We're always eager to hear from our customers. What feature would you like to see added ? " 
}

You can adapt this structure to include additional metadata that your annotation workflow requires. For example, you might want to add speaker information, timestamps, or other contextual data. The key is making sure your UI template is designed to process and display these attributes appropriately.

Create your labeling job

With the infrastructure deployed, let’s create the labeling job in SageMaker Ground Truth. For full instructions, refer to Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda.

  1. On the SageMaker console, choose Create labeling job.
  2. Give your job a name.
  3. Specify your input data location in Amazon S3.
  4. Specify an output bucket where annotations will be stored.
  5. For the task type, select Custom labeling task.
  6. In the UI template field, locate the placeholder values for the JavaScript and CSS files and update as follows:
    1. Replace audiovideo-wavesufer.js with your CloudFront JavaScript URL from the CloudFormation stack outputs.
    2. Replace audiovideo-stylesheet.css with your CloudFront CSS URL from the CloudFormation stack outputs.
<!-- Custom Javascript and Stylesheet -->
<script src="audiovideo-wavesufer.js"></script>
<link rel="stylesheet" href="audiovideo-stylesheet.css">
  1. Before you launch the job, use the Preview feature to verify your interface.

You should see the Wavesurfer.js interface load correctly with all controls working properly. This preview step is crucial—it confirms that your CloudFront URLs are correctly specified and the interface is properly configured.

Programmatic setup

Alternatively, you can create your labeling job programmatically using the CreateLabelingJob API. This is particularly useful for automation or when you need to create multiple jobs. See the following code:

response = sagemaker.create_labeling_job(
    LabelingJobName="audio-segmentation-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://your-bucket-name/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://your-bucket-name/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/SagemakerExecutionRole",

    # Optionally add PreHumanTaskLambdaArn or AnnotationConsolidationConfig
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-east-1:012345678910:workteam/private-crowd/work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://your-bucket-name/path-to-ui-template"

The API approach offers the same functionality as the SageMaker console, but allows for automation and integration with existing workflows. Whether you choose the SageMaker console or API approach, the result is the same: a fully configured labeling job ready for your annotation team.

Understanding the output

After your annotators complete their work, SageMaker Ground Truth will generate an output manifest in your specified S3 bucket. This manifest contains rich information at two levels:

  • Segment-level classifications – Details about each marked segment, including start and end times and assigned categories
  • Full-content classifications – Overall ratings and classifications for the entire file

Let’s look at a sample output to understand its structure:

{
  "answers": [
    {
      "acceptanceTime": "2024-11-04T18:33:38.658Z",
      "answerContent": {
        "annotations": {
          "categories": {
            "language": [
              "English",
              "Hindi",
              "Spanish",
              "French",
              "German",
              "Dutch"
            ],
            "speaker": [
              "Customer",
              "Representative"
            ]
          },
          "startTimestamp": 1730745219028,
          "startUTCTime": "Mon, 04 Nov 2024 18:33:39 GMT",
          "streams": {
            "language": [
              {
                "id": "English",
                "start": 0,
                "end": 334.808635,
                "text": "Sample text in English",
                "emotion": "happy"
              },
              {
                "id": "Spanish",
                "start": 334.808635,
                "end": 550.348471,
                "text": "Texto de ejemplo en español",
                "emotion": "neutral"
              }
            ]
          },
          "endTimestamp": 1730745269602,
          "endUTCTime": "Mon, 04 Nov 2024 18:34:29 GMT",
          "elapsedTime": 50574
        },
        "backgroundNoise": {
          "ambient": false,
          "music": true,
          "traffic": false
        },
        "emotiontag": "Neutral",
        "environmentalSounds": {
          "birdsChirping": false,
          "doorbell": true,
          "footsteps": false
        },
        "rate": {
          "1": false,
          "2": false,
          "3": false,
          "4": false,
          "5": true
        },
        "textTranslationFinal": "sample text for transcription"
      }
    }
  ]
} 

This two-level annotation structure provides valuable training data for your AI models, capturing both fine-grained details and overall content assessment.

Customizing the solution

Our audio/video segmentation solution is designed to be highly customizable. Let’s walk through how you can adapt the interface to match your specific annotation requirements.

Customize segment-level annotations

The segment-level annotations are controlled in the report() function of the JavaScript code. The following code snippet shows how you can modify the annotation options for each segment:

ranges.forEach(function (r) {
   // ... existing code ...
   
   // Example: Adding a custom dropdown for speaker identification
   var speakerDropdown = $('<select>').attr({
       name: 'speaker',
       class: 'custom-dropdown-width'
   });
   var speakerOptions = ['Speaker A', 'Speaker B', 'Multiple Speakers', 'Background Noise'];
   speakerOptions.forEach(function(option) {
       speakerDropdown.append($('<option>').val(option).text(option));
   });
   
   // Example: Adding a checkbox for quality issues
   var qualityCheck = $('<input>').attr({
       type: 'checkbox',
       name: 'quality_issue'
   });
   var qualityLabel = $('<label>').text('Contains Quality Issues');

   tr.append($('<TD>').append(speakerDropdown));
   tr.append($('<TD>').append(qualityCheck).append(qualityLabel));
   
   // Add event listeners for your new fields
   speakerDropdown.on('change', function() {
       r.speaker = $(this).val();
       updateTrackListData(r);
   });
   
   qualityCheck.on('change', function() {
       r.hasQualityIssues = $(this).is(':checked');
       updateTrackListData(r);
   });
});

You can remove existing fields or add new ones based on your needs. Make sure you’re updating the data model (updateTrackListData function) to handle your custom fields.

Modify full-content classifications

For classifications that apply to the entire audio/video file, you can modify the HTML template. The following code is an example of adding custom classification options:

<div class="row">
    <div class="col-6">
        <p><strong>Audio Quality Assessment:</strong></p>
        <label class="radio">
            <input type="radio" name="audioQuality" value="excellent" style="width: 20px;">
            Excellent
        </label>
        <label class="radio">
            <input type="radio" name="audioQuality" value="good" style="width: 20px;">
            Good
        </label>
        <label class="radio">
            <input type="radio" name="audioQuality" value="poor" style="width: 20px;">
            Poor
        </label>
    </div>
    <div class="col-6">
        <p><strong>Content Type:</strong></p>
        <label class="checkbox">
            <input type="checkbox" name="contentType" value="interview" style="width: 20px;">
            Interview
        </label>
        <label class="checkbox">
            <input type="checkbox" name="contentType" value="presentation" style="width: 20px;">
            Presentation
        </label>
    </div>
</div>

The classifications you add here will be included in your output manifest, allowing you to capture both segment-level and full-content annotations.

Extending Wavesurfer.js functionality

Our solution uses Wavesurfer.js, an open source audio visualization library. Although we’ve implemented core functionality for segmentation and annotation, you can extend this further using Wavesurfer.js’s rich feature set. For example, you might want to:

  • Add spectrogram visualization
  • Implement additional playback controls
  • Enhance zoom functionality
  • Add timeline markers

For these customizations, we recommend consulting the Wavesurfer.js documentation. When implementing additional Wavesurfer.js features, remember to test thoroughly in the SageMaker Ground Truth preview to review compatibility with the labeling workflow.

Wavesurfer.js is distributed under the BSD-3-Clause license. Although we’ve tested the integration thoroughly, modifications you make to the Wavesurfer.js implementation should be tested in your environment. The Wavesurfer.js community provides excellent documentation and support for implementing additional features.

Clean up

To clean up the resources created during this tutorial, follow these steps:

  1. Stop the SageMaker Ground Truth labeling job if it’s still running and you no longer need it. This will halt ongoing labeling tasks and stop additional charges from accruing.
  2. Empty the S3 buckets by deleting all objects within them. S3 buckets must be emptied before they can be deleted, so removing all stored files facilitates a smooth cleanup process.
  3. Delete the CloudFormation stack to remove all the AWS resources provisioned by the template. This action will automatically delete associated services like the S3 buckets, CloudFront distribution, Lambda function, and related IAM roles.

Conclusion

In this post, we walked through implementing an audio and video segmentation solution using SageMaker Ground Truth. We saw how to deploy the necessary infrastructure, configure the annotation interface, and create labeling jobs both through the SageMaker console and programmatically. The solution’s ability to capture precise segment-level annotations along with overall content classifications makes it particularly valuable for generating high-quality training data for generative AI models, whether you’re working on speech synthesis, video generation, or other multimedia AI applications. As you develop your AI models for audio and video generation, remember that the quality of human feedback directly impacts your model’s performance—whether you’re training models to generate more natural-sounding speech, create coherent video sequences, or understand complex audio patterns.

We encourage you to visit our GitHub repository to explore the solution further and adapt it to your specific needs. You can enhance your annotation workflows by customizing the interface, adding new classification categories, or implementing additional Wavesurfer.js features. To learn more about creating custom labeling workflows in SageMaker Ground Truth, visit Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda and Custom labeling workflows.

If you’re looking for a turnkey data labeling solution, consider Amazon SageMaker Ground Truth Plus, which provides access to an expert workforce trained in various machine learning tasks. With SageMaker Ground Truth Plus, you can quickly receive high-quality annotations without the need to build and manage your own labeling workflows, reducing costs by up to 40% and accelerating the delivery of labeled data at scale.

Start building your annotation workflow today and contribute to the next generation of AI models that push the boundaries of what’s possible in audio and video generation.


About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable and cost-efficient pipelines for computer vision applications, natural language processing, and generative AI. In his free time, Sundar loves exploring new places, sampling local eateries and embracing the great outdoors.

Vineet Agarwal is a Senior Manager of Customer Delivery in the Amazon Bedrock team responsible for Human in the Loop services. He has been in AWS for over 2 years managing Go-to-Market activities, business and technical operations. Prior to AWS, he worked in SaaS , Fintech and Telecommunications industry in services leadership role. He has MBA from the Indian School of Business and B. Tech in Electronics and Communications Engineering from National Institute of Technology, Calicut (India). In his free time, Vineet loves playing racquetball and enjoying outdoor activities with his family.

Read More

Using responsible AI principles with Amazon Bedrock Batch Inference

Using responsible AI principles with Amazon Bedrock Batch Inference

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

The recent announcement of batch inference in Amazon Bedrock enables organizations to process large volumes of data efficiently at 50% less cost compared to On-Demand pricing. It’s especially useful when the use case is not latency sensitive and you don’t need real-time inference. However, as we embrace these powerful capabilities, we must also address a critical challenge: implementing responsible AI practices in batch processing scenarios.

In this post, we explore a practical, cost-effective approach for incorporating responsible AI guardrails into Amazon Bedrock Batch Inference workflows. Although we use a call center’s transcript summarization as our primary example, the methods we discuss are broadly applicable to a variety of batch inference use cases where ethical considerations and data protection are a top priority.

Our approach combines two key elements:

  • Ethical prompting – We demonstrate how to embed responsible AI principles directly into the prompts used for batch inference, preparing for ethical outputs from the start
  • Postprocessing guardrails – We show how to apply additional safeguards to the batch inference output, making sure that the remaining sensitive information is properly handled

This two-step process offers several advantages:

  • Cost-effectiveness – By applying heavy-duty guardrails to only the typically shorter output text, we minimize processing costs without compromising on ethics
  • Flexibility – The technique can be adapted to various use cases beyond transcript summarization, making it valuable across industries
  • Quality assurance – By incorporating ethical considerations at both the input and output stages, we maintain high standards of responsible AI throughout the process

Throughout this post, we address several key challenges in responsible AI implementation for batch inference. These include safeguarding sensitive information, providing accuracy and relevance of AI-generated content, mitigating biases, maintaining transparency, and adhering to data protection regulations. By tackling these challenges, we aim to provide a comprehensive approach to ethical AI use in batch processing.

To illustrate these concepts, we provide practical step-by-step guidance on implementing this technique.

Solution overview

This solution uses Amazon Bedrock for batch inference to summarize call center transcripts, coupled with the following two-step approach to maintain responsible AI practices. The method is designed to be cost-effective, flexible, and maintain high ethical standards.

  • Ethical data preparation and batch inference:
    • Use ethical prompting to prepare data for batch processing
    • Store the prepared JSONL file in an Amazon Simple Storage Service (Amazon S3) bucket
    • Use Amazon Bedrock batch inference for efficient and cost-effective call center transcript summarization
  • Postprocessing with Amazon Bedrock Guardrails:
    • After the completion of initial summarization, apply Amazon Bedrock Guardrails to detect and redact sensitive information, filter inappropriate content, and maintain compliance with responsible AI policies
    • By applying guardrails to the shorter output text, you optimize for both cost and ethical compliance

This two-step approach combines the efficiency of batch processing with robust ethical safeguards, providing a comprehensive solution for responsible AI implementation in scenarios involving sensitive data at scale.

In the following sections, we walk you through the key components of implementing responsible AI practices in batch inference workflows using Amazon Bedrock, with a focus on ethical prompting techniques and guardrails.

Prerequisites

To implement the proposed solution, make sure you have satisfied the following requirements:

Ethical prompting techniques

When setting up your batch inference job, it’s crucial to incorporate ethical guidelines into your prompts. The following is a concise example of how you might structure your prompt:

prompt = f"""
Summarize the following customer service transcript:

{transcript}

Instructions:
1. Focus on the main issue, steps taken, and resolution.
2. Maintain a professional and empathetic tone.
3. Do not include any personally identifiable information (PII) in the summary.
4. Use gender-neutral language even if gender is explicitly mentioned.
5. Reflect the emotional context accurately without exaggeration.
6. Highlight actionable insights for improving customer service.
7. If any part is unclear or ambiguous, indicate this in the summary.
8. Replace specific identifiers with generic terms like 'the customer' or '{{MASKED}}'.
"""

This prompt sets the stage for ethical summarization by explicitly instructing the model to protect privacy, minimize bias, and focus on relevant information.

Set up a batch inference job

For detailed instructions on how to set up and run a batch inference job using Amazon Bedrock, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock. It provides detailed instructions for the following steps:

  • Preparing your data in the required JSONL format
  • Understanding the quotas and limitations for batch inference jobs
  • Starting a batch inference job using either the Amazon Bedrock console or API
  • Collecting and analyzing the output from your batch job

By following the instructions in our previous post and incorporating the ethical prompt provided in the preceding section, you’ll be well-equipped to set up batch inference jobs.

Amazon Bedrock Guardrails

After the batch inference job has run successfully, apply Amazon Bedrock Guardrails as a postprocessing step. This provides an additional layer of protection against potential ethical violations or sensitive information disclosure. The following is a simple implementation, but you can update this based on your data volume and SLA requirements:

import boto3, os, json, time

# Initialize Bedrock client and set guardrail details
bedrock_runtime = boto3.client('bedrock-runtime')
guardrail_id = "<Your Guardrail ID>"
guardrail_version = "<Your Guardrail Version>"

# S3 bucket and file details i.e. output of batch inference job
bucket_name = '<S3 bucket with batch inference output>'
prefix = "<prefix>"
filename = '<filename>'

# Set up AWS session and S3 client
session = boto3.Session(
    aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),
    region_name=os.environ.get('AWS_REGION')
)
s3 = session.client('s3')

# Read and process batch inference output from S3
output_data = []
try:
    object_key = f"{prefix}{filename}"
    json_data = s3.get_object(Bucket=bucket_name, Key=object_key)['Body'].read().decode('utf-8')
    
    for line in json_data.splitlines():
        data = json.loads(line)
        output_entry = {
            'request_id': data['recordId'],
            'output_text': data['modelOutput']['content'][0]['text']
        }
        output_data.append(output_entry)
except Exception as e:
    print(f"Error reading JSON file from S3: {e}")

# Function to apply guardrails and mask PII data
def mask_pii_data(batch_output: str):
    try:
        pii_data = [{"text": {"text": batch_output}}]
        response = bedrock_runtime.apply_guardrail(
            guardrailIdentifier=guardrail_id,
            guardrailVersion=guardrail_version,
            source='OUTPUT',
            content=pii_data
        )
        return response['outputs'][0]['text'] if response['action'] == 'GUARDRAIL_INTERVENED' else pii_data
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Set up rate limiting: # 20 requests per minute, 3 seconds interval
rpm = 20
interval = 3

# Apply guardrails to each record
masked_data = []
for record in output_data:
    iteration_start = time.time()
    
    record['masked_data'] = mask_pii_data(record['output_text'])
    masked_data.append(record)
    
    # Implement rate limiting
    time.sleep(max(0, interval - (time.time() - iteration_start)))

Key points about this implementation:

  • We use the apply_guardrail method from the Amazon Bedrock runtime to process each output
  • The guardrail is applied to the ‘OUTPUT’ source, focusing on postprocessing
  • We handle rate limiting by introducing a delay between API calls, making sure that we don’t exceed the requests per minute quota, which is 20 requests per minute
  • The function mask_pii_data applies the guardrail and returns the processed text if the guardrail intervened
  • We store the masked version for comparison and analysis

This approach allows you to benefit from the efficiency of batch processing while still maintaining strict control over the AI’s outputs and protecting sensitive information. By addressing ethical considerations at both the input (prompting) and output (guardrails) stages, you’ll have a comprehensive approach to responsible AI in batch inference workflows.

Although this example focuses on call center transcript summarization, you can adapt the principles and methods discussed in this post to various batch inference scenarios across different industries, always prioritizing ethical AI practices and data protection.

Ethical considerations for responsible AI

Although the prompt in the previous section provides a basic framework, there are many ethical considerations you can incorporate depending on your specific use case. The following is a more comprehensive list of ethical guidelines:

  • Privacy protection – Avoid including any personally identifiable information in the summary. This protects customer privacy and aligns with data protection regulations, making sure that sensitive personal data is not exposed or misused.
  • Factual accuracy – Focus on facts explicitly stated in the transcript, avoiding speculation. This makes sure that the summary remains factual and reliable, providing an accurate representation of the interaction without introducing unfounded assumptions.
  • Bias mitigation – Be mindful of potential biases related to gender, ethnicity, location, accent, or perceived socioeconomic status. This helps prevent discrimination and maintains fair treatment for your customers, promoting equality and inclusivity in AI-generated summaries.
  • Cultural sensitivity – Summarize cultural references or idioms neutrally, without interpretation. This respects cultural diversity and minimizes misinterpretation, making sure that cultural nuances are acknowledged without imposing subjective judgments.
  • Gender neutrality – Use gender-neutral language unless gender is explicitly mentioned. This promotes gender equality and minimizing stereotyping, creating summaries that are inclusive and respectful of all gender identities.
  • Location neutrality – Include location only if relevant to the customer’s issue. This minimizes regional stereotyping and focuses on the actual issue rather than unnecessary generalizations based on geographic information.
  • Accent awareness – If accent or language proficiency is relevant, mention it factually without judgment. This acknowledges linguistic diversity without discrimination, respecting the varied ways in which people communicate.
  • Socioeconomic neutrality – Focus on the issue and resolution, regardless of the product or service tier discussed. This promotes fair treatment regardless of a customer’s economic background, promoting equal consideration of customers’ concerns.
  • Emotional context – Use neutral language to describe emotions accurately. This provides insight into customer sentiment without escalating emotions, allowing for a balanced representation of the interaction’s emotional tone.
  • Empathy reflection – Note instances of the agent demonstrating empathy. This highlights positive customer service practices, encouraging the recognition and replication of compassionate interactions.
  • Accessibility awareness – Include information about any accessibility needs or accommodations factually. This promotes inclusivity and highlights efforts to accommodate diverse needs, fostering a more accessible and equitable customer service environment.
  • Ethical behavior flagging – Identify potentially unethical behavior without repeating problematic content. This helps identify issues for review while minimizing the propagation of inappropriate content, maintaining ethical standards in the summarization process.
  • Transparency – Indicate unclear or ambiguous information in the summary. This promotes transparency and helps identify areas where further clarification might be needed, making sure that limitations in understanding are clearly communicated.
  • Continuous improvement – Highlight actionable insights for improving customer service. This turns the summarization process into a tool for ongoing enhancement of service quality, contributing to the overall improvement of customer experiences.

When implementing ethical AI practices in your batch inference workflows, consider which of these guidelines are most relevant to your specific use case. You may need to add, remove, or modify instructions based on your industry, target audience, and specific ethical considerations. Remember to regularly review and update your ethical guidelines as new challenges and considerations emerge in the field of AI ethics.

Clean up

To delete the guardrail you created, follow the steps in Delete a guardrail.

Conclusion

Implementing responsible AI practices, regardless of the specific feature or method, requires a thoughtful balance of privacy protection, cost-effectiveness, and ethical considerations. In our exploration of batch inference with Amazon Bedrock, we’ve demonstrated how these principles can be applied to create a system that not only efficiently processes large volumes of data, but does so in a manner that respects privacy, avoids bias, and provides actionable insights.

We encourage you to adopt this approach in your own generative AI implementations. Start by incorporating ethical guidelines into your prompts and applying guardrails to your outputs. Responsible AI is an ongoing commitment—continuously monitor, gather feedback, and adapt your approach to align with the highest standards of ethical AI use. By prioritizing ethics alongside technological advancement, we can create AI systems that not only meet business needs, but also contribute positively to society.


About the authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Read More

Revolutionizing knowledge management: VW’s AI prototype journey with AWS

Revolutionizing knowledge management: VW’s AI prototype journey with AWS

Today, we’re excited to share the journey of the VW—an innovator in the automotive industry and Europe’s largest car maker—to enhance knowledge management by using generative AI, Amazon Bedrock, and Amazon Kendra to devise a solution based on Retrieval Augmented Generation (RAG) that makes internal information more easily accessible by its users. This solution efficiently handles documents that include both text and images, significantly enhancing VW’s knowledge management capabilities within their production domain.

The challenge

The VW engaged with AWS Industries Prototyping & Customer Engineering Team (AWSI-PACE) to explore ways to improve knowledge management in the production domain by building a prototype that uses advanced features of Amazon Bedrock, specifically Anthropic’s Claude 3 models, to extract and analyze information from private documents, such as PDFs containing text and images. The main technical challenge was to efficiently retrieve and process data in a multi-modal setup to provide comprehensive and accurate information from Chemical Compliance private documents.

PACE, a multi-disciplinary rapid prototyping team, focuses on delivering feature-complete initial products that enable business evaluation, determining feasibility, business value, and path to production. Using the PACE-Way (an Amazon-based development approach), the team developed a time-boxed prototype over a maximum of 6 weeks, which included a full stack solution with frontend and UX, backed by specialist expertise, such as data science, tailored for VW’s needs.

The choice of Anthropic’s Claude 3 models within Amazon Bedrock was driven by Claude’s advanced vision capabilities, enabling it to understand and analyze images alongside text. This multimodal interaction is crucial for applications that require extracting insights from complex documents containing both textual content and images. These features open up exciting possibilities for multimodal interactions, making it ideal for querying private PDF documents that include both text and images.

The integrated approach and ease of use of Amazon Bedrock in deploying large language models (LLMs), along with built-in features that facilitate seamless integration with other AWS services like Amazon Kendra, made it the preferred choice. By using Claude 3’s vision capabilities, we could upload image-rich PDF documents. Claude analyzes each image contained within these documents to extract text and understand the contextual details embedded in these visual elements. The extracted text and context from the images are then added to Amazon Kendra, enhancing the search-ability and accessibility of information within the system. This integration ensures that users can perform detailed and accurate searches across the indexed content, using the full depth of information extracted by Claude 3.

Architecture overview

Because of the need to provide access to proprietary information, it was decided early that the prototype would use RAG. The RAG approach, at this time an established solution to enhance LLMs with private knowledge, is implemented using a blend of AWS services that enable us to streamline the processing, searching, and querying of documents while at same time meeting non-functional requirements related to efficiency, scalability, and reliability. The architecture is centered around a native AWS serverless backend, which ensures minimal maintenance and high availability together with fast development.

Architecture overview

Core components of the RAG system

  1. Amazon Simple Storage Service (Amazon S3): Amazon S3 serves as the primary storage for source data. It’s also used for hosting static website components, ensuring high durability and availability.
  2. Amazon Kendra: Amazon Kendra provides semantic search capabilities for ranking of documents and passages, it also deals with the overhead of handling text extraction, embeddings, and managing vector datastore.
  3. Amazon Bedrock: This component is critical for processing and inference. It uses machine learning models to analyze and interpret the text and image data extracted from documents, integrating these insights to generate context-aware responses to queries.
  4. Amazon CloudFront: Distributes the web application globally to reduce latency, offering users fast and reliable access to the RAG system’s interface.
  5. AWS Lambda: Provides the serverless compute environment for running backend operations without provisioning or managing servers, which scales automatically with the application’s demands.
  6. Amazon DynamoDB: Used for storing metadata and other necessary information for quick retrieval during search operations. Its fast and flexible NoSQL database service accommodates high-performance needs.
  7. AWS AppSync: Manages real-time data synchronization and communication between the users’ interfaces and the serverless backend, enhancing the interactive experience.
  8. Amazon Cognito: Manages user authentication and authorization, providing secure and scalable user access control. It supports integration with various identity providers to facilitate easy and secure user sign-in and registration processes.
  9. Amazon API Gateway: Acts as the entry point for all RESTful API requests to the backend services, offering features such as throttling, monitoring, and API version management.
  10. AWS Step Functions: Orchestrates the various AWS services involved in the RAG system, ensuring coordinated execution of the workflow.

Solution walkthrough

The process flow handles complex documents efficiently from the moment a user uploads a PDF. These documents are often large and contain numerous images. This workflow integrates AWS services to extract, process, and make content available for querying. This section details the steps involved in processing uploaded documents and ensuring that extracted data is searchable and contextually relevant to user queries (shown in the following figure).

Solution walkthrough

Initiation and initial processing:

  1. User access: A user accesses the web interface through CloudFront, which allows users to upload PDFs as shown in Image A in Results. These PDFs are stored in Amazon S3.
  2. Text extraction: With the Amazon Kendra S3 connector, the solution indexes the S3 bucket repository of documents that the user has uploaded in Step 1. Amazon Kendra supports popular document types or formats such as PDF, HTML, Word, PowerPoint, and more. An index can contain multiple document formats. Amazon Kendra extracts the content inside the documents to make the documents searchable. The documents are parsed to optimize search on the extracted text within the documents. This means structuring the documents into fields or attributes that are used for search.
  3. Step function activation: When an object is created in S3, such as a user uploading a file in Step 1, the solution will launch a step function that orchestrates the document processing workflow for adding image context to the Kendra index.

Image extraction and analysis:

  1. Extract images: While Kendra indexes the text from the uploaded file, the step function extracts the images from the document. Extracting the images from the uploaded file allows the solution to process the images using Amazon Bedrock to extract text and contextual information. The code snippet that follows provides a sample of the code used to extract the images from the PDF file and save them back to S3.
import json
import fitz  # PyMuPDF
import os
import boto3

# Initialize the S3 client
s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket_name = event['bucket_name']
    pdf_key = event['pdf_key']
    
    # Define the local paths
    local_pdf_path = '/tmp/' + os.path.basename(pdf_key)
    local_image_dir = '/tmp/images'
    
    # Ensure the image directory exists
    if not os.path.exists(local_image_dir):
        os.makedirs(local_image_dir)
    
    # Download the PDF from S3
    s3.download_file(bucket_name, pdf_key, local_pdf_path)
    
    # Open the PDF file using PyMuPDF
    pdf_file = fitz.open(local_pdf_path)
    pdf_name = os.path.splitext(os.path.basename(local_pdf_path))[0]  # Extract PDF base name for labeling
    
    total_images_extracted = 0  # Counter for all images extracted from this PDF
    image_filenames = []  # List to store the filenames of extracted images
    
    # Iterate through each page of the PDF
    for current_page_index in range(len(pdf_file)):
        # Extract images from the current page
        for img_index, img in enumerate(pdf_file.get_page_images(current_page_index)):
            xref = img[0]
            image = fitz.Pixmap(pdf_file, xref)
            
            # Construct image filename with a global counter
            image_filename = f"{pdf_name}_image_{total_images_extracted}.png"
            image_path = os.path.join(local_image_dir, image_filename)
            total_images_extracted += 1
            
            # Save the image appropriately
            if image.n < 5:  # GRAY or RGB
                image.save(image_path)
            else:  # CMYK, requiring conversion to RGB
                new_image = fitz.Pixmap(fitz.csRGB, image)
                new_image.save(image_path)
                new_image = None
            
            image = None
            
            # Upload the image back to S3
            s3.upload_file(image_path, bucket_name, f'images/{image_filename}')
            
            # Add the image filename to the list
            image_filenames.append(image_filename)
    
    # Return the response with the list of image filenames and total images extracted
    return {
        'statusCode': 200,
        'image_filenames': image_filenames,
        'total_images_extracted': total_images_extracted
    }
    1. Lambda function code:
      1. Initialization: The function initializes the S3 client.
      2. Event extraction: Extracts the bucket name and PDF key from the incoming event payload.
      3. Local path set up: Defines local paths for storing the PDF and extracted images.
      4. Directory creation: Ensures the directory for images exists.
      5. PDF download: Downloads the PDF file from S3.
      6. Image extraction: Opens the PDF and iterates through its pages to extract images.
      7. Image processing: Saves the images locally and uploads them back to S3.
      8. Filename collection: Collects the filenames of the uploaded images.
      9. Return statement: Returns the list of image filenames and the total number of images extracted.
  1. Text extraction from images: The image files processed from the previous step are then sent to Amazon Bedrock, where advanced models extract textual content and contextual details from the images. The step function uses a map state to iterate over the list of images, processing each one individually. Claude 3 offers image-to-text vision capabilities that can process images and return text outputs. It excels at analyzing and understanding charts, graphs, technical diagrams, reports, and other visual assets. Claude 3 Sonnet achieves comparable performance to other best-in-class models with image processing capabilities while maintaining a significant speed advantage. The following is a sample snippet that extracts the contextual information from each image in the map state.
import json
import base64
import boto3
from botocore.exceptions import ClientError

# Initialize the boto3 client for BedrockRuntime and S3
s3 = boto3.client('s3', region_name='us-west-2')
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')

def lambda_handler(event, context):
    source_bucket = event['bucket_name']
    destination_bucket = event['destination_bucket']
    image_filename = event['image_filename']
    
    try:
        # Get the image from S3
        image_file = s3.get_object(Bucket=source_bucket, Key=image_filename)
        contents = image_file['Body'].read()

        # Encode the image to base64
        encoded_string = base64.b64encode(contents).decode('utf-8')

        # Prepare the payload for Bedrock
        payload = {
            "modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
            "contentType": "application/json",
            "accept": "application/json",
            "body": {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 4096,
                "temperature": 0.7,
                "top_p": 0.999,
                "top_k": 250,
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "image",
                                "source": {
                                    "type": "base64",
                                    "media_type": "image/png",
                                    "data": encoded_string
                                }
                            },
                            {
                                "type": "text",
                                "text": "Extract all text."
                            }
                        ]
                    }
                ]
            }
        }

        # Call Bedrock to extract text from the image
        body_bytes = json.dumps(payload['body']).encode('utf-8')
        response = bedrock_runtime.invoke_model(
            body=body_bytes,
            contentType=payload['contentType'],
            accept=payload['accept'],
            modelId=payload['modelId']
        )

        response = json.loads(response['body'].read().decode('utf-8'))
        response_content = response['content'][0]
        response_text = response_content['text']

        # Save the extracted text to S3
        text_file_key = image_filename.replace('.png', '.txt')
        s3.put_object(Bucket=destination_bucket, Key=text_file_key, Body=str(response_text))

        return {
            'statusCode': 200,
            'text_file_key': text_file_key,
            'message': f"Processed and saved text for {image_filename}"
        }

    except Exception as e:
        return {
            'statusCode': 500,
            'error': str(e),
            'message': f"An error occurred processing {image_filename}"
        }
    1. Lambda function code:
      1. Initialization: The script initializes the boto3 clients for BedrockRuntime and S3 services to interact with AWS resources.
      2. Lambda handler: The main function (lambda_handler) is invoked when the Lambda function is run. It receives the event and context parameters.
      3. Retrieve image: The image file is retrieved from the specified S3 bucket using the get_object method.
      4. Base64 encoding: The image is read and encoded to a base64 string, which is required for sending the image data to Bedrock.
      5. Payload preparation: A payload is constructed with the base64 encoded image and a request to extract text.
      6. Invoke Amazon Bedrock: The Amazon Bedrock model is invoked using the prepared payload to extract text from the image.
      7. Process response: The response from Amazon Bedrock is parsed to extract the textual content.
      8. Save text to S3: The extracted text is saved back to the specified S3 bucket with a filename derived from the original image filename.
      9. Return statement: The function returns a success message and the key of the saved text file. If an error occurs, it returns an error message.

Data storage and indexing:

  1. Save to S3: The extracted text from the images are saved back to S3 as text files.
  2. Indexing by Amazon Kendra: After being saved in S3, the data is indexed by Amazon Kendra, making it searchable and accessible for queries. This indexing adds the image context to perform similarity searches in the RAG system.

User query with semantic search and inference

The semantic search and inference process of our solution plays a critical role in providing users with accurate and contextually relevant information based on their queries.

Semantic search focuses on understanding the intent and contextual meaning behind a user’s query instead of relying solely on keyword matching. Amazon Kendra, an advanced enterprise search service, uses semantic search to deliver more accurate and relevant results. By using natural language processing (NLP) and machine learning algorithms, Amazon Kendra can interpret the nuances of a query, ensuring that the retrieved documents and data align closely with the user’s actual intent.

User query with semantic search and inference

User query handling:

  1. User interaction: Users submit their queries through a user-friendly interface.

Semantic search with Amazon Kendra:

  1. Context retrieval: Upon receiving a query, Amazon Kendra performs a semantic search to identify the most relevant documents and data. The advanced NLP capabilities of Amazon Kendra allow it to understand the intent and contextual nuances of the query.
  2. Provision of relevant context: Amazon Kendra provides a list of documents that are ranked based on their relevance to the user’s query. This ensures that the response is not only based on keyword matches but also on the semantic relevance of the content. Note that Amazon Kendra also uses the text extracted from images, which was processed with Amazon Bedrock, to enhance the search results.

Inference with Amazon Bedrock:

  1. Contextual analysis and inference: The relevant documents and data retrieved by Amazon Kendra are then passed to Amazon Bedrock. The inference models available in Amazon Bedrock consider both the context provided by Kendra and the specific details of the user query. This dual consideration allows Amazon Bedrock to formulate responses that are not only accurate but also finely tuned to the specifics of the query. The following are the snippets for generating prompts that help Bedrock provide accurate and contextually relevant responses:
def get_qa_prompt(self):
    template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}"""
    return PromptTemplate(template=template, input_variables=["context", "question"])

def get_prompt(self):
    template = """The following is a friendly conversation between a human and an AI. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{chat_history}

Question: {input}"""
    input_variables = ["input", "chat_history"]
    prompt_template_args = {
        "chat_history": "{chat_history}",
        "input_variables": input_variables,
        "template": template,
    }
    prompt_template = PromptTemplate(**prompt_template_args)
    return prompt_template

def get_condense_question_prompt(self):
    template = """<conv>
{chat_history}
</conv>

<followup>
{question}
</followup>

Given the conversation inside the tags <conv></conv>, rephrase the follow up question you find inside <followup></followup> to be a standalone question, in the same language as the follow up question.
"""
    return PromptTemplate(input_variables=["chat_history", "question"], template=template)
    1. QA prompt explanation:
      1. QA Prompt:
        1. This prompt is designed to use the context provided by Amazon Kendra to answer a question accurately. The context provided by Amazon Kendra is from the most relevant documents and data processed by the semantic search from the user query.
        2. It instructs the AI to use the given context and only provide an answer if it is certain; otherwise, it should admit not knowing the answer.

Response delivery:

  1. Delivery to user: This response is then delivered back to the user; completing the cycle of query and response.

Results

Our evaluation of the system revealed significant multi-lingual capabilities, enhancing user interaction with documents in multiple languages:

  • Multilingual support: The model showed strong performance across different languages. Despite the documents being primarily in German, the system handled queries in English effectively. It translated the extracted text from the PDFs or images from German to English, providing responses in English. This feature was crucial for English-speaking users.
  • Seamless language transition: The system also supports transitions between languages. Users could ask questions in German and receive responses in German, maintaining context and accuracy. This dual-language functionality significantly enhanced efficiency, catering to documents containing both German and English.
  • Enhanced user experience: This multilingual capability broadened the system’s accessibility and ensured users could receive information in their preferred language, making interactions more intuitive.

Image A demonstrates a user querying their private data. The solution successfully answers the query using the private data. The answer isn’t derived from the extracted text within the files, but from an image embedded in the uploaded file.

Image A demonstrates a user querying their private data.

Image B shows the specific image from which Amazon Bedrock extracted the text and added it to the index, enabling the system to provide the correct answer.

Image B shows the specific image from which Amazon Bedrock extracted the text and added it to the index.

Image C also shows a scenario where, without the image context, the question cannot be answered.

Image C also shows a scenario where, without the image context, the question cannot be answered.

Following the successful prototype development, Stefan Krawinkel from VW shared his thoughts:

“We are thrilled by the AWS team’s joy of innovation and the constant questioning of solutions for the requirements we brought to the prototype. The solutions developed give us a good overview of what is possible with generative AI, and what limits still exist today. We are confident that we will continue to push existing boundaries together with AWS to be able to offer attractive products to our customers.”

This testimonial highlights how the collaborative effort addressed the complex challenges and underscores the ongoing potential for innovation in future projects.

Additional thanks to Fabrizio Avantaggiato, Verena Koutsovagelis and Jon Reed for their work on this prototype.


About the Authors

Rui Costa specializes in Software Engineering and currently holds the position of Principal Solutions Developer within the AWS Industries Prototyping and Customer Engineering (PACE) Team based out of Jersey City, New Jersey.

Mahendra Bairagi is a Generative AI specialist who currently holds a position of Principal Solutions Architect – Generative AI within the AWS Industries and Customer Engineering (PACE) team. Throughout his more than 9 years at AWS, Mahendra has held a variety of pivotal roles, including Principal AI/ML Specialist, IoT Specialist, Principal Product Manager and head of Sports Innovations Lab. In these capacities, he has consistently led innovative solutions, driving significant advancements for both customers and partners.

Read More