Large language model inference over confidential data using AWS Nitro Enclaves

Large language model inference over confidential data using AWS Nitro Enclaves

This post is co-written with Justin Miles, Liv d’Aliberti, and Joe Kovba from Leidos. 

Leidos is a Fortune 500 science and technology solutions leader working to address some of the world’s toughest challenges in the defense, intelligence, homeland security, civil, and healthcare markets. In this post, we discuss how Leidos worked with AWS to develop an approach to privacy-preserving large language model (LLM) inference using AWS Nitro Enclaves.

LLMs are designed to understand and generate human-like language, and are used in many industries, including government, healthcare, financial, and intellectual property. LLMs have broad applicability, including chatbots, content generation, language translation, sentiment analysis, question answering systems, search engines, and code generation. Introducing LLM-based inference into a system also has the potential to introduce privacy threats, including model exfiltration, data privacy violations, and unintended LLM-based service manipulation. Technical architectures need to be implemented in order to make sure that LLMs don’t expose sensitive information during inference.

This post discusses how Nitro Enclaves can help protect LLM model deployments, specifically those that use personally identifiable information (PII) or protected health information (PHI). This post is for educational purposes only and should not be used in production environments without additional controls.

Overview of LLMs and Nitro Enclaves

A potential use case is an LLM-based sensitive query chatbot designed to carry out a question and answering service containing PII and PHI. Most current LLM chatbot solutions explicitly inform users that they should not include PII or PHI when inputting questions due to security concerns. To mitigate these concerns and protect customer data, service owners rely primarily on user protections such as the following:

  • Redaction – The process of identifying and obscuring sensitive information like PII in documents, texts, or other forms of content. This can be accomplished with input data before being sent to a model or an LLM trained to redact their responses automatically.
  • Multi-factor authentication – A security process that requires users to provide multiple authentication methods to verify their identity to gain access to the LLM.
  • Transport Layer Security (TLS) – A cryptographic protocol that provides secure communication that enhances data privacy in transit between users and the LLM service.

Although these practices enhance the security posture of the service, they are not sufficient to safeguard all sensitive user information and other sensitive information that can persist without the user’s knowledge.

In our example use case, an LLM service is designed to answer employee healthcare benefit questions or provide a personal retirement plan. Let’s analyze the following sample architecture and identify data privacy risk areas.

llm-risk-area-diagram

Figure 1 – Data Privacy Risk Areas Diagram

The potential risk areas are as follows:

  1. Privileged users have access to the instance that houses the server. Unintentional or unauthorized changes to the service could result in sensitive data being exposed in unintended ways.
  2. Users must trust the service will not expose or retain sensitive information in application logs.
  3. Changes to application packages can cause changes to the service, resulting in the exposure of sensitive data.
  4. Privileged users with access to the instance have unrestricted access to the LLM used by the service. Changes may cause incorrect or inaccurate information being returned to users.

Nitro Enclaves provides additional isolation to your Amazon Elastic Compute Cloud (Amazon EC2) instance, safeguarding data in use from unauthorized access, including admin-level users. In the preceding architecture, it’s possible for an unintentional change to result in sensitive data to persist in plaintext and accidentally get revealed to a user who may not need to access that data. With Nitro Enclaves, you create an isolated environment from your EC2 instance, permitting you to allocate CPU and memory resources to the enclave. This enclave is a highly restrictive virtual machine. By running code that handles sensitive data within the enclave, none of the parent’s processes will be able to view enclave data.

Nitro Enclaves offers the following benefits:

  • Memory and CPU Isolation – It relies on the Nitro Hypervisor to isolate the CPU and memory of the enclave from users, applications, and libraries on the parent instance. This feature helps isolate the enclave and your software, and significantly reduces the surface area for unintended events.
  • Separate virtual machine – Enclaves are separated virtual machines attached to an EC2 instance to further protect and securely process highly sensitive data.
  • No interactive access – Enclaves provide only secure local socket connectivity with their parent instance. They have no persistent storage, interactive access, or external networking.
  • Cryptographic attestation – Nitro Enclaves offers cryptographic attestation, a process used to prove the identity of an enclave and verify that only authorized code is running in your enclave.
  • AWS integration – Nitro Enclaves is integrated with AWS Key Management Service (AWS KMS), allowing you to decrypt files that have been encrypted using AWS KMS inside the enclave. AWS Certificate Manager (ACM) for Nitro Enclaves allows you to use public and private SSL/TLS certificates with your web applications and servers running on EC2 instances with Nitro Enclaves.

You can use these features provided by Nitro Enclaves to help mitigate risks associated with PII and PHI data. We recommend including Nitro Enclaves in an LLM service when handling sensitive user data.

Solution overview

Let’s examine the architecture of the example service, now including Nitro Enclaves. By incorporating Nitro Enclaves, as shown in the following figure, the LLM becomes a more secure chatbot for handling PHI or PII data.

llm-using-aws-nitro-enclaves-diagram

Figure 2 – Solution Overview Diagram

User data, including PII, PHI, and questions, remains encrypted throughout the request-response process when the application is hosted within an enclave. The steps carried out during the inference are as follows:

  1. The chatbot app generates temporary AWS credentials and asks the user to input a question. The question, which may contain PII or PHI, is then encrypted via AWS KMS. The encrypted user input is combined with the temporary credentials to create the encrypted request.
  2. The encrypted data is sent to an HTTP server hosted by Flask as a POST request. Before accepting sensitive data, this endpoint should be configured for HTTPs.
  3. The client app receives the POST request and forwards it through a secure local channel (for example, vsock) to the server app running inside Nitro Enclaves.
  4. The Nitro Enclaves server app uses the temporary credentials to decrypt the request, queries the LLM, and generates the response. The model-specific settings are stored within the enclaves and are protected with cryptographic attestation.
  5. The server app uses the same temporary credentials to encrypt the response.
  6. The encrypted response is returned back to the chatbot app through the client app as a response from the POST request.
  7. The chatbot app decrypts the response using their KMS key and displays the plaintext to the user.

Prerequisites

Before we get started, you need the following prerequisites to deploy the solution:

Configure an EC2 instance

Complete the following steps to configure an EC2 instance:

  1. Launch an r5.8xlarge EC2 instance using the amzn2-ami-kernel-5.10-hvm-2.0.20230628.0-x86_64-gp2 AMI with Nitro Enclaves enabled.
  2. Install the Nitro Enclaves CLI to build and run Nitro Enclaves applications:
    • sudo amazon-linux-extras install aws-nitro-enclaves-cli -y
    • sudo yum install aws-nitro-enclaves-cli-devel -y
  3. Verify the installation of the Nitro Enclaves CLI:
    • nitro-cli –version
    • The version used in this post is 1.2.2
  4. Install Git and Docker to build Docker images and download the application from GitHub. Add your instance user to the Docker group (<USER> is your IAM instance user):
    • sudo yum install git -y
    • sudo usermod -aG ne <USER>
    • sudo usermod -aG docker <USER>
    • sudo systemctl start docker && sudo systemctl enable docker
  5. Start and enable the Nitro Enclaves allocator and vsock proxy services:
    • sudo systemctl start nitro-enclaves-allocator.service && sudo systemctl enable nitro-enclaves-allocator.service
    • sudo systemctl start nitro-enclaves-vsock-proxy.service && sudo systemctl enable nitro-enclaves-vsock-proxy.service

Nitro Enclaves uses a local socket connection called vsock to create a secure channel between the parent instance and the enclave.

After all the services are started and enabled, restart the instance to verify that all of the user groups and services are running correctly:

sudo shutdown -r now

Configure the Nitro Enclaves allocator service

Nitro Enclaves is an isolated environment that designates a portion of the instance CPU and memory to run the enclave. With the Nitro Enclaves allocator service, you can indicate how many CPUs and how much memory will be taken from the parent instance to run the enclave.

Modify the enclave’s reserved resources using a text editor (for our solution, we allocate 8 CPU and 70,000 MiB memory to provide enough resources):

vi /etc/nitro_enclaves/allocatory.yaml

AWS-Nitro-Enclaves-Allocator-Service-Config

Figure 3 – AWS Nitro Enclaves Allocator Service Configuration

Clone the project

After you configure the EC2 instance, you can download the code to run the sensitive chatbot with an LLM inside of Nitro Enclaves.

You need to update the server.py file with the appropriate KMS key ID that you created in the beginning to encrypt the LLM response.

  1. Clone the GitHub project:
    • cd ~/ && git clone https://<THE_REPO.git>
  2. Navigate to the project folder to build the enclave_base Docker image that contains the Nitro Enclaves Software Development Kit (SDK) for cryptographic attestation documents from the Nitro Hypervisor (this step can take up to 15 minutes):
    • cd /nitro_llm/enclave_base
    • docker build ./ -t “enclave_base”

Save the LLM in the EC2 Instance

We are using the open-source Bloom 560m LLM for natural language processing to generate responses. This model is not fine-tuned to PII and PHI, but demonstrates how an LLM can live inside of an enclave. The model also needs to be saved on the parent instance so that it can be copied into the enclave via the Dockerfile.

  1. Navigate to the project:
    • cd /nitro_llm
  2. Install the necessary requirements to save the model locally:
    • pip3 install requirements.txt
  3. Run the save_model.py app to save the model within the /nitro_llm/enclave/bloom directory:
    • python3 save_model.py

Build and run the Nitro Enclaves image

To run Nitro Enclaves, you need to create an enclave image file (EIF) from a Docker image of your application. The Dockerfile located in the enclave directory contains the files, code, and LLM that will run inside of the enclave.

Building and running the enclave will take multiple minutes to complete.

  1. Navigate to the root of the project:
    • cd /nitro_llm
  2. Build the enclave image file as enclave.eif:
    • nitro-cli build-enclave --docker-uri enclave:latest --output-file enclave.eif
AWS-Nitro-Enclave-Build-Result

Figure 4 – AWS Nitro Enclaves Build Result

When the enclave is built, a series of unique hashes and platform configuration registers (PCRs) will be created. The PCRs are a contiguous measurement to prove the identity of the hardware and application. These PCRs will be required for cryptographic attestation and used during the KMS key policy update step.

  1. Run the enclave with the resources from the allocator.service (adding the --attach-console argument at the end will run the enclave in debug mode):
    • nitro-cli run-enclave --cpu-count 8 --memory 70000 --enclave-cid 16 --eif-path enclave.eif

You need to allocate at least four times the EIF file size. This can be modified in the allocator.service from previous steps.

  1. Verify the enclave is running with the following command:
    • nitro-cli describe-enclaves
AWS-Nitro-Enclave-Describe-Command-Response

Figure 5 – AWS Nitro Enclave Describe Command

Update the KMS key policy

Complete the following steps to update your KMS key policy:

  1. On the AWS KMS console, choose Customer managed keys in the navigation pane.
  2. Search for the key that you generated as a prerequisite.
  3. Choose Edit on the key policy.
  4. Update the key policy with the following information:
    • Your account ID
    • Your IAM user name
    • The updated Cloud9 environment instance role
    • Actions kms:Encrypt and kms:Decrypt
    • Enclave PCRs (for example, PCR0, PCR1, PCR2) to your key policy with a condition statement

See the following key policy code:

{
   "Version":"2012-10-17",
   "Id":"key-default-1",
   "Statement":[
      {
         "Sid":"Enable User permissions",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam:::user/"
         },
         "Action":[
            "kms:CreateAlias",
            "kms:CreateKey",
            "kms:DeleteAlias",
            "kms:Describe*",
            "kms:GenerateRandom",
            "kms:Get*",
            "kms:List*",
            "kms:TagResource",
            "kms:UntagResource",
            "iam:ListGroups",
            "iam:ListRoles",
            "iam:ListUsers"
         ],
         "Resource":"*"
      },
      {
         "Sid":"Enable Enclave permissions",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam:::role/"
         },
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt"
         ],
         "Resource":"*",
         "Condition":{
            "StringEqualsIgnoreCase":{
               "kms:RecipientAttestation:PCR0":"",
               "kms:RecipientAttestation:PCR1":"",
               "kms:RecipientAttestation:PCR2":""
            }
         }
      }
   ]
}

Save the chatbot app

To mimic a sensitive query chatbot application that lives outside of the AWS account, you need to save the chatbot.py app and run it inside the Cloud9 environment. Your Cloud9 environment will use its instance role for temporary credentials to disassociate permissions from the EC2 running the enclave. Complete the following steps:

  1. On the Cloud9 console, open the environment you created.
  2. Copy the following code into a new file like chatbot.py into the main directory.
  3. Install the required modules:
    • pip install boto3
    • Pip install requests
  4. On the Amazon EC2 console, note the IP associated with your Nitro Enclaves instance.
  5. Update the URL variable in http://<ec2instanceIP>:5001.
"""
Modules for a basic chatbot like application and AWS communications
"""
import base64
import requests
import boto3
 
def get_identity_document():
    """
    Get identity document for current EC2 Host
    """
    identity_doc = requests.get(
        "http://169.254.169.254/latest/dynamic/instance-identity/document", timeout=30)
    return identity_doc
 
def get_region(identity):
    """
    Get account of current instance identity
    """
    region = identity.json()["region"]
    return region
 
def get_account(identity):
    """
    Get account of current instance identity
    """
    account = identity.json()["accountId"]
    return account
 
def set_identity():
    """
    Set region and account for KMS
    """
    identity = get_identity_document()
    region = get_region(identity)
    account = get_account(identity)
    return region, account
 
def prepare_server_request(ciphertext):
    """
    Get the AWS credential from EC2 instance metadata
    """
    instance_prof = requests.get(
        "http://169.254.169.254/latest/meta-data/iam/security-credentials/", timeout=30)
    instance_profile_name = instance_prof.text
 
    instance_prof_json = requests.get(
        f"http://169.254.169.254/latest/meta-data/iam/security-credentials/{instance_profile_name}",
        timeout=30)
    response = instance_prof_json.json()
 
    credential = {
        'access_key_id': response['AccessKeyId'],
        'secret_access_key': response['SecretAccessKey'],
        'token': response['Token'],
        'region': REGION,
        'ciphertext': ciphertext
    }
    return credential
 
def get_user_input():
    """
    Start chatbot to collect user input
    """
    print("Chatbot: Hello! How can I assist you?")
    user_input = input('Your Question: ')
    return user_input.lower()
 
def encrypt_string(user_input, alias, kms):
    """
    Encrypt user input using AWS KMS
    """
    file_contents = user_input
    encrypted_file = kms.encrypt(KeyId=f'alias/{alias}', Plaintext=file_contents)
    encrypted_file_contents = encrypted_file[u'CiphertextBlob']
    encrypted_file_contents_base64 = base64.b64encode(encrypted_file_contents)
    return encrypted_file_contents_base64.decode()
 
def decrypt_data(encrypted_data, kms):
    """
    Decrypt the LLM response using AWS KMS
    """
    try:
        ciphertext_blob = base64.b64decode(encrypted_data)
        response = kms.decrypt(CiphertextBlob=ciphertext_blob)
        decrypted_data = response['Plaintext'].decode()
        return decrypted_data
    except ImportError as e_decrypt:
        print("Decryption failed:", e_decrypt)
        return None
 
REGION, ACCOUNT = set_identity()
  
def main():
    """
    Main function to encrypt/decrypt data and send/receive with parent instance
    """
    kms = boto3.client('kms', region_name=REGION)
    alias = "ncsnitro"
    user_input = get_user_input()
    encrypted_input = encrypt_string(user_input, alias, kms)
    server_request = prepare_server_request(encrypted_input)
    url = 'http://<EC2 Instance Private IP>:5001'
    x = requests.post(url, json = server_request)
    response_body = x.json()
    llm_response = decrypt_data(response_body["EncryptedData"], kms)
    print(llm_response)
 
if __name__ == '__main__':
    main()
  1. Run the chatbot application:
    • python3 chat.py

When it’s running, the terminal will ask for the user input and follow the architectural diagram from earlier to generate a secure response.

Run the private question and answer chatbot

Now that Nitro Enclaves is up and running on the EC2 instance, you can more securely ask your chatbot PHI and PII questions. Let’s look at an example.

Within the Cloud9 environment, we ask our chatbot a question and provide our user name.

question-can't-access-my-email

Figure 6 – Asking the Chat Bot a Question

AWS KMS encrypts the question, which looks like the following screenshot.

excrypted-question

Figure 7 – Encrypted Question

It is then sent to the enclave and asked of the secured LLM. The question and response of the LLM will look like the following screenshot (the result and encrypted response are visible inside the enclave only in debug mode).

question-response-from-llm

Figure 8 – Response from LLM

The result is then encrypted using AWS KMS and returned to the Cloud9 environment to be decrypted.

final-decrypted-response

Figure 9 – Final Decrypted Response

Clean up

Complete the following steps to clean up your resources:

  1. Stop the EC2 instance created to house your enclave.
  2. Delete the Cloud9 environment.
  3. Delete the KMS key.
  4. Remove the EC2 instance role and IAM user permissions.

Conclusion

In this post, we showcased how to use Nitro Enclaves to deploy an LLM question and answering service that more securely sends and receives PII and PHI information. This was deployed on Amazon EC2, and the enclaves are integrated with AWS KMS restricting access to a KMS key, so only Nitro Enclaves and the end-user are allowed to use the key and decrypt the question.

If you’re planning to scale this architecture to support larger workloads, make sure the model selection process matches your model requirements with EC2 resources. Additionally, you must consider the maximum request size and what impact that will have on the HTTP server and inference time against the model. Many of these parameters are customizable through the model and HTTP server settings.

The best way to determine the specific settings and requirements for your workload is through testing with a fine-tuned LLM. Although this post only included natural language processing of sensitive data, you can modify this architecture to support alternate LLMs supporting audio, computer vision, or multi-modalities. The same security principles highlighted here can be applied to data in any format. The resources used to build this post are available on the GitHub repo.

Share how you are going to adapt this solution for your environment in the comments section.


About the Authors

Justin Miles is a cloud engineer within the Leidos Digital Modernization Sector under the Office of Technology. In his spare time, he enjoys golfing and traveling.

Liv d’Aliberti is a researcher within the Leidos AI/ML Accelerator under the Office of Technology. Their research focuses on privacy-preserving machine learning.

Chris Renzo is a Sr. Solution Architect within the AWS Defense and Aerospace organization. Outside of work, he enjoys a balance of warm weather and traveling.

Joe Kovba is a Vice President within the Leidos Digital Modernization Sector. In his free time, he enjoys refereeing football games and playing softball.

Read More

How VistaPrint delivers personalized product recommendations with Amazon Personalize

How VistaPrint delivers personalized product recommendations with Amazon Personalize

VistaPrint, a Cimpress business, is the design and marketing partner to millions of small businesses around the world. For more than two decades, VistaPrint has empowered small businesses to quickly and effectively create the marketing products – from promotional materials and signage to print advertising and more – to get the job done, regardless of whether they operate in-store or online.

To support small businesses on their brand-building journey, VistaPrint provides customers with personalized product recommendations, both in real time on vistaprint.com and through marketing emails. These product recommendations improve their customers’ experience by making it more efficient to find the products they need, while increasing VistaPrint’s conversion rates. Since implementing Amazon Personalize, VistaPrint increased their conversion rate by 10 percent and reduced their total cost of ownership by 30 percent.

In this post, we show you how VistaPrint uses a combination of Amazon Personalize, Twilio Segment, and auxiliary AWS services and partner solutions to better understand their customers’ needs and provide personalized product recommendations.

Prior solution and challenges

Prior to their current solution, VistaPrint had an internally developed product recommendation system hosted on-premises. The first challenge with their prior solution was that the solution couldn’t scale automatically when demand increased. The second challenge was that changes to the in-house developed system were time-consuming, because a high degree of machine learning and ecommerce domain specialization was required to make modifications.

These challenges led to the decision to create a new cloud-native system that can scale with increased demand and consists of serverless and software as a service (SaaS) components that externalize much of the domain-specific functionality to allow for easier operations and faster time-to-market for changes.

The new VistaPrint personalized product recommendation system

Architecture diagram showing Vistaprint's personalized product recommendation system.

Figure 1

As seen in Figure 1, the steps in how VistaPrint provides personalized product recommendations with their new cloud-native architecture are:

  1. Aggregate historical data in a data warehouse. Data from upstream systems including customer data platforms (CDPs) like Twilio Segment, order management, product catalog, and user management systems are collected in a data warehouse, which in VistaPrint’s case is Snowflake.
  2. Transform the data to create Amazon Personalize training data. Amazon Personalize uses data about users, items, and interactions, and this data is ingested from Amazon Simple Storage Service (Amazon S3) in CSV format. In VistaPrint’s case, they use Databricks to perform the required data transformations before landing the data in Amazon S3.
  3. Import bulk historical data to train Amazon Personalize models. After bulk historical data is ingested into an Amazon Personalize dataset, one or more solutions are trained using this data. In VistaPrint’s case, they use the User-Personalization and Similar-Items model recipes.
    • With User-Personalization, Amazon Personalize predicts the items that a user will interact with based on previous interactions across all users.
    • With Similar-Items, Amazon Personalize generates recommendations for items that are similar to an item you specify.

    To maintain the relevance of the personalization models, steps 2 and 3 are repeated on a regular basis to keep the training data up to date.

  4. Stream ecommerce website events to a CDP. A CDP is used to capture events from an ecommerce website, for example when a user views a product or adds a product to their shopping cart. A CDP can also perform identity resolution, which helps to identify the user regardless of whether they’re accessing a platform from a mobile or a web client. VistaPrint uses Twilio Segment as their CDP.
  5. Generate real-time product recommendations as a customer navigates the ecommerce website. As a customer navigates an ecommerce website and these events are captured by a CDP, they are also forwarded to Amazon Personalize. Amazon Personalize in turn generates recommendations for additional products that a customer may be interested in. These recommendations are placed back into the ecommerce website experience in real-time.
    • AWS Lambda is used to send data from Segment to Amazon Personalize using Segment’s Amazon Lambda Destination. VistaPrint uses the Segment Amazon Lambda Destination to perform additional data transformations and to get flexibility to integrate with additional observability tooling not shown, but other AWS customers can consider Segment’s Amazon Personalize Destination which is suitable for simpler integrations.
    • VistaPrint created a personalization service that sits in front of Amazon Personalize. This service provides additional functionality on top of Amazon Personalize APIs, including the ability to cache recent recommendations in Amazon DynamoDB, and integration with VistaPrint’s authentication and authorization systems.
    • VistaPrint created a placement and offer engine (POE), which allows data scientists and marketers to collaborate. Placement templates are used to create customized placements by allowing a marketer to select an Amazon Personalize model, the visual style of the placement, and extra features like whether to display a customer’s logo as it would appear on the final manufactured product. Figure 2 shows an example of one of these placements, called More with your design, as seen on vistaprint.com.
  6. Generate product recommendations as part of email marketing campaigns. In addition to providing real-time product recommendations on their website, VistaPrint uses personalized product recommendations in email marketing campaigns. The same POE system is used to design and place product recommendations into email templates.
Screenshot showing personalized product recommendations within the shopping cart page of vistaprint.com. The personalized product recommendations also show a notional logo as it would appear on the customized manufactured products.

Figure 2

Business Impact

Since implementing its new personalized product recommendation system, VistaPrint has realized a 10 percent increase in conversions originating from personalized recommendations. Amazon Personalize also reduced VistaPrint’s total cost of ownership by 30 percent compared to the previous on-premises solution.

Conclusion

VistaPrint’s cloud-native personalized product recommendation system helps the company deliver a more efficient and helpful experience to their customers, while increasing the company’s conversion rates.

Amazon Personalize is at the center of VistaPrint’s personalized product recommendation system, providing a fully managed, machine learning powered solution.

A customer data platform like Twilio Segment allows companies like VistaPrint to build a connected, 360 degree view of their customers by aggregating data from all of their customer touchpoints across multiple business domains. This cohesive view of the customer leads to more accurate and personalized product recommendations when paired with Amazon Personalize.

Next Steps

The VistaPrint personalized product recommendation system is one product within a larger data mesh of products. Read more about Vista’s data mesh strategy in this previous post How Vista built a data mesh enabled by solutions available in AWS Marketplace

Also read more on the other topics in this post:


About the Authors

Ethan Fahy is an Enterprise Senior Solutions Architect at AWS based in Boston, MA. Ethan has a background in geophysics and enjoys building large-scale, cloud-native architectures to support scientific workloads.

Mouloud Lounaci leads the Engineering team for Marketing Optimization at Vista. He is a Machine Learning enthusiast with around 10 years of experience in building AI-powered software products to solve complex customer problems. Whenever he gets a chance, Mouloud jumps on a plane to discover cultures, food, and landscapes from around the world.

Emeline Escolivet is the Engineering Manager for the Recommendations team at Vista. With 10+ years of experience as a Software Engineer, she enjoys turning complex business issues into reliable software solutions. In her free time, she likes to describe herself as a hiker, dancer and food lover.

Vibhusheet Tripathi is a Senior Data Engineer in the Recommendations Team at Vista. When not experimenting with machine learning systems, Vibhu likes to read, play sports and listen to music.

Read More

Automate the process to change image backgrounds using Amazon Bedrock and AWS Step Functions

Automate the process to change image backgrounds using Amazon Bedrock and AWS Step Functions

Many customers, including those in creative advertising, media and entertainment, ecommerce, and fashion, often need to change the background in a large number of images. Typically, this involves manually editing each image with photo software. This can take a lot of effort, especially for large batches of images. However, Amazon Bedrock and AWS Step Functions make it straightforward to automate this process at scale.

Amazon Bedrock offers the generative AI foundation model Amazon Titan Image Generator G1, which can automatically change the background of an image using a technique called outpainting. Step Functions allows you to create an automated workflow that seamlessly connects with Amazon Bedrock and other AWS services. Together, Amazon Bedrock and Step Functions streamline the entire process of automatically changing backgrounds across multiple images.

This post introduces a solution that simplifies the process of changing backgrounds in multiple images. By harnessing the capabilities of generative AI with Amazon Bedrock and the Titan Image Generator G1 model, combined with Step Functions, this solution efficiently generates images with the desired background. This post provides insight into the inner workings of the solution and helps you understand the design choices made to build this own custom solution.

See the GitHub repository for detailed instructions on deploying this solution.

Solution overview

Let’s look at how the solution works at a high level before diving deeper into specific elements and the AWS services used. The following diagram provides a simplified view of the solution architecture and highlights the key elements.

Solution Architecture

The workflow consists of the following steps:

  1. A user uploads multiple images into an Amazon Simple Storage Service (Amazon S3) bucket via a Streamlit web application.
  2. The Streamlit web application calls an Amazon API Gateway REST API endpoint integrated with the Amazon Rekognition DetectLabels API, which detects labels for each image.
  3. Upon submission, the Streamlit web application updates an Amazon DynamoDB table with image details.
  4. The DynamoDB update triggers an AWS Lambda function, which starts a Step Functions workflow.
  5. The Step Functions workflow runs the following steps for each image:
    5.1 Constructs a request payload for the Amazon Bedrock InvokeModel API.
    5.2 Invokes the Amazon Bedrock InvokeModel API action.
    5.3 Parses an image from the response and saves it to an S3 location.
    5.4 Updates the image status in a DynamoDB table.
  6. The Step Functions workflow invokes a Lambda function to generate a status report.
  7. The workflow sends an email using Amazon Simple Notification Service (Amazon SNS).

As shown in the following screenshot, the Streamlit web application allows you to upload images and enter text prompts to specify desired backgrounds, negative prompts, and outpainting mode for image generation. You can also view and remove unwanted labels associated with each uploaded image that you don’t want to keep in the final generated images.

Streamlit Web Application

In this example, the prompt for the background is “London city background.” The automation process generates new images based on the original uploaded images with London as the background.

Generated Images

Streamlit web application and images uploads

A Streamlit web application serves as the frontend for this solution. To protect the application from unauthorized access, it integrates with an Amazon Cognito user pool. API Gateway uses an Amazon Cognito authorizer to authenticate requests. The web application completes the following steps:

  1. For each selected image, it retrieves labels via Amazon Rekognition using an API Gateway REST API endpoint.
  2. Upon submission, the application uploads images to an S3 bucket.
  3. The application updates a DynamoDB table with relevant parameters, image names, and associated labels for each image using another API Gateway REST API endpoint.

Image processing workflow

When the DynamoDB table is updated, DynamoDB Streams triggers a Lambda function to start a new Step Functions workflow. The following is a sample request for the workflow:

{
  "Id": "621fa85a-38bb-4d98-a656-93bbbcf5477f",
  "S3Bucket": "<Image Bucket>",
  "InputS3Prefix": "image-files/<year>/<month>/<day>/<timestamp>",
  "OutputS3Prefix": "generated-image-files/<year>/<month>/<day>/<timestamp>",
  "StatusS3Prefix": "status-report-files/<year>/<month>/<day>/<timestamp>",
  "Prompt": "london city background",
  "NegativePrompt": "low quality, low resolution",
  "Mode": "PRECISE",
  "Images": [
    {
      "ImageName": "bus.png",
      "Labels": "Bus, Person"
    },
    {
      "ImageName": "cop.png",
      "Labels": "Person, Adult, Male, Man, Helmet, Jacket"
    },
    {
      "ImageName": "iguana-2.png",
      "Labels": "Lizard”
    },
    {
      "ImageName": "dog.png",
      "Labels": "Dog"
    }
  ]
}

The Step Functions workflow subsequently performs the following three steps:

  1. Replace the background for all images.
  2. Generate a status report.
  3. Send an email via Amazon SNS.

The following screenshot illustrates the Step Functions workflow.

AWS Step Functions Workflow

Let’s look at each step in more detail.

Replace background for all images

Step Functions uses a Distributed Map to process each image in parallel child workflows. The Distributed Map allows high-concurrency processing. Each child workflow has its own separate run history from that of the parent workflow.

Step Functions uses an InvokeModel optimized API action for Amazon Bedrock. The API accepts requests and responses that are up to 25 MB. However, Step Functions has a 256 KB limit on state payload input and output. To support larger images, the solution uses an S3 bucket where the InvokeModel API reads data from and writes the result to. The following is the configuration for the InvokeModel API for Amazon Bedrock integration:

{
    "ModelId": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-image-generator-v1",
    "ContentType": "application/json",
    "Input": {  
        "S3Uri": “s3://<Image Bucket>/image-files/<year>/<month>/<day>/<timestamp>/<Image name>.json",
    },  
    "Output": {  
        "S3Uri": “s3://<Image Bucket>/generated-image-files/<year>/<month>/<day>/<timestamp>/<Image name>.json”
    } 
}

The Input S3Uri parameter specifies the source location to retrieve the input data. The Output S3Uri parameter specifies the destination to write the API response.

A Lambda function saves the request payload as a JSON file in the specified Input S3Uri location. The InvokeModel API uses this input payload to generate images with the specified background:

{
    "taskType": "OUTPAINTING",
    "outPaintingParams": {
        "text": "london city background",
        "negativeText": "low quality, low resolution",        
        "image": "<base64-encoded string>",                         
        "maskPrompt": "Bus",                      
        "maskImage": "base64-encoded string",                             
        "outPaintingMode": "DEFAULT | PRECISE"                 
    },                                                 
    "imageGenerationConfig": {
        "numberOfImages": 1,
        "quality": "premium",
        "height": 1024,
        "width": 1024,
        "cfgScale": 8.0
    }
}

The Titan Image Generator G1 model supports the following parameters for image generation:

  • taskType – Specifies the outpainting method to replace background of image.
  • text – A text prompt to define the background.
  • negativeText – A text prompt to define what not to include in the image.
  • maskPrompt – A text prompt that defines the mask. It corresponds to labels that you want to retain in the final generated images.
  • maskImage – The JPEG or PNG image encoded in base64.
  • outPaintingMode – Specifies whether to allow modification of the pixels inside the mask or not. DEFAULT allows modification of the image inside the mask in order to keep it consistent with the reconstructed background. PRECISE prevents modification of the image inside the mask.
  • numberOfImages – The number of images to generate.
  • quality – The quality of the generated images: standard or premium.
  • cfgScale – Specifies how strongly the generated image should adhere to the prompt.
  • height – The height of the image in pixels.
  • width – The width of the image in pixels.

The Amazon Bedrock InvokeModel API generates a response with an encoded image in the Output S3Uri location. Another Lambda function parses the image from the response, decodes it from base64, and saves the image file in the following location: s3://<Image Bucket>/generated-image-file/<year>/<month>/<day>/<timestamp>/.

Finally, a child workflow updates a DynamoDB table with image generation status, marking it as either Succeeded or Failed, and including details such as ImageName, Cause, Error, and Status.

Generate a status report

After the image generation process, a Lambda function retrieves the status details from DynamoDB. It dynamically compiles these details into a comprehensive status report in JSON format. It then saves the generated status report a JSON file in the following location: s3://<Image Bucket>/status-report-files/<year>/<month>/<day>/<timestamp>/. The ITOps team can integrate this report with their existing notification system to track if image processing completed successfully. For business users, you can expand this further to generate a report in CSV format.

Send an email via Amazon SNS

Step Functions invokes an Amazon SNS API action to send an email. The email contains details including the S3 location for the status report and final images files. The following is the sample notification email.

Notification Email

Conclusion

In this post, we provided an overview of a sample solution demonstrating the automation of changing image backgrounds at scale using Amazon Bedrock and Step Functions. We also explained each element of the solution in detail. By using the Step Functions optimized integration with Amazon Bedrock, Distributed Map, and the Titan Image Generator G1 model, the solution efficiently replaces the backgrounds of images in parallel, enhancing productivity and scalability.

To deploy the solution, refer to the instructions in the GitHub repository.

Resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about the Titan Image Generator G1 model, see the following resources:

To learn more about using Amazon Bedrock with Step Functions, see the following resources:


About the Author

Chetan Makvana is a Senior Solutions Architect with Amazon Web Services. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture and implementing strategies to drive adoption of AWS services. He is a technology enthusiast and a builder with a core area of interest on generative AI, serverless, and DevOps. Outside of work, he enjoys watching shows, traveling, and music. 

Read More

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

Efficiently fine-tune the ESM-2 protein language model with Amazon SageMaker

In this post, we demonstrate how to efficiently fine-tune a state-of-the-art protein language model (pLM) to predict protein subcellular localization using Amazon SageMaker.

Proteins are the molecular machines of the body, responsible for everything from moving your muscles to responding to infections. Despite this variety, all proteins are made of repeating chains of molecules called amino acids. The human genome encodes 20 standard amino acids, each with a slightly different chemical structure. These can be represented by letters of the alphabet, which then allows us to analyze and explore proteins as a text string. The enormous possible number of protein sequences and structures is what gives proteins their wide variety of uses.

The structure of an amino acid chain

Proteins also play a key role in drug development, as potential targets but also as therapeutics. As shown in the following table, many of the top-selling drugs in 2022 were either proteins (especially antibodies) or other molecules like mRNA translated into proteins in the body. Because of this, many life science researchers need to answer questions about proteins faster, cheaper, and more accurately.

Name Manufacturer 2022 Global Sales ($ billions USD) Indications
Comirnaty Pfizer/BioNTech $40.8 COVID-19
Spikevax Moderna $21.8 COVID-19
Humira AbbVie $21.6 Arthritis, Crohn’s disease, and others
Keytruda Merck $21.0 Various cancers

Data source: Urquhart, L. Top companies and drugs by sales in 2022. Nature Reviews Drug Discovery 22, 260–260 (2023).

Because we can represent proteins as sequences of characters, we can analyze them using techniques originally developed for written language. This includes large language models (LLMs) pretrained on huge datasets, which can then be adapted for specific tasks, like text summarization or chatbots. Similarly, pLMs are pre-trained on large protein sequence databases using unlabeled, self-supervised learning. We can adapt them to predict things like the 3D structure of a protein or how it may interact with other molecules. Researchers have even used pLMs to design novel proteins from scratch. These tools don’t replace human scientific expertise, but they have the potential to speed up pre-clinical development and trial design.

One challenge with these models is their size. Both LLMs and pLMs have grown by orders of magnitude in the past few years, as illustrated in the following figure. This means that it can take a long time to train them to sufficient accuracy. It also means that you need to use hardware, especially GPUs, with large amounts of memory to store the model parameters.

Protein language models, like other large language models, have steadily increased in size for several years

Long training times, plus large instances, equals high cost, which can put this work out of reach for many researchers. For example, in 2023, a research team described training a 100 billion-parameter pLM on 768 A100 GPUs for 164 days! Fortunately, in many cases we can save time and resources by adapting an existing pLM to our specific task. This technique is called fine-tuning, and also allows us to borrow advanced tools from other types of language modeling.

Solution overview

The specific problem we address in this post is subcellular localization: Given a protein sequence, can we build a model that can predict if it lives on the outside (cell membrane) or inside of a cell? This is an important piece of information that can help us understand the function and whether it would make a good drug target.

We start by downloading a public dataset using Amazon SageMaker Studio. Then we use SageMaker to fine-tune the ESM-2 protein language model using an efficient training method. Finally, we deploy the model as a real-time inference endpoint and use it to test some known proteins. The following diagram illustrates this workflow.

AWS architecture for fine tuning ESM

In the following sections, we go through the steps to prepare your training data, create a training script, and run a SageMaker training job. All of the code featured in this post is available on GitHub.

Prepare the training data

We use part of the DeepLoc-2 dataset, which contains several thousand SwissProt proteins with experimentally determined locations. We filter for high-quality sequences between 100–512 amino acids:

df = pd.read_csv(
    "https://services.healthtech.dtu.dk/services/DeepLoc-2.0/data/Swissprot_Train_Validation_dataset.csv"
).drop(["Unnamed: 0", "Partition"], axis=1)
df["Membrane"] = df["Membrane"].astype("int32")

# filter for sequences between 100 and 512 amino acides
df = df[df["Sequence"].apply(lambda x: len(x)).between(100, 512)]

# Remove unnecessary features
df = df[["Sequence", "Kingdom", "Membrane"]]

Next, we tokenize the sequences and split them into training and evaluation sets:

dataset = Dataset.from_pandas(df).train_test_split(test_size=0.2, shuffle=True)
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t33_650M_UR50D")

def preprocess_data(examples, max_length=512):
    text = examples["Sequence"]
    encoding = tokenizer(text, truncation=True, max_length=max_length)
    encoding["labels"] = examples["Membrane"]
    return encoding

encoded_dataset = dataset.map(
    preprocess_data,
    batched=True,
    num_proc=os.cpu_count(),
    remove_columns=dataset["train"].column_names,
)

encoded_dataset.set_format("torch")

Finally, we upload the processed training and evaluation data to Amazon Simple Storage Service (Amazon S3):

train_s3_uri = S3_PATH + "/data/train"
test_s3_uri = S3_PATH + "/data/test"

encoded_dataset["train"].save_to_disk(train_s3_uri)
encoded_dataset["test"].save_to_disk(test_s3_uri)

Create a training script

SageMaker script mode allows you to run your custom training code in optimized machine learning (ML) framework containers managed by AWS. For this example, we adapt an existing script for text classification from Hugging Face. This allows us to try several methods for improving the efficiency of our training job.

Method 1: Weighted training class

Like many biological datasets, the DeepLoc data is unevenly distributed, meaning there isn’t an equal number of membrane and non-membrane proteins. We could resample our data and discard records from the majority class. However, this would reduce the total training data and potentially hurt our accuracy. Instead, we calculate the class weights during the training job and use them to adjust the loss.

In our training script, we subclass the Trainer class from transformers with a WeightedTrainer class that takes class weights into account when calculating cross-entropy loss. This helps prevent bias in our model:

class WeightedTrainer(Trainer):
    def __init__(self, class_weights, *args, **kwargs):
        self.class_weights = class_weights
        super().__init__(*args, **kwargs)

    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.get("logits")
        loss_fct = torch.nn.CrossEntropyLoss(
            weight=torch.tensor(self.class_weights, device=model.device)
        )
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

Method 2: Gradient accumulation

Gradient accumulation is a training technique that allows models to simulate training on larger batch sizes. Typically, the batch size (the number of samples used to calculate the gradient in one training step) is limited by the GPU memory capacity. With gradient accumulation, the model calculates gradients on smaller batches first. Then, instead of updating the model weights right away, the gradients get accumulated over multiple small batches. When the accumulated gradients equal the target larger batch size, the optimization step is performed to update the model. This lets models train with effectively bigger batches without exceeding the GPU memory limit.

However, extra computation is needed for the smaller batch forward and backward passes. Increased batch sizes via gradient accumulation can slow down training, especially if too many accumulation steps are used. The aim is to maximize GPU usage but avoid excessive slowdowns from too many extra gradient computation steps.

Method 3: Gradient checkpointing

Gradient checkpointing is a technique that reduces the memory needed during training while keeping the computational time reasonable. Large neural networks take up a lot of memory because they have to store all the intermediate values from the forward pass in order to calculate the gradients during the backward pass. This can cause memory issues. One solution is to not store these intermediate values, but then they have to be recalculated during the backward pass, which takes a lot of time.

Gradient checkpointing provides a balanced approach. It saves only some of the intermediate values, called checkpoints, and recalculates the others as needed. Therefore, it uses less memory than storing everything, but also less computation than recalculating everything. By strategically selecting which activations to checkpoint, gradient checkpointing enables large neural networks to be trained with manageable memory usage and computation time. This important technique makes it feasible to train very large models that would otherwise run into memory limitations.

In our training script, we turn on gradient activation and checkpointing by adding the necessary parameters to the TrainingArguments object:

from transformers import TrainingArguments

training_args = TrainingArguments(
	gradient_accumulation_steps=4,
	gradient_checkpointing=True
)

Method 4: Low-Rank Adaptation of LLMs

Large language models like ESM-2 can contain billions of parameters that are expensive to train and run. Researchers developed a training method called Low-Rank Adaptation (LoRA) to make fine-tuning these huge models more efficient.

The key idea behind LoRA is that when fine-tuning a model for a specific task, you don’t need to update all the original parameters. Instead, LoRA adds new smaller matrices to the model that transform the inputs and outputs. Only these smaller matrices are updated during fine-tuning, which is much faster and uses less memory. The original model parameters stay frozen.

After fine-tuning with LoRA, you can merge the small adapted matrices back into the original model. Or you can keep them separate if you want to quickly fine-tune the model for other tasks without forgetting previous ones. Overall, LoRA allows LLMs to be efficiently adapted to new tasks at a fraction of the usual cost.

In our training script, we configure LoRA using the PEFT library from Hugging Face:

from peft import get_peft_model, LoraConfig, TaskType
import torch
from transformers import EsmForSequenceClassification

model = EsmForSequenceClassification.from_pretrained(
	“facebook/esm2_t33_650M_UR50D”,
	Torch_dtype=torch.bfloat16,
	Num_labels=2,
)

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    bias="none",
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=[
        "query",
        "key",
        "value",
        "EsmSelfOutput.dense",
        "EsmIntermediate.dense",
        "EsmOutput.dense",
        "EsmContactPredictionHead.regression",
        "EsmClassificationHead.dense",
        "EsmClassificationHead.out_proj",
    ]
)

model = get_peft_model(model, peft_config)

Submit a SageMaker training job

After you have defined your training script, you can configure and submit a SageMaker training job. First, specify the hyperparameters:

hyperparameters = {
    "model_id": "facebook/esm2_t33_650M_UR50D",
    "epochs": 1,
    "per_device_train_batch_size": 8,
    "gradient_accumulation_steps": 4,
    "use_gradient_checkpointing": True,
    "lora": True,
}

Next, define what metrics to capture from the training logs:

metric_definitions = [
    {"Name": "epoch", "Regex": "'epoch': ([0-9.]*)"},
    {
        "Name": "max_gpu_mem",
        "Regex": "Max GPU memory use during training: ([0-9.e-]*) MB",
    },
    {"Name": "train_loss", "Regex": "'loss': ([0-9.e-]*)"},
    {
        "Name": "train_samples_per_second",
        "Regex": "'train_samples_per_second': ([0-9.e-]*)",
    },
    {"Name": "eval_loss", "Regex": "'eval_loss': ([0-9.e-]*)"},
    {"Name": "eval_accuracy", "Regex": "'eval_accuracy': ([0-9.e-]*)"},
]

Finally, define a Hugging Face estimator and submit it for training on an ml.g5.2xlarge instance type. This is a cost-effective instance type that is widely available in many AWS Regions:

from sagemaker.experiments.run import Run
from sagemaker.huggingface import HuggingFace
from sagemaker.inputs import TrainingInput

hf_estimator = HuggingFace(
    base_job_name="esm-2-membrane-ft",
    entry_point="lora-train.py",
    source_dir="scripts",
    instance_type="ml.g5.2xlarge",
    instance_count=1,
    transformers_version="4.28",
    pytorch_version="2.0",
    py_version="py310",
    output_path=f"{S3_PATH}/output",
    role=sagemaker_execution_role,
    hyperparameters=hyperparameters,
    metric_definitions=metric_definitions,
    checkpoint_local_path="/opt/ml/checkpoints",
    sagemaker_session=sagemaker_session,
    keep_alive_period_in_seconds=3600,
    tags=[{"Key": "project", "Value": "esm-fine-tuning"}],
)

with Run(
    experiment_name=EXPERIMENT_NAME,
    sagemaker_session=sagemaker_session,
) as run:
    hf_estimator.fit(
        {
            "train": TrainingInput(s3_data=train_s3_uri),
            "test": TrainingInput(s3_data=test_s3_uri),
        }
    )

The following table compares the different training methods we discussed and their effect on the runtime, accuracy, and GPU memory requirements of our job.

Configuration Billable Time (min) Evaluation Accuracy Max GPU Memory Usage (GB)
Base Model 28 0.91 22.6
Base + GA 21 0.90 17.8
Base + GC 29 0.91 10.2
Base + LoRA 23 0.90 18.6

All of the methods produced models with high evaluation accuracy. Using LoRA and gradient activation decreased the runtime (and cost) by 18% and 25%, respectively. Using gradient checkpointing decreased the maximum GPU memory usage by 55%. Depending on your constraints (cost, time, hardware), one of these approaches may make more sense than another.

Each of these methods perform well by themselves, but what happens when we use them in combination? The following table summarizes the results.

Configuration Billable Time (min) Evaluation Accuracy Max GPU Memory Usage (GB)
All methods 12 0.80 3.3

In this case, we see a 12% reduction in accuracy. However, we’ve reduced the runtime by 57% and GPU memory use by 85%! This is a massive decrease that allows us to train on a wide range of cost-effective instance types.

Clean up

If you’re following along in your own AWS account, delete the any real-time inference endpoints and data you created to avoid further charges.

predictor.delete_endpoint()

bucket = boto_session.resource("s3").Bucket(S3_BUCKET)
bucket.objects.filter(Prefix=S3_PREFIX).delete()

Conclusion

In this post, we demonstrated how to efficiently fine-tune protein language models like ESM-2 for a scientifically relevant task. For more information about using the Transformers and PEFT libraries to train pLMS, check out the posts Deep Learning With Proteins and ESMBind (ESMB): Low Rank Adaptation of ESM-2 for Protein Binding Site Prediction on the Hugging Face blog. You can also find more examples of using machine learning to predict protein properties in the Awesome Protein Analysis on AWS GitHub repository.


About the Author

Brian Loyal Brian Loyal is a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences team at Amazon Web Services. He has more than 17 years’ experience in biotechnology and machine learning, and is passionate about helping customers solve genomic and proteomic challenges. In his spare time, he enjoys cooking and eating with his friends and family.

Read More

Alida gains deeper understanding of customer feedback with Amazon Bedrock

Alida gains deeper understanding of customer feedback with Amazon Bedrock

This post is co-written with Sherwin Chu from Alida.

Alida helps the world’s biggest brands create highly engaged research communities to gather feedback that fuels better customer experiences and product innovation.

Alida’s customers receive tens of thousands of engaged responses for a single survey, therefore the Alida team opted to leverage machine learning (ML) to serve their customers at scale. However, when employing the use of traditional natural language processing (NLP) models, they found that these solutions struggled to fully understand the nuanced feedback found in open-ended survey responses. The models often only captured surface-level topics and sentiment, and missed crucial context that would allow for more accurate and meaningful insights.

In this post, we learn about how Anthropic’s Claude Instant model on Amazon Bedrock enabled the Alida team to quickly build a scalable service that more accurately determines the topic and sentiment within complex survey responses. The new service achieved a 4-6 times improvement in topic assertion by tightly clustering on several dozen key topics vs. hundreds of noisy NLP keywords.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies, such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Using Amazon Bedrock allowed Alida to bring their service to market faster than if they had used other machine learning (ML) providers or vendors.

The challenge

Surveys with a combination of multiple-choice and open-ended questions allow market researchers to get a more holistic view by capturing both quantitative and qualitative data points.

Multiple-choice questions are easy to analyze at scale, but lack nuance and depth. Set response options may also lead to biasing or priming participant responses.

Open-ended survey questions allow responders to provide context and unanticipated feedback. These qualitative data points deepen researchers’ understanding beyond what multiple-choice questions can capture alone. The challenge with the free-form text is that it can lead to complex and nuanced answers that are difficult for traditional NLP to fully understand. For example:

“I recently experienced some of life’s hardships and was really down and disappointed. When I went in, the staff were always very kind to me. It’s helped me get through some tough times!”

Traditional NLP methods will identify topics as “hardships,” “disappointed,” “kind staff,” and “get through tough times.” It can’t distinguish between the responder’s overall current negative life experiences and the specific positive store experiences.

Alida’s existing solution automatically process large volumes of open-ended responses, but they wanted their customers to gain better contextual comprehension and high-level topic inference.

Amazon Bedrock

Prior to the introduction of LLMs, the way forward for Alida to improve upon their existing single-model solution was to work closely with industry experts and develop, train, and refine new models specifically for each of the industry verticals that Alida’s customers operated in. This was both a time- and cost-intensive endeavor.

One of the breakthroughs that make LLMs so powerful is the use of attention mechanisms. LLMs use self-attention mechanisms that analyze the relationships between words in a given prompt. This allows LLMs to better handle the topic and sentiment in the earlier example and presents an exciting new technology that can be used to address the challenge.

With Amazon Bedrock, teams and individuals can immediately start using foundation models without having to worry about provisioning infrastructure or setting up and configuring ML frameworks. You can get started with the following steps:

  1. Verify that your user or role has permission to create or modify Amazon Bedrock resources. For details, see Identity-based policy examples for Amazon Bedrock
  2. Log in into the Amazon Bedrock console.
  3. On the Model access page, review the EULA and enable the FMs you’d like in your account.
  4. Start interacting with the FMs via the following methods:

Alida’s executive leadership team was eager to be an early adopter of the Amazon Bedrock because they recognized its ability to help their teams to bring new generative AI-powered solutions to market faster.

Vincy William, the Senior Director of Engineering at Alida who leads the team responsible for building the topic and sentiment analysis service, says,

“LLMs provide a big leap in qualitative analysis and do things (at a scale that is) humanly not possible to do. Amazon Bedrock is a game changer, it allows us to leverage LLMs without the complexity.”

The engineering team experienced the immediate ease of getting started with Amazon Bedrock. They could select from various foundation models and start focusing on prompt engineering instead of spending time on right-sizing, provisioning, deploying, and configuring resources to run the models.

Solution overview

Sherwin Chu, Alida’s Chief Architect, shared Alida’s microservices architecture approach. Alida built the topic and sentiment classification as a service with survey response analysis as its first application. With this approach, common LLM implementation challenges such as the complexity of managing prompts, token limits, request constraints, and retries are abstracted away, and the solution allows for consuming applications to have a simple and stable API to work with. This abstraction layer approach also enables the service owners to continually improve internal implementation details and minimize API-breaking changes. Finally, the service approach allows for a single point to implement any data governance and security policies that evolve as AI governance matures in the organization.

The following diagram illustrates the solution architecture and flow.

Alida microservice architecture

Alida evaluated LLMs from various providers, and found Anthropic’s Claude Instant to be the right balance between cost and performance. Working closely with the prompt engineering team, Chu advocated to implement a prompt chaining strategy as opposed to a single monolith prompt approach.

Prompt chaining enables you to do the following:

  • Break down your objective into smaller, logical steps
  • Build a prompt for each step
  • Provide the prompts sequentially to the LLM

This creates additional points of inspection, which has the following benefits:

  • It’s straightforward to systematically evaluate changes you make to the input prompt
  • You can implement more detailed tracking and monitoring of the accuracy and performance at each step

Key considerations with this strategy include the increase in the number of requests made to the LLM and the resulting increase in the overall time it takes to complete the objective. For Alida’s use case they chose to batching a collection of open-ended responses in a single prompt to the LLM is what they chose to offset these effects.

NLP vs. LLM

Alida’s existing NLP solution relies on clustering algorithms and statistical classification to analyze open-ended survey responses. When applied to sample feedback for a coffee shop’s mobile app, it extracted topics based on word patterns but lacked true comprehension. The following table includes some examples comparing NLP responses vs. LLM responses.

Survey Response Existing Traditional NLP Amazon Bedrock with Claude Instant
Topic Topic Sentiment
I almost exclusively order my drinks through the app bc of convenience and it’s less embarrassing to order super customized drinks lol. And I love earning rewards! [‘app bc convenience’, ‘drink’, ‘reward’] Mobile Ordering Convenience positive
The app works pretty good the only complaint I have is that I can’t add Any number of money that I want to my gift card. Why does it specifically have to be $10 to refill?! [‘complaint’, ‘app’, ‘gift card’, ‘number money’] Mobile Order Fulfillment Speed negative

The example results show how the existing solution was able to extract relevant keywords, but isn’t able to achieve a more generalized topic group assignment.

In contrast, using Amazon Bedrock and Anthropic Claude Instant, the LLM with in-context training is able to assign the responses to pre-defined topics and assign sentiment.

In additional to delivering better answers for Alida’s customers, for this particular use-case, pursuing a solution using an LLM over traditional NLP methods saved a vast amount of time and effort in training and maintaining a suitable model. The following table compares training a traditional NLP model vs. in-context training of an LLM.

. Data Requirement Training Process Model Adaptability
Training a traditional NLP model Thousands of human-labeled examples

Combination of automated and manual feature engineering.

Iterative train and evaluate cycles.

Slower turnaround due to the need to retrain model
In-context training of LLM Several examples

Trained on the fly within the prompt.

Limited by context window size.

Faster iterations by modifying the prompt.

Limited retention due to context window size.

Conclusion

Alida’s use of Anthropic’s Claude Instant model on Amazon Bedrock demonstrates the powerful capabilities of LLMs for analyzing open-ended survey responses. Alida was able to build a superior service that was 4-6 times more precise at topic analysis when compared to their NLP-powered service. Additionally, using in-context prompt engineering for LLMs significantly reduced development time, because they didn’t need to curate thousands of human-labeled data points to train a traditional NLP model. This ultimately allows Alida to give their customers richer insights sooner!

If you’re ready to start building your own foundation model innovation with Amazon Bedrock, checkout this link to Set up Amazon Bedrock. If you interested in reading about other intriguing Amazon Bedrock applications, see the Amazon Bedrock specific section of the AWS Machine Learning Blog.


About the authors

Kinman Lam is an ISV/DNB Solution Architect for AWS. He has 17 years of experience in building and growing technology companies in the smartphone, geolocation, IoT, and open source software space. At AWS, he uses his experience to help companies build robust infrastructure to meet the increasing demands of growing businesses, launch new products and services, enter new markets, and delight their customers.

Sherwin ChuSherwin Chu is the Chief Architect at Alida, helping product teams with architectural direction, technology choice, and complex problem-solving. He is an experienced software engineer, architect, and leader with over 20 years in the SaaS space for various industries. He has built and managed numerous B2B and B2C systems on AWS and GCP.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML and generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, AWS’ flagship generative AI offering for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Read More

Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

Unlocking Innovation: AWS and Anthropic push the boundaries of generative AI together

Amazon Bedrock is the best place to build and scale generative AI applications with large language models (LLM) and other foundation models (FMs). It enables customers to leverage a variety of high-performing FMs, such as the Claude family of models by Anthropic, to build custom generative AI applications. Looking back to 2021, when Anthropic first started building on AWS, no one could have envisioned how transformative the Claude family of models would be. We have been making state-of-the-art generative AI models accessible and usable for businesses of all sizes through Amazon Bedrock. In just a few short months since Amazon Bedrock became generally available on September 28, 2023, more than 10K customers have been using it to deliver, and many of them are using Claude. Customers such as ADP, Broadridge, Cloudera, Dana-Farber Cancer Institute, Genesys, Genomics England, GoDaddy, Intuit, M1 Finance, Perplexity AI, Proto Hologram, Rocket Companies and more are using Anthropic’s Claude models on Amazon Bedrock to drive innovation in generative AI and to build transformative customer experiences. And today, we are announcing an exciting milestone with the next generation of Claude coming to Amazon Bedrock: Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku.

Introducing Anthropic’s Claude 3 models

Anthropic is unveiling its next generation of Claude with three advanced models optimized for different use cases. Haiku is the fastest and most cost-effective model on the market. It is a fast compact model for near-instant responsiveness. For the vast majority of workloads, Sonnet is 2x faster than Claude 2 and Claude 2.1 with higher levels of intelligence. It excels at intelligent tasks demanding rapid responses, like knowledge retrieval or sales automation. And it strikes the ideal balance between intelligence and speed – qualities especially critical for enterprise use cases. Opus is the most advanced, capable, state-of-the-art FM with deep reasoning, advanced math, and coding abilities, with top-level performance on highly complex tasks. It can navigate open-ended prompts, and novel scenarios with remarkable fluency, including task automation, hypothesis generation, and analysis of charts, graphs, and forecasts. And Sonnet is first available on Amazon Bedrock today. Current evaluations from Anthropic suggest that the Claude 3 model family outperforms comparable models in math word problem solving (MATH) and multilingual math (MGSM) benchmarks, critical benchmarks used today for LLMs.

  1. Vision capabilities – Claude 3 models have been trained to understand structured and unstructured data across different formats, not just language, but also images, charts, diagrams, and more. This lets businesses build generative AI applications integrating diverse multimedia sources and solving truly cross-domain problems. For instance, pharmaceutical companies can query drug research papers alongside protein structure diagrams to accelerate discovery. Media organizations can generate image captions or video scripts automatically.
  2. Best-in-class benchmarks – Claude 3 exceeds existing models on standardized evaluations such as math problems, programming exercises, and scientific reasoning. Customers can optimize domain specific experimental procedures in manufacturing, or audit financial reports based on contextual data, in an automated way and with high accuracy using AI-driven responses.

    Specifically, Opus outperforms its peers on most of the common evaluation benchmarks for AI systems, including undergraduate level expert knowledge (MMLU), graduate level expert reasoning (GPQA), basic mathematics (GSM8K), and more. It exhibits high levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.

  3. Reduced hallucination – Businesses require predictive, controllable outputs from AI systems directing automated processes or customer interactions. Claude 3 models mitigate hallucination through constitutional AI techniques that provide transparency into the model’s reasoning, as well as improve accuracy. Claude 3 Opus shows an estimated 2x gain in accuracy over Claude 2.1 on difficult open-ended questions, reducing the likelihood of faulty responses. As enterprise customers rely on Claude across industries like healthcare, finance, and legal research, reducing hallucinations is essential for safety and performance. The Claude 3 family sets a new standard for reliable generative AI output.

Benefits of Anthropic Claude 3 FMs on Amazon Bedrock

Through Amazon Bedrock, customers will get easy access to build with Anthropic’s newest models. This includes not only natural language models but also their expanded range of multimodal AI models capable of advanced reasoning across text, images, charts, and more. Our collaboration has already helped customers accelerate generative AI adoption and delivered business value to them. Here are a few ways customers have been using Anthropic’s Claude models on Amazon Bedrock:

“We are developing a generative AI solution on AWS to help customers plan epic trips and create life-changing experiences with personalized travel itineraries. By building with Claude on Amazon Bedrock, we reduced itinerary generation costs by nearly 80% percent when we quickly created a scalable, secure AI platform that can organize our book content in minutes to deliver cohesive, highly accurate travel recommendations. Now we can repackage and personalize our content in various ways on our digital platforms, based on customer preference, all while highlighting trusted local voices–just like Lonely Planet has done for 50 years.”

— Chris Whyde, Senior VP of Engineering and Data Science, Lonely Planet

“We are working with AWS and Anthropic to host our custom, fine-tuned Anthropic Claude model on Amazon Bedrock to support our strategy of rapidly delivering generative AI solutions at scale and with cutting-edge encryption, data privacy, and safe AI technology embedded in everything we do. Our new Lexis+ AI platform technology features conversational search, insightful summarization, and intelligent legal drafting capabilities, which enable lawyers to increase their efficiency, effectiveness, and productivity.”

— Jeff Reihl, Executive VP and CTO, LexisNexis Legal & Professional

“At Broadridge, we have been working to automate the understanding of regulatory reporting requirements to create greater transparency and increase efficiency for our customers operating in domestic and global financial markets. With use of Claude on Amazon Bedrock, we’re thrilled to get even higher accuracy in our experiments with processing and summarizing capabilities. With Amazon Bedrock, we have choice in our use of LLMs, and we value the performance and integration capabilities it offers.”

— Saumin Patel, VP Engineering generative AI, Broadridge

The Claude 3 model family caters to various needs, allowing customers to choose the model best suited for their specific use case, which is key to developing a successful prototype and later production systems that can deliver real impact—whether for a new product, feature or process that boosts the bottom line. Keeping customer needs top of mind, Anthropic and AWS are delivering where it matters most to organizations of all sizes:

  1. Improved performance – Claude 3 models are significantly faster for real-time interactions thanks to optimizations across hardware and software.
  2. Increased accuracy and reliability – Through massive scaling as well as new self-supervision techniques, expected gains of 2x in accuracy for complex questions over long contexts mean AI that’s even more helpful, safe, and honest.
  3. Simpler and secure customization – Customization capabilities, like retrieval-augmented generation (RAG), simplify training models on proprietary data and building applications backed by diverse data sources, so customers get AI tuned for their unique needs. In addition, proprietary data is never exposed to the public internet, never leaves the AWS network, is securely transferred through VPC, and is encrypted in transit and at rest.

And AWS and Anthropic are continuously reaffirming our commitment to advancing generative AI in a responsible manner. By constantly improving model capabilities committing to frameworks like Constitutional AI or the White House voluntary commitments on AI, we can accelerate the safe, ethical development and deployment of this transformative technology.

The future of generative AI

Looking ahead, customers will build entirely new categories of generative AI-powered applications and experiences with the latest generation of models. We’ve only begun to tap generative AI’s potential to automate complex processes, augment human expertise, and reshape digital experiences. We expect to see unprecedented levels of innovation as customers choose Anthropic’s models augmented with multimodal skills leveraging all the tools they need to build and scale generative AI applications on Amazon Bedrock. Imagine sophisticated conversational assistants providing fast and highly-contextual responses, picture personalized recommendation engines that seamlessly blend in relevant images, diagrams and associated knowledge to intuitively guide decisions. Envision scientific research turbocharged by generative AI able to read experiments, synthesize hypotheses, and even propose novel areas for exploration. There are so many possibilities that will be realized by taking full advantage of all generative AI has to offer through Amazon Bedrock. Our collaboration ensures enterprises and innovators worldwide will have the tools to reach the next frontier of generative AI-powered innovation responsibly, and for the benefit of all.

Conclusion

It’s still early days for generative AI, but strong collaboration and a focus on innovation are ushering in a new era of generative AI on AWS. We can’t wait to see what customers build next.

Resources

Check out the following resources to learn more about this announcement:


About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

Knowledge Bases for Amazon Bedrock now supports hybrid search

Knowledge Bases for Amazon Bedrock now supports hybrid search

At AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for fully managed Retrieval Augmented Generation (RAG).

In a previous post, we described how Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you and shared details about some of the recent feature launches.

For RAG-based applications, the accuracy of the generated response from large language models (LLMs) is dependent on the context provided to the model. Context is retrieved from the vector database based on the user query. Semantic search is widely used because it is able to understand more human-like questions—a user’s query is not always directly related to the exact keywords in the content that answers it. Semantic search helps provide answers based on the meaning of the text. However, it has limitations in capturing all the relevant keywords. Its performance relies on the quality of the word embeddings used to represent meaning of the text. To overcome such limitations, combining semantic search with keyword search (hybrid) will give better results.

In this post, we discuss the new feature of hybrid search, which you can select as a query option alongside semantic search.

Hybrid search overview

Hybrid search takes advantage of the strengths of multiple search algorithms, integrating their unique capabilities to enhance the relevance of returned search results. For RAG-based applications, semantic search capabilities are commonly combined with traditional keyword-based search to improve the relevance of search results. It enables searching over both the content of documents and their underlying meaning. For example, consider the following query:

What is the cost of the book "<book_name>" on <website_name>?

In this query for a book name and website name, a keyword search will give better results, because we want the cost of the specific book. However, the term “cost” might have synonyms such as “price,” so it will be better to use semantic search, which understands the meaning of the text. Hybrid search brings the best of both approaches: precision of semantic search and coverage of keywords. It works great for RAG-based applications where the retriever has to handle a wide variety of natural language queries. The keywords help cover specific entities in the query such as product name, color, and price, while semantics better understands the meaning and intent within the query. For example, if you have want to build a chatbot for an ecommerce website to handle customer queries such as the return policy or details of the product, using hybrid search will be most suitable.

Use cases for hybrid search

The following are some common use cases for hybrid search:

  • Open domain question answering – This involves answering questions on a wide variety of topics. This requires searching over large collections of documents with diverse content, such as website data, which can include various topics such as sustainability, leadership, financial results, and more. Semantic search alone can’t generalize well for this task, because it lacks the capacity for lexical matching of unseen entities, which is important for handling out-of-domain examples. Therefore, combining keyword-based search with semantic search can help narrow down the scope and provide better results for open domain question answering.
  • Contextual-based chatbots – Conversations can rapidly change direction and cover unpredictable topics. Hybrid search can better handle such open-ended dialogs.
  • Personalized search – Web-scale search over heterogeneous content benefits from a hybrid approach. Semantic search handles popular head queries, while keywords cover rare long-tail queries.

Although hybrid search offers wider coverage by combining two approaches, semantic search has precision advantages when the domain is narrow and semantics are well-defined, or when there is little room for misinterpretation, like factoid question answering systems.

Benefits of hybrid search

Both keyword and semantic search will return a separate set of results along with their relevancy scores, which are then combined to return the most relevant results. Knowledge Bases for Amazon Bedrock currently supports four vector stores: Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL-Compatible Edition, Pinecone, and Redis Enterprise Cloud. As of this writing, the hybrid search feature is available for OpenSearch Serverless, with support for other vector stores coming soon.

The following are some of the benefits of using hybrid search:

  • Improved accuracy – The accuracy of the generated response from the FM is directly dependent on the relevancy of retrieved results. Based on your data, it can be challenging to improve the accuracy of your application only using semantic search. The key benefit of using hybrid search is to get improved quality of retrieved results, which in turn helps the FM generate more accurate answers.
  • Expanded search capabilities – Keyword search casts a wider net and finds documents that may be relevant but might not contain semantic structure throughout the document. It allows you to search on keywords as well as the semantic meaning of the text, thereby expanding the search capabilities.

In the following sections, we demonstrate how to use hybrid search with Knowledge Bases for Amazon Bedrock.

Use hybrid search and semantic search options via SDK

When you call the Retrieve API, Knowledge Bases for Amazon Bedrock selects the right search strategy for you to give you most relevant results. You have the option to override it to use either hybrid or semantic search in the API.

Retrieve API

The Retrieve API is designed to fetch relevant search results by providing the user query, knowledge base ID, and number of results that you want the API to return. This API converts user queries into embeddings, searches the knowledge base using either hybrid search or semantic (vector) search, and returns the relevant results, giving you more control to build custom workflows on top of the search results. For example, you can add postprocessing logic to the retrieved results or add your own prompt and connect with any FM provided by Amazon Bedrock for generating answers.

To show you an example of switching between hybrid and semantic (vector) search options, we have created a knowledge base using the Amazon 10K document for 2023. For more details on creating a knowledge base, refer to Build a contextual chatbot application using Knowledge Bases for Amazon Bedrock.

To demonstrate the value of hybrid search, we use the following query:

As of December 31st 2023, what is the leased square footage for physical stores in North America?

The answer for the preceding query involves a few keywords, such as the date, physical stores, and North America. The correct response is 22,871 thousand square feet. Let’s observe the difference in the search results for both hybrid and semantic search.

The following code shows how to use hybrid or semantic (vector) search using the Retrieve API with Boto3:

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

def retrieve(query, kbId, numberOfResults=5):
    return bedrock_agent_runtime.retrieve(
        retrievalQuery= {
            'text': query
        },
        knowledgeBaseId=kbId,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': numberOfResults,
                'overrideSearchType': "HYBRID/SEMANTIC", # optional
            }
        }
    )
response = retrieve("As of December 31st 2023, what is the leased square footage for physical stores in North America?", "<knowledge base id>")["retrievalResults"]

The overrideSearchType option in retrievalConfiguration offers the choice to use either HYBRID or SEMANTIC. By default, it will select the right strategy for you to give you most relevant results, and if you want to override the default option to use either hybrid or semantic search, you can set the value to HYBRID/SEMANTIC. The output of the Retrieve API includes the retrieved text chunks, the location type and URI of the source data, and the relevancy scores of the retrievals. The scores help determine which chunks best match the response of the query.

The following are the results for the preceding query using hybrid search (with some of the output redacted for brevity):

[
  {
    "content": {
      "text": "... Description of Use Leased Square Footage (1).... Physical stores (2) 22,871  ..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "Property and equipment, net by segment is as follows (in millions): December 31, 2021 2022 2023 North America $ 83,640 $ 90,076 $ 93,632 International 21,718 23,347 24,357 AWS 43,245 60,324 72,701 Corporate 1.."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "..amortization of property and equipment acquired under finance leases of $9.9 billion, $6.1 billion, and $5.9 billion for 2021, 2022, and 2023. 54 Table of Contents Note 4 — LEASES We have entered into non-cancellable operating and finance leases for fulfillment network, data center, office, and physical store facilities as well as server and networking equipment, aircraft, and vehicles. Gross assets acquired under finance leases, ..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61908984
  }
]

The following are the results for semantic search (with some of the output redacted for brevity):

[
  {
    "content": {
      "text": "Property and equipment, net by segment is as follows (in millions):    December 31,    2021 2022 2023   North America $ 83,640 $ 90,076 $ 93,632  International 21,718 23,347 24,357  AWS 43,245 60,324 72,701.."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.6389407
  },
  {
    "content": {
      "text": "Depreciation and amortization expense on property and equipment was $22.9 billion, $24.9 billion, and $30.2 billion which includes amortization of property and equipment acquired under finance leases of $9.9 billion, $6.1 billion, and $5.9 billion for 2021, 2022, and 2023.   54        Table of Contents   Note 4 — LEASES We have entered into non-cancellable operating and finance leases for fulfillment network, data center, office, and physical store facilities as well a..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61908984
  },
  {
    "content": {
      "text": "Incentives that we receive from property and equipment   vendors are recorded as a reduction to our costs. Property includes buildings and land that we own, along with property we have acquired under build-to-suit lease arrangements when we have control over the building during the construction period and finance lease arrangements..."
    },
    "location": {
      "type": "S3",
      "s3Location": {
        "uri": "s3://<bucket_name>/amazon-10k-2023.pdf"
      }
    },
    "score": 0.61353767
  }
]

As you can see in the results, hybrid search was able to retrieve the search result with the leased square footage for physical stores in North America as mentioned in the user query. The main reason was that hybrid search was able to combine the results from keywords such as date, physical stores, and North America in the query, whereas semantic search did not. Therefore, when the search results are augmented with the user query and the prompt, the FM won’t be able to provide the correct response in case of semantic search.

Now let’s look at the RetrieveAndGenerate API with hybrid search to understand the final response generated by the FM.

RetrieveAndGenerate API

The RetrieveAndGenerate API queries a knowledge base and generates a response based on the retrieved results. You specify the knowledge base ID as well as the FM to generate a response from the results. Amazon Bedrock converts the queries into embeddings, queries the knowledge base based on the search type, and then augments the FM prompt with the search results as context information and returns the FM-generated response.

Let’s use the query “As of December 31st 2023, what is the leased square footage for physical stores in North America?” and ask the RetrieveAndGenerate API to generate the response using our query:

def retrieveAndGenerate(input, kbId):
    return bedrock_agent_runtime.retrieve_and_generate(
        input={
            'text': input
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kbId,
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-instant-v1'
                'retrievalConfiguration': {
                'overrideSearchType': 'HYBRID/SEMANTIC',
                }
                }
            }
        )
response = retrieveAndGenerate("As of December 31st 2023, what is the leased square footage for physical stores in North America?", "<knowledge base id>")["output"]["text"]

The following are the results using hybrid search:

22,871 thousand leased square feet

The following are the results using semantic search:

The search results do not contain any information about the leased square footage for physical stores in North America for 2023.

The actual answer for the query is 22,871 thousand leased square feet, which is generated by the hybrid search. The retrieved search results for hybrid search included the information about the leased square footage for physical stores in North America, whereas semantic search wasn’t able to fetch the right information from the vector store due to embeddings translation. Therefore, the FM couldn’t provide the correct response because it didn’t have the correct and most relevant search results.

However, for more generic questions that don’t involve entities such as physical stores or North America, both hybrid and semantic search give similar results.

The following are sample responses from a few queries demonstrating cases when both hybrid and semantic search yield similar results.

Question Semantic Search: RAG API Hybrid Search: RAG API
How does Amazon serve the developers and enterprises? We serve developers and enterprises of all sizes, including start-ups, government agencies, and academic institutions, through AWS, which offers a broad set of on-demand technology services, including compute, storage, database, analytics, and machine learning, and other services We serve developers and enterprises of all sizes, including start-ups, government agencies, and academic institutions, through AWS, which offers a broad set of on-demand technology services, including compute, storage, database, analytics, and machine learning, and other services.
Who are the Executive Officers and Directors for Amazon as of January 24, 2024? The executive officers of Amazon as of 2024 include Andrew R. Jassy as President and Chief Executive Officer, Douglas J. Herrington as CEO Worldwide Amazon Stores, Brian T. Olsavsky as Senior Vice President and Chief Financial Officer, Shelley L. Reynolds as Vice President, Worldwide Controller, and Principal Accounting Officer, Adam N. Selipsky as CEO Amazon Web Services, and David A. Zapolsky as Senior Vice President, Global Public Policy and General Counsel. As of 2024, Jeffrey P. Bezos serves as Executive Chair of Amazon.com. Andrew R. Jassy serves as President and Chief Executive Officer. Other executive officers include Douglas J. Herrington as CEO Worldwide Amazon Stores, Brian T. Olsavsky as Senior Vice President and Chief Financial Officer, Shelley L. Reynolds as Vice President, Worldwide Controller, and Principal Accounting Officer, and Adam N. Selipsky as CEO Amazon Web Services. David A. Zapolsky serves as Senior Vice President, Global Public Policy and General Counsel

Use hybrid search and semantic search options via the Amazon Bedrock console

To use hybrid and semantic search options on the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
  2. Choose the knowledge base you created.
  3. Choose Test knowledge base.
  4. Choose the configurations icon.
  5. For Search type¸ select Hybrid search (semantic & text).

By default, you can choose an FM to get a generated response for your query. If you want to see only the retrieved results, you can toggle Generate response off to get only retrieved results.

Conclusion

In this post, we covered the new query feature in Knowledge Bases for Amazon Bedrock, which enables hybrid search. We learned how to configure the hybrid search option in the SDK and the Amazon Bedrock console. This helps overcome some of the limitations of relying solely on semantic search, especially for searching over large collections of documents with diverse content. The use of hybrid search depends on the document type and the use case that you are trying to implement.

For additional resources, refer to the following:

References

Improving Retrieval Performance in RAG Pipelines with Hybrid Search


About the Authors

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Read More

Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

Expedite your Genesys Cloud Amazon Lex bot design with the Amazon Lex automated chatbot designer

The rise of artificial intelligence (AI) has created opportunities to improve the customer experience in the contact center space. Machine learning (ML) technologies continually improve and power the contact center customer experience by providing solutions for capabilities like self-service bots, live call analytics, and post-call analytics. Self-service bots integrated with your call center can help you achieve decreased wait times, intelligent routing, decreased time to resolution through self-service functions or data collection, and improved net promoter scores (NPS). Some examples include a customer calling to check on the status of an order and receiving an update from a bot, or a customer needing to submit a renewal for a license and the chatbot collecting the necessary information, which it hands over to an agent for processing.

With Amazon Lex bots, you can use conversational AI capabilities to enable these capabilities within your call center. Amazon Lex uses automatic speech recognition (ASR) and natural language understanding (NLU) to understand the customer’s needs and assist them on their journey.

Genesys Cloud (an omni-channel orchestration and customer relationship platform) provides a contact center platform in a public cloud model that enables quick and simple integration of AWS Contact Center Intelligence (AWS CCI) to transform the modern contact center from a cost center into a profit center. As part of AWS CCI, Genesys Cloud integrates with Amazon Lex, which enables self-service, intelligent routing, and data collection capabilities.

When exploring AWS CCI capabilities with Amazon Lex and Genesys Cloud, you may be unsure of where to start on your bot design journey. To assist those who may be starting with a blank canvas, Amazon Lex provides the Amazon Lex automated chatbot designer. The automated chatbot designer uses ML to provide an initial bot design that you can then refine and launch conversational experiences faster based on your current call transcripts. With the automated chatbot designer, Amazon Lex customers and partners have a straightforward and intuitive way of designing chatbots and can reduce bot design time from weeks to hours. However, the automated chatbot designer requires transcripts to be in a certain format that is not aligned to Genesys Cloud transcript exports.

In this post, we show how you can implement an architecture using Amazon EventBridge, Amazon Simple Storage Service (Amazon S3), and AWS Lambda to automatically collect, transform, and load your Genesys call transcripts in the required format for the Amazon Lex automated chatbot designer. You can then run the automated chatbot designer on your transcripts, be given recommendations for bot design, and streamline your bot design journey.

Solution overview

The following diagram illustrates the solution architecture.

The solution workflow consists of the following steps:

  1. Genesys Cloud sends iterative transcripts events to your EventBridge event bus.
  2. Lambda receives the iterative transcripts from EventBridge, determines when a conversation is complete, and invokes the Transcript API within Genesys Cloud and drops the full transcript in an S3 bucket.
  3. When a new full transcript is uploaded to Amazon S3, Lambda converts the Genesys Cloud formatted transcript into the required format for the Amazon Lex automated chatbot designer and copies it to an S3 bucket.
  4. The Amazon Lex automated chatbot designer uses ML to build an initial bot design based on the provided Genesys Cloud transcripts.

Prerequisites

Before you deploy the solution, you must complete the following prerequisites:

  1. Set up your Genesys Cloud CX account and make sure that you are able to log in. For more information on setting up your account, refer to the Genesys documentation.
  2. Make sure that the right permissions are set for enabling and publishing transcripts from Genesys. For more information on setting up the required permissions, refer to Roles and permissions overview.
  3. If PCI and PII encryption is required for transcription, make sure it is set up in Genesys. For more information on setting up the required permissions, refer to Are interaction transcripts encrypted when stored in the cloud.
  4. Set up an AWS account with the appropriate permissions.

Deploy the Genesys EventBridge integration

To enable the EventBridge integration with Genesys Cloud, complete the following steps:

  1. Log in to the Genesys Cloud environment.
  2. Choose Admin, Integrations, Add Integrations, and Amazon EventBridge Source.
  3. On the Configuration tab, provide the following information:
    1. For AWS Account ID, enter your AWS account ID.
    2. For AWS Account Region, enter the Region where you want EventBridge to be set up.
    3. For Event Source Suffix, enter a suffix (for example, genesys-eb-poc-demo).
  4. Save your configuration.
  5. On the EventBridge console, choose Integration in the navigation pane, then choose Partner event sources.

There should be an event source listed with a name like aws.partner/genesys.com/…/genesys-eb-poc-demo.

  1. Select the partner event source and choose Associate with event bus.

The status changes from Pending to Active. This sets up the EventBridge configuration for Genesys.

Next, you set up OAuth2 credentials in Genesys Cloud for authorizing the API call to get the final transcript.

  1. Navigate to the Genesys Cloud instance.
  2. Choose Admin, Integrations, and OAuth.
  3. Choose Add Client.
  4. On the Client Details tab, provide the following information:
    1. For App Name, enter a name (for example, TranscriptInvoke-creds).
    2. For Grant Types, select Client Credentials.

Make sure you’re using the right role that has access to invoke the Transcribe APIs.

  1. Choose Save.

This generates new values for Client ID and Client Secret. Copy these values to use in the next section, where you configure the template for the solution.

Deploy the solution

After you have set up the Genesys EventBridge integration, you can deploy an AWS Serverless Application Model (AWS SAM) template, which deploys the remainder of the architecture. To deploy the solution in your account, complete the following steps:

  1. Install AWS SAM if not installed already. For instructions, refer to Installing the AWS SAM CLI.
  2. Download the GitHub repo and unzip to your directory.
  3. Navigate to the genesys-to-lex-automated-chatbot-designer folder and run the following commands:
    sam build --use-container
    sam deploy –guided

The first command builds the source of your application. The second command packages and deploys your application to AWS, with a series of prompts:

  • Stack Name – Enter the name of the stack to deploy to AWS CloudFormation. This should be unique to your account and Region; a good starting point is something matching your project name.
  • AWS Region – Enter the Region you want to deploy your app to. Make sure it is deployed in the same Region as the EventBridge event bus.
  • Parameter GenesysBusname – Enter the bus name created when you configured the Genesys integration. The pattern of the bus name should look like aws.partner/genesys.com/*.
  • Parameter ClientId – Enter the client ID you copied earlier.
  • Parameter ClientSecret – Enter the client secret you copied earlier.
  • Parameter FileNamePrefix – Change the default file name prefix for the target transcript file in the raw S3 bucket or keep the default.
  • Parameter GenCloudEnv – Enter is the cloud environment for the specific Genesys organization. Genesys is available in more than 15 Regions worldwide as of this writing, so this value is mandatory and should point to the environment where your organization is created in Genesys (for example, usw2.pure.cloud).
  • Confirm changes before deploy – If set to yes, any change sets will be shown to you before deployment for manual review. If set to no, the AWS SAM CLI will automatically deploy application changes.
  • Allow SAM CLI IAM role creation – Many AWS SAM templates, including this example, create AWS Identity and Access Management (IAM) roles required for the Lambda functions included to access AWS services. By default, these are scoped down to the minimum required permissions. To deploy a CloudFormation stack that creates or modifies IAM roles, you must provide the CAPABILITY_IAM value for capabilities. If permission isn’t provided through this prompt, to deploy this example, you must explicitly pass --capabilities CAPABILITY_IAM to the sam deploy command.
  • Save arguments to samconfig.toml – If set to yes, your choices will be saved to a configuration file inside the project, so that in the future you can rerun sam deploy without parameters to deploy changes to your application.

After you deploy your AWS SAM application in your account, you can test that Genesys transcripts are being sent to your account and being transformed into the required format for the Amazon Lex automated chatbot designer.

Make a test call to validate the solution

After you have set up the Genesys EventBridge integration and deployed the preceding AWS SAM template, you can make test calls and validate that files are ending up in the S3 bucket for transformed files. At a high level, you need to perform the following steps:

  1. Make a test call to your Genesys instance to create a transcript.
  2. Wait a few minutes and check the TransformedTranscript bucket for the output.

Run the automated chatbot designer

After you have a few days’ worth of transcripts saved in Amazon S3, you can run the automated chatbot designer through the Amazon Lex console using the steps in this section. For more information about the minimum and maximum amount of turns for the service, refer to Prepare transcripts.

  1. On the Amazon Lex V2 console, choose Bots in the navigation pane.
  2. Choose Create bot.
  3. Select Start with transcripts as the creation method.
  4. Give the bot a name (for this example, InsuranceBot) and provide an optional description.
  5. Select Create a role with basic Amazon Lex permissions and use this as your runtime role.
  6. After you fill out the other fields, choose Next to proceed to the language configuration.
  7. Choose the language and voice for your interaction.
  8. Specify the Amazon S3 location of the transcripts that the solution has converted for you.
  9. Add additional local paths if you have a specific a folder structure within your S3 bucket.
  10. Apply a filter (date range) for your input transcripts.
  11. Choose Done.

You can use the status bar on the Amazon S3 console to track the analysis. Within a few hours, the automated chatbot designer surfaces a chatbot design that includes user intents, sample phrases associated with those intents, and a list of all the information required to fulfill them. The amount of time it takes to complete training depends on several factors, including the volume of transcripts and the complexity of the conversations. Typically, 600 lines of transcript are analyzed every minute.

  1. Choose Review to view the intents and slot types discovered by the automated chatbot designer.

The Intents tab lists all the intents along with sample phrases and slots, and the Slot types tab provides a list of all the slot types along with slot type values.

  1. Choose any of the intents to review the sample utterances and slots. For example, in the following screenshot, we choose ChangePassword to view the utterances.
  2. Choose the Associated transcripts tab to review the conversations used to identify the intents.
  3. After you review the results, select the intents and slot types relevant to your use case and choose Add.

This adds the selected intents and slot types to the bot. You can now iterate on this design by making changes such as adding prompts, merging intents or slot types, and renaming slots.

You have now used the Amazon Lex automated chatbot designer to identify common intents, utterances mapped to those intents, and information that the chatbot needs to collect to fulfill certain business functions.

Clean up

When you’re finished, clean up your resources by using the following command within the AWS SAM CLI:

sam delete

Conclusion

This post showed you how to use the Genesys Cloud CX and EventBridge integration to send your Genesys CX transcripts to your AWS account, transform them, and use them with the Amazon Lex automated chatbot designer to create sample bots, intents, utterances, and slots. This architecture can help first-time AWS CCI users and current AWS CCI users onboard more chatbots using the Genesys CX and Amazon Lex integration, or in continuous improvement opportunities where you may want to compare your current intent design to that outputted by the Amazon Lex automated chatbot designer. For more information about other AWS CCI capabilities, see Contact Center Intelligence.


About the Authors

Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), helping Enterprise customers across the Midwest US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. In his free time, he enjoys spending quality time with his family exploring new places and over analyzing his sports team’s performance.

Anand Bose is a Senior Solutions Architect at Amazon Web Services, supporting ISV partners who build business applications on AWS. He is passionate about creating differentiated solutions that unlock customers for cloud adoption. Anand lives in Dallas, Texas and enjoys travelling.

Teri Ferris is responsible for architecting great customer experiences alongside business partners, leveraging Genesys technology solutions that enable Experience Orchestration for contact centers. In her role she advises on solution architecture, integrations, IVR, routing, reporting analytics, self-service, AI, outbound, mobile capabilities, omnichannel, social channels, digital, unified communications (UCaaS), and analytics and how they can streamline the customer experience. Before Genesys, she held senior leadership roles at Human Resources, Payroll, and Learning Management companies, including overseeing the Contact Center.

Read More