Transform one-on-one customer interactions: Build speech-capable order processing agents with AWS and generative AI

Transform one-on-one customer interactions: Build speech-capable order processing agents with AWS and generative AI

In today’s landscape of one-on-one customer interactions for placing orders, the prevailing practice continues to rely on human attendants, even in settings like drive-thru coffee shops and fast-food establishments. This traditional approach poses several challenges: it heavily depends on manual processes, struggles to efficiently scale with increasing customer demands, introduces the potential for human errors, and operates within specific hours of availability. Additionally, in competitive markets, businesses adhering solely to manual processes might find it challenging to deliver efficient and competitive service. Despite technological advancements, the human-centric model remains deeply ingrained in order processing, leading to these limitations.

The prospect of utilizing technology for one-on-one order processing assistance has been available for some time. However, existing solutions can often fall into two categories: rule-based systems that demand substantial time and effort for setup and upkeep, or rigid systems that lack the flexibility required for human-like interactions with customers. As a result, businesses and organizations face challenges in swiftly and efficiently implementing such solutions. Fortunately, with the advent of generative AI and large language models (LLMs), it’s now possible to create automated systems that can handle natural language efficiently, and with an accelerated on-ramping timeline.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. In addition to Amazon Bedrock, you can use other AWS services like Amazon SageMaker JumpStart and Amazon Lex to create fully automated and easily adaptable generative AI order processing agents.

In this post, we show you how to build a speech-capable order processing agent using Amazon Lex, Amazon Bedrock, and AWS Lambda.

Solution overview

The following diagram illustrates our solution architecture.

The workflow consists of the following steps:

  1. A customer places the order using Amazon Lex.
  2. The Amazon Lex bot interprets the customer’s intents and triggers a DialogCodeHook.
  3. A Lambda function pulls the appropriate prompt template from the Lambda layer and formats model prompts by adding the customer input in the associated prompt template.
  4. The RequestValidation prompt verifies the order with the menu item and lets the customer know via Amazon Lex if there’s something they want to order that isn’t part of the menu and will provide recommendations. The prompt also performs a preliminary validation for order completeness.
  5. The ObjectCreator prompt converts the natural language requests into a data structure (JSON format).
  6. The customer validator Lambda function verifies the required attributes for the order and confirms if all necessary information is present to process the order.
  7. A customer Lambda function takes the data structure as an input for processing the order and passes the order total back to the orchestrating Lambda function.
  8. The orchestrating Lambda function calls the Amazon Bedrock LLM endpoint to generate a final order summary including the order total from the customer database system (for example, Amazon DynamoDB).
  9. The order summary is communicated back to the customer via Amazon Lex. After the customer confirms the order, the order will be processed.

Prerequisites

This post assumes that you have an active AWS account and familiarity with the following concepts and services:

Also, in order to access Amazon Bedrock from the Lambda functions, you need to make sure the Lambda runtime has the following libraries:

  • boto3>=1.28.57
  • awscli>=1.29.57
  • botocore>=1.31.57

This can be done with a Lambda layer or by using a specific AMI with the required libraries.

Furthermore, these libraries are required when calling the Amazon Bedrock API from Amazon SageMaker Studio. This can be done by running a cell with the following code:

%pip install --no-build-isolation --force-reinstall 
"boto3>=1.28.57" 
"awscli>=1.29.57" 
"botocore>=1.31.57"

Finally, you create the following policy and later attach it to any role accessing Amazon Bedrock:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Statement1",
            "Effect": "Allow",
            "Action": "bedrock:*",
            "Resource": "*"
        }
    ]
}

Create a DynamoDB table

In our specific scenario, we’ve created a DynamoDB table as our customer database system, but you could also use Amazon Relational Database Service (Amazon RDS). Complete the following steps to provision your DynamoDB table (or customize the settings as needed for your use case):

  1. On the DynamoDB console, choose Tables in the navigation pane.
  2. Choose Create table.

  1. For Table name, enter a name (for example, ItemDetails).
  2. For Partition key, enter a key (for this post, we use Item).
  3. For Sort key, enter a key (for this post, we use Size).
  4. Choose Create table.

Now you can load the data into the DynamoDB table. For this post, we use a CSV file. You can load the data to the DynamoDB table using Python code in a SageMaker notebook.

First, we need to set up a profile named dev.

  1. Open a new terminal in SageMaker Studio and run the following command:
aws configure --profile dev

This command will prompt you to enter your AWS access key ID, secret access key, default AWS Region, and output format.

  1. Return to the SageMaker notebook and write a Python code to set up a connection to DynamoDB using the Boto3 library in Python. This code snippet creates a session using a specific AWS profile named dev and then creates a DynamoDB client using that session. The following is the code sample to load the data:
%pip install boto3
import boto3
import csv

# Create a session using a profile named 'dev'
session = boto3.Session(profile_name='dev')

# Create a DynamoDB resource using the session
dynamodb = session.resource('dynamodb')

# Specify your DynamoDB table name
table_name = 'your_table_name'
table = dynamodb.Table(table_name)

# Specify the path to your CSV file
csv_file_path = 'path/to/your/file.csv'

# Read CSV file and put items into DynamoDB
with open(csv_file_path, 'r', encoding='utf-8-sig') as csvfile:
    csvreader = csv.reader(csvfile)
    
    # Skip the header row
    next(csvreader, None)

    for row in csvreader:
        # Extract values from the CSV row
        item = {
            'Item': row[0],  # Adjust the index based on your CSV structure
            'Size': row[1],
            'Price': row[2]
        }
        
        # Put item into DynamoDB
        response = table.put_item(Item=item)
        
        print(f"Item added: {response}")
print(f"CSV data has been loaded into the DynamoDB table: {table_name}")

Alternatively, you can use NoSQL Workbench or other tools to quickly load the data to your DynamoDB table.

The following is a screenshot after the sample data is inserted into the table.

Create templates in a SageMaker notebook using the Amazon Bedrock invocation API

To create our prompt template for this use case, we use Amazon Bedrock. You can access Amazon Bedrock from the AWS Management Console and via API invocations. In our case, we access Amazon Bedrock via API from the convenience of a SageMaker Studio notebook to create not only our prompt template, but our complete API invocation code that we can later use on our Lambda function.

  1. On the SageMaker console, access an existing SageMaker Studio domain or create a new one to access Amazon Bedrock from a SageMaker notebook.

  1. After you create the SageMaker domain and user, choose the user and choose Launch and Studio. This will open a JupyterLab environment.
  2. When the JupyterLab environment is ready, open a new notebook and begin importing the necessary libraries.

There are many FMs available via the Amazon Bedrock Python SDK. In this case, we use Claude V2, a powerful foundational model developed by Anthropic.

The order processing agent needs a few different templates. This can change depending on the use case, but we have designed a general workflow that can apply to multiple settings. For this use case, the Amazon Bedrock LLM template will accomplish the following:

  • Validate the customer intent
  • Validate the request
  • Create the order data structure
  • Pass a summary of the order to the customer
  1. To invoke the model, create a bedrock-runtime object from Boto3.

#Model api request parameters
modelId = 'anthropic.claude-v2' # change this to use a different version from the model provider
accept = 'application/json'
contentType = 'application/json'

import boto3
import json
bedrock = boto3.client(service_name='bedrock-runtime')

Let’s start by working on the intent validator prompt template. This is an iterative process, but thanks to Anthropic’s prompt engineering guide, you can quickly create a prompt that can accomplish the task.

  1. Create the first prompt template along with a utility function that will help prepare the body for the API invocations.

The following is the code for prompt_template_intent_validator.txt:

"{"prompt": "Human: I will give you some instructions to complete my request.\n<instructions>Given the Conversation between Human and Assistant, you need to identify the intent that the human wants to accomplish and respond appropriately. The valid intents are: Greeting,Place Order, Complain, Speak to Someone. Always put your response to the Human within the Response tags. Also add an XML tag to your output identifying the human intent.\nHere are some examples:\n<example><Conversation> H: hi there.\n\nA: Hi, how can I help you today?\n\nH: Yes. I would like a medium mocha please</Conversation>\n\nA:<intent>Place Order</intent><Response>\nGot it.</Response></example>\n<example><Conversation> H: hello\n\nA: Hi, how can I help you today?\n\nH: my coffee does not taste well can you please re-make it?</Conversation>\n\nA:<intent>Complain</intent><Response>\nOh, I am sorry to hear that. Let me get someone to help you.</Response></example>\n<example><Conversation> H: hi\n\nA: Hi, how can I help you today?\n\nH: I would like to speak to someone else please</Conversation>\n\nA:<intent>Speak to Someone</intent><Response>\nSure, let me get someone to help you.</Response></example>\n<example><Conversation> H: howdy\n\nA: Hi, how can I help you today?\n\nH:can I get a large americano with sugar and 2 mochas with no whipped cream</Conversation>\n\nA:<intent>Place Order</intent><Response>\nSure thing! Please give me a moment.</Response></example>\n<example><Conversation> H: hi\n\n</Conversation>\n\nA:<intent>Greeting</intent><Response>\nHi there, how can I help you today?</Response></example>\n</instructions>\n\nPlease complete this request according to the instructions and examples provided above:<request><Conversation>REPLACEME</Conversation></request>\n\nAssistant:\n", "max_tokens_to_sample": 250, "temperature": 1, "top_k": 250, "top_p": 0.75, "stop_sequences": ["\n\nHuman:", "\n\nhuman:", "\n\nCustomer:", "\n\ncustomer:"]}"


  1. Save this template into a file in order to upload to Amazon S3 and call from the Lambda function when needed. Save the templates as JSON serialized strings in a text file. The previous screenshot shows the code sample to accomplish this as well.
  2. Repeat the same steps with the other templates.

The following are some screenshots of the other templates and the results when calling Amazon Bedrock with some of them.

The following is the code for prompt_template_request_validator.txt:

"{"prompt": "Human: I will give you some instructions to complete my request.\n<instructions>Given the context do the following steps: 1. verify that the items in the input are valid. If customer provided an invalid item, recommend replacing it with a valid one. 2. verify that the customer has provided all the information marked as required. If the customer missed a required information, ask the customer for that information. 3. When the order is complete, provide a summary of the order and ask for confirmation always using this phrase: 'is this correct?' 4. If the customer confirms the order, Do not ask for confirmation again, just say the phrase inside the brackets [Great, Give me a moment while I try to process your order]</instructions>\n<context>\nThe VALID MENU ITEMS are: [latte, frappe, mocha, espresso, cappuccino, romano, americano].\nThe VALID OPTIONS are: [splenda, stevia, raw sugar, honey, whipped cream, sugar, oat milk, soy milk, regular milk, skimmed milk, whole milk, 2 percent milk, almond milk].\nThe required information is: size. Size can be: small, medium, large.\nHere are some examples: <example>H: I would like a medium latte with 1 Splenda and a small romano with no sugar please.\n\nA: <Validation>:\nThe Human is ordering a medium latte with one splenda. Latte is a valid menu item and splenda is a valid option. The Human is also ordering a small romano with no sugar. Romano is a valid menu item.</Validation>\n<Response>\nOk, I got: \n\t-Medium Latte with 1 Splenda and.\n\t-Small Romano with no Sugar.\nIs this correct?</Response>\n\nH: yep.\n\nA:\n<Response>\nGreat, Give me a moment while I try to process your order</example>\n\n<example>H: I would like a cappuccino and a mocha please.\n\nA: <Validation>:\nThe Human is ordering a cappuccino and a mocha. Both are valid menu items. The Human did not provide the size for the cappuccino. The human did not provide the size for the mocha. I will ask the Human for the required missing information.</Validation>\n<Response>\nSure thing, but can you please let me know the size for the Cappuccino and the size for the Mocha? We have Small, Medium, or Large.</Response></example>\n\n<example>H: I would like a small cappuccino and a large lemonade please.\n\nA: <Validation>:\nThe Human is ordering a small cappuccino and a large lemonade. Cappuccino is a valid menu item. Lemonade is not a valid menu item. I will suggest the Human a replacement from our valid menu items.</Validation>\n<Response>\nSorry, we don't have Lemonades, would you like to order something else instead? Perhaps a Frappe or a Latte?</Response></example>\n\n<example>H: Can I get a medium frappuccino with sugar please?\n\nA: <Validation>:\n The Human is ordering a Frappuccino. Frappuccino is not a valid menu item. I will suggest a replacement from the valid menu items in my context.</Validation>\n<Response>\nI am so sorry, but Frappuccino is not in our menu, do you want a frappe or a cappuccino instead? perhaps something else?</Response></example>\n\n<example>H: I want two large americanos and a small latte please.\n\nA: <Validation>:\n The Human is ordering 2 Large Americanos, and a Small Latte. Americano is a valid menu item. Latte is a valid menu item.</Validation>\n<Response>\nOk, I got: \n\t-2 Large Americanos and.\n\t-Small Latte.\nIs this correct?</Response>\n\nH: looks correct, yes.\n\nA:\n<Response>\nGreat, Give me a moment while I try to process your order.</Response></example>\n\n</Context>\n\nPlease complete this request according to the instructions and examples provided above:<request>REPLACEME</request>\n\nAssistant:\n", "max_tokens_to_sample": 250, "temperature": 0.3, "top_k": 250, "top_p": 0.75, "stop_sequences": ["\n\nHuman:", "\n\nhuman:", "\n\nCustomer:", "\n\ncustomer:"]}"

The following is our response from Amazon Bedrock using this template.

The following is the code for prompt_template_object_creator.txt:

"{"prompt": "Human: I will give you some instructions to complete my request.\n<instructions>Given the Conversation between Human and Assistant, you need to create a json object in Response with the appropriate attributes.\nHere are some examples:\n<example><Conversation> H: I want a latte.\n\nA:\nCan I have the size?\n\nH: Medium.\n\nA: So, a medium latte.\nIs this Correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\"1\":{\"item\":\"latte\",\"size\":\"medium\",\"addOns\":[]}}</Response></example>\n<example><Conversation> H: I want a large frappe and 2 small americanos with sugar.\n\nA: Okay, let me confirm:\n\n1 large frappe\n\n2 small americanos with sugar\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\"1\":{\"item\":\"frappe\",\"size\":\"large\",\"addOns\":[]},\"2\":{\"item\":\"americano\",\"size\":\"small\",\"addOns\":[\"sugar\"]},\"3\":{\"item\":\"americano\",\"size\":\"small\",\"addOns\":[\"sugar\"]}}</Response>\n</example>\n<example><Conversation> H: I want a medium americano.\n\nA: Okay, let me confirm:\n\n1 medium americano\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\"1\":{\"item\":\"americano\",\"size\":\"medium\",\"addOns\":[]}}</Response></example>\n<example><Conversation> H: I want a large latte with oatmilk.\n\nA: Okay, let me confirm:\n\nLarge latte with oatmilk\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\"1\":{\"item\":\"latte\",\"size\":\"large\",\"addOns\":[\"oatmilk\"]}}</Response></example>\n<example><Conversation> H: I want a small mocha with no whipped cream please.\n\nA: Okay, let me confirm:\n\nSmall mocha with no whipped cream\n\nIs this correct?\n\nH: Yes.</Conversation>\n\nA:<Response>{\"1\":{\"item\":\"mocha\",\"size\":\"small\",\"addOns\":[\"no whipped cream\"]}}</Response>\n\n</example></instructions>\n\nPlease complete this request according to the instructions and examples provided above:<request><Conversation>REPLACEME</Conversation></request>\n\nAssistant:\n", "max_tokens_to_sample": 250, "temperature": 0.3, "top_k": 250, "top_p": 0.75, "stop_sequences": ["\n\nHuman:", "\n\nhuman:", "\n\nCustomer:", "\n\ncustomer:"]}"


The following is the code for prompt_template_order_summary.txt:

"{"prompt": "Human: I will give you some instructions to complete my request.\n<instructions>Given the Conversation between Human and Assistant, you need to create a summary of the order with bullet points and include the order total.\nHere are some examples:\n<example><Conversation> H: I want a large frappe and 2 small americanos with sugar.\n\nA: Okay, let me confirm:\n\n1 large frappe\n\n2 small americanos with sugar\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>10.50</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\n1 large frappe\n\n2 small americanos with sugar.\nYour Order total is $10.50</Response></example>\n<example><Conversation> H: I want a medium americano.\n\nA: Okay, let me confirm:\n\n1 medium americano\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>3.50</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\n1 medium americano.\nYour Order total is $3.50</Response></example>\n<example><Conversation> H: I want a large latte with oat milk.\n\nA: Okay, let me confirm:\n\nLarge latte with oat milk\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>6.75</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\nLarge latte with oat milk.\nYour Order total is $6.75</Response></example>\n<example><Conversation> H: I want a small mocha with no whipped cream please.\n\nA: Okay, let me confirm:\n\nSmall mocha with no whipped cream\n\nIs this correct?\n\nH: Yes.</Conversation>\n\n<OrderTotal>4.25</OrderTotal>\n\nA:<Response>\nHere is a summary of your order along with the total:\n\nSmall mocha with no whipped cream.\nYour Order total is $6.75</Response>\n\n</example>\n</instructions>\n\nPlease complete this request according to the instructions and examples provided above:<request><Conversation>REPLACEME</Conversation>\n\n<OrderTotal>REPLACETOTAL</OrderTotal></request>\n\nAssistant:\n", "max_tokens_to_sample": 250, "temperature": 0.3, "top_k": 250, "top_p": 0.75, "stop_sequences": ["\n\nHuman:", "\n\nhuman:", "\n\nCustomer:", "\n\ncustomer:", "[Conversation]"]}"


As you can see, we have used our prompt templates to validate menu items, identify missing required information, create a data structure, and summarize the order. The foundational models available on Amazon Bedrock are very powerful, so you could accomplish even more tasks via these templates.

You have completed engineering the prompts and saved the templates to text files. You can now begin creating the Amazon Lex bot and the associated Lambda functions.

Create a Lambda layer with the prompt templates

Complete the following steps to create your Lambda layer:

  1. In SageMaker Studio, create a new folder with a subfolder named python.
  2. Copy your prompt files to the python folder.

  1. You can add the ZIP library to your notebook instance by running the following command.
!conda install -y -c conda-forge zip

  1. Now, run the following command to create the ZIP file for uploading to the Lambda layer.
!zip -r prompt_templates_layer.zip prompt_templates_layer/.

  1. After you create the ZIP file, you can download the file. Go to Lambda, create a new layer by uploading the file directly or by uploading to Amazon S3 first.
  2. Then attach this new layer to the orchestration Lambda function.

Now your prompt template files are locally stored in your Lambda runtime environment. This will speed up the process during your bot runs.

Create a Lambda layer with the required libraries

Complete the following steps to create your Lambda layer with the required librarues:

  1. Open an AWS Cloud9 instance environment, create a folder with a subfolder called python.
  2. Open a terminal inside the python folder.
  3. Run the following commands from the terminal:
pip install “boto3>=1.28.57” -t .
pip install “awscli>=1.29.57" -t .
pip install “botocore>=1.31.57” -t .
  1. Run cd .. and position yourself inside your new folder where you also have the python subfolder.
  2. Run the following command:
zip -r lambda-layer.zip
  1. After you create the ZIP file, you can download the file. Go to Lambda, create a new layer by uploading the file directly or by uploading to Amazon S3 first.
  2. Then attach this new layer to the orchestration Lambda function.

Create the bot in Amazon Lex v2

For this use case, we build an Amazon Lex bot that can provide an input/output interface for the architecture in order to call Amazon Bedrock using voice or text from any interface. Because the LLM will handle the conversation piece of this order processing agent, and Lambda will orchestrate the workflow, you can create a bot with three intents and no slots.

  1. On the Amazon Lex console, create a new bot with the method Create a blank bot.

Now you can add an intent with any appropriate initial utterance for the end-users to start the conversation with the bot. We use simple greetings and add an initial bot response so end-users can provide their requests. When creating the bot, make sure to use a Lambda code hook with the intents; this will trigger a Lambda function that will orchestrate the workflow between the customer, Amazon Lex, and the LLM.

  1. Add your first intent, which triggers the workflow and uses the intent validation prompt template to call Amazon Bedrock and identify what the customer is trying to accomplish. Add a few simple utterances for end-users to start conversation.

You don’t need to use any slots or initial reading in any of the bot intents. In fact, you don’t need to add utterances to the second or third intents. That is because the LLM will guide Lambda throughout the process.

  1. Add a confirmation prompt. You can customize this message in the Lambda function later.

  1. Under Code hooks, select Use a Lambda function for initialization and validation.

  1. Create a second intent with no utterance and no initial response. This is the PlaceOrder intent.

When the LLM identifies that the customer is trying to place an order, the Lambda function will trigger this intent and validate the customer request against the menu, and make sure that no required information is missing. Remember that all of this is on the prompt templates, so you can adapt this workflow for any use case by changing the prompt templates.

  1. Don’t add any slots, but add a confirmation prompt and decline response.

  1. Select Use a Lambda function for initialization and validation.

  1. Create a third intent named ProcessOrder with no sample utterances and no slots.
  2. Add an initial response, a confirmation prompt, and a decline response.

After the LLM has validated the customer request, the Lambda function triggers the third and last intent to process the order. Here, Lambda will use the object creator template to generate the order JSON data structure to query the DynamoDB table, and then use the order summary template to summarize the whole order along with the total so Amazon Lex can pass it to the customer.

  1. Select Use a Lambda function for initialization and validation. This can use any Lambda function to process the order after the customer has given the final confirmation.

  1. After you create all three intents, go to the Visual builder for the ValidateIntent, add a go-to intent step, and connect the output of the positive confirmation to that step.
  2. After you add the go-to intent, edit it and choose the PlaceOrder intent as the intent name.

  1. Similarly, to go the Visual builder for the PlaceOrder intent and connect the output of the positive confirmation to the ProcessOrder go-to intent. No editing is required for the ProcessOrder intent.
  2. You now need to create the Lambda function that orchestrates Amazon Lex and calls the DynamoDB table, as detailed in the following section.

Create a Lambda function to orchestrate the Amazon Lex bot

You can now build the Lambda function that orchestrates the Amazon Lex bot and workflow. Complete the following steps:

  1. Create a Lambda function with the standard execution policy and let Lambda create a role for you.
  2. In the code window of your function, add a few utility functions that will help: format the prompts by adding the lex context to the template, call the Amazon Bedrock LLM API, extract the desired text from the responses, and more. See the following code:
import json
import re
import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

bedrock = boto3.client(service_name='bedrock-runtime')
def CreatingCustomPromptFromLambdaLayer(object_key,replace_items):
   
    folder_path = '/opt/order_processing_agent_prompt_templates/python/'
    try:
        file_path = folder_path + object_key
        with open(file_path, "r") as file1:
            raw_template = file1.read()
            # Modify the template with the custom input prompt
            #template['inputs'][0].insert(1, {"role": "user", "content": '### Input:n' + user_request})
            for key,value in replace_items.items():
                value = json.dumps(json.dumps(value).replace('"','')).replace('"','')
                raw_template = raw_template.replace(key,value)
            modified_prompt = raw_template

            return modified_prompt
    except Exception as e:
        return {
            'statusCode': 500,
            'body': f'An error occurred: {str(e)}'
        }
def CreatingCustomPrompt(object_key,replace_items):
    logger.debug('replace_items is: {}'.format(replace_items))
    #retrieve user request from intent_request
    #we first propmt the model with current order
    
    bucket_name = 'your-bucket-name'
    
    #object_key = 'prompt_template_order_processing.txt'
    try:
        s3 = boto3.client('s3')
        # Retrieve the existing template from S3
        response = s3.get_object(Bucket=bucket_name, Key=object_key)
        raw_template = response['Body'].read().decode('utf-8')
        raw_template = json.loads(raw_template)
        logger.debug('raw template is {}'.format(raw_template))
        #template_json = json.loads(raw_template)
        #logger.debug('template_json is {}'.format(template_json))
        #template = json.dumps(template_json)
        #logger.debug('template is {}'.format(template))

        # Modify the template with the custom input prompt
        #template['inputs'][0].insert(1, {"role": "user", "content": '### Input:n' + user_request})
        for key,value in replace_items.items():
            raw_template = raw_template.replace(key,value)
            logger.debug("Replacing: {} nwith: {}".format(key,value))
        modified_prompt = json.dumps(raw_template)
        logger.debug("Modified template: {}".format(modified_prompt))
        logger.debug("Modified template type is: {}".format(print(type(modified_prompt))))
        
        #modified_template_json = json.loads(modified_prompt)
        #logger.debug("Modified template json: {}".format(modified_template_json))
        
        return modified_prompt
    except Exception as e:
        return {
            'statusCode': 500,
            'body': f'An error occurred: {str(e)}'
        }
    
def validate_intent(intent_request):
    logger.debug('starting validate_intent: {}'.format(intent_request))
    #retrieve user request from intent_request
    user_request = 'Human: ' + intent_request['inputTranscript'].lower()
    #getting current context variable
    current_session_attributes =  intent_request['sessionState']['sessionAttributes']
    if len(current_session_attributes) > 0:
        full_context = current_session_attributes['fullContext'] + '\n\n' + user_request
        dialog_context = current_session_attributes['dialogContext'] + '\n\n' + user_request
    else:
        full_context = user_request
        dialog_context = user_request
    #Preparing validation prompt by adding context to prompt template
    object_key = 'prompt_template_intent_validator.txt'
    #replace_items = {"REPLACEME":full_context}
    #replace_items = {"REPLACEME":dialog_context}
    replace_items = {"REPLACEME":dialog_context}
    #validation_prompt = CreatingCustomPrompt(object_key,replace_items)
    validation_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)

    #Prompting model for request validation
    intent_validation_completion = prompt_bedrock(validation_prompt)
    intent_validation_completion = re.sub(r'["]','',intent_validation_completion)

    #extracting response from response completion and removing some special characters
    validation_response = extract_response(intent_validation_completion)
    validation_intent = extract_intent(intent_validation_completion)
    
    

    #business logic depending on intents
    if validation_intent == 'Place Order':
        return validate_request(intent_request)
    elif validation_intent in ['Complain','Speak to Someone']:
        ##adding session attributes to keep current context
        full_context = full_context + '\n\n' + intent_validation_completion
        dialog_context = dialog_context + '\n\nAssistant: ' + validation_response
        intent_request['sessionState']['sessionAttributes']['fullContext'] = full_context
        intent_request['sessionState']['sessionAttributes']['dialogContext'] = dialog_context
        intent_request['sessionState']['sessionAttributes']['customerIntent'] = validation_intent
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'Fulfilled','Close',validation_response)
    if validation_intent == 'Greeting':
        ##adding session attributes to keep current context
        full_context = full_context + '\n\n' + intent_validation_completion
        dialog_context = dialog_context + '\n\nAssistant: ' + validation_response
        intent_request['sessionState']['sessionAttributes']['fullContext'] = full_context
        intent_request['sessionState']['sessionAttributes']['dialogContext'] = dialog_context
        intent_request['sessionState']['sessionAttributes']['customerIntent'] = validation_intent
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'InProgress','ConfirmIntent',validation_response)

def validate_request(intent_request):
    logger.debug('starting validate_request: {}'.format(intent_request))
    #retrieve user request from intent_request
    user_request = 'Human: ' + intent_request['inputTranscript'].lower()
    #getting current context variable
    current_session_attributes =  intent_request['sessionState']['sessionAttributes']
    if len(current_session_attributes) > 0:
        full_context = current_session_attributes['fullContext'] + '\n\n' + user_request
        dialog_context = current_session_attributes['dialogContext'] + '\n\n' + user_request
    else:
        full_context = user_request
        dialog_context = user_request
   
    #Preparing validation prompt by adding context to prompt template
    object_key = 'prompt_template_request_validator.txt'
    replace_items = {"REPLACEME":dialog_context}
    #validation_prompt = CreatingCustomPrompt(object_key,replace_items)
    validation_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)

    #Prompting model for request validation
    request_validation_completion = prompt_bedrock(validation_prompt)
    request_validation_completion = re.sub(r'["]','',request_validation_completion)

    #extracting response from response completion and removing some special characters
    validation_response = extract_response(request_validation_completion)

    ##adding session attributes to keep current context
    full_context = full_context + '\n\n' + request_validation_completion
    dialog_context = dialog_context + '\n\nAssistant: ' + validation_response
    intent_request['sessionState']['sessionAttributes']['fullContext'] = full_context
    intent_request['sessionState']['sessionAttributes']['dialogContext'] = dialog_context
    
    return close(intent_request['sessionState']['sessionAttributes'],'PlaceOrder','InProgress','ConfirmIntent',validation_response)
    
def process_order(intent_request):
    logger.debug('starting process_order: {}'.format(intent_request))

     #retrieve user request from intent_request
    user_request = 'Human: ' + intent_request['inputTranscript'].lower()
    #getting current context variable
    current_session_attributes =  intent_request['sessionState']['sessionAttributes']
    if len(current_session_attributes) > 0:
        full_context = current_session_attributes['fullContext'] + '\n\n' + user_request
        dialog_context = current_session_attributes['dialogContext'] + '\n\n' + user_request
    else:
        full_context = user_request
        dialog_context = user_request
    #   Preparing object creator prompt by adding context to prompt template
    object_key = 'prompt_template_object_creator.txt'
    replace_items = {"REPLACEME":dialog_context}
    #object_creator_prompt = CreatingCustomPrompt(object_key,replace_items)
    object_creator_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)
    #Prompting model for object creation
    object_creation_completion = prompt_bedrock(object_creator_prompt)
    #extracting response from response completion
    object_creation_response = extract_response(object_creation_completion)
    inputParams = json.loads(object_creation_response)
    inputParams = json.dumps(json.dumps(inputParams))
    logger.debug('inputParams is: {}'.format(inputParams))
    client = boto3.client('lambda')
    response = client.invoke(FunctionName = 'arn:aws:lambda:us-east-1:<AccountNumber>:function:aws-blog-order-validator',InvocationType = 'RequestResponse',Payload = inputParams)
    responseFromChild = json.load(response['Payload'])
    validationResult = responseFromChild['statusCode']
    if validationResult == 205:
        order_validation_error = responseFromChild['validator_response']
        return close(intent_request['sessionState']['sessionAttributes'],'PlaceOrder','InProgress','ConfirmIntent',order_validation_error)
    #invokes Order Processing lambda to query DynamoDB table and returns order total
    response = client.invoke(FunctionName = 'arn:aws:lambda:us-east-1: <AccountNumber>:function:aws-blog-order-processing',InvocationType = 'RequestResponse',Payload = inputParams)
    responseFromChild = json.load(response['Payload'])
    orderTotal = responseFromChild['body']
    ###Prompting the model to summarize the order along with order total
    object_key = 'prompt_template_order_summary.txt'
    replace_items = {"REPLACEME":dialog_context,"REPLACETOTAL":orderTotal}
    #order_summary_prompt = CreatingCustomPrompt(object_key,replace_items)
    order_summary_prompt = CreatingCustomPromptFromLambdaLayer(object_key,replace_items)
    order_summary_completion = prompt_bedrock(order_summary_prompt)
    #extracting response from response completion
    order_summary_response = extract_response(order_summary_completion)  
    order_summary_response = order_summary_response + '. Shall I finalize processing your order?'
    ##adding session attributes to keep current context
    full_context = full_context + '\n\n' + order_summary_completion
    dialog_context = dialog_context + '\n\nAssistant: ' + order_summary_response
    intent_request['sessionState']['sessionAttributes']['fullContext'] = full_context
    intent_request['sessionState']['sessionAttributes']['dialogContext'] = dialog_context
    return close(intent_request['sessionState']['sessionAttributes'],'ProcessOrder','InProgress','ConfirmIntent',order_summary_response)
    

""" --- Main handler and Workflow functions --- """

def lambda_handler(event, context):
    """
    Route the incoming request based on intent.
    The JSON body of the request is provided in the event slot.
    """
    logger.debug('event is: {}'.format(event))

    return dispatch(event)

def dispatch(intent_request):
    """
    Called when the user specifies an intent for this bot. If intent is not valid then returns error name
    """
    logger.debug('intent_request is: {}'.format(intent_request))
    intent_name = intent_request['sessionState']['intent']['name']
    confirmation_state = intent_request['sessionState']['intent']['confirmationState']
    # Dispatch to your bot's intent handlers
    if intent_name == 'ValidateIntent' and confirmation_state == 'None':
        return validate_intent(intent_request)
    if intent_name == 'PlaceOrder' and confirmation_state == 'None':
        return validate_request(intent_request)
    elif intent_name == 'PlaceOrder' and confirmation_state == 'Confirmed':
        return process_order(intent_request)
    elif intent_name == 'PlaceOrder' and confirmation_state == 'Denied':
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'Fulfilled','Close','Got it. Let me know if I can help you with something else.')
    elif intent_name == 'PlaceOrder' and confirmation_state not in ['Denied','Confirmed','None']:
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'Fulfilled','Close','Sorry. I am having trouble completing the request. Let me get someone to help you.')
        logger.debug('exiting intent {} here'.format(intent_request['sessionState']['intent']['name']))
    elif intent_name == 'ProcessOrder' and confirmation_state == 'None':
        return validate_request(intent_request)
    elif intent_name == 'ProcessOrder' and confirmation_state == 'Confirmed':
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'Fulfilled','Close','Perfect! Your order has been processed. Please proceed to payment.')
    elif intent_name == 'ProcessOrder' and confirmation_state == 'Denied':
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'Fulfilled','Close','Got it. Let me know if I can help you with something else.')
    elif intent_name == 'ProcessOrder' and confirmation_state not in ['Denied','Confirmed','None']:
        return close(intent_request['sessionState']['sessionAttributes'],intent_request['sessionState']['intent']['name'],'Fulfilled','Close','Sorry. I am having trouble completing the request. Let me get someone to help you.')
        logger.debug('exiting intent {} here'.format(intent_request['sessionState']['intent']['name']))
    raise Exception('Intent with name ' + intent_name + ' not supported')
    
def prompt_bedrock(formatted_template):
    logger.debug('prompt bedrock input is:'.format(formatted_template))
    body = json.loads(formatted_template)

    modelId = 'anthropic.claude-v2' # change this to use a different version from the model provider
    accept = 'application/json'
    contentType = 'application/json'

    response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
    response_body = json.loads(response.get('body').read())
    response_completion = response_body.get('completion')
    logger.debug('response is: {}'.format(response_completion))

    #print_ww(response_body.get('completion'))
    #print(response_body.get('results')[0].get('outputText'))
    return response_completion

#function to extract text between the <Response> and </Response> tags within model completion
def extract_response(response_completion):
    
    if '<Response>' in response_completion:
        customer_response = response_completion.replace('<Response>','||').replace('</Response>','').split('||')[1]
        
        logger.debug('modified response is: {}'.format(response_completion))

        return customer_response
    else:
        
        logger.debug('modified response is: {}'.format(response_completion))

        return response_completion
        
#function to extract text between the <Response> and </Response> tags within model completion
def extract_intent(response_completion):
    if '<intent>' in response_completion:
        customer_intent = response_completion.replace('<intent>','||').replace('</intent>','||').split('||')[1]
        return customer_intent
    else:
        return customer_intent
        
def close(session_attributes, intent, fulfillment_state, action_type, message):
    #This function prepares the response in the appropiate format for Lex V2

    response = {
        "sessionState": {
            "sessionAttributes":session_attributes,
            "dialogAction": {
                "type": action_type
            },
            "intent": {
                "name":intent,
                "state":fulfillment_state
                
            },
            
            },
        "messages":
            [{
                "contentType":"PlainText",
                "content":message,
            }]
            ,
        
    }
    return response
  1. Attach the Lambda layer you created earlier to this function.
  2. Additionally, attach the layer to the prompt templates you created.
  3. In the Lambda execution role, attach the policy to access Amazon Bedrock, which was created earlier.

The Lambda execution role should have the following permissions.

Attach the Orchestration Lambda function to the Amazon Lex bot

  1. After you create the function in the previous section, return to the Amazon Lex console and navigate to your bot.
  2. Under Languages in the navigation pane, choose English.
  3. For Source, choose your order processing bot.
  4. For Lambda function version or alias, choose $LATEST.
  5. Choose Save.

Create assisting Lambda functions

Complete the following steps to create additional Lambda functions:

  1. Create a Lambda function to query the DynamoDB table that you created earlier:
import json
import boto3
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
# Initialize the DynamoDB client
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('your-table-name')

def calculate_grand_total(input_data):
    # Initialize the total price
    total_price = 0
    
    try:
        # Loop through each item in the input JSON
        for item_id, item_data in input_data.items():
            item_name = item_data['item'].lower()  # Convert item name to lowercase
            item_size = item_data['size'].lower()  # Convert item size to lowercase
            
            # Query the DynamoDB table for the item based on Item and Size
            response = table.get_item(
                Key={'Item': item_name,
                    'Size': item_size}
            )
            
            # Check if the item was found in the table
            if 'Item' in response:
                item = response['Item']
                price = float(item['Price'])
                total_price += price  # Add the item's price to the total
    
        return total_price
    except Exception as e:
        raise Exception('An error occurred: {}'.format(str(e)))

def lambda_handler(event, context):
    try:
       
        # Parse the input JSON from the Lambda event
        input_json = json.loads(event)

        # Calculate the grand total
        grand_total = calculate_grand_total(input_json)
    
        # Return the grand total in the response
        return {'statusCode': 200,'body': json.dumps(grand_total)}
    except Exception as e:
        return {
            'statusCode': 500,
            'body': json.dumps('An error occurred: {}'.format(str(e)))
  1. Navigate to the Configuration tab in the Lambda function and choose Permissions.
  2. Attach a resource-based policy statement allowing the order processing Lambda function to invoke this function.

  1. Navigate to the IAM execution role for this Lambda function and add a policy to access the DynamoDB table.

  1. Create another Lambda function to validate if all required attributes were passed from the customer. In the following example, we validate if the size attribute is captured for an order:
import json
import logging

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

def lambda_handler(event, context):
    # Define customer orders from the input event
    customer_orders = json.loads(event)

    # Initialize a list to collect error messages
    order_errors = {}
    missing_size = []
    error_messages = []
    # Iterate through each order in customer_orders
    for order_id, order in customer_orders.items():
        if "size" not in order or order["size"] == "":
            missing_size.append(order['item'])
            order_errors['size'] = missing_size
    if order_errors:
        items_missing_size = order_errors['size']
        error_message = f"could you please provide the size for the following items: {', '.join(items_missing_size)}?"
        error_messages.append(error_message)

    # Prepare the response message
    if error_messages:
        response_message = "n".join(error_messages)
        return {
        'statusCode': 205,
        'validator_response': response_message
            }   
    else:
        response_message = "Order is validated successfully"
        return {
        'statusCode': 200,
        'validator_response': response_message
        }
  1. Navigate to the Configuration tab in the Lambda function and choose Permissions.
  2. Attach a resource-based policy statement allowing the order processing Lambda function to invoke this function.

Test the solution

Now we can test the solution with example orders that customers place via Amazon Lex.

For our first example, the customer asked for a frappuccino, which is not on the menu. The model validates with the help of order validator template and suggests some recommendations based on the menu. After the customer confirms their order, they are notified of the order total and order summary. The order will be processed based on the customer’s final confirmation.

In our next example, the customer is ordering for large cappuccino and then modifying the size from large to medium. The model captures all necessary changes and requests the customer to confirm the order. The model presents the order total and order summary, and processes the order based on the customer’s final confirmation.

For our final example, the customer placed an order for multiple items and the size is missing for a couple of items. The model and Lambda function will verify if all required attributes are present to process the order and then ask the customer to provide the missing information. After the customer provides the missing information (in this case, the size of the coffee), they’re shown the order total and order summary. The order will be processed based on the customer’s final confirmation.

LLM limitations

LLM outputs are stochastic by nature, which means that the results from our LLM can vary in format, or even in the form of untruthful content (hallucinations). Therefore, developers need to rely on a good error handling logic throughout their code in order to handle these scenarios and avoid a degraded end-user experience.

Clean up

If you no longer need this solution, you can delete the following resources:

  • Lambda functions
  • Amazon Lex box
  • DynamoDB table
  • S3 bucket

Additionally, shut down the SageMaker Studio instance if the application is no longer required.

Cost assessment

For pricing information for the main services used by this solution, see the following:

Note that you can use Claude v2 without the need for provisioning, so overall costs remain at a minimum. To further reduce costs, you can configure the DynamoDB table with the on-demand setting.

Conclusion

This post demonstrated how to build a speech-enabled AI order processing agent using Amazon Lex, Amazon Bedrock, and other AWS services. We showed how prompt engineering with a powerful generative AI model like Claude can enable robust natural language understanding and conversation flows for order processing without the need for extensive training data.

The solution architecture uses serverless components like Lambda, Amazon S3, and DynamoDB to enable a flexible and scalable implementation. Storing the prompt templates in Amazon S3 allows you to customize the solution for different use cases.

Next steps could include expanding the agent’s capabilities to handle a wider range of customer requests and edge cases. The prompt templates provide a way to iteratively improve the agent’s skills. Additional customizations could involve integrating the order data with backend systems like inventory, CRM, or POS. Lastly, the agent could be made available across various customer touchpoints like mobile apps, drive-thru, kiosks, and more using the multi-channel capabilities of Amazon Lex.

To learn more, refer to the following related resources:


About the Authors

Moumita Dutta is a Partner Solution Architect at Amazon Web Services. In her role, she collaborates closely with partners to develop scalable and reusable assets that streamline cloud deployments and enhance operational efficiency. She is a member of AI/ML community and a Generative AI expert at AWS. In her leisure, she enjoys gardening and cycling.

Fernando Lammoglia is a Partner Solutions Architect at Amazon Web Services, working closely with AWS partners in spearheading the development and adoption of cutting-edge AI solutions across business units. A strategic leader with expertise in cloud architecture, generative AI, machine learning, and data analytics. He specializes in executing go-to-market strategies and delivering impactful AI solutions aligned with organizational goals. On his free time he loves to spend time with his family and travel to other countries.

Mitul Patel is a Senior Solution Architect at Amazon Web Services. In his role as a cloud technology enabler, he works with customers to understand their goals and challenges, and provides prescriptive guidance to achieve their objective with AWS offerings. He is a member of AI/ML community and a Generative AI ambassador at AWS. In his free time, he enjoys hiking and playing soccer.

Read More

HEAL: A framework for health equity assessment of machine learning performance

HEAL: A framework for health equity assessment of machine learning performance

Health equity is a major societal concern worldwide with disparities having many causes. These sources include limitations in access to healthcare, differences in clinical treatment, and even fundamental differences in the diagnostic technology. In dermatology for example, skin cancer outcomes are worse for populations such as minorities, those with lower socioeconomic status, or individuals with limited healthcare access. While there is great promise in recent advances in machine learning (ML) and artificial intelligence (AI) to help improve healthcare, this transition from research to bedside must be accompanied by a careful understanding of whether and how they impact health equity.

Health equity is defined by public health organizations as fairness of opportunity for everyone to be as healthy as possible. Importantly, equity may be different from equality. For example, people with greater barriers to improving their health may require more or different effort to experience this fair opportunity. Similarly, equity is not fairness as defined in the AI for healthcare literature. Whereas AI fairness often strives for equal performance of the AI technology across different patient populations, this does not center the goal of prioritizing performance with respect to pre-existing health disparities.

Health equity considerations. An intervention (e.g., an ML-based tool, indicated in dark blue) promotes health equity if it helps reduce existing disparities in health outcomes (indicated in lighter blue).

In “Health Equity Assessment of machine Learning performance (HEAL): a framework and dermatology AI model case study”, published in The Lancet eClinicalMedicine, we propose a methodology to quantitatively assess whether ML-based health technologies perform equitably. In other words, does the ML model perform well for those with the worst health outcomes for the condition(s) the model is meant to address? This goal anchors on the principle that health equity should prioritize and measure model performance with respect to disparate health outcomes, which may be due to a number of factors that include structural inequities (e.g., demographic, social, cultural, political, economic, environmental and geographic).

The health equity framework (HEAL)

The HEAL framework proposes a 4-step process to estimate the likelihood that an ML-based health technology performs equitably:

  1. Identify factors associated with health inequities and define tool performance metrics,
  2. Identify and quantify pre-existing health disparities,
  3. Measure the performance of the tool for each subpopulation,
  4. Measure the likelihood that the tool prioritizes performance with respect to health disparities.

The final step’s output is termed the HEAL metric, which quantifies how anticorrelated the ML model’s performance is with health disparities. In other words, does the model perform better with populations that have the worse health outcomes?

This 4-step process is designed to inform improvements for making ML model performance more equitable, and is meant to be iterative and re-evaluated on a regular basis. For example, the availability of health outcomes data in step (2) can inform the choice of demographic factors and brackets in step (1), and the framework can be applied again with new datasets, models and populations.

Framework for Health Equity Assessment of machine Learning performance (HEAL). Our guiding principle is to avoid exacerbating health inequities, and these steps help us identify disparities and assess for inequitable model performance to move towards better outcomes for all.

With this work, we take a step towards encouraging explicit assessment of the health equity considerations of AI technologies, and encourage prioritization of efforts during model development to reduce health inequities for subpopulations exposed to structural inequities that can precipitate disparate outcomes. We should note that the present framework does not model causal relationships and, therefore, cannot quantify the actual impact a new technology will have on reducing health outcome disparities. However, the HEAL metric may help identify opportunities for improvement, where the current performance is not prioritized with respect to pre-existing health disparities.

Case study on a dermatology model

As an illustrative case study, we applied the framework to a dermatology model, which utilizes a convolutional neural network similar to that described in prior work. This example dermatology model was trained to classify 288 skin conditions using a development dataset of 29k cases. The input to the model consists of three photos of a skin concern along with demographic information and a brief structured medical history. The output consists of a ranked list of possible matching skin conditions.

Using the HEAL framework, we evaluated this model by assessing whether it prioritized performance with respect to pre-existing health outcomes. The model was designed to predict possible dermatologic conditions (from a list of hundreds) based on photos of a skin concern and patient metadata. Evaluation of the model is done using a top-3 agreement metric, which quantifies how often the top 3 output conditions match the most likely condition as suggested by a dermatologist panel. The HEAL metric is computed via the anticorrelation of this top-3 agreement with health outcome rankings.

We used a dataset of 5,420 teledermatology cases, enriched for diversity in age, sex and race/ethnicity, to retrospectively evaluate the model’s HEAL metric. The dataset consisted of “store-and-forward” cases from patients of 20 years or older from primary care providers in the USA and skin cancer clinics in Australia. Based on a review of the literature, we decided to explore race/ethnicity, sex and age as potential factors of inequity, and used sampling techniques to ensure that our evaluation dataset had sufficient representation of all race/ethnicity, sex and age groups. To quantify pre-existing health outcomes for each subgroup we relied on measurements from public databases endorsed by the World Health Organization, such as Years of Life Lost (YLLs) and Disability-Adjusted Life Years (DALYs; years of life lost plus years lived with disability).

HEAL metric for all dermatologic conditions across race/ethnicity subpopulations, including health outcomes (YLLs per 100,000), model performance (top-3 agreement), and rankings for health outcomes and tool performance.
(* Higher is better; measures the likelihood the model performs equitably with respect to the axes in this table.)
HEAL metric for all dermatologic conditions across sexes, including health outcomes (DALYs per 100,000), model performance (top-3 agreement), and rankings for health outcomes and tool performance. (* As above.)

Our analysis estimated that the model was 80.5% likely to perform equitably across race/ethnicity subgroups and 92.1% likely to perform equitably across sexes.

However, while the model was likely to perform equitably across age groups for cancer conditions specifically, we discovered that it had room for improvement across age groups for non-cancer conditions. For example, those 70+ have the poorest health outcomes related to non-cancer skin conditions, yet the model didn’t prioritize performance for this subgroup.

HEAL metrics for all cancer and non-cancer dermatologic conditions across age groups, including health outcomes (DALYs per 100,000), model performance (top-3 agreement), and rankings for health outcomes and tool performance. (* As above.)

Putting things in context

For holistic evaluation, the HEAL metric cannot be employed in isolation. Instead this metric should be contextualized alongside many other factors ranging from computational efficiency and data privacy to ethical values, and aspects that may influence the results (e.g., selection bias or differences in representativeness of the evaluation data across demographic groups).

As an adversarial example, the HEAL metric can be artificially improved by deliberately reducing model performance for the most advantaged subpopulation until performance for that subpopulation is worse than all others. For illustrative purposes, given subpopulations A and B where A has worse health outcomes than B, consider the choice between two models: Model 1 (M1) performs 5% better for subpopulation A than for subpopulation B. Model 2 (M2) performs 5% worse on subpopulation A than B. The HEAL metric would be higher for M1 because it prioritizes performance on a subpopulation with worse outcomes. However, M1 may have absolute performances of just 75% and 70% for subpopulations A and B respectively, while M2 has absolute performances of 75% and 80% for subpopulations A and B respectively. Choosing M1 over M2 would lead to worse overall performance for all subpopulations because some subpopulations are worse-off while no subpopulation is better-off.

Accordingly, the HEAL metric should be used alongside a Pareto condition (discussed further in the paper), which restricts model changes so that outcomes for each subpopulation are either unchanged or improved compared to the status quo, and performance does not worsen for any subpopulation.

The HEAL framework, in its current form, assesses the likelihood that an ML-based model prioritizes performance for subpopulations with respect to pre-existing health disparities for specific subpopulations. This differs from the goal of understanding whether ML will reduce disparities in outcomes across subpopulations in reality. Specifically, modeling improvements in outcomes requires a causal understanding of steps in the care journey that happen both before and after use of any given model. Future research is needed to address this gap.

Conclusion

The HEAL framework enables a quantitative assessment of the likelihood that health AI technologies prioritize performance with respect to health disparities. The case study demonstrates how to apply the framework in the dermatological domain, indicating a high likelihood that model performance is prioritized with respect to health disparities across sex and race/ethnicity, but also revealing the potential for improvements for non-cancer conditions across age. The case study also illustrates limitations in the ability to apply all recommended aspects of the framework (e.g., mapping societal context, availability of data), thus highlighting the complexity of health equity considerations of ML-based tools.

This work is a proposed approach to address a grand challenge for AI and health equity, and may provide a useful evaluation framework not only during model development, but during pre-implementation and real-world monitoring stages, e.g., in the form of health equity dashboards. We hold that the strength of the HEAL framework is in its future application to various AI tools and use cases and its refinement in the process. Finally, we acknowledge that a successful approach towards understanding the impact of AI technologies on health equity needs to be more than a set of metrics. It will require a set of goals agreed upon by a community that represents those who will be most impacted by a model.

Acknowledgements

The research described here is joint work across many teams at Google. We are grateful to all our co-authors: Terry Spitz, Malcolm Pyles, Heather Cole-Lewis, Ellery Wulczyn, Stephen R. Pfohl, Donald Martin, Jr., Ronnachai Jaroensri, Geoff Keeling, Yuan Liu, Stephanie Farquhar, Qinghan Xue, Jenna Lester, Cían Hughes, Patricia Strachan, Fraser Tan, Peggy Bui, Craig H. Mermel, Lily H. Peng, Yossi Matias, Greg S. Corrado, Dale R. Webster, Sunny Virmani, Christopher Semturs, Yun Liu, and Po-Hsuan Cameron Chen. We also thank Lauren Winer, Sami Lachgar, Ting-An Lin, Aaron Loh, Morgan Du, Jenny Rizk, Renee Wong, Ashley Carrick, Preeti Singh, Annisah Um’rani, Jessica Schrouff, Alexander Brown, and Anna Iurchenko for their support of this project.

Read More

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker

This post is co-written with Chaoyang He, Al Nevarez and Salman Avestimehr from FedML.

Many organizations are implementing machine learning (ML) to enhance their business decision-making through automation and the use of large distributed datasets. With increased access to data, ML has the potential to provide unparalleled business insights and opportunities. However, the sharing of raw, non-sanitized sensitive information across different locations poses significant security and privacy risks, especially in regulated industries such as healthcare.

To address this issue, federated learning (FL) is a decentralized and collaborative ML training technique that offers data privacy while maintaining accuracy and fidelity. Unlike traditional ML training, FL training occurs within an isolated client location using an independent secure session. The client only shares its output model parameters with a centralized server, known as the training coordinator or aggregation server, and not the actual data used to train the model. This approach alleviates many data privacy concerns while enabling effective collaboration on model training.

Although FL is a step towards achieving better data privacy and security, it’s not a guaranteed solution. Insecure networks lacking access control and encryption can still expose sensitive information to attackers. Additionally, locally trained information can expose private data if reconstructed through an inference attack. To mitigate these risks, the FL model uses personalized training algorithms and effective masking and parameterization before sharing information with the training coordinator. Strong network controls at local and centralized locations can further reduce inference and exfiltration risks.

In this post, we share an FL approach using FedML, Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker to improve patient outcomes while addressing data privacy and security concerns.

The need for federated learning in healthcare

Healthcare relies heavily on distributed data sources to make accurate predictions and assessments about patient care. Limiting the available data sources to protect privacy negatively affects result accuracy and, ultimately, the quality of patient care. Therefore, ML creates challenges for AWS customers who need to ensure privacy and security across distributed entities without compromising patient outcomes.

Healthcare organizations must navigate strict compliance regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) in the United States, while implementing FL solutions. Ensuring data privacy, security, and compliance becomes even more critical in healthcare, requiring robust encryption, access controls, auditing mechanisms, and secure communication protocols. Additionally, healthcare datasets often contain complex and heterogeneous data types, making data standardization and interoperability a challenge in FL settings.

Use case overview

The use case outlined in this post is of heart disease data in different organizations, on which an ML model will run classification algorithms to predict heart disease in the patient. Because this data is across organizations, we use federated learning to collate the findings.

The Heart Disease dataset from the University of California Irvine’s Machine Learning Repository is a widely used dataset for cardiovascular research and predictive modeling. It consists of 303 samples, each representing a patient, and contains a combination of clinical and demographic attributes, as well as the presence or absence of heart disease.

This multivariate dataset has 76 attributes in the patient information, out of which 14 attributes are most commonly used for developing and evaluating ML algorithms to predict the presence of heart disease based on the given attributes.

FedML framework

There is a wide selection of FL frameworks, but we decided to use the FedML framework for this use case because it is open source and supports several FL paradigms. FedML provides a popular open source library, MLOps platform, and application ecosystem for FL. These facilitate the development and deployment of FL solutions. It provides a comprehensive suite of tools, libraries, and algorithms that enable researchers and practitioners to implement and experiment with FL algorithms in a distributed environment. FedML addresses the challenges of data privacy, communication, and model aggregation in FL, offering a user-friendly interface and customizable components. With its focus on collaboration and knowledge sharing, FedML aims to accelerate the adoption of FL and drive innovation in this emerging field. The FedML framework is model agnostic, including recently added support for large language models (LLMs). For more information, refer to Releasing FedLLM: Build Your Own Large Language Models on Proprietary Data using the FedML Platform.

FedML Octopus

System hierarchy and heterogeneity is a key challenge in real-life FL use cases, where different data silos may have different infrastructure with CPU and GPUs. In such scenarios, you can use FedML Octopus.

FedML Octopus is the industrial-grade platform of cross-silo FL for cross-organization and cross-account training. Coupled with FedML MLOps, it enables developers or organizations to conduct open collaboration from anywhere at any scale in a secure manner. FedML Octopus runs a distributed training paradigm inside each data silo and uses synchronous or asynchronous trainings.

FedML MLOps

FedML MLOps enables local development of code that can later be deployed anywhere using FedML frameworks. Before initiating training, you must create a FedML account, as well as create and upload the server and client packages in FedML Octopus. For more details, refer to steps and Introducing FedML Octopus: scaling federated learning into production with simplified MLOps.

Solution overview

We deploy FedML into multiple EKS clusters integrated with SageMaker for experiment tracking. We use Amazon EKS Blueprints for Terraform to deploy the required infrastructure. EKS Blueprints helps compose complete EKS clusters that are fully bootstrapped with the operational software that is needed to deploy and operate workloads. With EKS Blueprints, the configuration for the desired state of EKS environment, such as the control plane, worker nodes, and Kubernetes add-ons, is described as an infrastructure as code (IaC) blueprint. After a blueprint is configured, it can be used to create consistent environments across multiple AWS accounts and Regions using continuous deployment automation.

The content shared in this post reflects real-life situations and experiences, but it’s important to note that the deployment of these situations in different locations may vary. Although we utilize a single AWS account with separate VPCs, it’s crucial to understand that individual circumstances and configurations may differ. Therefore, the information provided should be used as a general guide and may require adaptation based on specific requirements and local conditions.

The following diagram illustrates our solution architecture.

In addition to the tracking provided by FedML MLOps for each training run, we use Amazon SageMaker Experiments to track the performance of each client model and the centralized (aggregator) model.

SageMaker Experiments is a capability of SageMaker that lets you create, manage, analyze, and compare your ML experiments. By recording experiment details, parameters, and results, researchers can accurately reproduce and validate their work. It allows for effective comparison and analysis of different approaches, leading to informed decision-making. Additionally, tracking experiments facilitates iterative improvement by providing insights into the progression of models and enabling researchers to learn from previous iterations, ultimately accelerating the development of more effective solutions.

We send the following to SageMaker Experiments for each run:

  • Model evaluation metrics – Training loss and Area Under the Curve (AUC)
  • Hyperparameters – Epoch, learning rate, batch size, optimizer, and weight decay

Prerequisites

To follow along with this post, you should have the following prerequisites:

Deploy the solution

To begin, clone the repository hosting the sample code locally:

git clone git@ssh.gitlab.aws.dev:west-ml-sa/fl_fedml.ai.git

Then deploy the use case infrastructure using the following commands:

terraform init
terraform apply

The Terraform template may take 20–30 minutes to fully deploy. After it’s deployed, follow the steps in the next sections to run the FL application.

Create an MLOps deployment package

As a part of the FedML documentation, we need to create the client and server packages, which the MLOps platform will distribute to the server and clients to begin training.

To create these packages, run the following script found in the root directory:

. ./build_mlops_pkg.sh

This will create the respective packages in the following directory in the project’s root directory:

mlops/dist-packages

Upload the packages to the FedML MLOps platform

Complete the following steps to upload the packages:

  1. On the FedML UI, choose My Applications in the navigation pane.
  2. Choose New Application.
  3. Upload the client and server packages from your workstation.
  4. You can also adjust the hyperparameters or create new ones.

Trigger federated training

To run federated training, complete the following steps:

  1. On the FedML UI, choose Project List in the navigation pane.
  2. Choose Create a new project.
  3. Enter a group name and a project name, then choose OK.
  4. Choose the newly created project and choose Create new run to trigger a training run.
  5. Select the edge client devices and the central aggregator server for this training run.
  6. Choose the application that you created in the previous steps.
  7. Update any of the hyperparameters or use the default settings.
  8. Choose Start to start training.
  9. Choose the Training Status tab and wait for the training run to complete. You can also navigate to the tabs available.
  10. When training is complete, choose the System tab to see the training time durations on your edge servers and aggregation events.

View results and experiment details

When the training is complete, you can view the results using FedML and SageMaker.

On the FedML UI, on the Models tab, you can see the aggregator and client model. You can also download these models from the website.

You can also log in to Amazon SageMaker Studio and choose Experiments in the navigation pane.

The following screenshot shows the logged experiments.

Experiment tracking code

In this section, we explore the code that integrates SageMaker experiment tracking with the FL framework training.

In an editor of your choice, open the following folder to see the edits to the code to inject SageMaker experiment tracking code as a part of the training:

cd fl_fedml.ai/

For tracking the training, we create a SageMaker experiment with parameters and metrics logged using the log_parameter and log_metric command as outlined in the following code sample.

An entry in the config/fedml_config.yaml file declares the experiment prefix, which is referenced in the code to create unique experiment names: sm_experiment_name: "fed-heart-disease". You can update this to any value of your choice.

For example, see the following code for the heart_disease_trainer.py, which is used by each client to train the model on their own dataset:

# Add this code before the for loop on epochs
# We are passing the experiment prefix & client-rank from the config
# to the function to create a unique name
experiment_name = unique_name_from_base(args.sm_experiment_name + "-client-" + str(args.rank))
print(f"Sagemaker Experiment Name: {experiment_name}")

For each client run, the experiment details are tracked using the following code in heart_disease_trainer.py:

# create an experiment and start a new run
with Run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=Session()) as run:
run.log_parameters(
{ "Train Data Size": str(len(train_data.dataset)),
"device": "cpu",
"center": args.rank,
"learning-rate": args.lr,
"batch-size": args.batch_size,
"client-optimizer" : args.client_optimizer,
"weight-decay": args.weight_decay
}
)
run.log_metric(name="Validation:AUC", value=epoch_auc)
run.log_metric(name="Training:Loss", value=epoch_loss)

Similarly, you can use the code in heart_disease_aggregator.py to run a test on local data after updating the model weights. The details are logged after each communication run with the clients.

# create an experiment and start a new run
with Run(experiment_name=experiment_name, run_name=run_name, sagemaker_session=Session()) as run:
run.log_parameters(
{ "Train Data Size": str(len(test_data_local_dict[i])),
"device": "cpu",
"round": i,
"learning-rate": args.lr,
"batch-size": args.batch_size,
"client-optimizer" : args.client_optimizer,
"weight-decay": args.weight_decay
}
)
run.log_metric(name="Test:AUC", value=test_auc_metrics)
run.log_metric(name="Test:Loss", value=test_loss_metrics)

Clean up

When you’re done with the solution, make sure to clean up the resources used to ensure efficient resource utilization and cost management, and avoid unnecessary expenses and resource wastage. Active tidying up the environment, such as deleting unused instances, stopping unnecessary services, and removing temporary data, contributes to a clean and organized infrastructure. You can use the following code to clean up your resources:

terraform destroy -target=module.m_fedml_edge_server.module.eks_blueprints_kubernetes_addons -auto-approve
terraform destroy -target=module.m_fedml_edge_client_1.module.eks_blueprints_kubernetes_addons -auto-approve
terraform destroy -target=module.m_fedml_edge_client_2.module.eks_blueprints_kubernetes_addons -auto-approve

terraform destroy -target=module.m_fedml_edge_client_1.module.eks -auto-approve
terraform destroy -target=module.m_fedml_edge_client_2.module.eks -auto-approve
terraform destroy -target=module.m_fedml_edge_server.module.eks -auto-approve

terraform destroy

Summary

By using Amazon EKS as the infrastructure and FedML as the framework for FL, we are able to provide a scalable and managed environment for training and deploying shared models while respecting data privacy. With the decentralized nature of FL, organizations can collaborate securely, unlock the potential of distributed data, and improve ML models without compromising data privacy.

As always, AWS welcomes your feedback. Please leave your thoughts and questions in the comments section.


About the Authors

Randy DeFauwRandy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. He entered the big data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences, including Strata and GlueCon.

Arnab Sinha is a Senior Solutions Architect for AWS, acting as Field CTO to help organizations design and build scalable solutions supporting business outcomes across data center migrations, digital transformation and application modernization, big data, and machine learning. He has supported customers across a variety of industries, including energy, retail, manufacturing, healthcare, and life sciences. Arnab holds all AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Arnab was a technology leader and previously held architect and engineering leadership roles.

Prachi Kulkarni is a Senior Solutions Architect at AWS. Her specialization is machine learning, and she is actively working on designing solutions using various AWS ML, big data, and analytics offerings. Prachi has experience in multiple domains, including healthcare, benefits, retail, and education, and has worked in a range of positions in product engineering and architecture, management, and customer success.

Tamer Sherif is a Principal Solutions Architect at AWS, with a diverse background in the technology and enterprise consulting services realm, spanning over 17 years as a Solutions Architect. With a focus on infrastructure, Tamer’s expertise covers a broad spectrum of industry verticals, including commercial, healthcare, automotive, public sector, manufacturing, oil and gas, media services, and more. His proficiency extends to various domains, such as cloud architecture, edge computing, networking, storage, virtualization, business productivity, and technical leadership.

Hans Nesbitt is a Senior Solutions Architect at AWS based out of Southern California. He works with customers across the western US to craft highly scalable, flexible, and resilient cloud architectures. In his spare time, he enjoys spending time with his family, cooking, and playing guitar.

Chaoyang He is Co-founder and CTO of FedML, Inc., a startup running for a community building open and collaborative AI from anywhere at any scale. His research focuses on distributed and federated machine learning algorithms, systems, and applications. He received his PhD in Computer Science from the University of Southern California.

Al Nevarez is Director of Product Management at FedML. Before FedML, he was a group product manager at Google, and a senior manager of data science at LinkedIn. He has several data product-related patents, and he studied engineering at Stanford University.

Salman Avestimehr is Co-founder and CEO of FedML. He has been a Dean’s Professor at USC, Director of the USC-Amazon Center on Trustworthy AI, and an Amazon Scholar in Alexa AI. He is an expert on federated and decentralized machine learning, information theory, security, and privacy. He is a Fellow of IEEE and received his PhD in EECS from UC Berkeley.

Samir Lad is an accomplished enterprise technologist with AWS who works closely with customers’ C-level executives. As a former C-suite executive who has driven transformations across multiple Fortune 100 companies, Samir shares his invaluable experiences to help his clients succeed in their own transformation journey.

Stephen Kraemer is a Board and CxO advisor and former executive at AWS. Stephen advocates culture and leadership as the foundations of success. He professes security and innovation the drivers of cloud transformation enabling highly competitive, data-driven organizations.

Read More

Enable data sharing through federated learning: A policy approach for chief digital officers

Enable data sharing through federated learning: A policy approach for chief digital officers

This is a guest blog post written by Nitin Kumar, a Lead Data Scientist at T and T Consulting Services, Inc.

In this post, we discuss the value and potential impact of federated learning in the healthcare field. This approach can help heart stroke patients, doctors, and researchers with faster diagnosis, enriched decision-making, and more informed, inclusive research work on stroke-related health issues, using a cloud-native approach with AWS services for lightweight lift and straightforward adoption.

Diagnosis challenges with heart strokes

Statistics from the Centers for Disease Control and Prevention (CDC) show that each year in the US, more than 795,000 people suffer from their first stroke, and about 25% of them experience recurrent attacks. It is the number five cause of death according to the American Stroke Association and a leading cause of disability in the US. Therefore, it’s crucial to have prompt diagnosis and treatment to reduce brain damage and other complications in acute stroke patients.

CTs and MRIs are the gold standard in imaging technologies for classifying different sub-types of strokes and are crucial during preliminary assessment of patients, determining the root cause, and treatment. One critical challenge here, especially in the case of acute stroke, is the time of imaging diagnosis, which on average ranges from 30 minutes up to an hour and can be much longer depending on emergency department crowding.

Doctors and medical staff need quick and accurate image diagnosis to evaluate a patient’s condition and propose treatment options. In Dr. Werner Vogels’s own words at AWS re:Invent 2023, “every second that a person has a stroke counts.” Stroke victims can lose around 1.9 billion neurons every second they are not being treated.

Medical data restrictions

You can use machine learning (ML) to assist doctors and researchers in diagnosis tasks, thereby speeding up the process. However, the datasets needed to build the ML models and give reliable results are sitting in silos across different healthcare systems and organizations. This isolated legacy data has the potential for massive impact if cumulated. So why hasn’t it been used yet?

There are multiple challenges when working with medical domain datasets and building ML solutions, including patient privacy, security of personal data, and certain bureaucratic and policy restrictions. Additionally, research institutions have been tightening their data sharing practices. These obstacles also prevent international research teams from working together on diverse and rich datasets, which could save lives and prevent disabilities that can result from heart strokes, among other benefits.

Policies and regulations like General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPPA), and California Consumer Privacy Act (CCPA) put guardrails on sharing data from the medical domain, especially patient data. Additionally, the datasets at individual institutes, organizations, and hospitals are often too small, are unbalanced, or have biased distribution, leading to model generalization constraints.

Federated learning: An introduction

Federated learning (FL) is a decentralized form of ML—a dynamic engineering approach. In this decentralized ML approach, the ML model is shared between organizations for training on proprietary data subsets, unlike traditional centralized ML training, where the model generally trains on aggregated datasets. The data stays protected behind the organization’s firewalls or VPC, while the model with its metadata is shared.

In the training phase, a global FL model is disseminated and synchronized between unit organizations for training on individual datasets, and a local trained model is returned. The final global model is available to use to make predictions for everyone among the participants, and can also be used as a base for further training to build local custom models for participating organizations. It can further be extended to benefit other institutes. This approach can significantly reduce the cybersecurity requirements for data in transit by removing the need for data to transit outside of the organization’s boundaries at all.

The following diagram illustrates an example architecture.

In the following sections, we discuss how federated learning can help.

Federation learning to save the day (and save lives)

For good artificial intelligence (AI), you need good data.

Legacy systems, which are frequently found in the federal domain, pose significant data processing challenges before you can derive any intelligence or merge them with newer datasets. This is an obstacle in providing valuable intelligence to leaders. It can lead to inaccurate decision-making because the proportion of legacy data is sometimes much more valuable compared to the newer small dataset. You want to resolve this bottleneck effectively and without workloads of manual consolidation and integration efforts (including cumbersome mapping processes) for legacy and newer datasets sitting across hospitals and institutes, which can take many months—if not years, in many cases. The legacy data is quite valuable because it holds important contextual information needed for accurate decision-making and well-informed model training, leading to reliable AI in the real world. Duration of data informs on long-term variations and patterns in the dataset that would otherwise go undetected and lead to biased and ill-informed predictions.

Breaking down these data silos to unite the untapped potential of the scattered data can save and transform many lives. It can also accelerate the research related to secondary health issues arising from heart strokes. This solution can help you share insights from data isolated between institutes due to policy and other reasons, whether you are a hospital, a research institute, or other health data-focused organizations. It can enable informed decisions on research direction and diagnosis. Additionally, it results in a centralized repository of intelligence via a secure, private, and global knowledge base.

Federated learning has many benefits in general and specifically for medical data settings.

Security and Privacy features:

  • Keeps sensitive data away from the internet and still uses it for ML, and harnesses its intelligence with differential privacy
  • Enables you to build, train, and deploy unbiased and robust models across not just machines but also networks, without any data security hazards
  • Overcomes the hurdles with multiple vendors managing the data
  • Eliminates the need for cross-site data sharing and global governance
  • Preserves privacy with differential privacy and offers secure multi-party computation with local training

Performance Improvements:

  • Addresses the small sample size problem in the medical imaging space and costly labeling processes
  • Balances the distribution of the data
  • Enables you to incorporate most traditional ML and deep learning (DL) methods
  • Uses pooled image sets to help improve statistical power, overcoming the sample size limitation of individual institutions

Resilience Benefits:

  • If any one party decides to leave, it won’t hinder the training
  • A new hospital or institute can join at any time; it’s not reliant on any specific dataset with any node organization
  • There is no need for extensive data engineering pipelines for the legacy data scattered across widespread geographical locations

These features can help bring the walls down between institutions hosting isolated datasets on similar domains. The solution can become a force multiplier by harnessing the unified powers of distributed datasets and improving efficiency by radically transforming the scalability aspect without the heavy infrastructure lift. This approach helps ML reach its full potential, becoming proficient at the clinical level and not just research.

Federated learning has comparable performance to regular ML, as shown in the following experiment by NVidia Clara (on Medical Modal ARchive (MMAR) using the BRATS2018 dataset). Here, FL achieved a comparable segmentation performance compared to training with centralized data: over 80% with approximately 600 epochs while training a multi-modal, multi-class brain tumor segmentation task.

Federated learning has been tested recently in a few medical sub-fields for use cases including patient similarity learning, patient representation learning, phenotyping, and predictive modeling.

Application blueprint: Federated learning makes it possible and straightforward

To get started with FL, you can choose from many high-quality datasets. For example, datasets with brain images include ABIDE (Autism Brain Imaging Data Exchange initiative), ADNI (Alzheimer’s Disease Neuroimaging Initiative), RSNA (Radiological Society of North America) Brain CT, BraTS (Multimodal Brain Tumor Image Segmentation Benchmark) updated regularly for the Brain Tumor Segmentation Challenge under UPenn (University of Pennsylvania), UK BioBank (covered in the following NIH paper), and IXI. Similarly for heart images, you can choose from several publicly available options, including ACDC (Automatic Cardiac Diagnosis Challenge), which is a cardiac MRI assessment dataset with full annotation mentioned by the National Library of Medicine in the following paper, and M&M (Multi-Center, Multi-Vendor, and Multi-Disease) Cardiac Segmentation Challenge mentioned in the following IEEE paper.

The following images show a probabilistic lesion overlap map for the primary lesions from the ATLAS R1.1 dataset. (Strokes are one of the most common causes of brain lesions according to Cleveland Clinic.)

For Electronic Health Records (EHR) data, a few datasets are available that follow the Fast Healthcare Interoperability Resources (FHIR) standard. This standard helps you build straightforward pilots by removing certain challenges with heterogenous, non-normalized datasets, allowing for seamless and secure exchange, sharing, and integration of datasets. The FHIR enables maximum interoperability. Dataset examples include MIMIC-IV (Medical Information Mart for Intensive Care). Other good-quality datasets that aren’t currently FHIR but can be easily converted include Centers for Medicare & Medicaid Services (CMS) Public Use Files (PUF) and eICU Collaborative Research Database from MIT (Massachusetts Institute of Technology). There are also other resources becoming available that offer FHIR-based datasets.

The lifecycle for implementing FL can include the following steps: task initialization, selection, configuration, model training, client/server communication, scheduling and optimization, versioning, testing, deployment, and termination. There are many time-intensive steps that go into preparing medical imaging data for traditional ML, as described in the following paper. Domain knowledge might be needed in some scenarios to preprocess raw patient data, especially due to its sensitive and private nature. These can be consolidated and sometimes eliminated for FL, saving crucial time for training and providing faster results.

Implementation

FL tools and libraries have grown with widespread support, making it straightforward to use FL without a heavy overhead lift. There are a lot of good resources and framework options available to get started. You can refer to the following extensive list of the most popular frameworks and tools in the FL domain, including PySyft, FedML, Flower, OpenFL, FATE, TensorFlow Federated, and NVFlare. It provides a beginner’s list of projects to get started quickly and build upon.

You can implement a cloud-native approach with Amazon SageMaker that seamlessly works with AWS VPC peering, keeping each node’s training in a private subnet in their respective VPC and enabling communication via private IPv4 addresses. Furthermore, model hosting on Amazon SageMaker JumpStart can help by exposing the endpoint API without sharing model weights.

It also takes away potential high-level compute challenges with on-premises hardware with Amazon Elastic Compute Cloud (Amazon EC2) resources. You can implement the FL client and servers on AWS with SageMaker notebooks and Amazon Simple Storage Service (Amazon S3), maintain regulated access to the data and model with AWS Identity and Access Management (IAM) roles, and use AWS Security Token Service (AWS STS) for client-side security. You can also build your own custom system for FL using Amazon EC2.

For a detailed overview of implementing FL with the Flower framework on SageMaker, and a discussion of its difference from distributed training, refer to Machine learning with decentralized training data using federated learning on Amazon SageMaker.

The following figures illustrate the architecture of transfer learning in FL.

Addressing FL data challenges

Federated learning comes with its own data challenges, including privacy and security, but they are straightforward to address. First, you need to address the data heterogeneity problem with medical imaging data arising from data being stored across different sites and participating organizations, known as a domain shift problem (also referred to as client shift in an FL system), as highlighted by Guan and Liu in the following paper. This can lead to a difference in convergence of the global model.

Other components for consideration include ensuring data quality and uniformity at the source, incorporating expert knowledge into the learning process to inspire confidence in the system among medical professionals, and achieving model precision. For more information about some of the potential challenges you may face during implementation, refer to the following paper.

AWS helps you resolve these challenges with features like the flexible compute of Amazon EC2 and pre-built Docker images in SageMaker for straightforward deployment. You can resolve client-side problems like unbalanced data and computation resources for each node organization. You can address server-side learning problems like poisoning attacks from malicious parties with Amazon Virtual Private Cloud (Amazon VPC), security groups, and other security standards, preventing client corruption and implementing AWS anomaly detection services.

AWS also helps in addressing real-world implementation challenges, which can include integration challenges, compatibility issues with current or legacy hospital systems, and user adoption hurdles, by offering flexible, easy-to-use, and effortless lift tech solutions.

With AWS services, you can enable large-scale FL-based research and clinical implementation and deployment, which can consist of various sites across the world.

Recent policies on interoperability highlight the need for federated learning

Many laws recently passed by the government include a focus on data interoperability, bolstering the need for cross-organizational interoperability of data for intelligence. This can be fulfilled by using FL, including frameworks like the TEFCA (Trusted Exchange Framework and Common Agreement) and the expanded USCID (United States Core Data for Interoperability).

The proposed idea also contributes towards the CDC’s capture and distribution initiative CDC Moving Forward. The following quote from the GovCIO article Data Sharing and AI Top Federal Health Agency Priorities in 2024 also echoes a similar theme: “These capabilities can also support the public in an equitable way, meeting patients where they are and unlocking critical access to these services. Much of this work comes down to the data.”

This can help medical institutes and agencies around the country (and across the globe) with data silos. They can benefit from seamless and secure integration and data interoperability, making medical data usable for impactful ML-based predictions and pattern recognition. You can start with images, but the approach is applicable to all EHR as well. The goal is to find the best approach for data stakeholders, with a cloud-native pipeline to normalize and standardize the data or directly use it for FL.

Let’s explore an example use case. Heart stroke imaging data and scans are scattered around the country and the world, sitting in isolated silos in institutes, universities, and hospitals, and separated by bureaucratic, geographical, and political boundaries. There is no single aggregated source and no easy way for medical professionals (non-programmers) to extract insights from it. At the same time, it’s not feasible to train ML and DL models on this data, which could help medical professionals make faster, more accurate decisions in critical times when heart scans can take hours to come in while the patient’s life could be hanging in the balance.

Other known use cases include POTS (Purchasing Online Tracking System) at NIH (National Institutes of Health) and cybersecurity for scattered and tiered intelligence solution needs at COMCOMs/MAJCOMs locations around the globe.

Conclusion

Federated learning holds great promise for legacy healthcare data analytics and intelligence. It’s straightforward to implement a cloud-native solution with AWS services, and FL is especially helpful for medical organizations with legacy data and technical challenges. FL can have a potential impact on the entire treatment cycle, and now even more so with the focus on data interoperability from large federal organizations and government leaders.

This solution can help you avoid reinventing the wheel and use the latest technology to take a leap from legacy systems and be at the forefront in this ever-evolving world of AI. You can also become a leader for best practices and an efficient approach to data interoperability within and across agencies and institutes in the health domain and beyond. If you are an institute or agency with data silos scattered around the country, you can benefit from this seamless and secure integration.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post. It is each customers’ responsibility to determine whether they are subject to HIPAA, and if so, how best to comply with HIPAA and its implementing regulations. Before using AWS in connection with protected health information, customers must enter an AWS Business Associate Addendum (BAA) and follow its configuration requirements.


About the Author

Nitin Kumar (MS, CMU) is a Lead Data Scientist at T and T Consulting Services, Inc. He has extensive experience with R&D prototyping, health informatics, public sector data, and data interoperability. He applies his knowledge of cutting-edge research methods to the federal sector to deliver innovative technical papers, POCs, and MVPs. He has worked with multiple federal agencies to advance their data and AI goals. Nitin’s other focus areas include natural language processing (NLP), data pipelines, and generative AI.

Read More

Construction of Paired Knowledge Graph – Text Datasets Informed by Cyclic Evaluation

Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the…Apple Machine Learning Research

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

The journey of PGA TOUR’s generative AI virtual assistant, from concept to development to prototype

This is a guest post co-written with Scott Gutterman from the PGA TOUR.

Generative artificial intelligence (generative AI) has enabled new possibilities for building intelligent systems. Recent improvements in Generative AI based large language models (LLMs) have enabled their use in a variety of applications surrounding information retrieval. Given the data sources, LLMs provided tools that would allow us to build a Q&A chatbot in weeks, rather than what may have taken years previously, and likely with worse performance. We formulated a Retrieval-Augmented-Generation (RAG) solution that would allow the PGA TOUR to create a prototype for a future fan engagement platform that could make its data accessible to fans in an interactive fashion in a conversational format.

Using structured data to answer questions requires a way to effectively extract data that’s relevant to a user’s query. We formulated a text-to-SQL approach where by a user’s natural language query is converted to a SQL statement using an LLM. The SQL is run by Amazon Athena to return the relevant data. This data is again provided to an LLM, which is asked to answer the user’s query given the data.

Using text data requires an index that can be used to search and provide relevant context to an LLM to answer a user query. To enable quick information retrieval, we use Amazon Kendra as the index for these documents. When users ask questions, our virtual assistant rapidly searches through the Amazon Kendra index to find relevant information. Amazon Kendra uses natural language processing (NLP) to understand user queries and find the most relevant documents. The relevant information is then provided to the LLM for final response generation. Our final solution is a combination of these text-to-SQL and text-RAG approaches.

In this post we highlight how the AWS Generative AI Innovation Center collaborated with the AWS Professional Services and PGA TOUR to develop a prototype virtual assistant using Amazon Bedrock that could enable fans to extract information about any event, player, hole or shot level details in a seamless interactive manner. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Development: Getting the data ready

As with any data-driven project, performance will only ever be as good as the data. We processed the data to enable the LLM to be able to effectively query and retrieve relevant data.

For the tabular competition data, we focused on a subset of data relevant to the greatest number of user queries and labelled the columns intuitively, such that they would be easier for LLMs to understand. We also created some auxiliary columns to help the LLM understand concepts it might otherwise struggle with. For example, if a golfer shoots one shot less than par (such as makes it in the hole in 3 shots on a par 4 or in 4 shots on a par 5), it is commonly called a birdie. If a user asks, “How many birdies did player X make in last year?”, just having the score and par in the table is not sufficient. As a result, we added columns to indicate common golf terms, such as bogey, birdie, and eagle. In addition, we linked the Competition data with a separate video collection, by joining a column for a video_id, which would allow our app to pull the video associated with a particular shot in the Competition data. We also enabled joining text data to the tabular data, for example adding biographies for each player as a text column. The following figures shows the step-by-step procedure of how a query is processed for the text-to-SQL pipeline. The numbers indicate the series of step to answer a query.

In the following figure we demonstrate our end-to-end pipeline. We use AWS Lambda as our orchestration function responsible for interacting with various data sources, LLMs and error correction based on the user query. Steps 1-8 are similar to what is shown in the proceeding figure. There are slight changes for the unstructured data, which we discuss next.

Text data requires unique processing steps that chunk (or segment) long documents into parts digestible by the LLM, while maintaining topic coherence. We experimented with several approaches and settled on a page-level chunking scheme that aligned well with the format of the Media Guides. We used Amazon Kendra, which is a managed service that takes care of indexing documents, without requiring specification of embeddings, while providing an easy API for retrieval. The following figure illustrates this architecture.

The unified, scalable pipeline we developed allows the PGA TOUR to scale to their full history of data, some of which goes back to the 1800s. It enables future applications that can take live on the course context to create rich real-time experiences.

Development: Evaluating LLMs and developing generative AI applications

We carefully tested and evaluated the first- and third-party LLMs available in Amazon Bedrock to choose the model that is best suited for our pipeline and use case. We selected Anthropic’s Claude v2 and Claude Instant on Amazon Bedrock. For our final structured and unstructured data pipeline, we observe Anthropic’s Claude 2 on Amazon Bedrock generated better overall results for our final data pipeline.

Prompting is a critical aspect of getting LLMs to output text as desired. We spent considerable time experimenting with different prompts for each of the tasks. For example, for the text-to-SQL pipeline we had several fallback prompts, with increasing specificity and gradually simplified table schemas. If a SQL query was invalid and resulted in an error from Athena, we developed an error correction prompt that would pass the error and incorrect SQL to the LLM and ask it to fix it. The final prompt in the text-to-SQL pipeline asks the LLM to take the Athena output, which can be provided in Markdown or CSV format, and provide an answer to the user. For the unstructured text, we developed general prompts to use the context retrieved from Amazon Kendra to answer the user question. The prompt included instructions to use only the information retrieved from Amazon Kendra and not rely on data from the LLM pre-training.

Latency is often a concern with generative AI applications, and it is also the case here. It is especially a concern for text-to-SQL, which requires an initial SQL generation LLM invocation, followed by a response generation LLM invocation. If we’re using a large LLM, such as Anthropic’s Claude V2, this effectively doubles the latency of just one LLM invocation. We experimented with several configurations of large and smaller LLMs to evaluate run time as well as correctness. The following table shows an example for one question is shown below that demonstrates the latency, as well as the generated responses with Anthropic’s Claude V2 and Claude Instant on Amazon Bedrock.

Prototype

In our application, we used a Lambda function to orchestrate the prompt choices and communication between Amazon Athena, Amazon Kendra, and Amazon Bedrock. The virtual assistant features a conversational interface, built with React and the Cloudscape Design System, that uses text and videos to engage with users.

Our virtual assistant can field a wide range of questions about PGA TOUR events, players, statistics, history, and more. For example, when asked about Tony Finau’s longest drives at the Shriners Children’s Open (one of the Signature events of the PGA TOUR), the assistant can query structured data to pull his shot level data. It can extract the exact distance in yards, the round number and the hole during which he achieved this record using the text-to-SQL pipeline. The assistant then interprets this numerical data to generate a final response. The following table contains some example responses.

Question Response
What was the longest drive hit by Tony Finau in the Shriners Children’s Open? The longest drive hit by Tony Finau in the Shriners Childrens Open was 382 yards, which he hit during the first round on hole number 4 in 2018.
What was the longest drive hit by Collin Morikawa in the Shriners Children’s Open? Collin Morikawa’s longest drive at the Shriners Childrens Open was 334 yards. This occurred in 2019 during the first round on hole number 15.
Did anyone make an ace at the 2022 Shriners Children’s Open? Yes, Adam Hadwin made a hole-in-one on hole 14 during round 3 of the 2022 Shriners Children’s Open

The following explainer video highlights a few examples of interacting with the virtual assistant.

In initial testing, our PGA TOUR virtual assistant has shown great promise in improving fan experiences. By blending AI technologies like text-to-SQL, semantic search, and natural language generation, the assistant delivers informative, engaging responses. Fans are empowered to effortlessly access data and narratives that were previously hard to find.

What does the future hold?

As we continue development, we will expand the range of questions our virtual assistant can handle. This will require extensive testing, through collaboration between AWS and the PGA TOUR. Over time, we aim to evolve the assistant into a personalized, omni-channel experience accessible across web, mobile, and voice interfaces.

The establishment of a cloud-based generative AI assistant lets the PGA TOUR present its vast data source to multiple internal and external stakeholders. As the sports generative AI landscape evolves, it enables the creation of new content. For example, you can use AI and machine learning (ML) to surface content fans want to see as they’re watching an event, or as production teams are looking for shots from previous tournaments that match a current event. For example, if Max Homa is getting ready to take his final shot at the PGA TOUR Championship from a spot 20 feet from the pin, the PGA TOUR can use AI and ML to identify and present clips, with AI-generated commentary, of him attempting a similar shot five times previously. This kind of access and data allows a production team to immediately add value to the broadcast or allow a fan to customize the type of data that they want to see.

“The PGA TOUR is the industry leader in using cutting-edge technology to improve the fan experience. AI is at the forefront of our technology stack, where it is enabling us to create a more engaging and interactive environment for fans. This is the beginning of our generative AI journey in collaboration with the AWS Generative AI Innovation Center for a transformational end-to-end customer experience. We are working to leverage Amazon Bedrock and our propriety data to create an interactive experience for PGA TOUR fans to find information of interest about an event, player, stats, or other content in an interactive fashion.”
– Scott Gutterman, SVP of Broadcast and Digital Properties at PGA TOUR.

Conclusion

The project we discussed in this post exemplifies how structured and unstructured data sources can be fused using AI to create next-generation virtual assistants. For sports organizations, this technology enables more immersive fan engagement and unlocks internal efficiencies. The data intelligence we surface helps PGA TOUR stakeholders like players, coaches, officials, partners, and media make informed decisions faster. Beyond sports, our methodology can be replicated across any industry. The same principles apply to building assistants that engage customers, employees, students, patients, and other end-users. With thoughtful design and testing, virtually any organization can benefit from an AI system that contextualizes their structured databases, documents, images, videos, and other content.

If you’re interested in implementing similar functionalities, consider using Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock as an alternative, fully AWS-managed solution. This approach could further investigate providing intelligent automation and data search abilities through customizable agents. These agents could potentially transform user application interactions to be more natural, efficient, and effective.


About the authors

Scott Gutterman is the SVP of Digital Operations for the PGA TOUR. He is responsible for the TOUR’s overall digital operations, product development and is driving their GenAI strategy.

Ahsan Ali is an Applied Scientist at the Amazon Generative AI Innovation Center, where he works with customers from different domains to solve their urgent and expensive problems using Generative AI.

Tahin Syed is an Applied Scientist with the Amazon Generative AI Innovation Center, where he works with customers to help realize business outcomes with generative AI solutions. Outside of work, he enjoys trying new food, traveling, and teaching taekwondo.

Grace Lang is an Associate Data & ML engineer with AWS Professional Services. Driven by a passion for overcoming tough challenges, Grace helps customers achieve their goals by developing machine learning powered solutions.

Jae Lee is a Senior Engagement Manager in ProServe’s M&E vertical. She leads and delivers complex engagements, exhibits strong problem solving skill sets, manages stakeholder expectations, and curates executive level presentations. She enjoys working on projects focused on sports, generative AI, and customer experience.

Karn Chahar is a Security Consultant with the shared delivery team at AWS. He is a technology enthusiast who enjoys working with customers to solve their security challenges and to improve their security posture in the cloud.

Mike Amjadi is a Data & ML Engineer with AWS ProServe focused on enabling customers to maximize value from data. He specializes in designing, building, and optimizing data pipelines following well-architected principles. Mike is passionate about using technology to solve problems and is committed to delivering the best results for our customers.

Vrushali Sawant is a Front End Engineer with Proserve. She is highly skilled in creating responsive websites. She loves working with customers, understanding their requirements and providing them with scalable, easy to adopt UI/UX solutions.

Neelam Patel is a Customer Solutions Manager at AWS, leading key Generative AI and cloud modernization initiatives. Neelam works with key executives and technology owners to address their cloud transformation challenges and helps customers maximize the benefits of cloud adoption. She has an MBA from Warwick Business School, UK and a Bachelors in Computer Engineering, India.

Dr. Murali Baktha is Global Golf Solution Architect at AWS, spearheads pivotal initiatives involving Generative AI, data analytics and cutting-edge cloud technologies. Murali works with key executives and technology owners to understand customer’s business challenges and designs solutions to address those challenges. He has an MBA in Finance from UConn and a doctorate from Iowa State University.

Mehdi Noor is an Applied Science Manager at Generative Ai Innovation Center. With a passion for bridging technology and innovation, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation by focusing on scalable, measurable, and impactful uses of advanced AI technologies, and streamlining the path to production.

Read More