Automate mortgage document fraud detection using an ML model and business-defined rules with Amazon Fraud Detector: Part 3

Automate mortgage document fraud detection using an ML model and business-defined rules with Amazon Fraud Detector: Part 3

In the first post of this three-part series, we presented a solution that demonstrates how you can automate detecting document tampering and fraud at scale using AWS AI and machine learning (ML) services for a mortgage underwriting use case.

In the second post, we discussed an approach to develop a deep learning-based computer vision model to detect and highlight forged images in mortgage underwriting.

In this post, we present a solution to automate mortgage document fraud detection using an ML model and business-defined rules with Amazon Fraud Detector.

Solution overview

We use Amazon Fraud Detector, a fully managed fraud detection service, to automate the detection of fraudulent activities. With an objective to improve fraud prediction accuracies by proactively identifying document fraud, while improving underwriting accuracies, Amazon Fraud Detector helps you build customized fraud detection models using a historical dataset, configure customized decision logic using the built-in rules engine, and orchestrate risk decision workflows with the click of a button.

The following diagram represents each stage in a mortgage document fraud detection pipeline.

Conceptual Architecture

We will now be covering the third component of the mortgage document fraud detection pipeline. The steps to deploy this component are as follows:

  1. Upload historical data to Amazon Simple Storage Service (Amazon S3).
  2. Select your options and train the model.
  3. Create the model.
  4. Review model performance.
  5. Deploy the model.
  6. Create a detector.
  7. Add rules to interpret model scores.
  8. Deploy the API to make predictions.

Prerequisites

The following are prerequisite steps for this solution:

  1. Sign up for an AWS account.
  2. Set up permissions that allows your AWS account to access Amazon Fraud Detector.
  3. Collect the historical fraud data to be used to train the fraud detector model, with the following requirements:
    1. Data must be in CSV format and have headers.
    2. Two headers are required: EVENT_TIMESTAMP and EVENT_LABEL.
    3. Data must reside in Amazon S3 in an AWS Region supported by the service.
    4. It’s highly recommended to run a data profile before you train (use an automated data profiler for Amazon Fraud Detector).
    5. It’s recommended to use at least 3–6 months of data.
    6. It takes time for fraud to mature; data that is 1–3 months old is recommended (not too recent).
    7. Some NULLs and missing values are acceptable (but too many and the variable is ignored, as discussed in Missing or incorrect variable type).

Upload historical data to Amazon S3

After you have the custom historical data files to train a fraud detector model, create an S3 bucket and upload the data to the bucket.

Select options and train the model

The next step towards building and training a fraud detector model is to define the business activity (event) to evaluate for the fraud. Defining an event involves setting the variables in your dataset, an entity initiating the event, and the labels that classify the event.

Complete the following steps to define a docfraud event to detect document fraud, which is initiated by the entity applicant mortgage, referring to a new mortgage application:

  1. On the Amazon Fraud Detector console, choose Events in the navigation pane.
  2. Choose Create.
  3. Under Event type details, enter docfraud as the event type name and, optionally, enter a description of the event.
  4. Choose Create entity.
  5. On the Create entity page, enter applicant_mortgage as the entity type name and, optionally, enter a description of the entity type.
  6. Choose Create entity.
  7. Under Event variables, for Choose how to define this event’s variables, choose Select variables from a training dataset.
  8. For IAM role, choose Create IAM role.
  9. On the Create IAM role page, enter the name of the S3 bucket with your example data and choose Create role.
  10. For Data location, enter the path to your historical data. This is the S3 URI path that you saved after uploading the historical data. The path is similar to S3://your-bucket-name/example dataset filename.csv.
  11. Choose Upload.

Variables represent data elements that you want to use in a fraud prediction. These variables can be taken from the event dataset that you prepared for training your model, from your Amazon Fraud Detector model’s risk score outputs, or from Amazon SageMaker models. For more information about variables taken from the event dataset, see Get event dataset requirements using the Data models explorer.

  1. Under Labels – optional, for Labels, choose Create new labels.
  2. On the Create label page, enter fraud as the name. This label corresponds to the value that represents the fraudulent mortgage application in the example dataset.
  3. Choose Create label.
  4. Create a second label called legit. This label corresponds to the value that represents the legitimate mortgage application in the example dataset.
  5. Choose Create event type.

The following screenshot shows our event type details.

Event type details

The following screenshot shows our variables.

Model variables

The following screenshot shows our labels.

Labels

Create the model

After you have loaded the historical data and selected the required options to train a model, complete the following steps to create a model:

  1. On the Amazon Fraud Detector console, choose Models in the navigation pane.
  2. Choose Add model, and then choose Create model.
  3. On the Define model details page, enter mortgage_fraud_detection_model as the model’s name and an optional description of the model.
  4. For Model type, choose the Online Fraud Insights model.
  5. For Event type, choose docfraud. This is the event type that you created earlier.
  6. In the Historical event data section, provide the following information:
    1. For Event data source, choose Event data stored uploaded to S3 (or AFD).
    2. For IAM role, choose the role that you created earlier.
    3. For Training data location, enter the S3 URI path to your example data file.
  7. Choose Next.
  8. In the Model inputs section, leave all checkboxes checked. By default, Amazon Fraud Detector uses all variables from your historical event dataset as model inputs.
  9. In the Label classification section, for Fraud labels, choose fraud, which corresponds to the value that represents fraudulent events in the example dataset.
  10. For Legitimate labels, choose legit, which corresponds to the value that represents legitimate events in the example dataset.
  11. For Unlabeled events, keep the default selection Ignore unlabeled events for this example dataset.
  12. Choose Next.
  13. Review your settings, then choose Create and train model.

Amazon Fraud Detector creates a model and begins to train a new version of the model.

On the Model versions page, the Status column indicates the status of model training. Model training that uses the example dataset takes approximately 45 minutes to complete. The status changes to Ready to deploy after model training is complete.

Review model performance

After the model training is complete, Amazon Fraud Detector validates the model performance using 15% of your data that was not used to train the model and provides various tools, including a score distribution chart and confusion matrix, to assess model performance.

To view the model’s performance, complete the following steps:

  1. On the Amazon Fraud Detector console, choose Models in the navigation pane.
  2. Choose the model that you just trained (sample_fraud_detection_model), then choose 1.0. This is the version Amazon Fraud Detector created of your model.
  3. Review the Model performance overall score and all other metrics that Amazon Fraud Detector generated for this model.

Model performance

Deploy the model

After you have reviewed the performance metrics of your trained model and are ready to use it generate fraud predictions, you can deploy the model:

  1. On the Amazon Fraud Detector console, choose Models in the navigation pane.
  2. Choose the model sample_fraud_detection_model, and then choose the specific model version that you want to deploy. For this post, choose 1.0.
  3. On the Model version page, on the Actions menu, choose Deploy model version.

On the Model versions page, the Status shows the status of the deployment. The status changes to Active when the deployment is complete. This indicates that the model version is activated and available to generate fraud predictions.

Create a detector

After you have deployed the model, you build a detector for the docfraud event type and add the deployed model. Complete the following steps:

  1. On the Amazon Fraud Detector console, choose Detectors in the navigation pane.
  2. Choose Create detector.
  3. On the Define detector details page, enter fraud_detector for the detector name and, optionally, enter a description for the detector, such as my sample fraud detector.
  4. For Event Type, choose docfraud. This is the event that you created in earlier.
  5. Choose Next.

Add rules to interpret

After you have created the Amazon Fraud Detector model, you can use the Amazon Fraud Detector console or application programming interface (API) to define business-driven rules (conditions that tell Amazon Fraud Detector how to interpret model performance score when evaluating for fraud prediction). To align with the mortgage underwriting process, you may create rules to flag mortgage applications according to the risk levels associated and mapped as fraud, legitimate, or if a review is needed.

For example, you may want to automatically decline mortgage applications with a high fraud risk, considering parameters like tampered images of the required documents, missing documents like paystubs or income requirements, and so on. On the other hand, certain applications may need a human in the loop for making effective decisions.

Amazon Fraud Detector uses the aggregated value (calculated by combining a set of raw variables) and raw value (the value provided for the variable) to generate the model scores. The model scores can be between 0–1000, where 0 indicates low fraud risk and 1000 indicates high fraud risk.

To add the respective business-driven rules, complete the following steps:

  1. On the Amazon Fraud Detector console, choose Rules in the navigation pane.
  2. Choose Add rule.
  3. In the Define a rule section, enter fraud for the rule name and, optionally, enter a description.
  4. For Expression, enter the rule expression using the Amazon Fraud Detector simplified rule expression language $docdraud_insightscore >= 900
  5. For Outcomes, choose Create a new outcome (An outcome is the result from a fraud prediction and is returned if the rule matches during an evaluation.)
  6. In the Create a new outcome section, enter decline as the outcome name and an optional description.
  7. Choose Save outcome
  8. Choose Add rule to run the rule validation checker and save the rule.
  9. After it’s created, Amazon Fraud Detector makes the following high_risk rule available for use in your detector.
    1. Rule name: fraud
    2. Outcome: decline
    3. Expression: $docdraud_insightscore >= 900
  10. Choose Add another rule, and then choose the Create rule tab to add additional 2 rules as below:
  11. Create a low_risk rule with the following details:
    1. Rule name: legit
    2. Outcome: approve
    3. Expression: $docdraud_insightscore <= 500
  12. Create a medium_risk rule with the following details:
    1. Rule name: review needed
    2. Outcome: review
    3. Expression: $docdraud_insightscore <= 900 and docdraud_insightscore >=500

These values are examples used for this post. When you create rules for your own detector, use values that are appropriate for your model and use case.

  1. After you have created all three rules, choose Next.

Associated rules

Deploy the API to make predictions

After the rules-based actions have been triggered, you can deploy an Amazon Fraud Detector API to evaluate the lending applications and predict potential fraud. The predictions can be performed in a batch or real time.

Deploy Amazon Fraud Detector API

Integrate your SageMaker model (Optional)

If you already have a fraud detection model in SageMaker, you can integrate it with Amazon Fraud Detector for your preferred results.

This implies that you can use both SageMaker and Amazon Fraud Detector models in your application to detect different types of fraud. For example, your application can use the Amazon Fraud Detector model to assess the fraud risk of customer accounts, and simultaneously use your PageMaker model to check for account compromise risk.

Clean up

To avoid incurring any future charges, delete the resources created for the solution, including the following:

  • S3 bucket
  • Amazon Fraud Detector endpoint

Conclusion

This post walked you through an automated and customized solution to detect fraud in the mortgage underwriting process. This solution allows you to detect fraudulent attempts closer to the time of fraud occurrence and helps underwriters with an effective decision-making process. Additionally, the flexibility of the implementation allows you to define business-driven rules to classify and capture the fraudulent attempts customized to specific business needs.

For more information about building an end-to-end mortgage document fraud detection solution, refer to Part 1 and Part 2 in this series.


About the authors


Anup Ravindranath
is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada working with Financial Services organizations. He helps customers to transform their businesses and innovate on cloud.

Vinnie Saini is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada. She has been helping Financial Services customers transform on cloud, with AI and ML driven solutions laid on strong foundational pillars of Architectural Excellence.

Read More

Accenture creates a regulatory document authoring solution using AWS generative AI services

Accenture creates a regulatory document authoring solution using AWS generative AI services

This post is co-written with Ilan Geller, Shuyu Yang and Richa Gupta from Accenture.

Bringing innovative new pharmaceuticals drugs to market is a long and stringent process. Companies face complex regulations and extensive approval requirements from governing bodies like the US Food and Drug Administration (FDA). A key part of the submission process is authoring regulatory documents like the Common Technical Document (CTD), a comprehensive standard formatted document for submitting applications, amendments, supplements, and reports to the FDA. This document contains over 100 highly detailed technical reports created during the process of drug research and testing. Manually creating CTDs is incredibly labor-intensive, requiring up to 100,000 hours per year for a typical large pharma company. The tedious process of compiling hundreds of documents is also prone to errors.

Accenture built a regulatory document authoring solution using automated generative AI that enables researchers and testers to produce CTDs efficiently. By extracting key data from testing reports, the system uses Amazon SageMaker JumpStart and other AWS AI services to generate CTDs in the proper format. This revolutionary approach compresses the time and effort spent on CTD authoring. Users can quickly review and adjust the computer-generated reports before submission.

Because of the sensitive nature of the data and effort involved, pharmaceutical companies need a higher level of control, security, and auditability. This solution relies on the AWS Well-Architected principles and guidelines to enable the control, security, and auditability requirements. The user-friendly system also employs encryption for security.

By harnessing AWS generative AI, Accenture aims to transform efficiency for regulated industries like pharmaceuticals. Automating the frustrating CTD document process accelerates new product approvals so innovative treatments can get to patients faster. AI delivers a major leap forward.

This post provides an overview of an end-to-end generative AI solution developed by Accenture for regulatory document authoring using SageMaker JumpStart and other AWS services.

Solution overview

Accenture built an AI-based solution that automatically generates a CTD document in the required format, along with the flexibility for users to review and edit the generated content​. The preliminary value is estimated at a 40–45% reduction in authoring time.

This generative AI-based solution extracts information from the technical reports produced as part of the testing process and delivers the detailed dossier in a common format required by the central governing bodies. Users then review and edit the documents, where necessary, and submit the same to the central governing bodies. This solution uses the SageMaker JumpStart AI21 Jurassic Jumbo Instruct and AI21 Summarize models to extract and create the documents.

The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

  1. A user accesses the regulatory document authoring tool from their computer browser.
  2. A React application is hosted on AWS Amplify and is accessed from the user’s computer (for DNS, use Amazon Route 53).
  3. The React application uses the Amplify authentication library to detect whether the user is authenticated.
  4. Amazon Cognito provides a local user pool or can be federated with the user’s active directory.
  5. The application uses the Amplify libraries for Amazon Simple Storage Service (Amazon S3) and uploads documents provided by users to Amazon S3.
  6. The application writes the job details (app-generated job ID and Amazon S3 source file location) to an Amazon Simple Queue Service (Amazon SQS) queue. It captures the message ID returned by Amazon SQS. Amazon SQS enables a fault-tolerant decoupled architecture. Even if there are some backend errors while processing a job, having a job record inside Amazon SQS will ensure successful retries.
  7. Using the job ID and message ID returned by the previous request, the client connects to the WebSocket API and sends the job ID and message ID to the WebSocket connection.
  8. The WebSocket triggers an AWS Lambda function, which creates a record in Amazon DynamoDB. The record is a key-value mapping of the job ID (WebSocket) with the connection ID and message ID.
  9. Another Lambda function gets triggered with a new message in the SQS queue. The Lambda function reads the job ID and invokes an AWS Step Functions workflow for processing data files.
  10. The Step Functions state machine invokes a Lambda function to process the source documents. The function code invokes Amazon Textract to analyze the documents. The response data is stored in DynamoDB. Based on specific requirements with processing data, it can also be stored in Amazon S3 or Amazon DocumentDB (with MongoDB compatibility).
  11. A Lambda function invokes the Amazon Textract API DetectDocument to parse tabular data from source documents and stores extracted data into DynamoDB.
  12. A Lambda function processes the data based on mapping rules stored in a DynamoDB table.
  13. A Lambda function invokes the prompt libraries and a series of actions using generative AI with a large language model hosted through Amazon SageMaker for data summarization.
  14. The document writer Lambda function writes a consolidated document in an S3 processed folder.
  15. The job callback Lambda function retrieves the callback connection details from the DynamoDB table, passing the job ID. Then the Lambda function makes a callback to the WebSocket endpoint and provides the processed document link from Amazon S3.
  16. A Lambda function deletes the message from the SQS queue so that it’s not reprocessed.
  17. A document generator web module converts the JSON data into a Microsoft Word document, saves it, and renders the processed document on the web browser.
  18. The user can view, edit, and save the documents back to the S3 bucket from the web module. This helps in reviews and corrections needed, if any.

The solution also uses SageMaker notebooks (labeled T in the preceding architecture) to perform domain adaption, fine-tune the models, and deploy the SageMaker endpoints.

Conclusion

In this post, we showcased how Accenture is using AWS generative AI services to implement an end-to-end approach towards a regulatory document authoring solution. This solution in early testing has demonstrated a 60–65% reduction in the time required for authoring CTDs. We identified the gaps in traditional regulatory governing platforms and augmented generative intelligence within its framework for faster response times, and are continuously improving the system while engaging with users across the globe. Reach out to the Accenture Center of Excellence team to dive deeper into the solution and deploy it for your clients.

This joint program focused on generative AI will help increase the time-to-value for joint customers of Accenture and AWS. The effort builds on the 15-year strategic relationship between the companies and uses the same proven mechanisms and accelerators built by the Accenture AWS Business Group (AABG).

Connect with the AABG team at accentureaws@amazon.com to drive business outcomes by transforming to an intelligent data enterprise on AWS.

For further information about generative AI on AWS using Amazon Bedrock or SageMaker, refer to Generative AI on AWS: Technology and Get started with generative AI on AWS using Amazon SageMaker JumpStart.

You can also sign up for the AWS generative AI newsletter, which includes educational resources, blogs, and service updates.


About the Authors

Ilan Geller is a Managing Director in the Data and AI practice at Accenture.  He is the Global AWS Partner Lead for Data and AI and the Center for Advanced AI.  His roles at Accenture have primarily been focused on the design, development, and delivery of complex data, AI/ML, and most recently Generative AI solutions.

Shuyu Yang is Generative AI and Large Language Model Delivery Lead and also leads CoE (Center of Excellence) Accenture AI (AWS DevOps professional) teams.

Richa Gupta is a Technology Architect at Accenture, leading various AI projects. She comes with 18+ years of experience in architecting Scalable AI and GenAI solutions. Her expertise area is on AI architecture, Cloud Solutions and Generative AI. She plays and instrumental role in various presales activities.

Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Sachin Thakkar is a Senior Solutions Architect at Amazon Web Services, working with a leading Global System Integrator (GSI). He brings over 23 years of experience as an IT Architect and as Technology Consultant for large institutions. His focus area is on Data, Analytics and Generative AI. Sachin provides architectural guidance and supports the GSI partner in building strategic industry solutions on AWS.

Read More

Integrate QnABot on AWS with ServiceNow

Integrate QnABot on AWS with ServiceNow

Do your employees wait for hours on the telephone to open an IT ticket? Do they wait for an agent to triage an issue, which sometimes only requires restarting the computer? Providing excellent IT support is crucial for any organization, but legacy systems have relied heavily on human agents being available to intake reports and triage issues. Conversational AI (or chatbots) can help triage some of these common IT problems and create a ticket for the tasks when human assistance is needed. Chatbots quickly resolve common business issues, improve employee experiences, and free up agents’ time to handle more complex problems.

QnABot on AWS is an open source solution built using AWS native services like Amazon Lex, Amazon OpenSearch Service, AWS Lambda, Amazon Transcribe, and Amazon Polly. QnABot version 5.4+ is also enhanced with generative AI capabilities.

According to Gartner Magic Quadrant 2023, ServiceNow is one of the leading IT Service Management (ITSM) providers on the market. ServiceNow’s Incident Management uses workflows to identify, track, and resolve high‑impact IT service incidents.

In this post, we demonstrate how to integrate the QnABot on AWS chatbot solution with ServiceNow. With this integration, users can chat with QnABot to triage their IT service issues and open an incident ticket in ServiceNow in real time by providing details to QnABot.

Watch the following video to see how users can ask questions to an IT service desk chatbot and get answers. For most frequently asked questions, chatbot answers can help resolve the issue. When a user determines that the answers provided are not useful, they can request the creation of a ticket in ServiceNow.

Solution overview

QnABot on AWS is a multi-channel, multi-language chatbot that responds to your customer’s questions, answers, and feedback. QnABot on AWS is a complete solution and can be deployed as part of your IT Service Desk ticketing workflow. Its distributed architecture allows for integrations with other systems like ServiceNow. If you wish to build your own chatbot using Amazon Lex or add only Amazon Lex as part of your application, refer to Integrate ServiceNow with Amazon Lex chatbot for ticket processing.

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. A QnABot administrator can configure the questions using the Content Designer UI delivered by Amazon API Gateway and Amazon Simple Storage Service (Amazon S3).
  2. The Content Designer Lambda function saves the input in OpenSearch Service in a question’s bank index.
  3. When QnABot users ask questions prompting ServiceNow integration, Amazon Lex fetches the questions and requests the user to provide a description of the issue. When the description is provided, it invokes a Lambda function.
  4. The Lambda function fetches secrets from AWS Secrets Manager, where environment variables are stored, and makes an HTTP call to create a ticket in ServiceNow. The ticket number is then returned to the user.

When building a diagnostic workflow, you may require inputs to different questions before you can create a ticket in ServiceNow. You can use response bots and the document chaining capabilities of QnABot to achieve this capability.

Response bots are bots created to elicit a response from users and store them as part of session variables or as part of slot values. You can use built-in response bots or create a custom response bot. Response chatbot names must start with the letters “QNA.”

This solution provides a set of built-in response bots. Refer to Configuring the chatbot to ask the questions and use response bots for implementation details.

You can use document chaining to elicit the response and invoke Lambda functions. The chaining rule is a JavaScript programming expression used to test the value of the session attribute set to elicit a response and either route to another bot or invoke Lambda functions. You can identify the next question in the document by identifying the question ID (QID) specified in the Document Chaining:Chaining Rule field as ‘QID::‘ followed by the QID value of the document. For example, a rule that evaluates to “QID::Admin001” will chain to item Admin.001.

When using a chaining rule for Lambda, the function name must start with the letters “QNA,” and is specified in the Document Chaining:Chaining Rule field as ‘Lambda::FunctionNameorARN’. All chaining rules must be enclosed in a single quote.

Deploy the QnABot solution

Complete the following steps to deploy the solution:

  1. Choose Launch Solution on the QnABot implementation guide to deploy the latest QnABot template via AWS CloudFormation.
  2. Provide a name for the bot.
  3. Provide an email where you will receive an email to reset your password.
  4. Make sure that EnableCognitoLogin is set to true.
  5. For all other parameters, accept the defaults (see the implementation guide for parameter definitions), and launch the QnABot stack.

This post uses a static webpage hosted on Amazon CloudFront, and the QnABot chatbot is embedded in the page using the Amazon Lex web UI sample plugin. We also provide instructions for testing this solution using the QnABot client page.

Create a ServiceNow account

This section walks through the steps to create a ServiceNow account and ServiceNow developer instance:

  1. First, sign up for a ServiceNow account.

  1. Go to your email and confirm this email address for your ServiceNow ID.
  2. As part of the verification, you’ll will be asked to provide the six-digit verification code sent to your email.
  3. You can skip the page that asks you to set up two-factor authentication. You’re redirected to the landing page with the ServiceNow Developer program.
  4. In the Getting Started steps, choose Yes, I need a developer oriented IDE.

  1. Choose Start Building to set up an instance.

When the build is complete, which may take couple of seconds to minutes, you will be provided with the instance URL, user name, and password details. Save this information to use in later steps.

  1. Log in to the site using the following URL (provide your instance): https://devXXXXXX.service-now.com/now/nav/ui/classic/params/target/change_request_list.do.

Be sure to stay logged in to the ServiceNow developer instance throughout the process.

If logged out, use your email and password to log back in and wake up the instance and prevent hibernation.

  1. Choose All in the navigation bar, then choose Incidents.

  1. Select All to remove all of the filters.

All incidents will be shown on this page.

Create users in ServiceNow and an Amazon Cognito pool

You can create an incident using the userid of the chatbot user. For that, we need to confirm that the userId of the chatbot user exists in ServiceNow. First, we create the ServiceNow user, then we create a user with the same ID in an Amazon Cognito user pool. Amazon Cognito is an AWS service to authenticate clients and provide temporary AWS credentials.

  1. Create a ServiceNow user. Be sure to include a first name, last name, and email.

Note down the user ID of the newly created user. You will need this when creating an Amazon Cognito user in a user pool.

  1. On the Amazon Cognito console, choose User pools in the navigation pane.

If you have deployed the Amazon Lex web UI plugin, you will see two user pool names; if you did not, you’ll see only one user pool name.

  1. Select the user pool that has your QnABot name and create a new user. Use the same userId as that of the ServiceNow user.
  2. If you are using the Amazon Lex web UI, create a user in the appropriate Amazon Cognito user pool by following the preceding steps.

Note that the userId you created will be used for the QnABot client and Amazon Lex Web UI client.

Create a Lambda function for invoking ServiceNow

In this step, you create a Lambda function that invokes the ServiceNow API to create a ticket.

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose Create function.

  1. Select Author from scratch.
  2. For Function name, enter a name, such as qna-ChatBotLambda. (Remember that QnABot requires the prefix qna- in the name.)
  3. For Runtime, choose Node.js 18.x.

This Lambda function creates new role. If you want to use an existing role, you can change the default AWS Identity and Access Management (IAM) execution role by selecting Use existing role.

  1. Choose Create function.
  2. After you create the function, use the inline editor to edit the code for index.js.
  3. Right-click on index.js and rename it to index.mjs.
  4. Enter the following code, which is sample code for the function that you’re using as the compute layer for our logic:
import AWS from '@aws-sdk/client-secrets-manager';

const incident="incident";
const secret_name = "servicenow/password";

export const handler = async (event, context) => {
    console.log('Received event:',JSON.stringify(event, null,2));
    // make async call createticket which creates serviceNow ticket
    await createTicket( event).then(response => event=response);
    return event;
    
};

// async function to create servicenow ticket
async function createTicket( event){
 
    var password='';
    await getSecretValue().then(response => password=response);
    
    // fetch description and userid from event
      var shortDesc =  event.req._event.inputTranscript;
    console.log("received slots value", shortDesc);
    // userName of the logged in user
    var userName= event.req._userInfo.UserName;
    console.log("userId", userName);
    
    console.log("password from secrets manager::", password);
    // description provided by user is added to short_description
    var requestData = {
        "short_description": shortDesc,
        "caller_id": userName
      };
      var postData = JSON.stringify(requestData);

    // create url from hostname fetched from envrionment variables. Remaining path is constant.
    const url = "https://"+process.env.SERVICENOW_HOST+":443/api/now/table/"+incident;

    // create incident in servicenow and return event with ticket information
    try {
            await fetch(url,{
                method: 'POST',
            headers: {
                'Content-Type': 'application/json',
                'Accept': 'application/json',
                'Authorization': 'Basic ' + Buffer.from(process.env.SERVICENOW_USERNAME + ":" + password).toString('base64'),
                'Content-Length': Buffer.byteLength(postData),
            },
            'body': postData
            }).then(response=>response.json())
            .then(data=>{ console.log(data); 
                var ticketNumber = data.result.number;
                var ticketType = data.result.sys_class_name;
                event.res.message="Done! I've opened an " + ticketType + " ticket for you in ServiceNow. Your ticket number is: " + ticketNumber + ".";
            });  
            return event;
        }
        catch (e) {
            console.error(e);
            return 500;
        }

}

// get secret value from secrets manager
async function getSecretValue(){
    var secret;
    var client = new AWS.SecretsManager({
        region: process.env.AWS_REGION
    });
   // await to get secret value
    try {
        secret = await client.getSecretValue({SecretId: secret_name});
    }
    catch (err) {
        console.log("error", err);
    
    }   
   const secretString = JSON.parse(secret.SecretString);
    return secretString.password;
}

This function uses the ServiceNow Incident API. For more information, refer to Create an incident.

  1. Choose Deploy to deploy this code to the $LATEST version of the Lambda function.
  2. On the Configuration tab, in the Environment variables section, add the following:
      • Add SERVICENOW_HOST with the value devXXXXXX.service-now.com.
      • Add SERVICENOW_USERNAME with the value admin.

  3. Copy the Lambda function ARN. You will need it at later stage.

The next step is to store your ServiceNow user name and password in Secrets Manager.

  1. On the Secrets Manager console, create a new secret.
  2. Select Other type of secret.
  3. Add your key-value pairs as shown and choose Next.

  1. For Secret name, enter a descriptive name (for this post, servicenow/password). If you choose a different name, update the value of const secret_name in the Lambda function code.
  2. Choose Next.
  3. Leave Configure rotation on default and choose Next.
  4. Review the secret information and choose Store.
  5. Copy the ARN of the newly created secret.

Now let’s give Lambda permissions to Secrets Manager.

  1. On the Lambda function page, go to the Configurations tab and navigate to the Permissions section.

  1. Choose the execution role name to open the IAM page for the role.
  2. In the following inline policy, provide the ARN of the secret you created earlier:
{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "SecretsManagerRead",
			"Effect": "Allow",
			"Action": ["secretsmanager:GetResourcePolicy",
				"secretsmanager:GetSecretValue",
				"secretsmanager:DescribeSecret",
				"secretsmanager:ListSecrets",
				"secretsmanager:ListSecretVersionIds"
],
			"Resource": "<ARN>"
		}
	]
}
  1. Add the inline policy to the role.

Configure QnABot configurations

In this section, we first create some knowledge questions using the Questions feature of QnABot. We then create a response bot that elicits a response from a user when they ask for help. This bot uses document chaining to call another bot, and triggers Lambda to create a ServiceNow ticket.

For more information about using QnABot with generative AI, refer to Deploy generative AI self-service question answering using the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra, and Amazon Bedrock.

Create knowledge question 1

Create a knowledge question for installing software:

  1. On the AWS CloudFormation console, navigate to the QnABot stack.
  2. On the Outputs tab, and open the link for ContentDesignerURL.
  3. Log in to the QnABot Content Designer using admin credentials.
  4. Choose Add to add a new question.
  5. Select qna.
  6. For Item ID, enter software.001.
  7. Under Questions/Utterances, enter the following:
    a.	How to install a software 
    b.	How to install developer tools 
    c.	can you give me instructions to install software 
    

  8. Under Answer, enter the following answer:
Installing from Self Service does not require any kind of permissions or admin credentials. It will show you software that is available for you, without any additional requests.
1. Click the search icon in the menu at the top. Type Self Service and press Enter.
2. Sign in with your security key credentials.
3. Search for your desired software in the top right corner.
4. Click the Install button.

  1. Expand the Advanced section and enter the same text in Markdown Answer.

  1. Leave the rest as default, and choose Create to save the question.

Create knowledge question 2

Now you create the second knowledge question.

  1. Choose Add to add a new question.
  2. Select qna.
  3. For Item ID, enter knowledge.001.
  4. Under Questions/Utterances, enter Want to learn more about Amazon Lex.
  5. Under Answer, enter the following answer:
### Amazon Lex
Here is a video of Amazon Lex Introduction <iframe width="580" height="327" src="https://www.youtube.com/embed/Q2yJf4bn5fQ" title="Conversational AI powered by Amazon Lex | Amazon Web Services" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
Do you want to learn more about it?<br>
Here are some resources<br>
1. [Introduction to Amazon Lex](https://explore.skillbuilder.aws/learn/course/external/view/elearning/249/introduction-to-amazon-lex)
2. [Building better bots using Amazon Connect](https://explore.skillbuilder.aws/learn/course/external/view/elearning/481/building-better-bots-using-amazon-connect)
3. [Amazon Lex V2 getting started- Streaming APIs](https://aws.amazon.com/blogs/machine-learning/delivering-natural-conversational-experiences-using-amazon-lex-streaming-apis/)

  1. Expand the Advanced section and enter the same answer under Markdown Answer.

  1. Leave the rest as default, and choose Create to save the question.

Create knowledge question 3

Complete the following steps to add another knowledge question:

  1. Choose Add to add a new question.
  2. Select qna.
  3. For Item ID, enter password.reset.
  4. Under Questions/Utterances, enter I need to reset my password.
  5. Under Answer, enter the following answer:
#### Password Reset Instructions
Please follow below instructions to reset your password
1. Please go to AnyTech's IT web page. 
2. Use the Password Reset Tool on the left hand navigation. 
3. In the Password Reset Tool, provide your new password and save. 
4. Once you change your password, please log out of your laptop and login.
<br><br>
**Note**: If you are logged out of your computer, you can ask your manager to reset the password.

  1. Expand the Advanced section and enter the same text for Markdown Answer.
  2. Choose Create to save the question.

Create a response bot

Complete the following steps to create the first response bot, which elicits a response:

  1. Choose Add to add a new question.
  2. Select qna.
  3. For Item ID, enter ElicitResponse.001.
  4. Under Questions/Utterances, enter Please create a ticket.
  5. Under Answer, enter the following answer:
Sure, I can help you with that!! Please give a short description of your problem.

  1. Expand the Advanced section and navigate to the Elicit Response section.
  2. For Elicit Response: ResponseBot Hook, enter QNAFreeText.
  3. For Elicit Response: Response Session Attribute Namespace, enter short_description.

This creates a slot named short_description that captures the response or description for the incident. This slot uses the built-in QNAFreeText, which is used for capturing free text.

  1. For Document Chaining: Chaining Rule, enter QID::item.002. This must be in single quotes. Remember this chaining rule to use when creating your document chain.
  2. Leave the rest as default.

  1. Choose Create to save the question.

Create a document chain

Now we create a document chain in QnABot that will trigger the Lambda function to create a ticket and respond with a ticket number. Document chaining allows you to chain two bots based on the rule you configured. Complete the following steps:

  1. Choose Add to add a new question.
  2. Select qna.
  3. For Item ID, enter item.002. This should match the QID value given in the document chain rule earlier.
  4. Under Questions/Utterances, enter servicenow integration.
  5. Under Answer, enter the following answer:
There was an error, please contact system administrator
  1. In the Advanced section, add the Lambda function ARN for Lambda Hook.

  1. Choose Create to save the question.

Test the QnABot

To test the QnABot default client, complete the following steps:

  1. Choose the options menu in the Content Designer and choose QnABot Client.

The QnABot client will open in a new browser tab.

  1. Log in using the newly created user credentials to begin the test.

If you plan to use the Amazon Lex Web UI on a static page, follow these instructions.

  1. Choose the chat icon at the bottom of the page to start the chat.
  2. To log in, choose Login on the menu.

You will be routed to the login page.

  1. Provide the userId created earlier.
  2. For first-time logins, you will be prompted to reset your password.

  1. Now we can test the chatbot with example use cases. For our first use case, we want to learn about Amazon and enter the question “I want to learn about Amazon Lex, can you give me some information about it?” QnABot provides a video and some links to resources.

  1. In our next, example, we need to install software on our laptop, and ask “Can you give me instructions to install software.” QnABot understands that the user is requesting help installing software and provides answers from the knowledge bank. You can follow those instructions and install the software you need.

  1. While installing the software, what if you locked your password due to multiple failed login attempts? To request a password reset, you can ask “I need to reset my password.”

  1. You might need additional assistance resetting the password and want to create a ticket. In this case, enter “Please create a ticket.” QnABot asks for a description of the problem; you can enter “reset password.” QnAbot creates a ticket with the description provided and provides the ticket number as part of the response.

  1. You can verify the incident ticket was created on the ServiceNow console under Incidents. If the ticket is not shown on the first page, search for the ticket number using the search toolbar.

Clean up

To avoid incurring future charges, delete the resources you created. For instructions to uninstall the QnABot solution plugin, refer to Uninstall the solution.

Conclusion

Integrating QnABot on AWS with ServiceNow provides an end-to-end solution for automated customer support. With QnABot’s conversational AI capabilities to understand customer questions and ServiceNow’s robust incident management features, companies can streamline ticket creation and resolution. You can also extend this solution to show a list of tickets created by the user. For more information about incorporating these techniques into your bots, see QnABot on AWS.


About the Authors

Sujatha Dantuluri is a Senior Solutions Architect in the US federal civilian team at AWS. She has over 20 years of experience supporting commercial and federal government. She works closely with customers in building and architecting mission-critical solutions. She has also contributed to IEEE standards.

Maia Haile is a Solutions Architect at Amazon Web Services based in the Washington, D.C. area. In that role, she helps public sector customers achieve their mission objectives with well-architected solutions on AWS. She has 5 years of experience spanning nonprofit healthcare, media and entertainment, and retail. Her passion is using AI and ML to help public sector customers achieve their business and technical goals.

Read More

Deploy large language models for a healthtech use case on Amazon SageMaker

Deploy large language models for a healthtech use case on Amazon SageMaker

In 2021, the pharmaceutical industry generated $550 billion in US revenue. Pharmaceutical companies sell a variety of different, often novel, drugs on the market, where sometimes unintended but serious adverse events can occur.

These events can be reported anywhere, from hospitals or at home, and must be responsibly and efficiently monitored. Traditional manual processing of adverse events is made challenging by the increasing amount of health data and costs. Overall, $384 billion is projected as the cost of pharmacovigilance activities to the overall healthcare industry by 2022. To support overarching pharmacovigilance activities, our pharmaceutical customers want to use the power of machine learning (ML) to automate the adverse event detection from various data sources, such as social media feeds, phone calls, emails, and handwritten notes, and trigger appropriate actions.

In this post, we show how to develop an ML-driven solution using Amazon SageMaker for detecting adverse events using the publicly available Adverse Drug Reaction Dataset on Hugging Face. In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried.

We implemented the solution using the AWS Cloud Development Kit (AWS CDK). However, we don’t cover the specifics of building the solution in this post. For more information on the implementation of this solution, refer to Build a system for catching adverse events in real-time using Amazon SageMaker and Amazon QuickSight.

This post delves into several key areas, providing a comprehensive exploration of the following topics:

  • The data challenges encountered by AWS Professional Services
  • The landscape and application of large language models (LLMs):
    • Transformers, BERT, and GPT
    • Hugging Face
  • The fine-tuned LLM solution and its components:
    • Data preparation
    • Model training

Data challenge

Data skew is often a problem when coming up with classification tasks. You would ideally like to have a balanced dataset, and this use case is no exception.

We address this skew with generative AI models (Falcon-7B and Falcon-40B), which were prompted to generate event samples based on five examples from the training set to increase the semantic diversity and increase the sample size of labeled adverse events. It’s advantageous to us to use the Falcon models here because, unlike some LLMs on Hugging Face, Falcon gives you the training dataset they use, so you can be sure that none of your test set examples are contained within the Falcon training set and avoid data contamination.

The other data challenge for healthcare customers are HIPAA compliance requirements. Encryption at rest and in transit has to be incorporated into the solution to meet these requirements.

Transformers, BERT, and GPT

The transformer architecture is a neural network architecture that is used for natural language processing (NLP) tasks. It was first introduced in the paper “Attention Is All You Need” by Vaswani et al. (2017). The transformer architecture is based on the attention mechanism, which allows the model to learn long-range dependencies between words. Transformers, as laid out in the original paper, consist of two main components: the encoder and the decoder. The encoder takes the input sequence as input and produces a sequence of hidden states. The decoder then takes these hidden states as input and produces the output sequence. The attention mechanism is used in both the encoder and the decoder. The attention mechanism allows the model to attend to specific words in the input sequence when generating the output sequence. This allows the model to learn long-range dependencies between words, which is essential for many NLP tasks, such as machine translation and text summarization.

One of the more popular and useful of the transformer architectures, Bidirectional Encoder Representations from Transformers (BERT), is a language representation model that was introduced in 2018. BERT is trained on sequences where some of the words in a sentence are masked, and it has to fill in those words taking into account both the words before and after the masked words. BERT can be fine-tuned for a variety of NLP tasks, including question answering, natural language inference, and sentiment analysis.

The other popular transformer architecture that has taken the world by storm is Generative Pre-trained Transformer (GPT). The first GPT model was introduced in 2018 by OpenAI. It works by being trained to strictly predict the next word in a sequence, only aware of the context before the word. GPT models are trained on a massive dataset of text and code, and they can be fine-tuned for a range of NLP tasks, including text generation, question answering, and summarization.

In general, BERT is better at tasks that require deeper understanding of the context of words, whereas GPT is better suited for tasks that require generating text.

Hugging Face

Hugging Face is an artificial intelligence company that specializes in NLP. It provides a platform with tools and resources that enable developers to build, train, and deploy ML models focused on NLP tasks. One of the key offerings of Hugging Face is its library, Transformers, which includes pre-trained models that can be fine-tuned for various language tasks such as text classification, translation, summarization, and question answering.

Hugging Face integrates seamlessly with SageMaker, which is a fully managed service that enables developers and data scientists to build, train, and deploy ML models at scale. This synergy benefits users by providing a robust and scalable infrastructure to handle NLP tasks with the state-of-the-art models that Hugging Face offers, combined with the powerful and flexible ML services from AWS. You can also access Hugging Face models directly from Amazon SageMaker JumpStart, making it convenient to start with pre-built solutions.

Solution overview

We used the Hugging Face Transformers library to fine-tune transformer models on SageMaker for the task of adverse event classification. The training job is built using the SageMaker PyTorch estimator. SageMaker JumpStart also has some complementary integrations with Hugging Face that makes straightforward to implement. In this section, we describe the major steps involved in data preparation and model training.

Data preparation

We used the Adverse Drug Reaction Data (ade_corpus_v2) within the Hugging Face dataset with an 80/20 training/test split. The required data structure for our model training and inference has two columns:

  • One column for text content as model input data.
  • Another column for the label class. We have two possible classes for a text: Not_AE and Adverse_Event.

Model training and experimentation

In order to efficiently explore the space of possible Hugging Face models to fine-tune on our combined data of adverse events, we constructed a SageMaker hyperparameter optimization (HPO) job and passed in different Hugging Face models as a hyperparameter, along with other important hyperparameters such as training batch size, sequence length, models, and learning rate. The training jobs used an ml.p3dn.24xlarge instance and took an average of 30 minutes per job with that instance type. Training metrics were captured though the Amazon SageMaker Experiments tool, and each training job ran through 10 epochs.

We specify the following in our code:

  • Training batch size – Number of samples that are processed together before the model weights are updated
  • Sequence length – Maximum length of the input sequence that BERT can process
  • Learning rate – How quickly the model updates its weights during training
  • Models – Hugging Face pretrained models
# we use the Hyperparameter Tuner
from sagemaker.tuner import IntegerParameter,ContinuousParameter, CategoricalParameter
tuning_job_name = 'ade-hpo'
# Define exploration boundaries
hyperparameter_ranges = {
 'learning_rate': ContinuousParameter(5e-6,5e-4),
 'max_seq_length': CategoricalParameter(['16', '32', '64', '128', '256']),
 'train_batch_size': CategoricalParameter(['16', '32', '64', '128', '256']),
 'model_name': CategoricalParameter(["emilyalsentzer/Bio_ClinicalBERT", 
                                                            "dmis-lab/biobert-base-cased-v1.2", "monologg/biobert_v1.1_pubmed", "pritamdeka/BioBert-PubMed200kRCT", "saidhr20/pubmed-biobert-text-classification" ])
}

# create Optimizer
Optimizer = sagemaker.tuner.HyperparameterTuner(
    estimator=bert_estimator,
    hyperparameter_ranges=hyperparameter_ranges,
    base_tuning_job_name=tuning_job_name,
    objective_type='Maximize',
    objective_metric_name='f1',
    metric_definitions=[
        {'Name': 'f1',
         'Regex': "f1: ([0-9.]+).*$"}],  
    max_jobs=40,
    max_parallel_jobs=4,
)

Optimizer.fit({'training': inputs_data}, wait=False)

Results

The model that performed the best in our use case was the monologg/biobert_v1.1_pubmed model hosted on Hugging Face, which is a version of the BERT architecture that has been pre-trained on the Pubmed dataset, which consists of 19,717 scientific publications. Pre-training BERT on this dataset gives this model extra expertise when it comes to identifying context around medically related scientific terms. This boosts the model’s performance for the adverse event detection task because it has been pre-trained on medically specific syntax that shows up often in our dataset.

The following table summarizes our evaluation metrics.

Model Precision Recall F1
Base BERT 0.87 0.95 0.91
BioBert 0.89 0.95 0.92
BioBERT with HPO 0.89 0.96 0.929
BioBERT with HPO and synthetically generated adverse event 0.90 0.96 0.933

Although these are relatively small and incremental improvements over the base BERT model, this nevertheless demonstrates some viable strategies to improve model performance through these methods. Synthetic data generation with Falcon seems to hold a lot of promise and potential for performance improvements, especially as these generative AI models get better over time.

Clean up

To avoid incurring future charges, delete any resources created like the model and model endpoints you created with the following code:

# Delete resources
model_predictor.delete_model()
model_predictor.delete_endpoint()

Conclusion

Many pharmaceutical companies today would like to automate the process of identifying adverse events from their customer interactions in a systematic way in order to help improve customer safety and outcomes. As we showed in this post, the fine-tuned LLM BioBERT with synthetically generated adverse events added to the data classifies the adverse events with high F1 scores and can be used to build a HIPAA-compliant solution for our customers.

As always, AWS welcomes your feedback. Please leave your thoughts and questions in the comments section.


About the authors

Zack Peterson is a data scientist in AWS Professional Services. He has been hands on delivering machine learning solutions to customers for many years and has a master’s degree in Economics.

Dr. Adewale Akinfaderin is a senior data scientist in Healthcare and Life Sciences at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global healthcare customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in Physics and a doctorate degree in Engineering.

Ekta Walia Bhullar, PhD, is a senior AI/ML consultant with the AWS Healthcare and Life Sciences (HCLS) Professional Services business unit. She has extensive experience in the application of AI/ML within the healthcare domain, especially in radiology. Outside of work, when not discussing AI in radiology, she likes to run and hike.

Han Man is a Senior Data Science & Machine Learning Manager with AWS Professional Services based in San Diego, CA. He has a PhD in Engineering from Northwestern University and has several years of experience as a management consultant advising clients in manufacturing, financial services, and energy. Today, he is passionately working with key customers from a variety of industry verticals to develop and implement ML and generative AI solutions on AWS.

Read More

Announcing support for Llama 2 and Mistral models and streaming responses in Amazon SageMaker Canvas

Announcing support for Llama 2 and Mistral models and streaming responses in Amazon SageMaker Canvas

Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service for building and deploying machine learning (ML) models without the need to write any code. Ready-to-use Foundation Models (FMs) available in SageMaker Canvas enable customers to use generative AI for tasks such as content generation and summarization.

We are thrilled to announce the latest updates to Amazon SageMaker Canvas, which bring exciting new generative AI capabilities to the platform. With support for Meta Llama 2 and Mistral.AI models and the launch of streaming responses, SageMaker Canvas continues to empower everyone that wants to get started with generative AI without writing a single line of code. In this post, we discuss these updates and their benefits.

Introducing Meta Llama 2 and Mistral models

Llama 2 is a cutting-edge foundation model by Meta that offers improved scalability and versatility for a wide range of generative AI tasks. Users have reported that Llama 2 is capable of engaging in meaningful and coherent conversations, generating new content, and extracting answers from existing notes. Llama 2 is among the state-of-the-art large language models (LLMs) available today for the open source community to build their own AI-powered applications.

Mistral.AI, a leading AI French start-up, has developed the Mistral 7B, a powerful language model with 7.3 billion parameters. Mistral models has been very well received by the open-source community thanks to the usage of Grouped-query attention (GQA) for faster inference, making it highly efficient and performing comparably to model with twice or three times the number of parameters.

Today, we are excited to announce that SageMaker Canvas now supports three Llama 2 model variants and two Mistral 7B variants:

To test these models, navigate to the SageMaker Canvas Ready-to-use models page, then choose Generate, extract and summarize content. This is where you’ll find the SageMaker Canvas GenAI chat experience. In here, you can use any model from Amazon Bedrock or SageMaker JumpStart by selecting them on the model drop-down menu.

In our case, we choose one of the Llama 2 models. Now you can provide your input or query. As you send the input, SageMaker Canvas forwards your input to the model.

Choosing which one of the models available in SageMaker Canvas fits best for your use case requires you to take into account information about the models themselves: the Llama-2-70B-chat model is a bigger model (70 billion parameters, compared to 13 billion with Llama-2-13B-chat ), which means that its performance is generally higher that the smaller one, at the cost of a slightly higher latency and an increased cost per token. Mistral-7B has performances comparable to Llama-2-7B or Llama-2-13B, however it is hosted on Amazon SageMaker. This means that the pricing model is different, moving from a dollar-per-token pricing model, to a dollar-per-hour model. This can be more cost effective with a significant amount of requests per hour and a consistent usage at scale. All of the models above can perform well on a variety of use cases, so our suggestion is to evaluate which model best solves your problem, considering output, throughput, and cost trade-offs.

If you’re looking for a straightforward way to compare how models behave, SageMaker Canvas  natively provides this capability in the form of model comparisons. You can select up to three different models and send the same query to all of them at once. SageMaker Canvas will then get the responses from each of the models and show them in a side-by-side chat UI. To do this, choose Compare and choose other models to compare against, as shown below:

Introducing response streaming: Real-time interactions and enhanced performance

One of the key advancements in this release is the introduction of streamed responses. The streaming of responses provides a richer experience for the user and better reflects a chat experience. With streaming responses, users can receive instant feedback and seamless integration in their chatbot applications. This allows for a more interactive and responsive experience, enhancing the overall performance and user satisfaction of the chatbot. The ability to receive immediate responses in a chat-like manner creates a more natural conversation flow and improves the user experience.

With this feature, you can now interact with your AI models in real time, receiving instant responses and enabling seamless integration into a variety of applications and workflows. All models that can be queried in SageMaker Canvas—from Amazon Bedrock and SageMaker JumpStart—can stream responses to the user.

Get started today

Whether you’re building a chatbot, recommendation system, or virtual assistant, the Llama 2 and Mistral models combined with streamed responses bring enhanced performance and interactivity to your projects.

To use the latest features of SageMaker Canvas, make sure to delete and recreate the app. To do that, log out from the app by choosing Log out, then open SageMaker Canvas again. You should see the new models and enjoy the latest releases. Logging out of the SageMaker Canvas application will release all resources used by the workspace instance, therefore avoiding incurring additional unintended charges.

Conclusion

To get started with the new streamed responses for the Llama 2 and Mistral models in SageMaker Canvas, visit the SageMaker console and explore the intuitive interface. To learn more about how SageMaker Canvas and generative AI can help you achieve your business goals, refer to Empower your business users to extract insights from company documents using Amazon SageMaker Canvas and Generative AI and Overcoming common contact center challenges with generative AI and Amazon SageMaker Canvas.

If you want to learn more about SageMaker Canvas features and deep dive on other ML use cases, check out the other posts available in the SageMaker Canvas category of the AWS ML Blog. We can’t wait to see the amazing AI applications you will create with these new capabilities!


About the authors

Picture of DavideDavide Gallitelli is a Senior Specialist Solutions Architect for AI/ML. He is based in Brussels and works closely with customers all around the globe that are looking to adopt Low-Code/No-Code Machine Learning technologies, and Generative AI. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Dan Sinnreich is a Senior Product Manager at AWS, helping to democratize low-code/no-code machine learning. Previous to AWS, Dan built and commercialized enterprise SaaS platforms and time-series models used by institutional investors to manage risk and construct optimal portfolios. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

Read More

How HSR.health is limiting risks of disease spillover from animals to humans using Amazon SageMaker geospatial capabilities

How HSR.health is limiting risks of disease spillover from animals to humans using Amazon SageMaker geospatial capabilities

This is a guest post co-authored by Ajay K Gupta, Jean Felipe Teotonio and Paul A Churchyard from HSR.health.

HSR.health is a geospatial health risk analytics firm whose vision is that global health challenges are solvable through human ingenuity and the focused and accurate application of data analytics. In this post, we present one approach for zoonotic disease prevention that uses Amazon SageMaker geospatial capabilities to create a tool that provides more accurate disease spread information to health scientists to help them save more lives, quicker.

Zoonotic diseases affect both animals and humans. The transition of a disease from animal to human, known as spillover, is a phenomenon that continually occurs on our planet. According to health organizations such as the Centers for Disease Control and Prevention (CDC) and the World Health Organization (WHO), a spillover event at a wet market in Wuhan, China most likely caused the coronavirus disease 2019 (COVID-19). Studies suggest that a virus found in fruit bats underwent significant mutations, allowing it to infect humans. The initial patient, or ‘patient zero’, for COVID-19 probably started a subsequent local outbreak that eventually spread on internationally. HSR.health’s Zoonotic Spillover Risk Index aims to assist in the identification of these early outbreaks before they cross international borders and lead to widespread global impact.

The main weapon public health has against the propagation of regional outbreaks is disease surveillance: an entire interlocking system of disease reporting, investigation, and data communication between different levels of a public health system. This system is dependent not only on human factors, but also on technology and resources to collect disease data, analyze patterns, and create a consistent and continuous stream of data transfer from local to regional to central health authorities.

The speed at which COVID-19 went from a local outbreak to a global disease present in every single continent should be a sobering example of the dire need to harness innovative technology to create more efficient and accurate disease surveillance systems.

The risk of zoonotic disease spillover is sharply correlated with multiple social, environmental, and geographic factors that influence how often human beings interact with wildlife. HSR.health’s Zoonotic Disease Spillover Risk Index uses over 20 distinct geographic, social, and environmental factors historically known to affect the risk of human-wildlife interaction and therefore zoonotic disease spillover risk. Many of these factors can be mapped through a combination of satellite imagery and remote sensing.

In this post, we explore how HSR.health uses SageMaker geospatial capabilities to retrieve relevant features from satellite imagery and remote sensing for developing the risk index. SageMaker geospatial capabilities make it easy for data scientists and machine learning (ML) engineers to build, train, and deploy models using geospatial data. With SageMaker geospatial capabilities, you can efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained ML models, and explore model predictions and geospatial data on an interactive map using 3D accelerated graphics and built-in visualization tools.

Using ML and geospatial data for risk mitigation

ML is highly effective for anomaly detection on spatial or temporal data due to its ability to learn from data without being explicitly programmed to identify specific types of anomalies. Spatial data, which relates to the physical position and shape of objects, often contains complex patterns and relationships that may be difficult for traditional algorithms to analyze.

Incorporating ML with geospatial data enhances the capability to detect anomalies and unusual patterns systematically, which is essential for early warning systems. These systems are crucial in fields such as environmental monitoring, disaster management, and security. Predictive modeling using historical geospatial data allows organizations to identify and prepare for potential future events. These events range from natural disasters and traffic disruptions to, as this post discusses, disease outbreaks.

Detecting Zoonotic spillover risks

To predict zoonotic spillover risks, HSR.health has adopted a multimodal approach. By using a blend of data types—including environmental, biogeographical, and epidemiological information—this method enables a comprehensive assessment of disease dynamics. Such a multifaceted perspective is critical for developing proactive measures and enabling a rapid response to outbreaks.

The approach includes the following components:

  • Disease and outbreak data – HSR.health uses the extensive disease and outbreak data provided by Gideon and the World Health Organization (WHO), two trusted sources of global epidemiological information. This data serves as a fundamental pillar in the analytics framework. For Gideon, the data can be accessed through an API, and for the WHO, HSR.health has built a large language model (LLM) to mine outbreak data from past disease outbreak reports.
  • Earth observation data – Environmental factors, land use analysis and detection of habitat changes are integral components to assessing zoonotic risk. These insights can be derived from satellite-based earth observation data. HSR.health is able to streamline the use of earth observation data by using SageMaker geospatial capabilities to access and manipulate large-scale geospatial datasets. SageMaker geospatial offers a rich data catalog, including datasets from USGS Landsat-8, Sentinel-1, Sentinel-2, and others. It is also possible to bring in other datasets, such as high-resolution imagery from Planet Labs.
  • Social determinants of risk – Beyond biological and environmental factors, the team at HSR.health also considered social determinants, which encompass various socioeconomic and demographic indicators, and play a pivotal role in shaping zoonotic spillover dynamics.

From these components, HSR.health evaluated a range of different factors, and the following features have been identified as influential for identifying zoonotic spillover risks:

  • Animal habitats and habitable zones – Understanding the habitats of potential zoonotic hosts and their habitable zones is fundamental to assessing transmission risk.
  • Population centers – Proximity to densely populated areas is a key consideration because it influences the likelihood of human-animal interactions.
  • Loss of habitat – The degradation of natural habitats, particularly through deforestation, can accelerate zoonotic spillover events.
  • Human-wildland interface – Areas where human settlements intersect with wildlife habitats are potential hotspots for zoonotic transmission.
  • Social characteristics – Socioeconomic and cultural factors can significantly impact zoonotic risk, and HSR.health examines these as well.
  • Human health characteristics – The health status of local human populations is an essential variable because it affects susceptibility and transmission dynamics.

Solution overview

HSR.health’s workflow encompasses data preprocessing, feature extraction, and the creation of informative visualizations using ML techniques. This allows for a clear understanding of the data’s evolution from its raw form to actionable insights.

The following is a visual representation of the workflow, starting with input data from Gideon, earth observation data, and social determinant of risk data.

Solution overview

Retrieve and process satellite imagery using SageMaker geospatial capabilities

Satellite data forms a cornerstone of the analysis performed to build the risk index, providing critical information on environmental changes. To generate insights from satellite imagery, HSR.health uses Earth Observation Jobs (EOJs). EOJs enable the acquisition and transformation of raster data gathered from the Earth’s surface. An EOJ obtains satellite imagery from a designated data source—for instance, a satellite constellation—over a specific area and time period. It then applies one or more models to the retrieved images.

Additionally, Amazon SageMaker Studio offers a geospatial notebook pre-installed with commonly-used geospatial libraries. This notebook enables direct visualization and processing of geospatial data within a Python notebook environment. EOJs can be created in the geospatial notebook environment.

To configure an EOJ, the following parameters are used:

  • InputConfig – The input configuration specifies the data sources and the filtering criteria to be used during data acquisition:
    • RasterDataCollectionArn – Specifies the satellite from which to collect data.
    • AreaOfInterest – The geographical area of interest (AOI) defines the polygon boundaries for image collection.
    • TimeRangeFilter – The time range of interest: {StartTime: <string>, EndTime: <string>}.
    • PropertyFilters – Additional property filters, such as acceptable percentage of cloud coverage or desired sun azimuth angles.
  • JobConfig – This configuration defines the type of job to be applied to the retrieved satellite image data. It supports operations such as band math, resampling, geomosaic or cloud removal.

The following example code demonstrates running an EOJ for cloud removal, representative of the steps performed by HSR.health:

eoj_input_config = {
    "RasterDataCollectionQuery": {
        "RasterDataCollectionArn": "arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8",
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [
                            [-76.23240119828894,-6.268815697653608],
                            [-76.23240119828894,-6.339419992332921],
                            [-76.13834453776985,-6.339419992332921],
                            [-76.13834453776985,-6.268815697653608],
                            [-76.23240119828894,-6.268815697653608]                       
        ]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2022-03-01T00:00:00Z",
            "EndTime": "2022-06-30T23:59:59Z",
        },
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": {"LowerBound": 0.0, "UpperBound": 2.0}}}],
            "LogicalOperator": "AND",
        },
    }
}
eoj_job_config = {
    "CloudRemovalConfig": {
        "AlgorithmName": "INTERPOLATION",
        "InterpolationValue": "-9999",
        "TargetBands": ["red", "green", "blue", "nir", "swir16"],
    }
}

eoj = geospatial_client.start_earth_observation_job(
    Name="eoj-analysis-loreto",
    InputConfig=eoj_input_config,
    JobConfig=eoj_job_config,
    ExecutionRoleArn=execution_role,
)

HSR.health used several operations to preprocess the data and extract relevant features. This includes operations such as land cover classification, mapping temperature variation, and vegetation indexes.

One vegetation index relevant for indicating vegetation health is the Normalized Difference Vegetation Index (NDVI). The NDVI quantifies vegetation health by using near-infrared light, which vegetation reflects, and red light, which vegetation absorbs. Monitoring the NDVI over time can reveal changes in vegetation, such as the impact of human activities like deforestation.

The following code snippet demonstrates how to calculate a vegetation index like the NDVI based on the data that has been passed through cloud removal:

eoj_input_config = {
    "PreviousEarthObservationJobArn": eoj["Arn"]
}
eoj_job_config = {
  "BandMathConfig": {
    "CustomIndices": {
        "Operations": [
            {
                "Equation": "(nir - red) / (nir + red)",
                "Name": "ndvi",
                "OutputType": "FLOAT32"
            }
        ]
    }
  }
}
eoj = geospatial_client.start_earth_observation_job(
    Name="eoj-vi-ndvi",
    InputConfig=eoj_input_config,
    JobConfig=eoj_job_config,
    ExecutionRoleArn=execution_role,
)

EOJ visualization

We can visualize the job output using SageMaker geospatial capabilities. SageMaker geospatial capabilities can help you overlay model predictions on a base map and provide layered visualization to make collaboration easier. With the GPU-powered interactive visualizer and Python notebooks, it’s possible to explore millions of data points in one view, facilitating the collaborative exploration of insights and results.

The steps outlined in this post demonstrate just one of the many raster-based features that HSR.health has extracted to create the risk index.

Combining raster-based features with health and social data

After extracting the relevant features in raster format, HSR.health used zonal statistics to aggregate the raster data within the administrative boundary polygons to which the social and health data are assigned. The analysis incorporates a combination of raster and vector geospatial data. This kind of aggregation allows for the management of raster data in a geodataframe, which facilitates its integration with the health and social data to produce the final risk index.

The following code snippet demonstrates how to aggregate raster data to administrative vector boundaries:

import geopandas as gp
import numpy as np
import pandas as pd
import rasterio
from rasterstats import zonal_stats
import pandas as pd

def get_proportions(inRaster, inVector, classDict, idCols, year):
    # Reading In Vector File
    if '.parquet' in inVector:
        vector = gp.read_parquet(inVector)
    else:
        vector = gp.read_file(inVector)
    raster = rasterio.open(inRaster)
    vector = vector.to_crs(raster.crs)
    # Retrieving the Bounding Box for the Raster Image
    xmin, ymin, xmax, ymax = raster.bounds
    # Selecting the Vector Features that Intersect with the Raster Bounding Box
    vector = vector.cx[xmin:xmax, ymin:ymax]
    vector = vector.reset_index()
    # Calculate the sum of pixels of each class in the vector geometries
    stats = zonal_stats(vector.geometry, raster.read(1), affine=raster.transform, nodata=raster.nodata, categorical=True)
    # Creating a dataframe with the class sum of pixels and the id fields of the vector geometries
    df1 = pd.DataFrame(data=stats)
    df1 = df1.fillna(0)
    df1['totalpixels'] = df1.sum(axis=1)  
    df1['year'] = year 
    if 'year' in vector.columns.tolist():
        vector = vector.drop(['year'], 1)
    # Merging the class sum of pixels dataframe with the vector geodataframe
    df = vector.merge(df1, left_index=True, right_index=True)
    # Renaming Columns
    cdict = pd.read_csv(classDict)
    cdict = cdict.set_index("Value")['Class_name'].to_dict()
    df = df.rename(columns=cdict)
    keptCols = [x for x in df.columns.tolist() if x in idCols + list(cdict.values()) + ['totalpixels', 'year']]
    df = df[keptCols]
    return(df)

def aggregateData(rasterList, inVector, classDict, idCols, years):
    dfList = []
    # Creating aggregated raster to vector geodataframes for all rasters in rasterList
    for tiff in rasterList:
        inRaster = tiff
        year = [x for x in years if x in tiff][0]
        dfList.append(get_proportions(inRaster, inVector, classDict, idCols, year))
    # Concating into a single geodataframe
    allDf = pd.concat(dfList, ignore_index=True)
    classDictDf = pd.read_csv(classDict)
    # Renaming the numerical values of the categories to the string version of the category name
    classCols = classDictDf['Class_name'].unique().tolist()
    # Summing the pixel counts by administrative division as a single administrative division might cover more than one raster image
    for col in classCols:
        allDf[col] = allDf[col].fillna(0)
        allDf[col] = allDf.groupby(idCols + ['year'])[col].transform(lambda x: x.sum())
    # Removing Duplicates from the dataframe
    allDf = allDf.groupby(idCols + ['year']).first().reset_index()
    # Reattaching the geometry to the aggregated raster data
    if '.parquet' in inVector:
        vector = gp.read_parquet(inVector)
    else:
        vector = gp.read_file(inVector)
    allDf = vector.merge(allDf, on=idCols)
    return(allDf)

To evaluate the extracted features effectively, ML models are used to predict factors representing each feature. One of the models used is a support vector machine (SVM). The SVM model assists in revealing patterns and associations within data that inform risk assessments.

The index represents a quantitative assessment of risk levels, calculated as a weighted average of these factors, to aid in understanding potential spillover events in various regions.

import pandas as pd
import numpy as np
import geopandas as gp

def finalIndicatorCalculation(inputLayer, weightDictionary, outLayer):
    # Creating a dictionary with the weights for each factor in the indicator
    weightsDict = pd.read_csv(weightDictionary).set_index('metric')['weight'].to_dict()
    # Reading in the data from the layer
    layer = gp.read_file(inputLayer)
    # Initializing the Sum of the Weights
    layer['sumweight'] = 0
    # Calculating the sum of the weighted factors
    for col in weightsDict.keys():
        layer[col] = layer[col].fillna(0)
        layer['sumweight'] = layer['sumweight'] + (layer[col] * zweights[col])
    # Calculating Raw Zoonotic Spillover Risk Index
    layer['raw_idx'] = np.log(layer['e_pop']) * layer['sumweight']
    # Normalizing the Index between 0 and 100
    layer['zs_idx'] = ((layer['raw_idx'] - layer['raw_idx'].min()) / (layer['raw_idx'].max() - layer['raw_idx'].min()) * 100).round(2)
    return(layer)

The following figure on the left shows the aggregation of the image classification from the test area scene in northern Peru aggregated to the district administrative level with the calculated change in the forest area between 2018–2023. Deforestation is one of the key factors that determine the risk of zoonotic spillover. The figure on the right highlights the zoonotic spillover risk severity levels within the regions covered, ranging from highest (red) to the lowest (dark green) risk. The area was chosen as one of the training areas for the image classification due to the diversity of land cover captured in the scene, including: urban, forest, sand, water, grassland, and agriculture, among others. Additionally, this is one of many areas of interest for potential zoonotic spillover events due to the deforestation and interaction between humans and animals.

Zoonotic spillover risk severity levels in northern Peru

By adopting this multi-modal approach, encompassing historical data on disease outbreak, Earth observation data, social determinants, and ML techniques, we can better understand and predict zoonotic spillover risk, ultimately directing disease surveillance and prevention strategies to areas of greatest outbreak risk. The following screenshot shows a dashboard of the output from a zoonotic spillover risk analysis. This risk analysis highlights where resources and surveillance for new potential zoonotic outbreaks can occur so that the next disease can be contained before it becomes an endemic or a new pandemic.

Zoonotic spillover risk analysis dashboard

A novel approach to pandemic prevention

In 1998, along the Nipah River in Malaysia, between the fall of 1998 and spring of 1999, 265 people were infected with a then unknown virus that caused acute encephalitis and severe respiratory distress. 105 of them died, a 39.6% fatality rate. COVID-19’s untreated fatality rate by contrast is 6.3%. Since then, the Nipah Virus, as it is now dubbed, has transitioned out of its forest habitat and caused over 20 deadly outbreaks, mostly in India and Bangladesh.

Viruses such as Nipah surface each year, posing challenges to our daily lives, particularly in countries where establishing strong, lasting, and robust systems for disease surveillance and detection is more difficult. These detection systems are crucial for reducing the risks associated with such viruses.

Solutions that use ML and geospatial data, such as the Zoonotic Spillover Risk Index, can assist local public health authorities in prioritizing resource allocation to areas of highest risk. By doing so, they can establish targeted and localized surveillance measures to detect and halt regional outbreaks before they extend beyond borders. This approach can significantly limit the impact of a disease outbreak and save lives.

Conclusion

This post demonstrated how HSR.health successfully developed the Zoonotic Spillover Risk Index by integrating geospatial data, health, social determinants, and ML. By using SageMaker, the team created a scalable workflow that can pinpoint the most substantial threats of a potential future pandemic. Effective management of these risks can lead to a reduction in the global disease burden. The substantial economic and social advantages of reducing pandemic risk cannot be overstated, with benefits extending regionally and globally.

HSR.health used SageMaker geospatial capabilities for an initial implementation of the Zoonotic Spillover Risk Index and is now seeking partnerships, as well as support from host countries and funding sources, to develop the index further and extend its application to additional regions around the world. For more information about HSR.health and the Zoonotic Spillover Risk Index, visit www.hsr.health.

Discover the potential of integrating Earth observation data into your healthcare initiatives by exploring SageMaker geospatial features. For more information, refer to Amazon SageMaker geospatial capabilities, or engage with additional examples to get hands-on experience.


About the Authors

Ajay K GuptaAjay K Gupta is Co-Founder and CEO of HSR.health, a firm that disrupts and innovates health risk analytics through geospatial tech and AI techniques to predict the spread and severity of disease. And provides these insights to industry, governments, and the health sector so they can anticipate, mitigate, and take advantage of future risks. Outside of work, you can find Ajay behind the mic bursting eardrums while belting out his favorite pop music tunes from U2, Sting, George Michael, or Imagine Dragons.

Jean Felipe TeotonioJean Felipe Teotonio is a driven physician and passionate expert in healthcare quality and infectious disease epidemiology, Jean Felipe leads the HSR.health public health team. He works towards the shared goal of improving public health by reducing the global burden of disease by leveraging GeoAI approaches to develop solutions for the greatest health challenges of our time. Outside of work, his hobbies include reading sci fi books, hiking, the English premier league, and playing bass guitar.

Paul A ChurchyardPaul A Churchyard, CTO and Chief Geospatial Engineer for HSR.health, uses his broad technical skills and expertise to build the core infrastructure for the firm as well as its patented and proprietary GeoMD Platform. Additionally, he and the data science team incorporate geospatial analytics and AI/ML techniques into all health risk indices HSR.health produces. Outside of work, Paul is a self-taught DJ and loves snow.

Janosch WoschitzJanosch Woschitz is a Senior Solutions Architect at AWS, specializing in geospatial AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions that capitalize on geospatial data. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in complex domains such as autonomous driving.

Emmett NelsonEmmett Nelson is an Account Executive at AWS supporting Nonprofit Research customers across the Healthcare & Life Sciences, Earth / Environmental Sciences, and Education verticals. His primary focus is enabling use cases across analytics, AI/ML, high performance computing (HPC), genomics, and medical imaging. Emmett joined AWS in 2020 and is based in Austin, TX.

Read More

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

Monitor embedding drift for LLMs deployed from Amazon SageMaker JumpStart

One of the most useful application patterns for generative AI workloads is Retrieval Augmented Generation (RAG). In the RAG pattern, we find pieces of reference content related to an input prompt by performing similarity searches on embeddings. Embeddings capture the information content in bodies of text, allowing natural language processing (NLP) models to work with language in a numeric form. Embeddings are just vectors of floating point numbers, so we can analyze them to help answer three important questions: Is our reference data changing over time? Are the questions users are asking changing over time? And finally, how well is our reference data covering the questions being asked?

In this post, you’ll learn about some of the considerations for embedding vector analysis and detecting signals of embedding drift. Because embeddings are an important source of data for NLP models in general and generative AI solutions in particular, we need a way to measure whether our embeddings are changing over time (drifting). In this post, you’ll see an example of performing drift detection on embedding vectors using a clustering technique with large language models (LLMS) deployed from Amazon SageMaker JumpStart. You’ll also be able to explore these concepts through two provided examples, including an end-to-end sample application or, optionally, a subset of the application.

Overview of RAG

The RAG pattern lets you retrieve knowledge from external sources, such as PDF documents, wiki articles, or call transcripts, and then use that knowledge to augment the instruction prompt sent to the LLM. This allows the LLM to reference more relevant information when generating a response. For example, if you ask an LLM how to make chocolate chip cookies, it can include information from your own recipe library. In this pattern, the recipe text is converted into embedding vectors using an embedding model, and stored in a vector database. Incoming questions are converted to embeddings, and then the vector database runs a similarity search to find related content. The question and the reference data then go into the prompt for the LLM.

Let’s take a closer look at the embedding vectors that get created and how to perform drift analysis on those vectors.

Analysis on embedding vectors

Embedding vectors are numeric representations of our data so analysis of these vectors can provide insight into our reference data that can later be used to detect potential signals of drift. Embedding vectors represent an item in n-dimensional space, where n is often large. For example, the GPT-J 6B model, used in this post, creates vectors of size 4096. To measure drift, assume that our application captures embedding vectors for both reference data and incoming prompts.

We start by performing dimension reduction using Principal Component Analysis (PCA). PCA tries to reduce the number of dimensions while preserving most of the variance in the data. In this case, we try to find the number of dimensions that preserves 95% of the variance, which should capture anything within two standard deviations.

Then we use K-Means to identify a set of cluster centers. K-Means tries to group points together into clusters such that each cluster is relatively compact and the clusters are as distant from each other as possible.

We calculate the following information based on the clustering output shown in the following figure:

  • The number of dimensions in PCA that explain 95% of the variance
  • The location of each cluster center, or centroid

Additionally, we look at the proportion (higher or lower) of samples in each cluster, as shown in the following figure.

Finally, we use this analysis to calculate the following:

  • Inertia – Inertia is the sum of squared distances to cluster centroids, which measures how well the data was clustered using K-Means.
  • Silhouette score – The silhouette score is a measure for the validation of the consistency within clusters, and ranges from -1 to 1. A value close to 1 means that the points in a cluster are close to the other points in the same cluster and far from the points of the other clusters. A visual representation of the silhouette score can be seen in the following figure.

We can periodically capture this information for snapshots of the embeddings for both the source reference data and the prompts. Capturing this data allows us to analyze potential signals of embedding drift.

Detecting embedding drift

Periodically, we can compare the clustering information through snapshots of the data, which includes the reference data embeddings and the prompt embeddings. First, we can compare the number of dimensions needed to explain 95% of the variation in the embedding data, the inertia, and the silhouette score from the clustering job. As you can see in the following table, compared to a baseline, the latest snapshot of embeddings requires 39 more dimensions to explain the variance, indicating that our data is more dispersed. The inertia has gone up, indicating that the samples are in aggregate farther away from their cluster centers. Additionally, the silhouette score has gone down, indicating that the clusters are not as well defined. For prompt data, that might indicate that the types of questions coming into the system are covering more topics.

Next, in the following figure, we can see how the proportion of samples in each cluster has changed over time. This can show us whether our newer reference data is broadly similar to the previous set, or covers new areas.

Finally, we can see if the cluster centers are moving, which would show drift in the information in the clusters, as shown in the following table.

Reference data coverage for incoming questions

We can also evaluate how well our reference data aligns to the incoming questions. To do this, we assign each prompt embedding to a reference data cluster. We compute the distance from each prompt to its corresponding center, and look at the mean, median, and standard deviation of those distances. We can store that information and see how it changes over time.

The following figure shows an example of analyzing the distance between the prompt embedding and reference data centers over time.

As you can see, the mean, median, and standard deviation distance statistics between prompt embeddings and reference data centers is decreasing between the initial baseline and the latest snapshot. Although the absolute value of the distance is difficult to interpret, we can use the trends to determine if the semantic overlap between reference data and incoming questions is getting better or worse over time.

Sample application

In order to gather the experimental results discussed in the previous section, we built a sample application that implements the RAG pattern using embedding and generation models deployed through SageMaker JumpStart and hosted on Amazon SageMaker real-time endpoints.

The application has three core components:

  • We use an interactive flow, which includes a user interface for capturing prompts, combined with a RAG orchestration layer, using LangChain.
  • The data processing flow extracts data from PDF documents and creates embeddings that get stored in Amazon OpenSearch Service. We also use these in the final embedding drift analysis component of the application.
  • The embeddings are captured in Amazon Simple Storage Service (Amazon S3) via Amazon Kinesis Data Firehose, and we run a combination of AWS Glue extract, transform, and load (ETL) jobs and Jupyter notebooks to perform the embedding analysis.

The following diagram illustrates the end-to-end architecture.

The full sample code is available on GitHub. The provided code is available in two different patterns:

  • Sample full-stack application with a Streamlit frontend – This provides an end-to-end application, including a user interface using Streamlit for capturing prompts, combined with the RAG orchestration layer, using LangChain running on Amazon Elastic Container Service (Amazon ECS) with AWS Fargate
  • Backend application – For those that don’t want to deploy the full application stack, you can optionally choose to only deploy the backend AWS Cloud Development Kit (AWS CDK) stack, and then use the Jupyter notebook provided to perform RAG orchestration using LangChain

To create the provided patterns, there are several prerequisites detailed in the following sections, starting with deploying the generative and text embedding models then moving on to the additional prerequisites.

Deploy models through SageMaker JumpStart

Both patterns assume the deployment of an embedding model and generative model. For this, you’ll deploy two models from SageMaker JumpStart. The first model, GPT-J 6B, is used as the embedding model and the second model, Falcon-40b, is used for text generation.

You can deploy each of these models through SageMaker JumpStart from the AWS Management Console, Amazon SageMaker Studio, or programmatically. For more information, refer to How to use JumpStart foundation models. To simplify the deployment, you can use the provided notebook derived from notebooks automatically created by SageMaker JumpStart. This notebook pulls the models from the SageMaker JumpStart ML hub and deploys them to two separate SageMaker real-time endpoints.

The sample notebook also has a cleanup section. Don’t run that section yet, because it will delete the endpoints just deployed. You will complete the cleanup at the end of the walkthrough.

After confirming successful deployment of the endpoints, you’re ready to deploy the full sample application. However, if you’re more interested in exploring only the backend and analysis notebooks, you can optionally deploy only that, which is covered in the next section.

Option 1: Deploy the backend application only

This pattern allows you to deploy the backend solution only and interact with the solution using a Jupyter notebook. Use this pattern if you don’t want to build out the full frontend interface.

Prerequisites

You should have the following prerequisites:

  • A SageMaker JumpStart model endpoint deployed – Deploy the models to SageMaker real-time endpoints using SageMaker JumpStart, as previously outlined
  • Deployment parameters – Record the following:
    • Text model endpoint name – The endpoint name of the text generation model deployed with SageMaker JumpStart
    • Embeddings model endpoint name – The endpoint name of the embedding model deployed with SageMaker JumpStart

Deploy the resources using the AWS CDK

Use the deployment parameters noted in the previous section to deploy the AWS CDK stack. For more information about AWS CDK installation, refer to Getting started with the AWS CDK.

Make sure that Docker is installed and running on the workstation that will be used for AWS CDK deployment. Refer to Get Docker for additional guidance.

$ cd pattern1-rag/cdk
$ cdk deploy BackendStack --exclusively
    -c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model> 
    -c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

Alternatively, you can enter the context values in a file called cdk.context.json in the pattern1-rag/cdk directory and run cdk deploy BackendStack --exclusively.

The deployment will print out outputs, some of which will be needed to run the notebook. Before you can start question and answering, embed the reference documents, as shown in the next section.

Embed reference documents

For this RAG approach, reference documents are first embedded with a text embedding model and stored in a vector database. In this solution, an ingestion pipeline has been built that intakes PDF documents.

An Amazon Elastic Compute Cloud (Amazon EC2) instance has been created for the PDF document ingestion and an Amazon Elastic File System (Amazon EFS) file system is mounted on the EC2 instance to save the PDF documents. An AWS DataSync task is run every hour to fetch PDF documents found in the EFS file system path and upload them to an S3 bucket to start the text embedding process. This process embeds the reference documents and saves the embeddings in OpenSearch Service. It also saves an embedding archive to an S3 bucket through Kinesis Data Firehose for later analysis.

To ingest the reference documents, complete the following steps:

  1. Retrieve the sample EC2 instance ID that was created (see the AWS CDK output JumpHostId) and connect using Session Manager, a capability of AWS Systems Manager. For instructions, refer to Connect to your Linux instance with AWS Systems Manager Session Manager.
  2. Go to the directory /mnt/efs/fs1, which is where the EFS file system is mounted, and create a folder called ingest:
    $ cd /mnt/efs/fs1
    $ mkdir ingest && cd ingest

  3. Add your reference PDF documents to the ingest directory.

The DataSync task is configured to upload all files found in this directory to Amazon S3 to start the embedding process.

The DataSync task runs on an hourly schedule; you can optionally start the task manually to start the embedding process immediately for the PDF documents you added.

  1. To start the task, locate the task ID from the AWS CDK output DataSyncTaskID and start the task with defaults.

After the embeddings are created, you can start the RAG question and answering through a Jupyter notebook, as shown in the next section.

Question and answering using a Jupyter notebook

Complete the following steps:

  1. Retrieve the SageMaker notebook instance name from the AWS CDK output NotebookInstanceName and connect to JupyterLab from the SageMaker console.
  2. Go to the directory fmops/full-stack/pattern1-rag/notebooks/.
  3. Open and run the notebook query-llm.ipynb in the notebook instance to perform question and answering using RAG.

Make sure to use the conda_python3 kernel for the notebook.

This pattern is useful to explore the backend solution without needing to provision additional prerequisites that are required for the full-stack application. The next section covers the implementation of a full-stack application, including both the frontend and backend components, to provide a user interface for interacting with your generative AI application.

Option 2: Deploy the full-stack sample application with a Streamlit frontend

This pattern allows you to deploy the solution with a user frontend interface for question and answering.

Prerequisites

To deploy the sample application, you must have the following prerequisites:

  • SageMaker JumpStart model endpoint deployed – Deploy the models to your SageMaker real-time endpoints using SageMaker JumpStart, as outlined in the previous section, using the provided notebooks.
  • Amazon Route 53 hosted zone – Create an Amazon Route 53 public hosted zone to use for this solution. You can also use an existing Route 53 public hosted zone, such as example.com.
  • AWS Certificate Manager certificate – Provision an AWS Certificate Manager (ACM) TLS certificate for the Route 53 hosted zone domain name and its applicable subdomains, such as example.com and *.example.com for all subdomains. For instructions, refer to Requesting a public certificate. This certificate is used to configure HTTPS on Amazon CloudFront and the origin load balancer.
  • Deployment parameters – Record the following:
    • Frontend application custom domain name – A custom domain name used to access the frontend sample application. The domain name provided is used to create a Route 53 DNS record pointing to the frontend CloudFront distribution; for example, app.example.com.
    • Load balancer origin custom domain name – A custom domain name used for the CloudFront distribution load balancer origin. The domain name provided is used to create a Route 53 DNS record pointing to the origin load balancer; for example, app-lb.example.com.
    • Route 53 hosted zone ID – The Route 53 hosted zone ID to host the custom domain names provided; for example, ZXXXXXXXXYYYYYYYYY.
    • Route 53 hosted zone name – The name of the Route 53 hosted zone to host the custom domain names provided; for example, example.com.
    • ACM certificate ARN – The ARN of the ACM certificate to be used with the custom domain provided.
    • Text model endpoint name – The endpoint name of the text generation model deployed with SageMaker JumpStart.
    • Embeddings model endpoint name – The endpoint name of the embedding model deployed with SageMaker JumpStart.

Deploy the resources using the AWS CDK

Use the deployment parameters you noted in the prerequisites to deploy the AWS CDK stack. For more information, refer to Getting started with the AWS CDK.

Make sure Docker is installed and running on the workstation that will be used for the AWS CDK deployment.

$ cd pattern1-rag/cdk
$ cdk deploy --all -c appCustomDomainName=<Enter Custom Domain Name to be used for Frontend App> 
    -c loadBalancerOriginCustomDomainName=<Enter Custom Domain Name to be used for Load Balancer Origin> 
    -c customDomainRoute53HostedZoneID=<Enter Route53 Hosted Zone ID for the Custom Domain being used> 
    -c customDomainRoute53HostedZoneName=<Enter Route53 Hostedzone Name> 
    -c customDomainCertificateArn=<Enter ACM Certificate ARN for Custom Domains provided> 
    -c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model> 
    -c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

In the preceding code, -c represents a context value, in the form of the required prerequisites, provided on input. Alternatively, you can enter the context values in a file called cdk.context.json in the pattern1-rag/cdk directory and run cdk deploy --all.

Note that we specify the Region in the file bin/cdk.ts. Configuring ALB access logs requires a specified Region. You can change this Region before deployment.

The deployment will print out the URL to access the Streamlit application. Before you can start question and answering, you need to embed the reference documents, as shown in the next section.

Embed the reference documents

For a RAG approach, reference documents are first embedded with a text embedding model and stored in a vector database. In this solution, an ingestion pipeline has been built that intakes PDF documents.

As we discussed in the first deployment option, an example EC2 instance has been created for the PDF document ingestion and an EFS file system is mounted on the EC2 instance to save the PDF documents. A DataSync task is run every hour to fetch PDF documents found in the EFS file system path and upload them to an S3 bucket to start the text embedding process. This process embeds the reference documents and saves the embeddings in OpenSearch Service. It also saves an embedding archive to an S3 bucket through Kinesis Data Firehose for later analysis.

To ingest the reference documents, complete the following steps:

  1. Retrieve the sample EC2 instance ID that was created (see the AWS CDK output JumpHostId) and connect using Session Manager.
  2. Go to the directory /mnt/efs/fs1, which is where the EFS file system is mounted, and create a folder called ingest:
    $ cd /mnt/efs/fs1
    $ mkdir ingest && cd ingest

  3. Add your reference PDF documents to the ingest directory.

The DataSync task is configured to upload all files found in this directory to Amazon S3 to start the embedding process.

The DataSync task runs on an hourly schedule. You can optionally start the task manually to start the embedding process immediately for the PDF documents you added.

  1. To start the task, locate the task ID from the AWS CDK output DataSyncTaskID and start the task with defaults.

Question and answering

After the reference documents have been embedded, you can start the RAG question and answering by visiting the URL to access the Streamlit application. An Amazon Cognito authentication layer is used, so it requires creating a user account in the Amazon Cognito user pool deployed via the AWS CDK (see the AWS CDK output for the user pool name) for first-time access to the application. For instructions on creating an Amazon Cognito user, refer to Creating a new user in the AWS Management Console.

Embed drift analysis

In this section, we show you how to perform drift analysis by first creating a baseline of the reference data embeddings and prompt embeddings, and then creating a snapshot of the embeddings over time. This allows you to compare the baseline embeddings to the snapshot embeddings.

Create an embedding baseline for the reference data and prompt

To create an embedding baseline of the reference data, open the AWS Glue console and select the ETL job embedding-drift-analysis. Set the parameters for the ETL job as follows and run the job:

  • Set --job_type to BASELINE.
  • Set --out_table to the Amazon DynamoDB table for reference embedding data. (See the AWS CDK output DriftTableReference for the table name.)
  • Set --centroid_table to the DynamoDB table for reference centroid data. (See the AWS CDK output CentroidTableReference for the table name.)
  • Set --data_path to the S3 bucket with the prefix; for example, s3://<REPLACE_WITH_BUCKET_NAME>/embeddingarchive/. (See the AWS CDK output BucketName for the bucket name.)

Similarly, using the ETL job embedding-drift-analysis, create an embedding baseline of the prompts. Set the parameters for the ETL job as follows and run the job:

  • Set --job_type to BASELINE
  • Set --out_table to the DynamoDB table for prompt embedding data. (See the AWS CDK output DriftTablePromptsName for the table name.)
  • Set --centroid_table to the DynamoDB table for prompt centroid data. (See the AWS CDK output CentroidTablePrompts for the table name.)
  • Set --data_path to the S3 bucket with the prefix; for example, s3://<REPLACE_WITH_BUCKET_NAME>/promptarchive/. (See the AWS CDK output BucketName for the bucket name.)

Create an embedding snapshot for the reference data and prompt

After you ingest additional information into OpenSearch Service, run the ETL job embedding-drift-analysis again to snapshot the reference data embeddings. The parameters will be the same as the ETL job that you ran to create the embedding baseline of the reference data as shown in the previous section, with the exception of setting the --job_type parameter to SNAPSHOT.

Similarly, to snapshot the prompt embeddings, run the ETL job embedding-drift-analysis again. The parameters will be the same as the ETL job that you ran to create the embedding baseline for the prompts as shown in the previous section, with the exception of setting the --job_type parameter to SNAPSHOT.

Compare the baseline to the snapshot

To compare the embedding baseline and snapshot for reference data and prompts, use the provided notebook pattern1-rag/notebooks/drift-analysis.ipynb.

To look at embedding comparison for reference data or prompts, change the DynamoDB table name variables (tbl and c_tbl) in the notebook to the appropriate DynamoDB table for each run of the notebook.

The notebook variable tbl should be changed to the appropriate drift table name. The following is an example of where to configure the variable in the notebook.

The table names can be retrieved as follows:

  • For the reference embedding data, retrieve the drift table name from the AWS CDK output DriftTableReference
  • For the prompt embedding data, retrieve the drift table name from the AWS CDK output DriftTablePromptsName

In addition, the notebook variable c_tbl should be changed to the appropriate centroid table name. The following is an example of where to configure the variable in the notebook.

The table names can be retrieved as follows:

  • For the reference embedding data, retrieve the centroid table name from the AWS CDK output CentroidTableReference
  • For the prompt embedding data, retrieve the centroid table name from the AWS CDK output CentroidTablePrompts

Analyze the prompt distance from the reference data

First, run the AWS Glue job embedding-distance-analysis. This job will find out which cluster, from the K-Means evaluation of the reference data embeddings, that each prompt belongs to. It then calculates the mean, median, and standard deviation of the distance from each prompt to the center of the corresponding cluster.

You can run the notebook pattern1-rag/notebooks/distance-analysis.ipynb to see the trends in the distance metrics over time. This will give you a sense of the overall trend in the distribution of the prompt embedding distances.

The notebook pattern1-rag/notebooks/prompt-distance-outliers.ipynb is an AWS Glue notebook that looks for outliers, which can help you identify whether you’re getting more prompts that are not related to the reference data.

Monitor similarity scores

All similarity scores from OpenSearch Service are logged in Amazon CloudWatch under the rag namespace. The dashboard RAG_Scores shows the average score and the total number of scores ingested.

Clean up

To avoid incurring future charges, delete all the resources that you created.

Delete the deployed SageMaker models

Reference the cleanup up section of the provided example notebook to delete the deployed SageMaker JumpStart models, or you can delete the models on the SageMaker console.

Delete the AWS CDK resources

If you entered your parameters in a cdk.context.json file, clean up as follows:

$ cd pattern1-rag/cdk
$ cdk destroy --all

If you entered your parameters on the command line and only deployed the backend application (the backend AWS CDK stack), clean up as follows:

$ cd pattern1-rag/cdk
$ cdk destroy --all
    -c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model> 
    -c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

If you entered your parameters on the command line and deployed the full solution (the frontend and backend AWS CDK stacks), clean up as follows:

$ cd pattern1-rag/cdk
$ cdk destroy --all -c appCustomDomainName=<Enter Custom Domain Name to be used for Frontend App> 
    -c loadBalancerOriginCustomDomainName=<Enter Custom Domain Name to be used for Load Balancer Origin> 
    -c customDomainRoute53HostedZoneID=<Enter Route53 Hosted Zone ID for the Custom Domain being used> 
    -c customDomainRoute53HostedZoneName=<Enter Route53 Hostedzone Name> 
    -c customDomainCertificateArn=<Enter ACM Certificate ARN for Custom Domains provided> 
    -c textModelEndpointName=<Enter the SageMaker Endpoint Name for the Text generation model> 
    -c embeddingsModelEndpointName=<Enter the SageMaker Endpoint Name for the Text embeddings model>

Conclusion

In this post, we provided a working example of an application that captures embedding vectors for both reference data and prompts in the RAG pattern for generative AI. We showed how to perform clustering analysis to determine whether reference or prompt data is drifting over time, and how well the reference data covers the types of questions users are asking. If you detect drift, it can provide a signal that the environment has changed and your model is getting new inputs that it may not be optimized to handle. This allows for proactive evaluation of the current model against changing inputs.


About the Authors

Abdullahi Olaoye is a Senior Solutions Architect at Amazon Web Services (AWS). Abdullahi holds a MSC in Computer Networking from Wichita State University and is a published author that has held roles across various technology domains such as DevOps, infrastructure modernization and AI. He is currently focused on Generative AI and plays a key role in assisting enterprises to architect and build cutting-edge solutions powered by Generative AI. Beyond the realm of technology, he finds joy in the art of exploration. When not crafting AI solutions, he enjoys traveling with his family to explore new places.

Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.

Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She has been in technology for 24 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background into the domain of MLOps to help customers deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee is a co-creator and instructor of the Practical Data Science specialization on Coursera. She is also the Co-Director of Women In Big Data (WiBD), Denver chapter. In her spare time, she likes to spend time with her family, friends, and overactive dogs.

Read More

Designing generative AI workloads for resilience

Designing generative AI workloads for resilience

Resilience plays a pivotal role in the development of any workload, and generative AI workloads are no different. There are unique considerations when engineering generative AI workloads through a resilience lens. Understanding and prioritizing resilience is crucial for generative AI workloads to meet organizational availability and business continuity requirements. In this post, we discuss the different stacks of a generative AI workload and what those considerations should be.

Full stack generative AI

Although a lot of the excitement around generative AI focuses on the models, a complete solution involves people, skills, and tools from several domains. Consider the following picture, which is an AWS view of the a16z emerging application stack for large language models (LLMs).

Taxonomy of LLM App Stack on AWS

Compared to a more traditional solution built around AI and machine learning (ML), a generative AI solution now involves the following:

  • New roles – You have to consider model tuners as well as model builders and model integrators
  • New tools – The traditional MLOps stack doesn’t extend to cover the type of experiment tracking or observability necessary for prompt engineering or agents that invoke tools to interact with other systems

Agent reasoning

Unlike traditional AI models, Retrieval Augmented Generation (RAG) allows for more accurate and contextually relevant responses by integrating external knowledge sources. The following are some considerations when using RAG:

  • Setting appropriate timeouts is important to the customer experience. Nothing says bad user experience more than being in the middle of a chat and getting disconnected.
  • Make sure to validate prompt input data and prompt input size for allocated character limits that are defined by your model.
  • If you’re performing prompt engineering, you should persist your prompts to a reliable data store. That will safeguard your prompts in case of accidental loss or as part of your overall disaster recovery strategy.

Data pipelines

In cases where you need to provide contextual data to the foundation model using the RAG pattern, you need a data pipeline that can ingest the source data, convert it to embedding vectors, and store the embedding vectors in a vector database. This pipeline could be a batch pipeline if you prepare contextual data in advance, or a low-latency pipeline if you’re incorporating new contextual data on the fly. In the batch case, there are a couple challenges compared to typical data pipelines.

The data sources may be PDF documents on a file system, data from a software as a service (SaaS) system like a CRM tool, or data from an existing wiki or knowledge base. Ingesting from these sources is different from the typical data sources like log data in an Amazon Simple Storage Service (Amazon S3) bucket or structured data from a relational database. The level of parallelism you can achieve may be limited by the source system, so you need to account for throttling and use backoff techniques. Some of the source systems may be brittle, so you need to build in error handling and retry logic.

The embedding model could be a performance bottleneck, regardless of whether you run it locally in the pipeline or call an external model. Embedding models are foundation models that run on GPUs and do not have unlimited capacity. If the model runs locally, you need to assign work based on GPU capacity. If the model runs externally, you need to make sure you’re not saturating the external model. In either case, the level of parallelism you can achieve will be dictated by the embedding model rather than how much CPU and RAM you have available in the batch processing system.

In the low-latency case, you need to account for the time it takes to generate the embedding vectors. The calling application should invoke the pipeline asynchronously.

Vector databases

A vector database has two functions: store embedding vectors, and run a similarity search to find the closest k matches to a new vector. There are three general types of vector databases:

  • Dedicated SaaS options like Pinecone.
  • Vector database features built into other services. This includes native AWS services like Amazon OpenSearch Service and Amazon Aurora.
  • In-memory options that can be used for transient data in low-latency scenarios.

We don’t cover the similarity searching capabilities in detail in this post. Although they’re important, they are a functional aspect of the system and don’t directly affect resilience. Instead, we focus on the resilience aspects of a vector database as a storage system:

  • Latency – Can the vector database perform well against a high or unpredictable load? If not, the calling application needs to handle rate limiting and backoff and retry.
  • Scalability – How many vectors can the system hold? If you exceed the capacity of the vector database, you’ll need to look into sharding or other solutions.
  • High availability and disaster recovery – Embedding vectors are valuable data, and recreating them can be expensive. Is your vector database highly available in a single AWS Region? Does it have the ability to replicate data to another Region for disaster recovery purposes?

Application tier

There are three unique considerations for the application tier when integrating generative AI solutions:

  • Potentially high latency – Foundation models often run on large GPU instances and may have finite capacity. Make sure to use best practices for rate limiting, backoff and retry, and load shedding. Use asynchronous designs so that high latency doesn’t interfere with the application’s main interface.
  • Security posture – If you’re using agents, tools, plugins, or other methods of connecting a model to other systems, pay extra attention to your security posture. Models may try to interact with these systems in unexpected ways. Follow the normal practice of least-privilege access, for example restricting incoming prompts from other systems.
  • Rapidly evolving frameworks – Open source frameworks like LangChain are evolving rapidly. Use a microservices approach to isolate other components from these less mature frameworks.

Capacity

We can think about capacity in two contexts: inference and training model data pipelines. Capacity is a consideration when organizations are building their own pipelines. CPU and memory requirements are two of the biggest requirements when choosing instances to run your workloads.

Instances that can support generative AI workloads can be more difficult to obtain than your average general-purpose instance type. Instance flexibility can help with capacity and capacity planning. Depending on what AWS Region you are running your workload in, different instance types are available.

For the user journeys that are critical, organizations will want to consider either reserving or pre-provisioning instance types to ensure availability when needed. This pattern achieves a statically stable architecture, which is a resiliency best practice. To learn more about static stability in the AWS Well-Architected Framework reliability pillar, refer to Use static stability to prevent bimodal behavior.

Observability

Besides the resource metrics you typically collect, like CPU and RAM utilization, you need to closely monitor GPU utilization if you host a model on Amazon SageMaker or Amazon Elastic Compute Cloud (Amazon EC2). GPU utilization can change unexpectedly if the base model or the input data changes, and running out of GPU memory can put the system into an unstable state.

Higher up the stack, you will also want to trace the flow of calls through the system, capturing the interactions between agents and tools. Because the interface between agents and tools is less formally defined than an API contract, you should monitor these traces not only for performance but also to capture new error scenarios. To monitor the model or agent for any security risks and threats, you can use tools like Amazon GuardDuty.

You should also capture baselines of embedding vectors, prompts, context, and output, and the interactions between these. If these change over time, it may indicate that users are using the system in new ways, that the reference data is not covering the question space in the same way, or that the model’s output is suddenly different.

Disaster recovery

Having a business continuity plan with a disaster recovery strategy is a must for any workload. Generative AI workloads are no different. Understanding the failure modes that are applicable to your workload will help guide your strategy. If you are using AWS managed services for your workload, such as Amazon Bedrock and SageMaker, make sure the service is available in your recovery AWS Region. As of this writing, these AWS services don’t support replication of data across AWS Regions natively, so you need to think about your data management strategies for disaster recovery, and you also may need to fine-tune in multiple AWS Regions.

Conclusion

This post described how to take resilience into account when building generative AI solutions. Although generative AI applications have some interesting nuances, the existing resilience patterns and best practices still apply. It’s just a matter of evaluating each part of a generative AI application and applying the relevant best practices.

For more information about generative AI and using it with AWS services, refer to the following resources:


About the Authors

Jennifer Moran is an AWS Senior Resiliency Specialist Solutions Architect based out of New York City. She has a diverse background, having worked in many technical disciplines, including software development, agile leadership, and DevOps, and is an advocate for women in tech. She enjoys helping customers design resilient solutions to improve resilience posture and publicly speaks about all topics related to resilience.

Randy DeFauwRandy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. He entered the big data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences, including Strata and GlueCon.

Read More

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Analyze security findings faster with no-code data preparation using generative AI and Amazon SageMaker Canvas

Data is the foundation to capturing the maximum value from AI technology and solving business problems quickly. To unlock the potential of generative AI technologies, however, there’s a key prerequisite: your data needs to be appropriately prepared. In this post, we describe how use generative AI to update and scale your data pipeline using Amazon SageMaker Canvas for data prep.

Typically, data pipeline work requires a specialized skill to prepare and organize data for security analysts to use to extract value, which can take time, increase risks, and increase time to value. With SageMaker Canvas, security analysts can effortlessly and securely access leading foundation models to prepare their data faster and remediate cyber security risks.

Data prep involves careful formatting and thoughtful contextualization, working backward from the customer problem. Now with the SageMaker Canvas chat for data prep capability, analysts with domain knowledge can quickly prepare, organize, and extract value from data using a chat-based experience.

Solution overview

Generative AI is revolutionizing the security domain by providing personalized and natural language experiences, enhancing risk identification and remediations, while boosting business productivity. For this use case, we use SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon Security Lake, and Amazon Simple Storage Service (Amazon S3). Amazon Security Lake allows you to aggregate and normalize security data for analysis to gain a better understanding of security across your organization. Amazon S3 enables you to store and retrieve any amount of data at any time or place. It offers industry-leading scalability, data availability, security, and performance.

SageMaker Canvas now supports comprehensive data preparation capabilities powered by SageMaker Data Wrangler. With this integration, SageMaker Canvas provides an end-to-end no-code workspace to prepare data, build, and use machine learning (ML) and Amazon Bedrock foundation models to accelerate the time from data to business insights. You can now discover and aggregate data from over 50 data sources and explore and prepare data using over 300 built-in analyses and transformations in the SageMaker Canvas visual interface. You’ll also see faster performance for transforms and analyses, and benefit from a natural language interface to explore and transform data for ML.

In this post, we demonstrate three key transformations; filtering, column renaming, and text extraction from a column on the security findings dataset. We also demonstrate using the chat for data prep feature in SageMaker Canvas to analyze the data and visualize your findings.

Prerequisites

Before starting, you need an AWS account. You also need to set up an Amazon SageMaker Studio domain. For instructions on setting up SageMaker Canvas, refer to Generate machine learning predictions without code.

Access the SageMaker Canvas chat interface

Complete the following steps to start using the SageMaker Canvas chat feature:

  1. On the SageMaker Canvas console, choose Data Wrangler.
  2. Under Datasets, choose Amazon S3 as your source and specify the security findings dataset from Amazon Security Lake.
  3. Choose your data flow and choose Chat for data prep, which will display a chat interface experience with guided prompts.

Filter data

For this post, we first want to filter for critical and high severity warnings, so we enter into the chat box instructions to remove findings that are not critical or high severity. Canvas removes the rows, displays a preview of transformed data, and provides the option to use the code. We can add it to the list of steps in the Steps pane.

Rename columns

Next, we want rename two columns, so we enter in the chat box the following prompt, to rename the desc and title columns to Finding and Remediation. SageMaker Canvas generates a preview, and if you’re happy with the results, you can add the transformed data to the data flow steps.

Extract text

To determine the source Regions of the findings, you can enter in chat instructions to Extract the Region text from the UID column based on the pattern arn:aws:security:securityhub:region:*  and create a new column called Region) to extract the Region text from the UID column based on a pattern. SageMaker Canvas then generates code to create a new region column. The data preview shows the findings originate from one Region: us-west-2. You can add this transformation to the data flow for downstream analysis.

Analyze the data

Finally, we want to analyze the data to determine if there is a correlation between time of day and number of critical findings. You can enter a request to summarize critical findings by time of day into the chat, and SageMaker Canvas returns insights that are useful for your investigation and analysis.

Visualize findings

Next, we visualize the findings by severity over time to include in a leadership report. You can ask SageMaker Canvas to generate a bar chart of severity compared to time of day. In seconds, SageMaker Canvas has created the chart grouped by severity. You can add this visualization to the analysis in the data flow and download it for your report. The data shows the findings originate from one Region and happen at specific times. This gives us confidence on where to focus our security findings investigation to determine root causes and corrective actions.

Clean up

To avoid incurring unintended charges, complete the following steps to clean up your resources:

  1. Empty the S3 bucket you used as a source.
  2. Log out of SageMaker Canvas.

Conclusion

In this post, we showed you how to use SageMaker Canvas as an end-to-end no-code workspace for data preparation to build and use Amazon Bedrock foundation models to accelerate time to gather business insights from data.

Note that this approach is not limited to security findings; you can apply this to any generative AI use case that uses data preparation at its core.

The future belongs to businesses that can effectively harness the power of generative AI and large language models. But to do so, we must first develop a solid data strategy and understand the art of data preparation. By using generative AI to structure our data intelligently, and working backward from the customer, we can solve business problems faster. With SageMaker Canvas chat for data preparation, it’s effortless for analysts to get started and capture immediate value from AI.


About the Authors

Sudeesh Sasidharan is a Senior Solutions Architect at AWS, within the Energy team. Sudeesh loves experimenting with new technologies and building innovative solutions that solve complex business challenges. When he is not designing solutions or tinkering with the latest technologies, you can find him on the tennis court working on his backhand.

John Klacynski is a Principal Customer Solution Manager within the AWS Independent Software Vendor (ISV) team. In this role, he programmatically helps ISV customers adopt AWS technologies and services to reach their business goals more quickly. Prior to joining AWS, John led Data Product Teams for large Consumer Package Goods companies, helping them leverage data insights to improve their operations and decision making.

Read More