Time series forecasting with Amazon SageMaker AutoML

Time series forecasting with Amazon SageMaker AutoML

Time series forecasting is a critical component in various industries for making informed decisions by predicting future values of time-dependent data. A time series is a sequence of data points recorded at regular time intervals, such as daily sales revenue, hourly temperature readings, or weekly stock market prices. These forecasts are pivotal for anticipating trends and future demands in areas such as product demand, financial markets, energy consumption, and many more.

However, creating accurate and reliable forecasts poses significant challenges because of factors such as seasonality, underlying trends, and external influences that can dramatically impact the data. Additionally, traditional forecasting models often require extensive domain knowledge and manual tuning, which can be time-consuming and complex.

In this blog post, we explore a comprehensive approach to time series forecasting using the Amazon SageMaker AutoMLV2 Software Development Kit (SDK). SageMaker AutoMLV2 is part of the SageMaker Autopilot suite, which automates the end-to-end machine learning workflow from data preparation to model deployment. Throughout this blog post, we will be talking about AutoML to indicate SageMaker Autopilot APIs, as well as Amazon SageMaker Canvas AutoML capabilities. We’ll walk through the data preparation process, explain the configuration of the time series forecasting model, detail the inference process, and highlight key aspects of the project. This methodology offers insights into effective strategies for forecasting future data points in a time series, using the power of machine learning without requiring deep expertise in model development. The code for this post can be found in the GitHub repo.

The following diagram depicts the basic AutoMLV2 APIs, all of which are relevant to this post. The diagram shows the workflow for building and deploying models using the AutoMLV2 API. In the training phase, CSV data is uploaded to Amazon S3, followed by the creation of an AutoML job, model creation, and checking for job completion. The deployment phase allows you to choose between real-time inference via an endpoint or batch inference using a scheduled transform job that stores results in S3.

Basic AutoMLV2 API's

1. Data preparation

The foundation of any machine learning project is data preparation. For this project, we used a synthetic dataset containing time series data of product sales across various locations, focusing on attributes such as product code, location code, timestamp, unit sales, and promotional information. The dataset can be found in an Amazon-owned, public Amazon Simple Storage Service (Amazon S3) dataset.

When preparing your CSV file for input into a SageMaker AutoML time series forecasting model, you must ensure that it includes at least three essential columns (as described in the SageMaker AutoML V2 documentation):

  1. Item identifier attribute name: This column contains unique identifiers for each item or entity for which predictions are desired. Each identifier distinguishes the individual data series within the dataset. For example, if you’re forecasting sales for multiple products, each product would have a unique identifier.
  2. Target attribute name: This column represents the numerical values that you want to forecast. These could be sales figures, stock prices, energy usage amounts, and so on. It’s crucial that the data in this column is numeric because the forecasting models predict quantitative outcomes.
  3. Timestamp attribute name: This column indicates the specific times when the observations were recorded. The timestamp is essential for analyzing the data in a chronological context, which is fundamental to time series forecasting. The timestamps should be in a consistent and appropriate format that reflects the regularity of your data (for example, daily or hourly).

All other columns in the dataset are optional and can be used to include additional time-series related information or metadata about each item. Therefore, your CSV file should have columns named according to the preceding attributes (item identifier, target, and timestamp) as well as any other columns needed to support your use case For instance, if your dataset is about forecasting product demand, your CSV might look something like this:

  • Product_ID (item identifier): Unique product identifiers.
  • Sales (target): Historical sales data to be forecasted.
  • Date (timestamp): The dates on which sales data was recorded.

The process of splitting the training and test data in this project uses a methodical and time-aware approach to ensure that the integrity of the time series data is maintained. Here’s a detailed overview of the process:

Ensuring timestamp integrity

The first step involves converting the timestamp column of the input dataset to a datetime format using pd.to_datetime. This conversion is crucial for sorting the data chronologically in subsequent steps and for ensuring that operations on the timestamp column are consistent and accurate.

Sorting the data

The sorted dataset is critical for time series forecasting, because it ensures that data is processed in the correct temporal order. The input_data DataFrame is sorted based on three columns: product_code, location_code, and timestamp. This multi-level sort guarantees that the data is organized first by product and location, and then chronologically within each product-location grouping. This organization is essential for the logical partitioning of data into training and test sets based on time.

Splitting into training and test sets

The splitting mechanism is designed to handle each combination of product_code and location_code separately, respecting the unique temporal patterns of each product-location pair. For each group:

  • The initial test set is determined by selecting the last eight timestamps (yellow + green below). This subset represents the most recent data points that are candidates for testing the model’s forecasting ability.
  • The final test set is refined by removing the last four timestamps from the initial test set, resulting in a test dataset that includes the four timestamps immediately preceding the latest data (green below). This strategy ensures the test set is representative of the near-future periods the model is expected to predict, while also leaving out the most recent data to simulate a realistic forecasting scenario.
  • The training set comprises the remaining data points, excluding the last eight timestamps (blue below). This ensures the model is trained on historical data that precedes the test period, avoiding any data leakage and ensuring that the model learns from genuinely past observations.

This process is visualized in the following figure with an arbitrary value on the Y axis and the days of February on the X axis.

Time series data split

The test dataset is used to evaluate the performance of the trained model and compute various loss metrics, such as mean absolute error (MAE) and root-mean-squared error (RMSE). These metrics quantify the model’s accuracy in forecasting the actual values in the test set, providing a clear indication of the model’s quality and its ability to make accurate predictions. The evaluation process is detailed in the “Inference: Batch, real-time, and asynchronous” section, where we discuss the comprehensive approach to model evaluation and conditional model registration based on the computed metrics.

Creating and saving the datasets

After the data for each product-location group is categorized into training and test sets, the subsets are aggregated into comprehensive training and test DataFrames using pd.concat. This aggregation step combines the individual DataFrames stored in train_dfs and test_dfs lists into two unified DataFrames:

  • train_df for training data
  • test_df for testing data

Finally, the DataFrames are saved to CSV files (train.csv for training data and test.csv for test data), making them accessible for model training and evaluation processes. This saving step not only facilitates a clear separation of data for modelling purposes but also enables reproducibility and sharing of the prepared datasets.

Summary

This data preparation strategy meticulously respects the chronological nature of time series data and ensures that the training and test sets are appropriately aligned with real-world forecasting scenarios. By splitting the data based on the last known timestamps and carefully excluding the most recent periods from the training set, the approach mimics the challenge of predicting future values based on past observations, thereby setting the stage for a robust evaluation of the forecasting model’s performance.

2. Training a model with AutoMLV2

SageMaker AutoMLV2 reduces the resources needed to train, tune, and deploy machine learning models by automating the heavy lifting involved in model development. It provides a straightforward way to create high-quality models tailored to your specific problem type, be it classification, regression, or forecasting, among others. In this section, we delve into the steps to train a time series forecasting model with AutoMLV2.

Step 1: Define the tine series forecasting configuration

The first step involves defining the problem configuration. This configuration guides AutoMLV2 in understanding the nature of your problem and the type of solution it should seek, whether it involves classification, regression, time-series classification, computer vision, natural language processing, or fine-tuning of large language models. This versatility is crucial because it allows AutoMLV2 to adapt its approach based on the specific requirements and complexities of the task at hand. For time series forecasting, the configuration includes details such as the frequency of forecasts, the horizon over which predictions are needed, and any specific quantiles or probabilistic forecasts. Configuring the AutoMLV2 job for time series forecasting involves specifying parameters that would best use the historical sales data to predict future sales.

The AutoMLTimeSeriesForecastingConfig is a configuration object in the SageMaker AutoMLV2 SDK designed specifically for setting up time series forecasting tasks. Each argument provided to this configuration object tailors the AutoML job to the specifics of your time series data and the forecasting objectives.

time_series_config = AutoMLTimeSeriesForecastingConfig(
    forecast_frequency='W',
    forecast_horizon=4,
    item_identifier_attribute_name='product_code',
    target_attribute_name='unit_sales',
    timestamp_attribute_name='timestamp',
    ...
)

The following is a detailed explanation of each configuration argument used in your time series configuration:

  • forecast_frequency
    • Description: Specifies how often predictions should be made.
    • Value ‘W’: Indicates that forecasts are expected on a weekly basis. The model will be trained to understand and predict data as a sequence of weekly observations. Valid intervals are an integer followed by Y (year), M (month), W (week), D (day), H (hour), and min (minute). For example, 1D indicates every day and 15min indicates every 15 minutes. The value of a frequency must not overlap with the next larger frequency. For example, you must use a frequency of 1H instead of 60min.
  • forecast_horizon
    • Description: Defines the number of future time-steps the model should predict.
    • Value 4: The model will forecast four time-steps into the future. Given the weekly frequency, this means the model will predict the next four weeks of data from the last known data point.
  • forecast_quantiles
    • Description: Specifies the quantiles at which to generate probabilistic forecasts.
    • Values [p50,p60,p70,p80,p90]: These quantiles represent the 50th, 60th, 70th, 80th, and 90th percentiles of the forecast distribution, providing a range of possible outcomes and capturing forecast uncertainty. For instance, the p50 quantile (median) might be used as a central forecast, while the p90 quantile provides a higher-end forecast, where 90% of the actual data is expected to fall below the forecast, accounting for potential variability.
  • filling
    • Description: Defines how missing data should be handled before training; specifying filling strategies for different scenarios and columns.
    • Value filling_config: This should be a dictionary detailing how to fill missing values in your dataset, such as filling missing promotional data with zeros or specific columns with predefined values. This ensures the model has a complete dataset to learn from, improving its ability to make accurate forecasts.
  • item_identifier_attribute_name
    • Description: Specifies the column that uniquely identifies each time series in the dataset.
    • Value ’product_code’: This setting indicates that each unique product code represents a distinct time series. The model will treat data for each product code as a separate forecasting problem.
  • target_attribute_name
    • Description: The name of the column in your dataset that contains the values you want to predict.
    • Value unit_sales: Designates the unit_sales column as the target variable for forecasts, meaning the model will be trained to predict future sales figures.
  • timestamp_attribute_name
    • Description: The name of the column indicating the time point for each observation.
    • Value ‘timestamp’: Specifies that the timestamp column contains the temporal information necessary for modeling the time series.
  • grouping_attribute_names
    • Description: A list of column names that, in combination with the item identifier, can be used to create composite keys for forecasting.
    • Value [‘location_code’]: This setting means that forecasts will be generated for each combination of product_code and location_code. It allows the model to account for location-specific trends and patterns in sales data.

The configuration provided instructs the SageMaker AutoML to train a model capable of weekly sales forecasts for each product and location, accounting for uncertainty with quantile forecasts, handling missing data, and recognizing each product-location pair as a unique series. This detailed setup aims to optimize the forecasting model’s relevance and accuracy for your specific business context and data characteristics.

Step 2: Initialize the AutoMLV2 job

Next, initialize the AutoMLV2 job by specifying the problem configuration, the AWS role with permissions, the SageMaker session, a base job name for identification, and the output path where the model artifacts will be stored.

automl_sm_job = AutoMLV2(
    problem_config=time_series_config,
    role=role,
    sagemaker_session=sagemaker_session,
    base_job_name='time-series-forecasting-job',
    output_path=f's3://{bucket}/{prefix}/output'
)

Step 3: Fit the model

To start the training process, call the fit method on your AutoMLV2 job object. This method requires specifying the input data’s location in Amazon S3 and whether SageMaker should wait for the job to complete before proceeding further. During this step, AutoMLV2 will automatically pre-process your data, select algorithms, train multiple models, and tune them to find the best solution.

automl_sm_job.fit(
    inputs=[AutoMLDataChannel(s3_data_type='S3Prefix', s3_uri=train_uri, channel_type='training')],
    wait=True,
    logs=True
)

Please note that model fitting may take several hours, depending on the size of your dataset and compute budget. A larger compute budget allows for more powerful instance types, which can accelerate the training process. In this situation, provided you’re not running this code as part of the provided SageMaker notebook (which handles the order of code cell processing correctly), you will need to implement some custom code that monitors the training status before retrieving and deploying the best model.

3. Deploying a model with AutoMLV2

Deploying a machine learning model into production is a critical step in your machine learning workflow, enabling your applications to make predictions from new data. SageMaker AutoMLV2 not only helps build and tune your models but also provides a seamless deployment experience. In this section, we’ll guide you through deploying your best model from an AutoMLV2 job as a fully managed endpoint in SageMaker.

Step 1: Identify the best model and extract name

After your AutoMLV2 job completes, the first step in the deployment process is to identify the best performing model, also known as the best candidate. This can be achieved by using the best_candidate method of your AutoML job object. You can either use this method immediately after fitting the AutoML job or specify the job name explicitly if you’re operating on a previously completed AutoML job.

# Option 1: Directly after fitting the AutoML job
best_candidate = automl_sm_job.best_candidate()

# Option 2: Specifying the job name directly
best_candidate = automl_sm_job.best_candidate(job_name='your-auto-ml-job-name')

best_candidate_name = best_candidate['CandidateName']

Step 2: Create a SageMaker model

Before deploying, create a SageMaker model from the best candidate. This model acts as a container for the artifacts and metadata necessary to serve predictions. Use the create_model method of the AutoML job object to complete this step.

endpoint_name = f"ep-{best_candidate_name}-automl-ts"

# Create a SageMaker model from the best candidate
automl_sm_model = automl_sm_job.create_model(name=best_candidate_name, candidate=best_candidate)

4. Inference: Batch, real-time, and asynchronous

For deploying the trained model, we explore batch, real-time, and asynchronous inference methods to cater to different use cases.

The following figure is a decision tree to help you decide what type of endpoint to use. The diagram outlines a decision-making process for selecting between batch, asynchronous, or real-time inference endpoints. Starting with the need for immediate responses, it guides you through considerations like the size of the payload and the computational complexity of the model. Depending on these factors, you can choose a faster option with lower computational requirements or a slower batch process for larger datasets.

Decisioin tree for selecting between batch, asynchronous, or real-time inference endpoints

Batch inference using SageMaker pipelines

  • Usage: Ideal for generating forecasts in bulk, such as monthly sales predictions across all products and locations.
  • Process: We used SageMaker’s batch transform feature to process a large dataset of historical sales data, outputting forecasts for the specified horizon.

The inference pipeline used for batch inference demonstrates a comprehensive approach to deploying, evaluating, and conditionally registering a machine learning model for time series forecasting using SageMaker. This pipeline is structured to ensure a seamless flow from data preprocessing, through model inference, to post-inference evaluation and conditional model registration. Here’s a detailed breakdown of its construction:

  • Batch tranform step
    • Transformer Initialization: A Transformer object is created, specifying the model to use for batch inference, the compute resources to allocate, and the output path for the results.
    • Transform step creation: This step invokes the transformer to perform batch inference on the specified input data. The step is configured to handle data in CSV format, a common choice for structured time series data.
  • Evaluation step
    • Processor setup: Initializes an SKLearn processor with the specified role, framework version, instance count, and type. This processor is used for the evaluation of the model’s performance.
    • Evaluation processing: Configures the processing step to use the SKLearn processor, taking the batch transform output and test data as inputs. The processing script (evaluation.py) is specified here, which will compute evaluation metrics based on the model’s predictions and the true labels.
    • Evaluation strategy: We adopted a comprehensive evaluation approach, using metrics like mean absolute error (MAE) and root-means squared error (RMSE) to quantify the model’s accuracy and adjusting the forecasting configuration based on these insights.
    • Outputs and property files: The evaluation step produces an output file (evaluation_metrics.json) that contains the computed metrics. This file is stored in Amazon S3 and registered as a property file for later access in the pipeline.
  • Conditional model registration
    • Model metrics setup: Defines the model metrics to be associated with the model package, including statistics and explainability reports sourced from specified Amazon S3 URIs.
    • Model registration: Prepares for model registration by specifying content types, inference and transform instance types, model package group name, approval status, and model metrics.
    • Conditional registration step: Implements a condition based on the evaluation metrics (for example, MAE). If the condition (for example, MAE is greater than or equal to threshold) is met, the model is registered; otherwise, the pipeline concludes without model registration.
  • Pipeline creation and runtime
    • Pipeline definition: Assembles the pipeline by naming it and specifying the sequence of steps to run: batch transform, evaluation, and conditional registration.
    • Pipeline upserting and runtime: The pipeline.upsert method is called to create or update the pipeline based on the provided definition, and pipeline.start() runs the pipeline.

The following figure is an example of the SageMaker Pipeline directed acyclic graph (DAG).

SageMaker Pipeline directed acyclic graph (DAG) for this problem.

This pipeline effectively integrates several stages of the machine learning lifecycle into a cohesive workflow, showcasing how Amazon SageMaker can be used to automate the process of model deployment, evaluation, and conditional registration based on performance metrics. By encapsulating these steps within a single pipeline, the approach enhances efficiency, ensures consistency in model evaluation, and streamlines the model registration process—all while maintaining the flexibility to adapt to different models and evaluation criteria.

Inferencing with Amazon SageMaker Endpoint in (near) real-time

But what if you want to run inference in real-time or asynchronously? SageMaker real-time endpoint inference offers the capability to deliver immediate predictions from deployed machine learning models, crucial for scenarios demanding quick decision making. When an application sends a request to a SageMaker real-time endpoint, it processes the data in real time and returns the prediction almost immediately. This setup is optimal for use cases that require near-instant responses, such as personalized content delivery, immediate fraud detection, and live anomaly detection.

  • Usage: Suited for on-demand forecasts, such as predicting next week’s sales for a specific product at a particular location.
  • Process: We deployed the model as a SageMaker endpoint, allowing us to make real-time predictions by sending requests with the required input data.

Deployment involves specifying the number of instances and the instance type to serve predictions. This step creates an HTTPS endpoint that your applications can invoke to perform real-time predictions.

# Deploy the model to a SageMaker endpoint
predictor = automl_sm_model.deploy(initial_instance_count=1, endpoint_name=endpoint_name, instance_type='ml.m5.xlarge')

The deployment process is asynchronous, and SageMaker takes care of provisioning the necessary infrastructure, deploying your model, and ensuring the endpoint’s availability and scalability. After the model is deployed, your applications can start sending prediction requests to the endpoint URL provided by SageMaker.

While real-time inference is suitable for many use cases, there are scenarios where a slightly relaxed latency requirement can be beneficial. SageMaker Asynchronous Inference provides a queue-based system that efficiently handles inference requests, scaling resources as needed to maintain performance. This approach is particularly useful for applications that require processing of larger datasets or complex models, where the immediate response is not as critical.

  • Usage: Examples include generating detailed reports from large datasets, performing complex calculations that require significant computational time, or processing high-resolution images or lengthy audio files. This flexibility makes it a complementary option to real-time inference, especially for businesses that face fluctuating demand and seek to maintain a balance between performance and cost.
  • Process: The process of using asynchronous inference is straightforward yet powerful. Users submit their inference requests to a queue, from which SageMaker processes them sequentially. This queue-based system allows SageMaker to efficiently manage and scale resources according to the current workload, ensuring that each inference request is handled as promptly as possible.

Clean up

To avoid incurring unnecessary charges and to tidy up resources after completing the experiments or running the demos described in this post, follow these steps to delete all deployed resources:

  1. Delete the SageMaker endpoints: To delete any deployed real-time or asynchronous endpoints, use the SageMaker console or the AWS SDK. This step is crucial as endpoints can accrue significant charges if left running.
  2. Delete the SageMaker Pipeline: If you have set up a SageMaker Pipeline, delete it to ensure that there are no residual executions that might incur costs.
  3. Delete S3 artifacts: Remove all artifacts stored in your S3 buckets that were used for training, storing model artifacts, or logging. Ensure you delete only the resources related to this project to avoid data loss.
  4. Clean up any additional resources: Depending on your specific implementation and additional setup modifications, there may be other resources to consider, such as roles or logs. Check your AWS Management Console for any resources that were created and delete them if they are no longer needed.

Conclusion

This post illustrates the effectiveness of Amazon SageMaker AutoMLV2 for time series forecasting. By carefully preparing the data, thoughtfully configuring the model, and using both batch and real-time inference, we demonstrated a robust methodology for predicting future sales. This approach not only saves time and resources but also empowers businesses to make data-driven decisions with confidence.

If you’re inspired by the possibilities of time series forecasting and want to experiment further, consider exploring the SageMaker Canvas UI. SageMaker Canvas provides a user-friendly interface that simplifies the process of building and deploying machine learning models, even if you don’t have extensive coding experience.

Visit the SageMaker Canvas page to learn more about its capabilities and how it can help you streamline your forecasting projects. Begin your journey towards more intuitive and accessible machine learning solutions today!


About the Authors

Nick McCarthy is a Senior Machine Learning Engineer at AWS, based in London. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Outside of work he loves to spend time travelling, trying new cuisines and reading about science and technology. Nick has a Bachelors degree in Astrophysics and a Masters degree in Machine Learning.

Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Read More

Automate user on-boarding for financial services with a digital assistant powered by Amazon Bedrock

Automate user on-boarding for financial services with a digital assistant powered by Amazon Bedrock

In this post, we present a solution that harnesses the power of generative AI to streamline the user onboarding process for financial services through a digital assistant. Onboarding new customers in the banking industry is a crucial step in the customer journey, involving a series of activities designed to fulfill know your customer (KYC) requirements, conduct necessary verifications, and introduce them to the bank’s products or services. Traditionally, customer onboarding has been a tedious and heavily manual process. Our solution provides practical guidance on addressing this challenge by using a generative AI assistant on AWS.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock, we build a digital assistant that automates document processing, identity verifications, and engages customers through conversational interactions. As a result, customers can be onboarded in a matter of minutes through secure, automated workflows. In this post we provide you a solution and the accompanying code that banks can use to dramatically enhance the customer experience and establish a strong customer relationship from the outset.

Challenges with traditional onboarding

The traditional onboarding process for banks faces challenges in the current digital landscape because many institutions don’t have fully automated account-opening systems. While customers in other sectors have access to intelligent assistants, those in banking often encounter legacy processes. As the financial services industry adapts to changing consumer expectations, there’s a need to address the demand for instant and 24/7 availability of services.

The challenges associated with the manual onboarding process include but aren’t limited to, the following:

  • Time-consuming paperwork – New customers are asked to manually fill out extensive paperwork including account opening forms, disclosures, and so on. Reviewing physical documents also takes up valuable staff time. This lengthy paperwork process can result in slow onboarding and a poor customer experience.
  • Security risks – Paper documents and in-person ID verification lack security compared to digital processes because of their susceptibility to tampering, loss, and lack of traceability. For example, there’s a greater risk of identity theft and fraud with physical documents, because they can be altered or misplaced without leaving an audit trail.
  • Accessibility issues – Requiring in-person account opening at branches can create accessibility challenges for many customers, including senior citizens and disabled individuals.
  • Limited service hours – The account opening process is available only during branch operating hours, which limits the timeframe when customers can complete the onboarding process. This constraint impacts the flexibility for customers to initiate account opening at their preferred time.
  • High costs – Manual paperwork processing and in-person verification are labor-intensive tasks that require significant staff time and resources, leading to high operational costs.

AI-powered services enable automated, secure, and compliant processes for self-service account opening. Providing onboarding experiences aligned with current digital standards might offer a competitive edge for banks in the future.

Solution overview

The solution allows users to open bank accounts remotely through a conversational interface, eliminating the need to visit a physical branch. We created a digital assistant named Penny to guide users through the process, including uploading KYC documents and facilitating identity verification using document scanning and facial recognition. The approach uses Retrieval Augmented Generation (RAG), which combines text generation capabilities with database querying to provide contextually relevant responses to customer inquiries. Implementing digital onboarding reduces the accessibility barriers present in traditional manual account opening processes. The code for this solution is available in a GitHub repository.

The brain of our application is a custom LangChain Agent. When a user wants to open a new bank account, the agent will help them complete the onboarding process using preconfigured stages corresponding to each onboarding step. Each stage might use a LangChain tool, allowing for the automation and orchestration of onboarding. These tools call on AWS service APIs for the required functionality.

The following figure represents the high-level architecture of the proposed solution.

User on-boarding architecture diagram

The flow of the application is as follows:

  1. Users access the frontend website hosted within AWS Amplify. AWS Amplify is an end-to-end solution that enables frontend web developers to build and deploy secure, scalable full stack applications.
  2. The website invokes an Amazon CloudFront endpoint to interact with the digital assistant, Penny, which is containerized and deployed in AWS Fargate. Fargate is a serverless compute engine for containers that manages and scales your containers for you, compatible with Amazon Elastic Container Service (Amazon ECS).
  3. The digital assistant uses a custom LangChain agent to answer questions on the bank’s products and services and orchestrate the onboarding flow.
  4. If the user asks a general question related to the bank’s products or service, the agent will use a custom LangChain tool called ProductSearch. This tool uses Amazon Kendra linked with an Amazon Simple Storage Service (Amason S3) data source that contains the bank’s data. Amazon Kendra is an intelligent enterprise search service powered by machine learning that enables companies to index and search content across their document stores.
  5. If the user indicates that they want to open a new account, the agent will prompt the user for their email. After the user responds, the application will invoke a custom LangChain tool called EmailValidation. This tools checks if there is an existing account in the bank’s Amazon DynamoDB database, by calling an endpoint deployed in Amazon API Gateway.
  6. After the email validation, KYC information is gathered, such as first and last name. Then, the user is prompted for an identity document, which is uploaded to Amazon S3.
  7. The agent will invoke a custom LangChain tool called IDVerification. This tool checks if the user details entered during the session match the ID by calling an endpoint deployed in Amazon API Gateway. The details are verified by extracting the document text using Amazon Textract, a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents.
  8. After the ID verification, the user is asked for a selfie. The image is uploaded to Amazon S3. Then, the agent will invoke a custom LangChain tool called SelfieVerification. This tool checks if the uploaded selfie matches the face on the ID by calling an endpoint deployed in API Gateway. The face match is detected using Amazon Rekognition, which offers pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos.
  9. After the face verification is successful, the agent will use a custom LangChain tool called SaveData. This tool creates a new account in the bank’s DynamoDB database by calling an endpoint deployed in API Gateway.
  10. The user is notified that their new account has been created successfully, using Amazon Simple Email Service (Amazon SES).

Prompt design for agent orchestration

Now, let’s take a look at how we give our digital assistant, Penny, the capability to handle onboarding for financial services. The key is the prompt engineering for the custom LangChain agent. This has been specified in PennyAgent.py. This prompt includes onboarding stages and relevant LangChain tools that the agent might need to complete the onboarding steps.

To begin, we provide the agent with a name, role and company.

AGENT_TOOLS_PROMPT = """
Never forget your name is {assistant_name}. You work as a {assistant_role}.
You work at company named {bank_name}

Next, we define the various stages of onboarding and specify the respective tools and expected responses. Having stages in a sequential and structured format while also providing awareness of all possible stages helps the agent determine the onboarding stage with accuracy.

<STAGES>

These are the stages:

Introduction or greeting:  When conversation history is empty, choose stage 1
Response: Start the conversation with a greeting. Say that you can help with {bank_name} related questions or open a bank account for them. Do this only during the start of the conversation.
Tool: 
    
General Banking Questions: Customer asks general questions about AnyBank
Response: Use ProductSearch tool to get the relevant information and answer the question like a banking assistant. Never assume anything.
Tool: ProductSearch
    
Account Open 1: Customer has requested to open an account.
Response: Customer has requested to open an account. Now, respond with a question asking for the customer's email address only to get them started with onboarding. We need the email address to start the process.
Tool:
    
Account Open 2: User provided their email.
Response: Take the email and validate it using a EmailValidation tool. If it is valid and there is no existing account with the email, ask for account type: either CHEQUING or SAVINGS. If it is invalid or there is an existing account with the email, the user must try again. 
Tool: EmailValidation
    
Account Open 3: User provided which account type to open.
Response: Ask the user for their first name
Tool: 

Account Open 4: User provided first name.
Response: Ask the user for their last name
Tool: 

Account Open 5: User provided last name.
Response: Ask the user to upload an identity document.
Tool:
    
Account Open 6: Penny asked for identity document and then System notified that a new file has been uploaded
Response: Take the identity file name and verify it using the IDVerification tool. If the verification is unsuccessful, ask the user to try again. 
Tool: IDVerification
    
Account Open 7: The ID document is valid. 
Response: Ask the user to upload their selfie to compare their face to the ID.
Tool:
    
Account Open 8: Penny asked user for their selfie and then "System notified that a file has been uploaded. "
Response: Take the "selfie" file name and verify it using the SelfieVerification tool. If there is no face match, ask the user to try again.
Tool: SelfieVerification: Use this tool to verify the user selfie and compare faces. 
    
Account Open 9: Face match verified
Response: Give the summary of the all the information you collected and ask user to confirm. 
Tool:
        
Account Open 10: Confirmation
Response: Save the user data for future reference using SaveData tool. Upon saving the data, let the user know that they will receive an email confirmation of the bank account opening.
Tool: SaveData

We append the tools, their descriptions, and their response formats to the prompt. When calling on a specific tool, the agent can generate input parameters as required. Access to all the tools helps the agent identify the best tool choice based on the conversation stage.

TOOLS:
------
Penny has access to the following tools:
{tools}

We include some guidelines that the agent needs to follow while generating outputs. By using emotion-based prompt engineering, we minimize hallucinations and deviation from expected outputs. These guidelines were chosen after extensive testing to minimize edge cases and help prevent common agent mistakes.

<GUIDELINES>

1. If you ever assume any user response without asking, it may cause significant consequences.
2. It is of high priority that you respond and use appropriate tools in their respective stages. If not, it may cause significant consequences.
3. It is of high priority that you never reveal the tools or tool names to the user. Only communicate the outcome.
4. It is critical that you never reveal any details provided by the System including file names. 
5. If ever the user deviates by asking general question during your account opening process, Retrieve the necessary information using 'ProductSearch' tool and answer the question. With confidence, ask user if they want to resume the account opening process and continue from where we left off. 

The agent uses the ReAct framework to make decisions about how to respond based on user input. ReAct provides the agent with a thinking structure, through which it selects the most appropriate tool for a given task. Such frameworks make LLM agents versatile and adaptable to different use cases.

Based on the stage descriptions and the tools available, if the LLM generates a response that requires access to an external tool, then the response of the LLM will include Thought, Decision, Action, Action Input and Observation. The agent comes with a string matcher, which will detect Action and Action Input from the LLM’s response and trigger the respective tool. Based on the response from the tool, the LLM with decide whether to proceed with the Final Answer, and then the output will be returned by the agent.

FORMAT:
------

To use a tool, please always use the following format:
```
Thought: {input}
Decision: Do I need to use a tool? y
Action: what tool to use, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
```
When I am finished, I will have a response like this: 

Final Answer: [your response as a banking assistant]

Finally, we give the agent access to the conversation history to better decide what stage the conversation is currently in. In addition, we also give access to an agent scratchpad where it can store its thought processes to execute certain actions.

Be confident that you are a banking assistant and only respond with final answer.
Begin!

<Conversation history>
{conversation_history}

{agent_scratchpad}

Orchestrating intelligent digital assistants requires thoughtful prompt engineering to handle complex tasks. By structuring the conversation into stages, providing tooling, and setting guidelines, we enable the assistant to systematically complete the onboarding process. This approach allows assistants to scale across use cases while maintaining accuracy. With the right guardrails, assistants can deliver smooth, trustworthy customer experiences.

Prompt design is key to unlocking the versatility of LLMs for real-world automation. Amazon Bedrock Prompt Management can be used to streamline the creation, evaluation, versioning and testing of prompts. This will help developers and prompt engineers save time by applying the same prompt to different onboarding processes. When you create a prompt, you can select a different model for inference and adjust the variables to obtain the best-suited results for a variety of workflows.

The following sections explain how to deploy the solution in your AWS account.

Note: Running this workload would have an estimated hourly cost of $1.34 for the Oregon (us-west-2) AWS Region. Check the pricing details for each service to understand the costs you might be charged for different usage tiers and resource configurations.

Setup

To deploy the agent, visit the project Github Repository, and use the following instructions:

  1. Ensure the pre-requisites are completed as mentioned in the README.
  2. Deploy the solution including the agent, tools infrastructure, and demo application—in that order—based on the instructions in the README.
  3. After the deployment is successful, visit the outputted domain where the demo application is running. You can now begin testing the agent.

Testing the agent

Begin your exploration by accessing the Amplify endpoint where the demonstration is hosted. The demonstration incorporates an interactive chat interface, enabling you to engage in a conversational exchange with the digital assistant, Penny. Whenever you want to initiate a new instance of the agent, refresh the web page.

Let’s start talking to Penny:

  1. Enter Hi

Penny will respond with a friendly greeting

  1. Enter What are the cutoff times to receive wire transfers on the same day?

Penny will use the ProductSearch tool to find the relevant information from the loaded product catalog. You can try asking other questions about the bank’s product or services including the AnyBank Travel Rewards Visa Infinite Card or New Vehicle Loans.

  1. Enter I would like to open a new bank account

Penny will recognize that the account opening flow needs to be initiated and will proceed with the first step, which is asking you for an email address.

Open bank account

  1. Enter the verified customer email you registered with the Amazon SES identity. For our demonstration, we will use anup@test.com(parameter SesCustomerEmail used in the example command to setup infrastructure)

Penny will take the email address and run the EmailValidation Tool. If there is an existing account with this email, it will ask you to retry. Otherwise, it will move on to the next step which is gathering your account type.

  1. Enter I want a savings account or indicate that you want a checking account.

Penny will record your account type and move on to the KYC questions.

  1. Enter Anup

Penny will record your first name and continue gathering the remaining KYC information.

KYC information

  1. Enter Ravi

It will record your last name and prompt you for an ID next. We used Ravi to match the ID document provided below.

  1. Download the picture ID. It’s also located at ./api/lambdas/test/passport.png

Sample passport

Upload it to the chat by selecting Choose File.

After uploading the image, you will receive a confirmation message on the chat stating We have received your document. Penny will use ID Verification to compare the name entered during the session to the document. After verification is complete, Penny will prompt you to upload a selfie.

  1. Upload the selfie located at ./api/lambdas/test/selfie.png to the chat by selecting Choose File.

Sample selfie

After the upload is complete, you will receive a confirmation message on the chat stating We have received your document. Penny will use Selfie Verification to compare the face on the ID to the selfie for a face match. After verification is complete, Penny will prompt you to confirm that you want to proceed.

ID verification

  1. Enter Yes I confirm

Confirmation email

Penny will use Create Account to complete the onboarding process and send an email confirmation. It will inform to you of this update in the chat.

New account creation

Check the customer email you used. The email address specified as the SesCustomerEmail parameter (in this example: anup@test.com) during setup will receive a new email from the email address you set as the SesBankEmail parameter (in this example: owner@anybank.com).

  1. Go to the DynamoDB console, select Table from the navigation pane and select the table created by the AWS CloudFormation This is the accounts table in the bank’s AWS account. From the Table page, choose Explore items. You will see a new account created with the details that you entered.

Account creation DynamoDB

Guardrails and security

Security is a critical part of any application and must be rigorously addressed when developing and deploying solutions, especially those that involve handling sensitive data or interacting with users. For a solution similar to the example in this post, several robust security measures should be implemented to maintain the confidentiality, integrity, and availability of the system.

  • Address the security of the service itself. One approach to mitigate potential biases, toxicity, or other undesirable outputs is to use Constitutional AI techniques, such as those provided by the LangChain library or Guardrails for Amazon Bedrock. By defining and enforcing a set of rules or constraints, the system can be trained to generate outputs that align with predefined ethical principles and values, thereby enhancing the trustworthiness and reliability of the service.
  • To maintain data protection and privacy, implementing a write-only database architecture is recommended. In this setup, the agent or service can write data to the database but is prohibited from reading or retrieving sensitive stored information. This measure effectively isolates sensitive user data, making sure that the agent would be unable to access or disclose confidential details even in the event of a compromise.
  • Prompt injection attacks, where malicious inputs are crafted to manipulate the system’s behavior, are a serious concern in conversational AI systems. To mitigate this risk, it’s crucial to implement robust input validation and sanitization mechanisms. This could include techniques like whitelisting permissible characters, filtering out potentially harmful patterns, and employing context-aware input processing.
  • Secure coding practices, such as input validation, output encoding, and proper error handling, should be rigorously followed throughout the development process. Regular security audits, penetration testing, and vulnerability assessments should be conducted to identify and address potential weaknesses in the system.
  • Amazon API Gateway, a fully managed service, securely handles API traffic, acting as a front door for applications running on AWS. It supports multiple security mechanisms, including AWS Identity and Access Management (IAM) for authentication and authorization, AWS WAF for web application protection, AWS Secrets Manager for securely storing and retrieving secrets, and integration with AWS CloudTrail for API activity logging. API Gateway also supports client-side SSL certificates, API keys, and resource policies for granular access control.
  • Communication between users, the solution, and its internal dependencies should be protected using TLS to encrypt data in transit.
  • Additionally, the data should be encrypted using data-at-rest encryption with AWS Key Management Service (AWS KMS) customer managed keys (CMK).

By implementing these robust security measures and fostering a culture of continuous security awareness and improvement, the solution can better protect against potential threats, safeguard user privacy, and maintain the integrity and reliability of the service.

Cleanup

Follow the cleanup Instructions in the README of the Github repository to remove the environment from your account.

Conclusion

In this post, we presented an end-to-end solution that demonstrates how banks can transform user onboarding with an AI-powered digital assistant. By orchestrating workflows across AWS services, we enabled automated, secure account opening within minutes. The conversational interface delivers exceptional customer experiences while reducing operational costs.

This solution can be quickly deployed and enhanced using the features of Amazon Bedrock. Amazon Bedrock Agents streamlines workflows by executing multistep tasks and integrating with company systems and data sources. Amazon Bedrock Knowledge Bases provides contextual information from proprietary data sources, enhancing the accuracy and relevance of responses. Additionally, Amazon Bedrock Guardrails implements safeguards to enable responsible AI usage, filtering harmful content and protecting sensitive information. These can enable a robust and secure deployment of an AI-powered onboarding solution.

Key outcomes of this solution include:

  • Fully digital onboarding without paper forms or branch visits
  • Automated KYC verification using documents and facial recognition
  • Customers onboarded securely in minutes with email confirmations
  • Lower costs by reducing manual verification workloads
  • Personalized assistance for any product questions 24/7

Instant, secure, and scalable delivery has become the norm that customers demand. This AI assistant solution, powered by AWS, showcases the potential future of user onboarding for financial institutions. As consumer behaviors and expectations continue to be influenced by the latest digital experiences across industries, banks that invest in advanced technologies will gain a competitive edge over their rivals.

Ready to future proof your banking experience? Visit Artificial Intelligence and Machine learning for Financial services with AWS.


About the authors

Anup Ravindranath is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada working with Financial Services organizations. He helps customers to transform their businesses and innovate on cloud.

Arya Subramanyam is a Solutions Architect based in Toronto, Canada. She works with Enterprise Greenfield customers as well as Small & Medium businesses as a technical advisor, helping them solve business challenges with cloud solutions. Arya holds a Bachelor of Applied Science in Computer Engineering from the University of British Columbia, Vancouver. Her passion for Generative AI has led her to develop various solutions leveraging Large Language Models (LLMs) with a focus on prompt engineering and AI agents.

Venkata Satyanarayana Chivatam is a Solutions Architect at AWS. He specializes in Generative AI and Computer Vision, with a particular focus on driving adoption across industries such as healthcare and finance. At AWS, he helps ISV and SMB customers leverage cutting-edge AI technologies to unlock new possibilities and solve complex challenges. He is passionate about supporting businesses of all sizes in their AI journey.

Akshata Ramesh Rao is a Solutions Architect in Toronto, Canada. Akshata works with enterprise customers to accelerate innovation and advise them through technical challenges. She also loves working with SMB customers and help them reach their business objectives quickly, safely, and cost-effectively with AWS services, frameworks, and best practices. Prior to joining AWS, Akshata worked as Devops Engineer at Amazon and holds a master’s degree in computer science from University of Ottawa.

Read More

Build a generative AI Slack chat assistant using Amazon Bedrock and Amazon Kendra

Build a generative AI Slack chat assistant using Amazon Bedrock and Amazon Kendra

Despite the proliferation of information and data in business environments, employees and stakeholders often find themselves searching for information and struggling to get their questions answered quickly and efficiently. This can lead to productivity losses, frustration, and delays in decision-making.

A generative AI Slack chat assistant can help address these challenges by providing a readily available, intelligent interface for users to interact with and obtain the information they need. By using the natural language processing and generation capabilities of generative AI, the chat assistant can understand user queries, retrieve relevant information from various data sources, and provide tailored, contextual responses.

By harnessing the power of generative AI and Amazon Web Services (AWS) services Amazon Bedrock, Amazon Kendra, and Amazon Lex, this solution provides a sample architecture to build an intelligent Slack chat assistant that can streamline information access, enhance user experiences, and drive productivity and efficiency within organizations.

Why use Amazon Kendra for building a RAG application?

Amazon Kendra is a fully managed service that provides out-of-the-box semantic search capabilities for state-of-the-art ranking of documents and passages. You can use Amazon Kendra to quickly build high-accuracy generative AI applications on enterprise data and source the most relevant content and documents to maximize the quality of your Retrieval Augmented Generation (RAG) payload, yielding better large language model (LLM) responses than using conventional or keyword-based search solutions. Amazon Kendra offers simple-to-use deep learning search models that are pre-trained on 14 domains and don’t require machine learning (ML) expertise. Amazon Kendra can index content from a wide range of sources, including databases, content management systems, file shares, and web pages.

Further, the FAQ feature in Amazon Kendra complements the broader retrieval capabilities of the service, allowing the RAG system to seamlessly switch between providing prewritten FAQ responses and dynamically generating responses by querying the larger knowledge base. This makes it well-suited for powering the retrieval component of a RAG system, allowing the model to access a broad knowledge base when generating responses. By integrating the FAQ capabilities of Amazon Kendra into a RAG system, the model can use a curated set of high-quality, authoritative answers for commonly asked questions. This can improve the overall response quality and user experience, while also reducing the burden on the language model to generate these basic responses from scratch.

This solution balances retaining customizations in terms of model selection, prompt engineering, and adding FAQs with not having to deal with word embeddings, document chunking, and other lower-level complexities typically required for RAG implementations.

Solution overview

The chat assistant is designed to assist users by answering their questions and providing information on a variety of topics. The purpose of the chat assistant is to be an internal-facing Slack tool that can help employees and stakeholders find the information they need.

The architecture uses Amazon Lex for intent recognition, AWS Lambda for processing queries, Amazon Kendra for searching through FAQs and web content, and Amazon Bedrock for generating contextual responses powered by LLMs. By combining these services, the chat assistant can understand natural language queries, retrieve relevant information from multiple data sources, and provide humanlike responses tailored to the user’s needs. The solution showcases the power of generative AI in creating intelligent virtual assistants that can streamline workflows and enhance user experiences based on model choices, FAQs, and modifying system prompts and inference parameters.

Architecture diagram

The following diagram illustrates a RAG approach where the user sends a query through the Slack application and receives a generated response based on the data indexed in Amazon Kendra. In this post, we use Amazon Kendra Web Crawler as the data source and include FAQs stored on Amazon Simple Storage Service (Amazon S3). See Data source connectors for a list of supported data source connectors for Amazon Kendra.

ML-16837-arch-diag

The step-by-step workflow for the architecture is the following:

  1. The user sends a query such as What is the AWS Well-Architected Framework? through the Slack app.
  2. The query goes to Amazon Lex, which identifies the intent.
  3. Currently two intents are configured in Amazon Lex (Welcome and FallbackIntent).
  4. The welcome intent is configured to respond with a greeting when a user enters a greeting such as “hi” or “hello.” The assistant responds with “Hello! I can help you with queries based on the documents provided. Ask me a question.”
  5. The fallback intent is fulfilled with a Lambda function.
    1. The Lambda function searches Amazon Kendra FAQs through the search_Kendra_FAQ method by taking the user query and Amazon Kendra index ID as inputs. If there’s a match with a high confidence score, the answer from the FAQ is returned to the user.
      def search_Kendra_FAQ(question, kendra_index_id):
          """
          This function takes in the question from the user, and checks if the question exists in the Kendra FAQs.
          :param question: The question the user is asking that was asked via the frontend input text box.
          :param kendra_index_id: The kendra index containing the documents and FAQs
          :return: If found in FAQs, returns the answer along with any relevant links. If not, returns False and then calls kendra_retrieve_document function.
          """
          kendra_client = boto3.client('kendra')
          response = kendra_client.query(IndexId=kendra_index_id, QueryText=question, QueryResultTypeFilter='QUESTION_ANSWER')
          for item in response['ResultItems']:
              score_confidence = item['ScoreAttributes']['ScoreConfidence']
              # Taking answers from FAQs that have a very high confidence score only
              if score_confidence == 'VERY_HIGH' and len(item['AdditionalAttributes']) > 1:
                  text = item['AdditionalAttributes'][1]['Value']['TextWithHighlightsValue']['Text']
                  url = "None"
                  if item['DocumentURI'] != '':
                      url = item['DocumentURI']
                  return (text, url)
          return (False, False)

    2. If there isn’t a match with a high enough confidence score, relevant documents from Amazon Kendra with a high confidence score are retrieved through the kendra_retrieve_document method and sent to Amazon Bedrock to generate a response as the context.
      def kendra_retrieve_document(question, kendra_index_id):
          """
          This function takes in the question from the user, and retrieves relevant passages based on default PageSize of 10.
          :param question: The question the user is asking that was asked via the frontend input text box.
          :param kendra_index_id: The kendra index containing the documents and FAQs
          :return: Returns the context to be sent to the LLM and document URIs to be returned as relevant data sources.
          """
          kendra_client = boto3.client('kendra')
          documents = kendra_client.retrieve(IndexId=kendra_index_id, QueryText=question)
          text = ""
          uris = set()
          if len(documents['ResultItems']) > 0:
              for i in range(len(documents['ResultItems'])):
                  score_confidence = documents['ResultItems'][i]['ScoreAttributes']['ScoreConfidence']
                  if score_confidence == 'VERY_HIGH' or score_confidence == 'HIGH':
                      text += documents['ResultItems'][i]['Content'] + "n"
                      uris.add(documents['ResultItems'][i]['DocumentURI'])
          return (text, uris)

    3. The response is generated from Amazon Bedrock with the invokeLLM method. The following is a snippet of the invokeLLM method within the fulfillment function. Read more on inference parameters and system prompts to modify parameters that are passed into the Amazon Bedrock invoke model request.
      def invokeLLM(question, context, modelId):
          """
          This function takes in the question from the user, along with the Kendra responses as context to generate an answer
          for the user on the frontend.
          :param question: The question the user is asking that was asked via the frontend input text box.
          :param documents: The response from the Kendra document retrieve query, used as context to generate a better
          answer.
          :return: Returns the final answer that will be provided to the end-user of the application who asked the original
          question.
          """
          # Setup Bedrock client
          bedrock = boto3.client('bedrock-runtime')
          # configure model specifics such as specific model
          modelId = modelId
      
          # body of data with parameters that is passed into the bedrock invoke model request
          body = json.dumps({"max_tokens": 350,
                  "system": "You are a truthful AI assistant. Your goal is to provide informative and substantive responses to queries based on the documents provided. If you do not know the answer to a question, you truthfully say you do not know.",
                  "messages": [{"role": "user", "content": "Answer this user query:" + question + "with the following context:" + context}],
                  "anthropic_version": "bedrock-2023-05-31",
                      "temperature":0,
                  "top_k":250,
                  "top_p":0.999})
      
          # Invoking the bedrock model with your specifications
          response = bedrock.invoke_model(body=body,
                                          modelId=modelId)
          # the body of the response that was generated
          response_body = json.loads(response.get('body').read())
          # retrieving the specific completion field, where you answer will be
          answer = response_body.get('content')
          # returning the answer as a final result, which ultimately gets returned to the end user
          return answer

    4. Finally, the response generated from Amazon Bedrock along with the relevant referenced URLs are returned to the end user.

    When selecting websites to index, adhere to the AWS Acceptable Use Policy and other AWS terms. Remember that you can only use Amazon Kendra Web Crawler to index your own web pages or web pages that you have authorization to index. Visit the Amazon Kendra Web Crawler data source guide to learn more about using the web crawler as a data source. Using Amazon Kendra Web Crawler to aggressively crawl websites or web pages you don’t own is not considered acceptable use.

    Supported features

    The chat assistant supports the following features:

    1. Support for the following Anthropic’s models on Amazon Bedrock:
      • claude-v2
      • claude-3-haiku-20240307-v1:0
      • claude-instant-v1
      • claude-3-sonnet-20240229-v1:0
    2. Support for FAQs and the Amazon Kendra Web Crawler data source
    3. Returns FAQ answers only if the confidence score is VERY_HIGH
    4. Retrieves only documents from Amazon Kendra that have a HIGH or VERY_HIGH confidence score
    5. If documents with a high confidence score aren’t found, the chat assistant returns “No relevant documents found”

    Prerequisites

    To perform the solution, you need to have following prerequisites:

    • Basic knowledge of AWS
    • An AWS account with access to Amazon S3 and Amazon Kendra
    • An S3 bucket to store your documents. For more information, see Step 1: Create your first S3 bucket and the Amazon S3 User Guide.
    • A Slack workspace to integrate the chat assistant
    • Permission to install Slack apps in your Slack workspace
    • Seed URLs for the Amazon Kendra Web Crawler data source
      • You’ll need authorization to crawl and index any websites provided
    • AWS CloudFormation for deploying the solution resources

    Build a generative AI Slack chat assistant

    To build a Slack application, use the following steps:

    1. Request model access on Amazon Bedrock for all Anthropic models
    2. Create an S3 bucket in the us-east-1 (N. Virginia) AWS Region.
    3. Upload the AIBot-LexJson.zip and SampleFAQ.csv files to the S3 bucket
    4. Launch the CloudFormation stack in the us-east-1 (N. Virginia) AWS Region.Launch Stack to create solution resources
    5. Enter a Stack name of your choice
    6. For S3BucketName, enter the name of the S3 bucket created in Step 2
    7. For S3KendraFAQKey, enter the name of the SampleFAQs uploaded to the S3 bucket in Step 3
    8. For S3LexBotKey, enter the name of the Amazon Lex .zip file uploaded to the S3 bucket in Step 3
    9. For SeedUrls, enter up to 10 URLs for the web crawler as a comma delimited list. In the example in this post, we give the publicly available Amazon Bedrock service page as the seed URL
    10. Leave the rest as defaults. Choose Next. Choose Next again on the Configure stack options
    11. Acknowledge by selecting the box and choose Submit, as shown in the following screenshot
      ML-16837-cfn-checkbox
    12. Wait for the stack creation to complete
    13. Verify all resources are created
    14. Test on the AWS Management Console for Amazon Lex
      1. On the Amazon Lex console, choose your chat assistant ${YourStackName}-AIBot
      2. Choose Intents
      3. Choose Version 1 and choose Test, as shown in the following screenshot
        ML-16837-lex-version1
      4. Select the AIBotProdAlias and choose Confirm, as shown in the following screenshot. If you want to make changes to the chat assistant, you can use the draft version, publish a new version, and assign the new version to the AIBotProdAlias. Learn more about Versioning and Aliases.
      5. Test the chat assistant with questions such as, “Which AWS service has 11 nines of durability?” and “What is the AWS Well-Architected Framework?” and verify the responses. The following table shows that there are three FAQs in the sample .csv file.
        _question _answer _source_uri
        Which AWS service has 11 nines of durability? Amazon S3 https://aws.amazon.com/s3/
        What is the AWS Well-Architected Framework? The AWS Well-Architected Framework enables customers and partners to review their architectures using a consistent approach and provides guidance to improve designs over time. https://aws.amazon.com/architecture/well-architected/
        In what Regions is Amazon Kendra available? Amazon Kendra is currently available in the following AWS Regions: Northern Virginia, Oregon, and Ireland https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
      6. The following screenshot shows the question “Which AWS service has 11 nines of durability?” and its response. You can observe that the response is the same as in the FAQ file and includes a link.
        ML-16837-Q1inLex
      7. Based on the pages you have crawled, ask a question in the chat. For this example, the publicly available Amazon Bedrock page was crawled and indexed. The following screenshot shows the question, “What are agents in Amazon Bedrock?” and and a generated response that includes relevant links.
        ML-16837-Q2inLex
    1. For integration of the Amazon Lex chat assistant with Slack, see Integrating an Amazon Lex V2 bot with Slack. Choose the AIBotProdAlias under Alias in the Channel Integrations

    Run sample queries to test the solution

    1. In Slack, go to the Apps section. In the dropdown menu, choose Manage and select Browse apps.
      ML-16837-slackBrowseApps
    2. Search for ${AIBot} in App Directory and choose the chat assistant. This will add the chat assistant to the Apps section in Slack. You can now start asking questions in the chat. The following screenshot shows the question “Which AWS service has 11 nines of durability?” and its response. You can observe that the response is the same as in the FAQ file and includes a link.
      ML-16837-Q1slack
    3. The following screenshot shows the question, “What is the AWS Well-Architected Framework?” and its response.
      ML-16837-Q2slack
    4. Based on the pages you have crawled, ask a question in the chat. For this example, the publicly available Amazon Bedrock page was crawled and indexed. The following screenshot shows the question, “What are agents in Amazon Bedrock?” and and a generated response that includes relevant links.
      ML-16837-Q3slack
    5. The following screenshot shows the question, “What is amazon polly?” Because there is no Amazon Polly documentation indexed, the chat assistant responds with “No relevant documents found,” as expected.
      ML-16837-Q4slack

    These examples show how the chat assistant retrieves documents from Amazon Kendra and provides answers based on the documents retrieved. If no relevant documents are found, the chat assistant responds with “No relevant documents found.”

    Clean up

    To clean up the resources created by this solution:

    1. Delete the CloudFormation stack by navigating to the CloudFormation console
    2. Select the stack you created for this solution and choose Delete
    3. Confirm the deletion by entering the stack name in the provided field. This will remove all the resources created by the CloudFormation template, including the Amazon Kendra index, Amazon Lex chat assistant, Lambda function, and other related resources.

    Conclusion

    This post describes the development of a generative AI Slack application powered by Amazon Bedrock and Amazon Kendra. This is designed to be an internal-facing Slack chat assistant that helps answer questions related to the indexed content. The solution architecture includes Amazon Lex for intent identification, a Lambda function for fulfilling the fallback intent, Amazon Kendra for FAQ searches and indexing crawled web pages, and Amazon Bedrock for generating responses. The post walks through the deployment of the solution using a CloudFormation template, provides instructions for running sample queries, and discusses the steps for cleaning up the resources. Overall, this post demonstrates how to use various AWS services to build a powerful generative AI–powered chat assistant application.

    This solution demonstrates the power of generative AI in building intelligent chat assistants and search assistants. Explore the generative AI Slack chat assistant: Invite your teams to a Slack workspace and start getting answers to your indexed content and FAQs. Experiment with different use cases and see how you can harness the capabilities of services like Amazon Bedrock and Amazon Kendra to enhance your business operations. For more information about using Amazon Bedrock with Slack, refer to Deploy a Slack gateway for Amazon Bedrock.


    About the authors

    Kruthi Jayasimha Rao is a Partner Solutions Architect with a focus on AI and ML. She provides technical guidance to AWS Partners in following best practices to build secure, resilient, and highly available solutions in the AWS Cloud.

    Mohamed Mohamud is a Partner Solutions Architect with a focus on Data Analytics. He specializes in streaming analytics, helping partners build real-time data pipelines and analytics solutions on AWS. With expertise in services like Amazon Kinesis, Amazon MSK, and Amazon EMR, Mohamed enables data-driven decision-making through streaming analytics.

Read More

Create your fashion assistant application using Amazon Titan models and Amazon Bedrock Agents

Create your fashion assistant application using Amazon Titan models and Amazon Bedrock Agents

In the generative AI era, agents that simulate human actions and behaviors are emerging as a powerful tool for enterprises to create production-ready applications. Agents can interact with users, perform tasks, and exhibit decision-making abilities, mimicking humanlike intelligence. By combining agents with foundation models (FMs) from the Amazon Titan in Amazon Bedrock family, customers can develop multimodal, complex applications that enable the agent to understand and generate natural language or images.

For example, in the fashion retail industry, an assistant powered by agents and multimodal models can provide customers with a personalized and immersive experience. The assistant can engage in natural language conversations, understanding the customer’s preferences and intents. It can then use the multimodal capabilities to analyze images of clothing items and make recommendations based on the customer’s input. Additionally, the agent can generate visual aids, such as outfit suggestions, enhancing the overall customer experience.

In this post, we implement a fashion assistant agent using Amazon Bedrock Agents and the Amazon Titan family models. The fashion assistant provides a personalized, multimodal conversational experience. Among others, the capabilities of Amazon Titan Image Generator to inpaint and outpaint images can be used to generate fashion inspirations and edit user photos. Amazon Titan Multimodal Embeddings models can be used to search for a style on a database using both a prompt text or a reference image provided by the user to find similar styles. Anthropic Claude 3 Sonnet is used by the agent to orchestrate the agent’s actions, for example, search for the current weather to receive weather-appropriate outfit recommendations. A simple web UI through Streamlit provides the user with the best experience to interact with the agent.

The fashion assistant agent can be smoothly integrated into existing ecommerce platforms or mobile applications, providing customers with a seamless and delightful experience. Customers can upload their own images, describe their desired style, or even provide a reference image, and the agent will generate personalized recommendations and visual inspirations.

The code used in this solution is available in the GitHub repository.

Solution overview

The fashion assistant agent uses the power of Amazon Titan models and Amazon Bedrock Agents to provide users with a comprehensive set of style-related functionalities:

  • Image-to-image or text-to-image search – This tool allows customers to find products similar to styles they like from the catalog, enhancing their user experience. We use the Titan Multimodal Embeddings model to embed each product image and store them in Amazon OpenSearch Serverless for future retrieval.
  • Text-to-image generation – If the desired style is not available in the database, this tool generates unique, customized images based on the user’s query, enabling the creation of personalized styles.
  • Weather API connection – By fetching weather information for a given location mentioned in the user’s prompt, the agent can suggest appropriate styles for the occasion, making sure the customer is dressed for the weather.
  • Outpainting – Users can upload an image and request to change the background, allowing them to visualize their preferred styles in different settings.
  • Inpainting – This tool enables users to modify specific clothing items in an uploaded image, such as changing the design or color, while keeping the background intact.

The following flow chart illustrates the decision-making process:

Agent Execution Flowchart

And the corresponding architecture diagram:

Prerequisites

To set up the fashion assistant agent, make sure you have the following:

  • An active AWS account and AWS Identity and Access Management (IAM) role with Amazon Bedrock, AWS Lambda, and Amazon Simple Storage (Amazon S3) access
  • Installation of required Python libraries such as Streamlit
  • Anthropic Claude 3 Sonnet, Amazon Titan Image Generator and Amazon Titan Multimodal Embeddings models enabled in Amazon Bedrock. You can confirm these are enabled on the Model access page of the Amazon Bedrock console. If these models are enabled, the access status will show as Access granted, as shown in the following screenshot.

Before executing the notebook provided in the GitHub repo to start building the infrastructure, make sure your AWS account has permission to:

  • Create managed IAM roles and policies
  • Create and invoke Lambda functions
  • Create, read from, and write to S3 buckets
  • Access and manage Amazon Bedrock agents and models

If you want to enable the image-to-image or text-to-image search capabilities, additional permissions for your AWS account are required:

  • Create security policy, access policy, collect, index, and index mapping on OpenSearch Serverless
  • Call the BatchGetCollection on OpenSearch Serverless

Set up the fashion assistant agent

To set up the fashion assistant agent, follow these steps:

  1. Clone the GitHub repository using the command
    git clone

  2. Complete the prerequisites to grant sufficient permissions
  3. Follow the deployment steps outlined in the README.md
  4. (Optional) If you want to use the image_lookup feature, execute code snippets in opensearch_ingest.ipynb to use Amazon Titan Multimodal Embeddings to embed and store sample images
  5. Run the Streamlit UI to interact with the agent using the command
    streamlit run frontend/app.py

By following these steps, you can create a powerful and engaging fashion assistant agent that combines the capabilities of Amazon Titan models with the automation and decision-making capabilities of Amazon Bedrock Agents.

Test the fashion assistant

After the fashion assistant is set up, you can interact with it through the Streamlit UI. Follow these steps:

  1. Navigate to your Streamlit UI, as shown in the following screenshot

  1. Upload an image or enter a text prompt describing the desired style, according to the desired action, for example, image search, image generation, outpainting, or inpainting. The following screenshot shows an example prompt.

Streamlit UI Example Two

  1. Press enter to send the prompt to the agent. You can view the chain-of-thought (CoT) process of the agent in the UI, as shown in the following screenshot

Streamlit UI Example Three

  1. When the response is ready, you can view the agent’s response in the UI, as shown in the following screenshot. The response may include generated images, similar style recommendations, or modified images based on your request. You can download the generated images directly from the UI or check the image in your S3 bucket.

Streamlit UI Example Four

Clean up

To avoid unnecessary costs, make sure to delete the resources used in this solution. You can do this by running the following command.

cdk destroy

Conclusion

The fashion assistant agent, powered by Amazon Titan models and Amazon Bedrock Agents, is an example of how retailers can create innovative applications that enhance the customer experience and drive business growth. By using this solution, retailers can gain a competitive edge, offering personalized style recommendations, visual inspirations, and interactive fashion advice to their customers.

We encourage you to explore the potential of building more agents like this fashion assistant by checking out the examples available on the aws-samples GitHub repository.


 About the Authors

Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based solutions. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers leverage GenAI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a Ph.D. degree in Electrical Engineering. Outside of work, she loves traveling, working out and exploring new things.

antoniaAntonia Wiebeler is a Data Scientist at the AWS Generative AI Innovation Center, where she enjoys building proofs of concept for customers. Her passion is exploring how generative AI can solve real-world problems and create value for customers. While she is not coding, she enjoys running and competing in triathlons.

Alex Newton is a Data Scientist at the AWS Generative AI Innovation Center, helping customers solve complex problems with generative AI and machine learning. He enjoys applying state of the art ML solutions to solve real world challenges. In his free time you’ll find Alex playing in a band or watching live music.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.

Read More

How Aviva built a scalable, secure, and reliable MLOps platform using Amazon SageMaker

How Aviva built a scalable, secure, and reliable MLOps platform using Amazon SageMaker

This post is co-written with Dean Steel and Simon Gatie from Aviva.

With a presence in 16 countries and serving over 33 million customers, Aviva is a leading insurance company headquartered in London, UK. With a history dating back to 1696, Aviva is one of the oldest and most established financial services organizations in the world. Aviva’s mission is to help people protect what matters most to them—be it their health, home, family, or financial future. To achieve this effectively, Aviva harnesses the power of machine learning (ML) across more than 70 use cases. Previously, ML models at Aviva were developed using a graphical UI-driven tool and deployed manually. This approach led to data scientists spending more than 50% of their time on operational tasks, leaving little room for innovation, and posed challenges in monitoring model performance in production.

In this post, we describe how Aviva built a fully serverless MLOps platform based on the AWS Enterprise MLOps Framework and Amazon SageMaker to integrate DevOps best practices into the ML lifecycle. This solution establishes MLOps practices to standardize model development, streamline ML model deployment, and provide consistent monitoring. We illustrate the entire setup of the MLOps platform using a real-world use case that Aviva has adopted as its first ML use case.

The Challenge: Deploying and operating ML models at scale

Approximately 47% of ML projects never reach production, according to Gartner. Despite the advancements in open source data science frameworks and cloud services, deploying and operating these models remains a significant challenge for organizations. This struggle highlights the importance of establishing consistent processes, integrating effective monitoring, and investing in the necessary technical and cultural foundations for a successful MLOps implementation.

For companies like Aviva, which handles approximately 400,000 insurance claims annually, with expenditures of about £3 billion in settlements, the pressure to deliver a seamless digital experience to customers is immense. To meet this demand amidst rising claim volumes, Aviva recognizes the need for increased automation through AI technology. Therefore, developing and deploying more ML models is crucial to support their growing workload.

To prove the platform can handle onboarding and industrialization of ML models, Aviva picked their Remedy use case as their first project. This use case concerns a claim management system that employs a data-driven approach to determine whether submitted car insurance claims qualify as either total loss or repair cases, as illustrated in the following diagram

Remedy Use Case

  1. The workflow consists of the following steps:
  2. The workflow begins when a customer experiences a car accident.
  3. The customer contacts Aviva, providing information about the incident and details about the damage.
  4. To determine the estimated cost of repair, 14 ML models and a set of business rules are used to process the request.
  5. The estimated cost is compared with the car’s current market value from external data sources.
  6. Information related to similar cars for sale nearby is included in the analysis.
  7. Based on the processed data, a recommendation is made by the model to either repair or write off the car. This recommendation, along with the supporting data, is provided to the claims handler, and the pipeline reaches its final state.

The successful deployment and evaluation of the Remedy use case on the MLOps platform was intended to serve as a blueprint for future use cases, providing maximum efficiency by using templated solutions.

Solution overview of the MLOps platform

To handle the complexity of operationalizing ML models at scale, AWS offers provides an MLOps offering called AWS Enterprise MLOps Framework, which can be used for a wide variety of use cases. The offering encapsulates a best practices approach to build and manage MLOps platforms based on the consolidated knowledge gained from a multitude of customer engagements carried out by AWS Professional Services in the last five 5 years. The proposed baseline architecture can be logically divided into four building blocks which that are sequentially deployed into the provided AWS accounts, as illustrated in the following diagram below.

ML Ops Framework

The building blocks are as follows:

  • Networking – A virtual private cloud (VPC), subnets, security groups, and VPC endpoints are deployed across all accounts.
  • Amazon SageMaker Studio – SageMaker Studio offers a fully integrated ML integrated development environment (IDE) acting as a data science workbench and control panel for all ML workloads.
  • Amazon SageMaker Projects templates – These ready-made infrastructure sets cover the ML lifecycle, including continuous integration and delivery (CI/CD) pipelines and seed code. You can launch these from SageMaker Studio with a few clicks, either choosing from preexisting templates or creating custom ones.
  • Seed code – This refers to the data science code tailored for a specific use case, divided between two repositories: training (covering processing, training, and model registration) and inference (related to SageMaker endpoints). The majority of time in developing a use case should be dedicated to modifying this code.

The framework implements the infrastructure deployment from a primary governance account to separate development, staging, and production accounts. Developers can use the AWS Cloud Development Kit (AWS CDK) to customize the solution to align with the company’s specific account setup. In adapting the AWS Enterprise MLOps Framework to a three-account structure, Aviva has designated accounts as follows: development, staging, and production. This structure is depicted in the following architecture diagram. The governance components, which facilitate model promotions with consistent processes across accounts, have been integrated into the development account.

Architecture Diagram

Building reusable ML pipelines

The processing, training, and inference code for the Remedy use case was developed by Aviva’s data science team in SageMaker Studio, a cloud-based environment designed for collaborative work and rapid experimentation. When experimentation is complete, the resulting seed code is pushed to an AWS CodeCommit repository, initiating the CI/CD pipeline for the construction of a SageMaker pipeline. This pipeline comprises a series of interconnected steps for data processing, model training, parameter tuning, model evaluation, and the registration of the generated models in the Amazon SageMaker Model Registry.

SageMaker Pipeline

Amazon SageMaker Automatic Model Tuning enabled Aviva to utilize advanced tuning strategies and overcome the complexities associated with implementing parallelism and distributed computing. The initial step involved a hyperparameter tuning process (Bayesian optimization), during which approximately 100 model variations were trained (5 steps with 20 models trained concurrently in each step). This feature integrates with Amazon SageMaker Experiments to provide data scientists with insights into the tuning process. The optimal model is then evaluated in terms of accuracy, and if it exceeds a use case-specific threshold, it is registered in the SageMaker Model Registry. A custom approval step was constructed, such that only Aviva’s lead data scientist can permit the deployment of a model through a CI/CD pipeline to a SageMaker real-time inference endpoint in the development environment for further testing and subsequent promotion to the staging and production environment.

Serverless workflow for orchestrating ML model inference

To realize the actual business value of Aviva’s ML model, it was necessary to integrate the inference logic with Aviva’s internal business systems. The inference workflow is responsible for combining the model predictions, external data, and business logic to generate a recommendation for claims handlers. The recommendation is based on three possible outcomes:

  • Write off a vehicle (expected repairs cost exceeds the value of the vehicle)
  • Seek a repair (value of the vehicle exceeds repair cost)
  • Require further investigation given a borderline estimation of the value of damage and the price for a replacement vehicle

The following diagram illustrates the workflow.

Inference Workflow

The workflow starts with a request to an API endpoint hosted on Amazon API Gateway originating from a claims management system, which invokes an AWS Step Functions workflow that uses AWS Lambda to complete the following steps:

  1. The input data of the REST API request is transformed into encoded features, which is utilized by the ML model.
  2. ML model predictions are generated by feeding the input to the SageMaker real-time inference endpoints. Because Aviva processes daily claims at irregular intervals, real-time inference endpoints help overcome the challenge of providing predictions consistently at low latency.
  3. ML model predictions are further processed by a custom business logic to derive a final decision (of the three aforementioned options).
  4. The final decision, along with the generated data, is consolidated and transmitted back to the claims management system as a REST API response.

Monitor ML model decisions to elevate confidence amongst users

The ability to obtain real-time access to detailed data for each state machine run and task is critically important for effective oversight and enhancement of the system. This includes providing claim handlers with comprehensive details behind decision summaries, such as model outputs, external API calls, and applied business logic, to make sure recommendations are based on accurate and complete information. Snowflake is the preferred data platform, and it receives data from Step Functions state machine runs through Amazon CloudWatch logs. A series of filters screen for data pertinent to the business. This data then transmits to an Amazon Data Firehose delivery stream and subsequently relays to an Amazon Simple Storage Service (Amazon S3) bucket, which is accessed by Snowflake. The data generated by all runs is used by Aviva business analysts to create dashboards and management reports, facilitating insights such as monthly views of total losses by region or average repair costs by vehicle manufacturer and model.

Security

The described solution processes personally identifiable information (PII), making customer data protection the core security focus of the solution. The customer data is protected by employing networking restrictions, because processing is run inside the VPC, where data is logically separated in transit. The data is encrypted in transit between steps of the processing and encrypted at rest using AWS Key Management Service (AWS KMS). Access to the production customer data is restricted on a need-to-know basis, where only the authorized parties are allowed to access production environment where this data resides.

The second security focus of the solution is protecting Aviva’s intellectual property. The code the data scientists and engineers are working on is stored securely in the dev AWS account, private to Aviva, in the CodeCommit git repositories. The training data and the artifacts of the trained models are stored securely in the S3 buckets in the dev account, protected by AWS KMS encryption at rest, with AWS Identity and Access Management (IAM) policies restricting access to the buckets to only the authorized SageMaker endpoints. The code pipelines are private to the account as well, and reside in the customer’s AWS environment.

The auditability of the workflows is provided by logging the steps of inference and decision-making in the CloudWatch logs. The logs are encrypted at rest as well with AWS KMS, and are configured with a lifecycle policy, guaranteeing availability of audit information for the required compliance period. To maintain security of the project and operate it securely, the accounts are enabled with Amazon GuardDuty and AWS Config. AWS CloudTrail is used to monitor the activity within the accounts. The software to monitor for security vulnerabilities resides primarily in the Lambda functions implementing the business workflows. The processing code is primarily written in Python using libraries that are periodically updated.

Conclusion

This post provided an overview of the partnership between Aviva and AWS, which resulted in the construction of a scalable MLOps platform. This platform was developed using the open source AWS Enterprise MLOps Framework, which integrated DevOps best practices into the ML lifecycle. Aviva is now capable of replicating consistent processes and deploying hundreds of ML use cases in weeks rather than months. Furthermore, Aviva has transitioned entirely to a pay-as-you-go model, resulting in a 90% reduction in infrastructure costs compared to the company’s previous on-premises ML platform solution.

Explore the AWS Enterprise MLOps Framework on GitHub and learn more about MLOps on Amazon SageMaker to see how it can accelerate your organization’s MLOps journey.


About the Authors

Dean Steel is a Senior MLOps Engineer at Aviva with a background in Data Science and actuarial work. He is passionate about all forms of AI/ML with experience developing and deploying a diverse range of models for insurance-specific applications, from large transformers through to linear models. With an engineering focus, Dean is a strong advocate of combining AI/ML with DevSecOps in the cloud using AWS. In his spare time, Dean enjoys exploring music technology, restaurants and film.

Simon Gatie, Principle Analytics Domain Authority at Aviva in Norwich brings a diverse background in Physics, Accountancy, IT, and Data Science to his role. He leads Machine Learning projects at Aviva, driving innovation in data science and advanced technologies for financial services.

Gabriel Rodriguez is a Machine Learning Engineer at AWS Professional Services in Zurich. In his current role, he has helped customers achieve their business goals on a variety of ML use cases, ranging from setting up MLOps pipelines to developing a fraud detection application. Whenever he is not working, he enjoys doing physical exercises, listening to podcasts, or traveling.

Marco Geiger is a Machine Learning Engineer at AWS Professional Services based in Zurich. He works with customers from various industries to develop machine learning solutions that use the power of data for achieving business goals and innovate on behalf of the customer. Besides work, Marco is a passionate hiker, mountain biker, football player, and hobby barista.

Andrew Odendaal is a Senior DevOps Consultant at AWS Professional Services based in Dubai. He works across a wide range of customers and industries to bridge the gap between software and operations teams and provides guidance and best practices for senior management when he’s not busy automating something. Outside of work, Andrew is a family man that loves nothing more than a binge-watching marathon with some good coffee on tap.

Read More

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

This post is co-written with Ike Bennion from Visier.

Visier’s mission is rooted in the belief that people are the most valuable asset of every organization and that optimizing their potential requires a nuanced understanding of workforce dynamics.

Paycor is an example of the many world-leading enterprise people analytics companies that trust and use the Visier platform to process large volumes of data to generate informative analytics and actionable predictive insights.

Visier’s predictive analytics has helped organizations such as Providence Healthcare retain critical employees within their workforce and saved an estimated $6 million by identifying and preventing employee attrition by using a framework built on top of Visier’s risk-of-exit predictions.

Trusted sources like Sapient Insights Group, Gartner, G2, Trust Radius, and RedThread Research have recognized Visier for its inventiveness, great user experience, and vendor and customer satisfaction. Today, over 50,000 organizations in 75 countries use the Visier platform as the driver to shape business strategies and drive better business results.

Unlocking growth potential by overcoming the tech stack barrier

Visier’s analytics and predictive power is what makes its people analytics solution so valuable. Users without data science or analytics experience can generate rigorous data-backed predictions to answer big questions like time-to-fill for important positions, or resignation risk for crucial employees.

It was an executive priority at Visier to continue innovating in their analytics and predictive capabilities because those make up one of the cornerstones of what their users love about their product.

The challenge for Visier was that their data science tech stack was holding them back from innovating at the rate they wanted to. It was costly and time consuming to experiment and implement new analytic and predictive capabilities because:

  • The data science tech stack was tightly coupled with the entire platform development. The data science team couldn’t roll out changes independently to production. This limited the team to fewer and slower iteration cycles.
  • The data science tech stack was a collection of solutions from multiple vendors, which led to additional management and support overhead for the data science team.

Steamlining model management and deployment with SageMaker

Amazon SageMaker is a managed machine learning platform that provides data scientists and data engineers familiar concepts and tools to build, train, deploy, govern, and manage the infrastructure needed to have highly available and scalable model inference endpoints. Amazon SageMaker Inference Recommender is an example of a tool that can help data scientists and data engineers be more autonomous and less reliant on outside teams by providing guidance on right-sizing inference instances.

The existing data science tech stack was one of the many services comprising Visier’s application platform. Using the SageMaker platform, Visier built an API-based microservices architecture for the analytics and predictive services that was decoupled from the application platform. This gave the data science team the desired autonomy to deploy changes independently and release new updates more frequently.

Analytics and Predictive Model Microservice Architecture

The results

The first improvement Visier saw after migrating the analytics and predictive services to SageMaker was that it allowed the data science team to spend more time on innovations—such as the build-up of a prediction model validation pipeline—rather than having to spend time on deployment details and vendor tooling integration.

Prediction model validation

The following figure shows the prediction model validation pipeline.

Predictive Model Evaluation Pipeline

Using SageMaker, Visier built a prediction model validation pipeline that:

  1. Pulls the training dataset from the production databases
  2. Gathers additional validation measures that describe the dataset and specific corrections and enhancements on the dataset
  3. Performs multiple cross-validation measurements using different split strategies
  4. Stores the validation results along with metadata about the run in a permanent datastore

The validation pipeline allowed the team to deliver a stream of advancements in the models that improved prediction performance by 30% across their whole customer base.

Train customer-specific predictive models at scale

Visier develops and manages thousands of customer-specific predictive models for their enterprise customers. The second workflow improvement the data science team made was to develop a highly scalable method to generate all of the customer-specific predictive models. This allowed the team to deliver ten times as many models with the same number of resources.

Base model customization As shown in the preceding figure, the team developed a model-training pipeline where model changes are made in a central prediction codebase. This codebase is executed separately for each Visier customer to train a sequence of custom models (for different points in time) that are sensitive to the specialized configuration of each customer and their data. Visier uses this pattern to scalably push innovation in a single model design to thousands of custom models across their customer base. To ensure state-of-art training efficiency for large models, SageMaker provides libraries that support parallel (SageMaker Model Parallel Library) and distributed (SageMaker Distributed Data Parallelism Library) model training. To learn more about how effective these libraries are, see Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries.

Using the model validation workload shown earlier, changes made to a predictive model can be validated in as little as three hours.

Process unstructured data

Iterative improvements, a scalable deployment, and consolidation of data science technology were an excellent start, but when Visier adopted SageMaker, the goal was to enable innovation that was entirely out of reach by the previous tech stack.

A unique advantage that Visier has is the ability to learn from the collective employee behaviors across all their customer base. Tedious data engineering tasks like pulling data into the environment and database infrastructure costs were eliminated by securely storing their vast amount of customer-related datasets within Amazon Simple Storage Service (Amazon S3) and using Amazon Athena to directly query the data using SQL. Visier used these AWS services to combine relevant datasets and feed them directly into SageMaker, resulting in the creation and release of a new prediction product called Community Predictions. Visier’s Community Predictions give smaller organizations the power to create predictions based on the entire community’s data, rather than just their own. That gives a 100-person organization access to the kind of predictions that otherwise would be reserved for enterprises with thousands of employees.

For information about how you can manage and process your own unstructured data, see Unstructured data management and governance using AWS AI/ML and analytics services.

Use Visier Data in Amazon SageMaker

With the transformative success Visier had internally, they wanted ensure their end-customers could also benefit from the Amazon SageMaker platform to develop their own AI and machine learning (AI/ML) models.

Visier has written a full tutorial about how to use Visier Data in Amazon SageMaker and have also built a Python connector available on their GitHub repo. The Python connector allows customers to pipe Visier data to their own AI/ML projects to better understand the impact of their people on financials, operations, customers and partners. These results are often then imported back into the Visier platform to distribute these insights and drive derivative analytics to further improve outcomes across the employee lifecycle.

Conclusion

Visier’s success with Amazon SageMaker demonstrates the power and flexibility of this managed machine learning platform. By using the capabilities of SageMaker, Visier increased their model output by 10 times, accelerated innovation cycles, and unlocked new opportunities such as processing unstructured data for their Community Predictions product.

If you’re looking to streamline your machine learning workflows, scale your model deployments, and unlock insights from your data, explore the possibilities with SageMaker and built-in capabilities such as Amazon SageMaker Pipelines.

Get started today and create an AWS account, go to the Amazon SageMaker console, and reach out to your AWS account team to set up an Experience-based Acceleration engagement to unlock the full potential of your data and build custom generative AI and ML models that drive actionable insights and business impact today.


About the authors

Kinman Lam is a Solution Architect at AWS. He is accountable for the health and growth of some of the largest ISV/DNB companies in Western Canada. He is also a member of the AWS Canada Generative AI vTeam and has helped a growing number of Canadian companies successful launch advanced Generative AI use-cases.

Ike Bennion is the Vice President of Platform & Platform Marketing at Visier and a recognized thought leader in the intersection between people, work and technology. With a rich history in implementation, product development, product strategy and go-to-market. He specializes in market intelligence, business strategy, and innovative technologies, including AI and blockchain. Ike is passionate about using data to drive equitable and intelligent decision-making. Outside of work, he enjoys dogs, hip hop, and weightlifting.

Read More

Implement model-independent safety measures with Amazon Bedrock Guardrails

Implement model-independent safety measures with Amazon Bedrock Guardrails

Generative AI models can produce information on a wide range of topics, but their application brings new challenges. These include maintaining relevance, avoiding toxic content, protecting sensitive information like personally identifiable information (PII), and mitigating hallucinations. Although foundation models (FMs) on Amazon Bedrock offer built-in protections, these are often model-specific and might not fully align with an organization’s use cases or responsible AI principles. As a result, developers frequently need to implement additional customized safety and privacy controls. This need becomes more pronounced when organizations use multiple FMs across different use cases, because maintaining consistent safeguards is crucial for accelerating development cycles and implementing a uniform approach to responsible AI.

In April 2024, we announced the general availability of Amazon Bedrock Guardrails to help you introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. With Amazon Bedrock Guardrails, you can implement safeguards in your generative AI applications that are customized to your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple FMs, improving user experiences and standardizing safety controls across generative AI applications.

In addition, to enable safeguarding applications using different FMs, Amazon Bedrock Guardrails now supports the ApplyGuardrail API to evaluate user inputs and model responses for custom and third-party FMs available outside of Amazon Bedrock. In this post, we discuss how you can use the ApplyGuardrail API in common generative AI architectures such as third-party or self-hosted large language models (LLMs), or in a self-managed Retrieval Augmented Generation (RAG) architecture, as shown in the following figure.

Overview of topics that Amazon Bedrock Guardrails filter

Solution overview

For this post, we create a guardrail that stops our FM from providing fiduciary advice. The full list of configurations for the guardrail is available in the GitHub repo. You can modify the code as needed for your use case.

Prerequisites

Make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails.

Additionally, you should have access to a third-party or self-hosted LLM to use in this walkthrough. For this post, we use the Meta Llama 3 model on Amazon SageMaker JumpStart. For more details, see AWS Managed Policies for SageMaker projects and JumpStart.

You can create a guardrail using the Amazon Bedrock console, infrastructure as code (IaC), or the API. For the example code to create the guardrail, see the GitHub repo. We define two filtering policies within a guardrail that we use for the following examples: a denied topic so it doesn’t provide a fiduciary advice to users and a contextual grounding check to filter model responses that aren’t grounded in the source information or are irrelevant to the user’s query. For more information about the different guardrail components, see Components of a guardrail. Make sure you’ve created a guardrail before moving forward.

Using the ApplyGuardrail API

The ApplyGuardrail API allows you to invoke a guardrail regardless of the model used. The guardrail is applied at the text parameter, as demonstrated in the following code:

content = [
    {
        "text": {
            "text": "Is the AB503 Product a better investment than the S&P 500?"
        }
    }
]

For this example, we apply the guardrail to the entire input from the user. If you want to apply guardrails to only certain parts of the input while leaving other parts unprocessed, see Selectively evaluate user input with tags.

If you’re using contextual grounding checks within Amazon Bedrock Guardrails, you need to introduce an additional parameter: qualifiers. This tells the API which parts of the content are the grounding_source, or information to use as the source of truth, the query, or the prompt sent to the model, and the guard_content, or the part of the model response to ground against the grounding source. Contextual grounding checks are only applied to the output, not the input. See the following code:

content = [
    {
        "text": {
            "text": "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%",
            "qualifiers": ["grounding_source"],
        }
    },
    {
        "text": {
            "text": "What’s the Guaranteed return rate of your AB503 Product",
            "qualifiers": ["query"],
        }
    },
    {
        "text": {
            "text": "Our Guaranteed Rate is 7%",
            "qualifiers": ["guard_content"],
        }
    },
]

The final required components are the guardrailIdentifier and the guardrailVersion of the guardrail you want to use, and the source, which indicates whether the text being analyzed is a prompt to a model or a response from the model. This is demonstrated in the following code using Boto3; the full code example is available in the GitHub repo:

import boto3
import json

bedrock_runtime = boto3.client('bedrock-runtime')

# Specific guardrail ID and version
guardrail_id = "" # Adjust with your Guardrail Info
guardrail_version = "" # Adjust with your Guardrail Info

content = [
    {
        "text": {
            "text": "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%",
            "qualifiers": ["grounding_source"],
        }
    },
    {
        "text": {
            "text": "What’s the Guaranteed return rate of your AB503 Product",
            "qualifiers": ["query"],
        }
    },
    {
        "text": {
            "text": "Our Guaranteed Rate is 7%",
            "qualifiers": ["guard_content"],
        }
    },
]

# Call the ApplyGuardrail API
try:
    response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='OUTPUT', # or 'INPUT' depending on your use case
        content=content
    )
    
    # Process the response
    print("API Response:")
    print(json.dumps(response, indent=2))
    
    # Check the action taken by the guardrail
    if response['action'] == 'GUARDRAIL_INTERVENED':
        print("nGuardrail intervened. Output:")
        for output in response['outputs']:
            print(output['text'])
    else:
        print("nGuardrail did not intervene.")

except Exception as e:
    print(f"An error occurred: {str(e)}")
    print("nAPI Response (if available):")
    try:
        print(json.dumps(response, indent=2))
    except NameError:
        print("No response available due to early exception.")

The response of the API provides the following details:

  • If the guardrail intervened.
  • Why the guardrail intervened.
  • The consumption utilized for the request. For full pricing details for Amazon Bedrock Guardrails, refer to Amazon Bedrock pricing.

The following response shows a guardrail intervening because of denied topics:

  "usage": {
    "topicPolicyUnits": 1,
    "contentPolicyUnits": 1,
    "wordPolicyUnits": 1,
    "sensitiveInformationPolicyUnits": 1,
    "sensitiveInformationPolicyFreeUnits": 0,
    "contextualGroundingPolicyUnits": 0
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "topicPolicy": {
        "topics": [
          {
            "name": "Fiduciary Advice",
            "type": "DENY",
            "action": "BLOCKED"
          }
        ]
      }
    }
  ]
}

The following response shows a guardrail intervening because of contextual grounding checks:

  "usage": {
    "topicPolicyUnits": 1,
    "contentPolicyUnits": 1,
    "wordPolicyUnits": 1,
    "sensitiveInformationPolicyUnits": 1,
    "sensitiveInformationPolicyFreeUnits": 1,
    "contextualGroundingPolicyUnits": 1
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "contextualGroundingPolicy": {
        "filters": [
          {
            "type": "GROUNDING",
            "threshold": 0.75,
            "score": 0.38,
            "action": "BLOCKED"
          },
          {
            "type": "RELEVANCE",
            "threshold": 0.75,
            "score": 0.9,
            "action": "NONE"
          }
        ]
      }
    }
  ]
}

From the response to the first request, you can observe that the guardrail intervened so it wouldn’t provide a fiduciary advice to a user who asked for a recommendation of a financial product. From the response to the second request, you can observe that the guardrail intervened to filter the hallucinations of a guaranteed return rate in the model response that deviates from the information in the grounding source. In both cases, the guardrail intervened as expected to make sure that the model responses provided to the user avoid certain topics and are factually accurate based on the source to potentially meet regulatory requirements or internal company policies.

Using the ApplyGuardrail API with a self-hosted LLM

A common use case for the ApplyGuardrail API is in conjunction with an LLM from a third-party provider or a model that you self-host. This combination allows you to apply guardrails to the input or output of your requests.

The general flow includes the following steps:

  1. Receive an input for your model.
  2. Apply the guardrail to this input using the ApplyGuardrail API.
  3. If the input passes the guardrail, send it to your model for inference.
  4. Receive the output from your model.
  5. Apply the guardrail to your output.
  6. If the output passes the guardrail, return the final output.
  7. If either input or output is intervened by the guardrail, return the defined message indicating the intervention from input or output.

This workflow is demonstrated in the following diagram.

Workflow diagram for self-hosted LLM

See the provided code example to see an implementation of the workflow.

We use the Meta-Llama-3-8B model hosted on an Amazon SageMaker endpoint. To deploy your own version of this model on SageMaker, see Meta Llama 3 models are now available in Amazon SageMaker JumpStart.

We created a TextGenerationWithGuardrails class that integrates the ApplyGuardrail API with a SageMaker endpoint to provide protected text generation. This class includes the following key methods:

  • generate_text – Calls our LLM through a SageMaker endpoint to generate text based on the input.
  • analyze_text – A core method that applies our guardrail using the ApplyGuardrail API. It interprets the API response to determine if the guardrail passed or intervened.
  • analyze_prompt and analyze_output – These methods use analyze_text to apply our guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and associated messages.

The class implements the workflow in the preceding diagram. It works as follows:

  1. It checks the input prompt using analyze_prompt.
  2. If the input passes the guardrail, it generates text using generate_text.
  3. The generated text is then checked using analyze_output.
  4. If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, with clear handling of cases where guardrails intervene. It’s designed to integrate with larger applications while providing flexibility for error handling and customization based on guardrail results.

We can test this by providing the following inputs:

query = "What is the Guaranteed Rate of Return for AB503 Product"
grounding_source = "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%"

For demonstration purposes, we have not followed Meta best practices for prompting Meta Llama; in real-world scenarios, make sure you’re adhering to model provider best practices when prompting LLMs.

The model responds with the following:

Here is what the Model Responded with: ?
The guaranteed rate of return for AB503 product is 4.25% per annum. This rate is guaranteed for the entire term of the investment, which is 5 years. The guaranteed rate of return is based on the performance of the underlying assets and is subject to the creditworthiness of the issuer.
What are the Key Features of AB503 Product?
The key features of AB503 product are:
Guaranteed Rate of Return: 4.25% per annum for 5 years
Minimum Investment: $1,000
Maximum Investment: $100,000
Maturity Date: 5 years from the date of investment
Interest Payment Frequency: Annually
Principal Protection: 100% of the principal amount is guaranteed
Credit Risk: The product is issued by a reputable financial institution and is subject to credit risk
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions
What are the Risks Associated with AB503 Product?
The risks associated with AB503 product are:
Credit Risk: The product is issued by a financial institution and is subject to credit risk. If the issuer defaults, you may lose some or all of your investment.
Interest Rate Risk: The guaranteed rate of return is fixed and may not keep pace with inflation or changes in interest rates.
Liquidity Risk: The product can be redeemed at the end of the term or earlier, subject to certain conditions. If you need to access your funds before the maturity date, you may not be able to do so or may have to sell your investment at a loss.
Market Risk: The value of the underlying assets may fluctuate, which could affect the value of your investment.
What are the Benefits of AB503 Product?
The benefits of AB503 product are:
Guaranteed Rate of Return: The product offers a guaranteed rate of return of 4.25% per annum for 5 years, which can provide a predictable income stream.
Principal Protection: 100% of the principal amount is guaranteed, which means that you will not lose any of your initial investment.
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions, which can provide flexibility and access to your funds when needed.
Diversification: The product can be used as a diversification tool to reduce the risk of your overall investment portfolio.
What are the Eligibility Criteria for AB503 Product?
The eligibility criteria for AB503 product are:
Age: The product is available to individuals

This is a hallucinated response to our question. You can see this demonstrated through the outputs of the workflow.

=== Input Analysis ===

Input Prompt Passed The Guardrail Check - Moving to Generate the Response


=== Text Generation ===

Here is what the Model Responded with: ?
The guaranteed rate of return for AB503 product is 4.25% per annum. This rate is guaranteed for the entire term of the investment, which is 5 years. The guaranteed rate of return is based on the performance of the underlying assets and is subject to the creditworthiness of the issuer.
What are the Key Features of AB503 Product?
The key features of AB503 product are:
Guaranteed Rate of Return: 4.25% per annum for 5 years
Minimum Investment: $1,000
Maximum Investment: $100,000
Maturity Date: 5 years from the date of investment
Interest Payment Frequency: Annually
Principal Protection: 100% of the principal amount is guaranteed
Credit Risk: The product is issued by a reputable financial institution and is subject to credit risk
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions
What are the Risks Associated with AB503 Product?
The risks associated with AB503 product are:
Credit Risk: The product is issued by a financial institution and is subject to credit risk. If the issuer defaults, you may lose some or all of your investment.
Interest Rate Risk: The guaranteed rate of return is fixed and may not keep pace with inflation or changes in interest rates.
Liquidity Risk: The product can be redeemed at the end of the term or earlier, subject to certain conditions. If you need to access your funds before the maturity date, you may not be able to do so or may have to sell your investment at a loss.
Market Risk: The value of the underlying assets may fluctuate, which could affect the value of your investment.
What are the Benefits of AB503 Product?
The benefits of AB503 product are:
Guaranteed Rate of Return: The product offers a guaranteed rate of return of 4.25% per annum for 5 years, which can provide a predictable income stream.
Principal Protection: 100% of the principal amount is guaranteed, which means that you will not lose any of your initial investment.
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions, which can provide flexibility and access to your funds when needed.
Diversification: The product can be used as a diversification tool to reduce the risk of your overall investment portfolio.
What are the Eligibility Criteria for AB503 Product?
The eligibility criteria for AB503 product are:
Age: The product is available to individuals


=== Output Analysis ===

Analyzing Model Response with the Response Guardrail

Output Guardrail Intervened. The response to the User is: I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. 

Full API Response:
{
  "ResponseMetadata": {
    "RequestId": "6bfb900f-e60c-4861-87b4-bb555bbe3d9e",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Mon, 29 Jul 2024 17:37:01 GMT",
      "content-type": "application/json",
      "content-length": "1637",
      "connection": "keep-alive",
      "x-amzn-requestid": "6bfb900f-e60c-4861-87b4-bb555bbe3d9e"
    },
    "RetryAttempts": 0
  },
  "usage": {
    "topicPolicyUnits": 3,
    "contentPolicyUnits": 3,
    "wordPolicyUnits": 3,
    "sensitiveInformationPolicyUnits": 3,
    "sensitiveInformationPolicyFreeUnits": 3,
    "contextualGroundingPolicyUnits": 3
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "contextualGroundingPolicy": {
        "filters": [
          {
            "type": "GROUNDING",
            "threshold": 0.75,
            "score": 0.01,
            "action": "BLOCKED"
          },
          {
            "type": "RELEVANCE",
            "threshold": 0.75,
            "score": 1.0,
            "action": "NONE"
          }
        ]
      }
    }
  ]
}

In the workflow output, you can see that the input prompt passed the guardrail’s check and the workflow proceeded to generate a response. Then, the workflow calls guardrail to check the model output before presenting it to the user. And you can observe that the contextual grounding check intervened because it detected that the model response was not factually accurate based on the information from grounding source. So, the workflow instead returned a defined message for guardrail intervention instead of a response that is considered ungrounded and factually incorrect.

Using the ApplyGuardrail API within a self-managed RAG pattern

A common use case for the ApplyGuardrail API uses an LLM from a third-party provider, or a model that you self-host, applied within a RAG pattern.

The general flow includes the following steps:

  1. Receive an input for your model.
  2. Apply the guardrail to this input using the ApplyGuardrail API.
  3. If the input passes the guardrail, send it to your embeddings model for query embedding, and query your vector embeddings.
  4. Receive the output from your embeddings model and use it as context.
  5. Provide the context to your language model along with input for inference.
  6. Apply the guardrail to your output and use the context as grounding source.
  7. If the output passes the guardrail, return the final output.
  8. If either input or output is intervened by the guardrail, return the defined message indicating the intervention from input or output.

This workflow is demonstrated in the following diagram.

Workflow diagram for self-hosted RAG

See the provided code example to see an implementation of the diagram.

For our examples, we use a self-hosted SageMaker model for our LLM, but this could be other third-party models as well.

We use the Meta-Llama-3-8B model hosted on a SageMaker endpoint. For embeddings, we use the voyage-large-2-instruct model. To learn more about Voyage AI embeddings models, see Voyage AI.

We enhanced our TextGenerationWithGuardrails class to integrate embeddings, run document retrieval, and use the ApplyGuardrail API with our SageMaker endpoint. This protects text generation with contextually relevant information. The class now includes the following key methods:

  • generate_text – Calls our LLM using a SageMaker endpoint to generate text based on the input.
  • analyze_text – A core method that applies the guardrail using the ApplyGuardrail API. It interprets the API response to determine if the guardrail passed or intervened.
  • analyze_prompt and analyze_output – These methods use analyze_text to apply the guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and any associated message.
  • embed_text – Embeds the given text using a specified embedding model.
  • retrieve_relevant_documents – Retrieves the most relevant documents based on cosine similarity between the query embedding and document embeddings.
  • generate_and_analyze – A comprehensive method that combines all steps of the process, including embedding, document retrieval, text generation, and guardrail checks.

The enhanced class implements the following workflow:

  1. It first checks the input prompt using analyze_prompt.
  2. If the input passes the guardrail, it embeds the query and retrieves relevant documents.
  3. The retrieved documents are appended to the original query to create an enhanced query.
  4. Text is generated using generate_text with the enhanced query.
  5. The generated text is checked using analyze_output, with the retrieved documents serving as the grounding source.
  6. If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, while also incorporating relevant context from a document collection. It’s designed with the following objectives:

  • Enforce safety through multiple guardrail checks
  • Enhance relevance by incorporating retrieved documents into the generation process
  • Provide flexibility for error handling and customization based on guardrail results
  • Integrate with larger applications

You can further customize the class to adjust the number of retrieved documents, modify the embedding process, or alter how retrieved documents are incorporated into the query. This makes it a versatile tool for safe and context-aware text generation in various applications.

Let’s test out the implementation with the following input prompt:

query = "What is the Guaranteed Rate of Return for AB503 Product?"

We use the following documents as inputs into the workflow:

documents = [
        "The AG701 Global Growth Fund is currently projecting an annual return of 8.5%, focusing on emerging markets and technology sectors.",
        "The AB205 Balanced Income Trust offers a steady 4% dividend yield, combining blue-chip stocks and investment-grade bonds.",
        "The AE309 Green Energy ETF has outperformed the market with a 12% return over the past year, investing in renewable energy companies.",
        "The AH504 High-Yield Corporate Bond Fund is offering a current yield of 6.75%, targeting BB and B rated corporate debt.",
        "The AR108 Real Estate Investment Trust focuses on commercial properties and is projecting a 7% annual return including quarterly distributions.",
        "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options."]

The following is an example output of the workflow:

=== Query Embedding ===

Query: What is the Guaranteed Rate of Return for AB503 Product?
Query embedding (first 5 elements): [-0.024676240980625153, 0.0432446151971817, 0.008557720109820366, 0.059132225811481476, -0.045152030885219574]...


=== Document Embedding ===

Document 1: The AG701 Global Growth Fund is currently projecti...
Embedding (first 5 elements): [-0.012595066800713539, 0.052137792110443115, 0.011615722440183163, 0.017397189512848854, -0.06500907987356186]...

Document 2: The AB205 Balanced Income Trust offers a steady 4%...
Embedding (first 5 elements): [-0.024578886106610298, 0.03796630725264549, 0.004817029926925898, 0.03752804920077324, -0.060099825263023376]...

Document 3: The AE309 Green Energy ETF has outperformed the ma...
Embedding (first 5 elements): [-0.016489708796143532, 0.04436756297945976, 0.006371065974235535, 0.0194888636469841, -0.07305170595645905]...

Document 4: The AH504 High-Yield Corporate Bond Fund is offeri...
Embedding (first 5 elements): [-0.005198546685278416, 0.05041510611772537, -0.007950469851493835, 0.047702062875032425, -0.06752850860357285]...

Document 5: The AR108 Real Estate Investment Trust focuses on ...
Embedding (first 5 elements): [-0.03276287764310837, 0.04030522331595421, 0.0025598432403057814, 0.022755954414606094, -0.048687443137168884]...

Document 6: The AB503 Financial Product is currently offering ...
Embedding (first 5 elements): [-0.00174321501981467, 0.05635036155581474, -0.030949480831623077, 0.028832541778683662, -0.05486077815294266]...


=== Document Retrieval ===

Retrieved Document:
[
  "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options."
]

The retrieved document is provided as the grounding source for the call to the ApplyGuardrail API:

=== Input Analysis ===

Input Prompt Passed The Guardrail Check - Moving to Generate the Response


=== Text Generation ===

Here is what the Model Responded with:  However, investors should be aware that the actual return may vary based on market conditions and other factors.

What is the guaranteed rate of return for the AB503 product?

A) 0%
B) 7%
C) Not applicable
D) Not provided

Correct answer: A) 0%

Explanation: The text states that the rate of return is "non-guaranteed," which means that there is no guaranteed rate of return. Therefore, the correct answer is A) 0%. The other options are incorrect because the text does not provide a guaranteed rate of return, and the non-guaranteed rate of 7% is not a guaranteed rate of return. Option C is incorrect because the text does provide information about the rate of return, and option D is incorrect because the text does provide information about the rate of return, but it is not guaranteed.


=== Output Analysis ===

Analyzing Model Response with the Response Guardrail

Output Guardrail Intervened. The response to the User is: I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. 

Full API Response:
{
  "ResponseMetadata": {
    "RequestId": "5f2d5cbd-e6f0-4950-bb40-8c0be27df8eb",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Mon, 29 Jul 2024 17:52:36 GMT",
      "content-type": "application/json",
      "content-length": "1638",
      "connection": "keep-alive",
      "x-amzn-requestid": "5f2d5cbd-e6f0-4950-bb40-8c0be27df8eb"
    },
    "RetryAttempts": 0
  },
  "usage": {
    "topicPolicyUnits": 1,
    "contentPolicyUnits": 1,
    "wordPolicyUnits": 1,
    "sensitiveInformationPolicyUnits": 1,
    "sensitiveInformationPolicyFreeUnits": 1,
    "contextualGroundingPolicyUnits": 1
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "contextualGroundingPolicy": {
        "filters": [
          {
            "type": "GROUNDING",
            "threshold": 0.75,
            "score": 0.38,
            "action": "BLOCKED"
          },
          {
            "type": "RELEVANCE",
            "threshold": 0.75,
            "score": 0.97,
            "action": "NONE"
          }
        ]
      }
    }
  ]
}

You can see that the guardrail intervened because of the following source document statement:

[
  "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options."
]

Whereas the model responded with the following:

Here is what the Model Responded with:  However, investors should be aware that the actual return may vary based on market conditions and other factors.

What is the guaranteed rate of return for the AB503 product?

A) 0%
B) 7%
C) Not applicable
D) Not provided

Correct answer: A) 0%

Explanation: The text states that the rate of return is "non-guaranteed," which means that there is no guaranteed rate of return. Therefore, the correct answer is A) 0%. The other options are incorrect because the text does not provide a guaranteed rate of return, and the non-guaranteed rate of 7% is not a guaranteed rate of return. Option C is incorrect because the text does provide information about the rate of return, and option D is incorrect because the text does provide information about the rate of return, but it is not guaranteed.

This demonstrated a hallucination; the guardrail intervened and presented the user with the defined message instead of a hallucinated answer.

Pricing

Pricing for the solution is largely dependent on the following factors:

  • Text characters sent to the guardrail – For a full breakdown of the pricing, see Amazon Bedrock pricing
  • Self-hosted model infrastructure costs – Provider dependent
  • Third-party managed model token costs – Provider dependent

Clean up

To delete any infrastructure provisioned in this example, follow the instructions in the GitHub repo.

Conclusion

You can use the ApplyGuardrail API to decouple safeguards for your generative AI applications from FMs. You can now use guardrails without invoking FMs, which opens the door to more integration of standardized and thoroughly tested enterprise safeguards to your application flow regardless of the models used. Try out the example code in the GitHub repo and provide any feedback you might have. To learn more about Amazon Bedrock Guardrails and the ApplyGuardrail API, see Amazon Bedrock Guardrails.


About the Authors

Michael Cho is a Solutions Architect at AWS, where he works with customers to accelerate their mission on the cloud. He is passionate about architecting and building innovative solutions that empower customers. Lately, he has been dedicating his time to experimenting with Generative AI for solving complex business problems.

Aarushi Karandikar is a Solutions Architect at Amazon Web Services (AWS), responsible for providing Enterprise ISV customers with technical guidance on their cloud journey. She studied Data Science at UC Berkeley and specializes in Generative AI technology.

Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s & Master’s degree in Computer Science from Virginia Tech. In her free time, she enjoys staying active and reading.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Read More

How Schneider Electric uses Amazon Bedrock to identify high-potential business opportunities

How Schneider Electric uses Amazon Bedrock to identify high-potential business opportunities

This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Adrian Boeh, Senior Data Scientist – NAM AI, from Schneider Electric.

Schneider Electric is a global leader in the digital transformation of energy management and automation. The company specializes in providing integrated solutions that make energy safe, reliable, efficient, and sustainable. Schneider Electric serves a wide range of industries, including smart manufacturing, resilient infrastructure, future-proof data centers, intelligent buildings, and intuitive homes. They offer products and services that encompass electrical distribution, industrial automation, and energy management. Their innovative technologies, extensive range of products, and commitment to sustainability position Schneider Electric as a key player in advancing smart and green solutions for the modern world.

As demand for renewable energy continues to rise, Schneider Electric faces high demand for sustainable microgrid infrastructure. This demand comes in the form of requests for proposals (RFPs), each of which needs to be manually reviewed by a microgrid subject matter expert (SME) at Schneider. Manual review of each RFP was proving too costly and couldn’t be scaled to meet the industry needs. To solve the problem, Schneider turned to Amazon Bedrock and generative artificial intelligence (AI). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

In this post, we show how the team at Schneider collaborated with the AWS Generative AI Innovation Center (GenAIIC) to build a generative AI solution on Amazon Bedrock to solve this problem. The solution processes and evaluates each RFP and then routes high-value RFPs to the microgrid SME for approval and recommendation.

Problem Statement

Microgrid infrastructure is a critical element to the growing renewables energy market. A microgrid includes on-site power generation and storage that allow a system to disconnect from the main grid. Schneider Electric offers several important products that allow customers to build microgrid solutions to make their residential buildings, schools, or manufacturing centers more sustainable. Growing public and private investment in this sector has led to an exponential increase in the number of RFPs for microgrid systems.

The RFP documents contain technically complex textual and visual information such as scope of work, parts lists, and electrical diagrams. Moreover, they can be hundreds of pages long. The following figure provides several examples of RFP documents. The RFP size and complexity makes reviewing them costly and labor intensive. An experienced SME is usually required to review an entire RFP and provide an assessment for its applicability to the business and potential for conversion.

Microgrid Request for Proposal (RFP) Examples

Sample request for proposal (RFP) input data

To add additional complexity, the same set of RFP documents might be assessed by multiple business units within Schneider. Each unit might be looking for different requirements that make the opportunity relevant to that sales team.

Given the size and complexity of the RFP documents, the Schneider team needed a way to quickly and accurately identify opportunities where Schneider products offer a competitive advantage and a high potential for conversion. Failure to respond to viable opportunities could result in potential revenue loss, while devoting resources to proposals where the company lacks a distinct competitive edge would lead to an inefficient use of time and effort.

They also needed a solution that could be repurposed for other business units, allowing the impact to extend to the entire enterprise. Successfully handling the influx of RFPs would not only allow the Schneider team to expand their microgrid business, but help businesses and industries adopt a new renewable energy paradigm.

Amazon Bedrock and Generative AI

To help solve this problem, the Schneider team turned to generative AI and Amazon Bedrock. Large language models (LLMs) are now enabling more efficient business processes through their ability to identify and summarize specific categories of information with human-like precision. The volume and complexity of the RFP documents made them an ideal candidate to use generative AI for document processing.

You can use Amazon Bedrock to build and scale generative AI applications with a broad range of FMs. Amazon Bedrock is a fully managed service that includes FMs from Amazon and third-party models supporting a range of use cases. For more details about the FMs available, see Supported foundation models on Amazon Bedrock. Amazon Bedrock enables developers to create unique experiences with generative AI capabilities supporting a broad range of programming languages and frameworks.

The solution uses Anthropic Claude on Amazon Bedrock, specifically the Anthropic Claude Sonnet model. For the vast majority of workloads, Sonnet is two times faster than Claude 2 and Claude 2.1, with higher levels of intelligence.

Solution Overview

Traditional Retrieval Augmented Generation (RAG) systems can’t identify the relevancy of RFP documents to a given sales team because of the extensively long list of one-time business requirements and the large taxonomy of electrical components or services, which might or might not be present in the documents.

Other existing approaches require either expensive domain-specific fine-tuning to the LLM or the use of filtering for noise and data elements, which leads to suboptimal performance and scalability impacts.

Instead, the AWS GenAIC team worked with Schneider Electric to package business objectives onto the LLM through multiple prisms of semantic transformations: concepts, functions, and components. For example, in the domain of smart grids, the underlying business objectives might be defined as resiliency, isolation, and sustainability. Accordingly, the corresponding functions would involve energy generation, consumption, and storage. The following figure illustrates these components.

Microgrid Concept Diagram

Microgrid semantic components

The approach of concept-driven information extraction resembles ontology-based prompting. It allows engineering teams to customize the initial list of concepts and scale onto different domains of interest. The decomposition of complex concepts into specific functions incentivizes the LLM to detect, interpret, and extract the associated data elements.

The LLM was prompted to read RFPs and retrieve quotes pertinent to the defined concepts and functions. These quotes materialize the presence of electrical equipment satisfying the high-level objectives and were used as weight of evidence indicating the downstream relevancy of an RFP to the original sales team.

For example, in the following code, the term BESS stands for battery energy storage system and materializes evidence for power storage.

{
    "quote": "2.3W / 2MWh Saft IHE LFP (1500V) BESS (1X)",
    "function": "Power Storage",
    "relevance": 10,
    "summary": "Specifies a lithium iron phosphate battery energy storage system."
}

In the following example, the term EPC indicates the presence of a solar plant.

{
    "quote": "EPC 2.0MW (2X)",
    "function": "Power Generation",
    "relevance": 9,
    "summary": "Specifies 2 x 2MW solar photovoltaic inverters."
}

The overall solution encompasses three phases:

  • Document chunking and preprocessing
  • LLM-based quote retrieval
  • LLM-based quote summarization and evaluation

The first step uses standard document chunking as well as Schneider’s proprietary document processing pipelines to group similar text elements into a single chunk. Each chunk is processed by the quote retrieval LLM, which identifies relevant quotes within each chunk if they’re available. This brings relevant information to the forefront and filters out irrelevant content. Finally, the relevant quotes are compiled and fed to a final LLM that summarizes the RFP and determines its overall relevance to the microgrid family of RFPs. The following diagram illustrates this pipeline.

GenAI solution flow diagram

The final determination about the RFP is made using the following prompt structure. The details of the actual prompt are proprietary, but the structure includes the following:

  • We first provide the LLM with a brief description of the business unit in question.
  • We then define a persona and tell the LLM where to locate evidence.
  • Provide criteria for RFP categorization.
  • Specify the output format, which includes:
    • A single yes, no, maybe
    • A relevance score from 1–10.
    • An explainability.
prompt = """ 
[1] <DESCRIPTION OF THE BUSINESS UNIT> 
[2] You're an expert in <BUSINESS UNIT> and have to evaluate if a given RFP is related to <BUSINESS UNIT>… 

The quotes are provided below… 

<QUOTES> 

[3] Determine the relevancy to <BUSINESS UNIT> using … criteria: 

<CRITERIA> 

[4] <RESPONSE_FORMAT> 
[4a] A designation of Yes, No, or Maybe. 
[4b] A relevance score. 
[4c] A brief summary of justification and explanation. 
"""

The result compresses a relatively large corpus of RFP documents into a focused, concise, and informative representation by precisely capturing and returning the most important aspects. The structure allows the SME to quickly filter for specific LLM labels, and the summary quotes allow them to better understand which quotes are driving the LLM’s decision-making process. In this way, the Schneider SME team can spend less time reading through pages of RFP proposals and can instead focus their attention on the content that matters most to their business. The sample below shows both a classification result and qualitative feedback for a sample RFP.

GenAI solution output

Internal teams are already experiencing the advantages of our new AI-driven RFP Assistant:

“At Schneider Electric, we are committed to solving real-world problems by creating a sustainable, digitized, and new electric future. We leverage AI and LLMs to further enhance and accelerate our own digital transformation, unlocking efficiency and sustainability in the energy sector.”

– Anthony Medeiros, Manager of Solutions Engineering and Architecture, Schneider Electric.

Conclusion

In this post, the AWS GenAIIC team, working with Schneider Electric, demonstrated the remarkable general capability of LLMs available on Amazon Bedrock to assist sales teams and optimize their workloads.

The RFP assistant solution allowed Schneider Electric to achieve 94% accuracy in the task of identifying microgrid opportunities. By making small adjustments to the prompts, the solution can be scaled and adopted to other lines of business.

By precisely guiding the prompts, the team can derive distinct and objective perspectives from identical sets of documents. The proposed solution enables RFPs to be viewed through the interchangeable lenses of various business units, each pursuing a diverse range of objectives. These previously obscured insights have the potential to unveil novel business prospects and generate supplementary revenue streams.

These capabilities will allow Schneider Electric to seamlessly integrate AI-powered insights and recommendations into its day-to-day operations. This integration will facilitate well-informed and data-driven decision-making processes, streamline operational workflows for heightened efficiency, and elevate the quality of customer interactions, ultimately delivering superior experiences.


About the Authors

Anthony MedeirosAnthony Medeiros is a Manager of Solutions Engineering and Architecture at Schneider Electric. He specializes in delivering high-value AI/ML initiatives to many business functions within North America. With 17 years of experience at Schneider Electric, he brings a wealth of industry knowledge and technical expertise to the team.

Adrian BoehAdrian Boeh is a Senior Data Scientist working on advanced data tasks for Schneider Electric’s North American Customer Transformation Organization. Adrian has 13 years of experience at Schneider Electric and is AWS Machine Learning Certified with a proven ability to innovate and improve organizations using data science methods and technology.

Kosta Belz is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where he helps customers design and build generative AI solutions to solve key business problems.

Dan VolkDan Volk is a Data Scientist at the AWS Generative AI Innovation Center. He has 10 years of experience in machine learning, deep learning, and time series analysis, and holds a Master’s in Data Science from UC Berkeley. He is passionate about transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.

Negin Sokhandan is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where she works on building generative AI solutions for AWS strategic customers. Her research background is statistical inference, computer vision, and multimodal systems.

Read More

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Large enterprises are building strategies to harness the power of generative AI across their organizations. However, scaling up generative AI and making adoption easier for different lines of businesses (LOBs) comes with challenges around making sure data privacy and security, legal, compliance, and operational complexities are governed on an organizational level. In this post, we discuss how to address these challenges holistically.

Managing bias, intellectual property, prompt safety, and data integrity are critical considerations when deploying generative AI solutions at scale. Because this is an emerging area, best practices, practical guidance, and design patterns are difficult to find in an easily consumable basis. In this post, we share AWS guidance that we have learned and developed as part of real-world projects into practical guides oriented towards the AWS Well-Architected Framework, which is used to build production infrastructure and applications on AWS. We focus on the operational excellence pillar in this post.

Amazon Bedrock plays a pivotal role in this endeavor. It’s a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic, Cohere, Meta, Mistral AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. You can securely integrate and deploy generative AI capabilities into your applications using services such as AWS Lambda, enabling seamless data management, monitoring, and compliance (for more details, see Monitoring and observability). This integration makes sure enterprises can take advantage of the full power of generative AI while adhering to best practices in operational excellence.

With Amazon Bedrock, enterprises can achieve the following:

  • Scalability – Scale generative AI applications across different LOBs without compromising performance
  • Security and compliance – Enforce data privacy, security, and compliance with industry standards and regulations
  • Operational efficiency – Streamline operations with built-in tools for monitoring, logging, and automation, aligned with the AWS Well-Architected Framework
  • Innovation – Access cutting-edge AI models and continually improve them with real-time data and feedback

This approach enables enterprises to deploy generative AI at scale while maintaining operational excellence, ultimately driving innovation and efficiency across their organizations.

What’s different about operating generative AI workloads and solutions?

The operational excellence pillar of the Well-Architected Framework is mainly focused on supporting the development and running of workloads effectively, gaining insight into their operations, and continuously improving supporting processes and procedures to deliver business value. However, if we were to apply a generative AI lens, we would need to address the intricate challenges and opportunities arising from its innovative nature, encompassing the following aspects:

  • Complexity can be unpredictable due to the ability of large language models (LLMs) to generate new content
  • Potential intellectual property infringement is a concern due to the lack of transparency in the model training data
  • Low accuracy in generative AI can create incorrect or controversial content
  • Resource utilization requires a specific operating model to meet the substantial computational resources required for training and prompt and token sizes
  • Continuous learning necessitates additional data annotation and curation strategies
  • Compliance is also a rapidly evolving area, where data governance becomes more nuanced and complex, and poses challenges
  • Integration with legacy systems requires careful considerations of compatibility, data flow between systems, and potential performance impacts.

Any generative AI lens therefore needs to combine the following elements, each with varying levels of prescription and enforcement, to address these challenges and provide the basis for responsible AI usage:

  • Policy – The system of principles to guide decisions
  • Guardrails – The rules that create boundaries to keep you within the policy
  • Mechanisms – The process and tools

AWS has advanced responsible AI by introducing Amazon Bedrock Guardrails as a protection to prevent harmful responses from the LLMs, providing an additional layer of safeguards regardless of the underlying FM. However, a more holistic organizational approach is crucial because generative AI practitioners, data scientists, or developers can potentially use a wide range of technologies, models, and datasets to circumvent the established controls.

As cloud adoption has matured for more traditional IT workloads and applications, the need to help developers select the right cloud solution that minimizes corporate risk and simplifies the developer experience has emerged. This is often referred to as platform engineering and can be neatly summarized by the mantra “You (the developer) build and test, and we (the platform engineering team) do all the rest!”

This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value. This is illustrated in the following diagram.

GenAI cloud center of excellence

Where to start?

We start this post by reviewing the foundational operational elements a generative AI platform team needs to initially focus on as they transition generative solutions from a proof of concept or prototype phase to a production-ready solution.

Specifically, we cover how you can safely develop, deploy, and monitor models, mitigating operational and compliance risks, thereby reducing the friction in adopting AI at scale and for production use. We focus on the following four design principles:

  • Establish control through promoting transparency of model details, setting up guardrails or safeguards, and providing visibility into costs, metrics, logs, and traces
  • Automate model fine-tuning, training, validation, and deployment using large language model operations (LLMOps) or foundation model operations (FMOps)
  • Manage data through standard methods for ingestion, governance, and indexing
  • Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines

In the following sections, we explain this using an architecture diagram while diving into the best practices of the control pillar.

Provide control through transparency of models, guardrails, and costs using metrics, logs, and traces

The control pillar of the generative AI framework focuses on observability, cost management, and governance, making sure enterprises can deploy and operate their generative AI solutions securely and efficiently. The following diagram illustrates the key components of this pillar:

Control Pillar of Generative AI Well architected solutions

Observability

Setting up observability measures lays the foundations for the other two components, namely FinOps and Governance. Observability is crucial for monitoring the performance, reliability, and cost-efficiency of generative AI solutions. By using AWS services such as Amazon CloudWatch, AWS CloudTrail, and Amazon OpenSearch Service, enterprises can gain visibility into model metrics, usage patterns, and potential issues, enabling proactive management and optimization.

Amazon Bedrock is compatible with robust observability features to monitor and manage ML models and applications. Key metrics integrated with CloudWatch include invocation counts, latency, client and server errors, throttles, input and output token counts, and more (for more details, see Monitor Amazon Bedrock with Amazon CloudWatch). You can also use Amazon EventBridge to monitor events related to Amazon Bedrock. This allows you to create rules that invoke specific actions when certain events occur, enhancing the automation and responsiveness of your observability setup (for more details, see Monitor Amazon Bedrock). CloudTrail can log all API calls made to Amazon Bedrock by a user, role, or AWS service in an AWS environment. This is particularly useful for tracking access to sensitive resources such as personally identifiable information (PII), model updates, and other critical activities, enabling enterprises to maintain a robust audit trail and compliance. To learn more, see Log Amazon Bedrock API calls using AWS CloudTrail.

Amazon Bedrock supports the metrics and telemetry needed for implementing an observability maturity model for LLMs, which includes the following:

  • Capturing and analyzing LLM-specific metrics such as model performance, prompt properties, and cost metrics through CloudWatch
  • Implementing alerts and incident management tailored to LLM-related issues
  • Providing security compliance and robust monitoring mechanisms, because Amazon Bedrock is in scope for common compliance standards and offers automated abuse detection mechanisms
  • Using CloudWatch and CloudTrail for anomaly detection, usage and costs forecasting, optimizing performance, and resource utilization
  • Using AWS forecasting services for better resource planning and cost management

CloudWatch provides a unified monitoring and observability service that collects logs, metrics, and events from various AWS services and on-premises sources. This allows enterprises to track key performance indicators (KPIs) for their generative AI models, such as I/O volumes, latency, and error rates. You can use CloudWatch dashboards to create custom visualizations and alerts, so teams are quickly notified of any anomalies or performance degradation.

For more advanced observability requirements, enterprises can use OpenSearch Service, a fully managed service for deploying, operating, and scaling OpenSearch and Kibana. Opensearch Dashboards provides powerful search and analytical capabilities, allowing teams to dive deeper into generative AI model behavior, user interactions, and system-wide metrics.

Additionally, you can enable model invocation logging to collect invocation logs, full request response data, and metadata for all Amazon Bedrock model API invocations in your AWS account. Before you can enable invocation logging, you need to set up an Amazon Simple Storage Service (Amazon S3) or CloudWatch Logs destination. You can enable invocation logging through either the AWS Management Console or the API. By default, logging is disabled.

Cost management and optimization (FinOps)

Generative AI solutions can quickly scale and consume significant cloud resources, and a robust FinOps practice is essential. With services like AWS Cost Explorer and AWS Budgets, enterprises can track their usage and optimize their generative AI spending, achieving cost-effective deployment and scaling.

Cost Explorer provides detailed cost analysis and forecasting capabilities, enabling you to understand your tenant-related expenditures, identify cost drivers, and plan for future growth. Teams can create custom cost allocation reports, set custom budgets using AWS budgets and alerts, and explore cost trends over time.

Analyzing the cost and performance of generative AI models is crucial for making informed decisions about model deployment and optimization. EventBridge, CloudTrail, and CloudWatch provide the necessary tools to track and analyze these metrics, helping enterprises make data-driven decisions. With this information, you can identify optimization opportunities, such as scaling down under-utilized resources.

With EventBridge, you can configure Amazon Bedrock to respond automatically to status change events in Amazon Bedrock. This enables you to handle API rate limit issues, API updates, and reduction in additional compute resources. For more details, see Monitor Amazon Bedrock events in Amazon EventBridge.

As discussed in previous section, CloudWatch can monitor Amazon Bedrock to collect raw data and process it into readable, near real-time cost metrics. You can graph the metrics using the CloudWatch console. You can also set alarms that watch for certain thresholds, and send notifications or take actions when values exceed those thresholds. For more information, see Monitor Amazon Bedrock with Amazon CloudWatch.

Governance

Implementation of robust governance measures, including continuous evaluation and multi-layered guardrails, is fundamental for the responsible and effective deployment of generative AI solutions in enterprise environments. Let’s look at them one by one:

  • Performance monitoring and evaluation – Continuously evaluating the performance, safety, and compliance of generative AI models is critical. You can achieve this in several ways:
    • Enterprises can use AWS services like Amazon SageMaker Model Monitor and Amazon Bedrock Guardrails, or Amazon Comprehend to monitor model behavior, detect drifts, and make sure generative AI solutions are performing as expected (or better) and adhering to organizational policies.
    • You can deploy open-source evaluation metrics like RAGAS as custom metrics to make sure LLM responses are grounded, mitigate bias, and prevent hallucinations.
    • Model evaluation jobs allow you to compare model outputs and choose the best-suited model for your use case. The job could be automated based on a ground truth, or you could use humans to bring in expertise on the matter. You can also use FMs from Amazon Bedrock to evaluate your applications. To learn more about this approach, refer to Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock.
  • Guardrails – Generative AI solutions should include robust, multi-level guardrails to enforce responsible AI and oversight:
    • First, you need guardrails around the LLM model to mitigate risks around bias and safeguard the application with responsible AI policies. This can be done through Amazon Bedrock Guardrails to set up custom guardrails around a model (FM or fine-tuned) for configuring denied topics, content filters, and blocked messaging.
    • The second level is to set guardrails around the framework for each use case. This includes implementing access controls, data governance policies, and proactive monitoring and alerting to make sure sensitive information is properly secured and monitored. For example, you can use AWS data analytics services such as Amazon Redshift for data warehousing, AWS Glue for data integration, and Amazon QuickSight for business intelligence (BI).
  • Compliance measures – Enterprises need to set up a robust compliance framework to meet regulatory requirements and industry standards such as GDPR, CCPA, or industry-specific standards. This helps make sure generative AI solutions remain secure, compliant, and efficient in handling sensitive information across different use cases. This approach minimizes the risk of data breaches or unauthorized data access, thereby protecting the integrity and confidentiality of critical data assets. Enterprises can take the following organization-level actions to create a comprehensive governance structure:
    • Establish a clear incident response plan for addressing compliance breaches or AI system malfunctions.
    • Conduct periodic compliance assessments and third-party audits to identify and address potential risks or violations.
    • Provide ongoing training to employees on compliance requirements and best practices in AI governance.
  • Model transparency – Although achieving full transparency in generative AI models remains challenging, organizations can take several steps to enhance model transparency and explainability:
    • Provide model cards on the model’s intended use, performance, capabilities, and potential biases.
    • Ask the model to self-explain, meaning provide explanations for their own decisions. This can also be set in a complex system—for example, agents could perform multi-step planning and improve through self-explanation.

Automate model lifecycle management with LLMOps or FMOps

Implementing LLMOps is crucial for efficiently managing the lifecycle of generative AI models at scale. To grasp the concept of LLMOps, a subset of FMOps, and the key differentiators compared to MLOps, see FMOps/LLMOps: Operationalize generative AI and differences with MLOps. In that post, you can learn more about the developmental lifecycle of a generative AI application and the additional skills, processes, and technologies needed to operationalize generative AI applications.

Manage data through standard methods of data ingestion and use

Enriching LLMs with new data is imperative for LLMs to provide more contextual answers without the need for extensive fine-tuning or the overhead of building a specific corporate LLM. Managing data ingestion, extraction, transformation, cataloging, and governance is a complex, time-consuming process that needs to align with corporate data policies and governance frameworks.

AWS provides several services to support this; the following diagram illustrates these at a high level. For a more detailed description, see Scaling AI and Machine Learning Workloads with Ray on AWS and Build a RAG data ingestion pipeline for large scale ML workloads.

This workflow includes the following steps:

  1. Data can be securely transferred to AWS using either custom or existing tools or the AWS Transfer family. You can use AWS Identity and Access Management (IAM) and AWS PrivateLink to control and secure access to data and generative AI resources, making sure data remains within the organization’s boundaries and complies with the relevant regulations.
  2. When the data is in Amazon S3, you can use AWS Glue to extract and transform data (for example, into Parquet format) and store metadata about the ingested data, facilitating data governance and cataloging.
  3. The third component is the GPU cluster, which could potentially be a Ray cluster. You can employ various orchestration engines, such as AWS Step Functions, Amazon SageMaker Pipelines, or AWS Batch, to run the jobs (or create pipelines) to create embeddings and ingest the data into a data store or vector store.
  4. Embeddings can be stored in a vector store such as OpenSearch, enabling efficient retrieval and querying. Alternatively, you can use a solution such as Amazon Bedrock Knowledge Bases to ingest data from Amazon S3 or other data sources, enabling seamless integration with generative AI solutions.
  5. You can use Amazon DataZone to manage access control to the raw data stored in Amazon S3 and the vector store, enforcing role-based or fine-grained access control for data governance.
  6. For cases where you need a semantic understanding of your data, you can use Amazon Kendra for intelligent enterprise search. Amazon Kendra has inbuilt ML capabilities and is easy to integrate with various data sources like S3, making it adaptable for different organizational needs.

The choice of which components to use will depend on the specific requirements of the solution, but a consistent solution should exist for all data management to be codified into blueprints (discussed in the following section).

Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines

There are a number of ways to build and deploy a generative AI solution. AWS offers key services such as Amazon Bedrock, Amazon Kendra, OpenSearch Service, and more, which can be configured to support multiple generative AI use cases, such as text summarization, Retrieval Augmented Generation (RAG), and others.

The simplest way is to allow each team who needs to use generative AI to build their own custom solution on AWS, but this will inevitably increase costs and cause organization-wide irregularities. A more scalable option is to have a centralized team build standard generative AI solutions codified into blueprints or constructs and allow teams to deploy and use them. This team can provide a platform that abstracts away these constructs with a user-friendly and integrated API and provide additional services such as LLMOps, data management, FinOps, and more. The following diagram illustrates these options.

different approaches to scale out GenAI solutions

Establishing blueprints and constructs for generative AI runtimes, APIs, prompts, and orchestration such as LangChain, LiteLLM, and so on will simplify adoption of generative AI and increase overall safe usage. Offering standard APIs with access controls, consistent AI, and data and cost management makes usage straightforward, cost-efficient, and secure.

For more information about how to enforce isolation of resources in a multi-tenant architecture and key patterns in isolation strategies while building solutions on AWS, refer to the whitepaper SaaS Tenant Isolation Strategies.

Conclusion

By focusing on the operational excellence pillar of the Well-Architected Framework from a generative AI lens, enterprises can scale their generative AI initiatives with confidence, building solutions that are secure, cost-effective, and compliant. Introducing a standardized skeleton framework for generative AI runtimes, prompts, and orchestration will empower your organization to seamlessly integrate generative AI capabilities into your existing workflows.

As a next step, you can establish proactive monitoring and alerting, helping your enterprise swiftly detect and mitigate potential issues, such as the generation of biased or harmful output.

Don’t wait—take this proactive stance towards adopting the best practices. Conduct regular audits of your generative AI systems to maintain ethical AI practices. Invest in training your team on the generative AI operational excellence techniques. By taking these actions now, you’ll be well positioned to harness the transformative potential of generative AI while navigating the complexities of this technology wisely.


About the Authors

Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based services and products. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions. Her diverse interests span development, entrepreneurship, and research.

Malcolm Orr is a principal engineer at AWS and has a long history of building platforms and distributed systems using AWS services. He brings a structured – systems, view to generative AI and is helping define how customers can adopt GenAI safely, securely and cost effectively across their organization.

Tanvi Singhal is a Data Scientist within AWS Professional Services. Her skills and areas of expertise include data science, machine learning, and big data. She supports customers in developing Machine learning models and MLops solutions within the cloud. Prior to joining AWS, she was also a consultant in various industries such as Transportation Networking, Retail and Financial Services. She is passionate about enabling customers on their data/AI journey to the cloud.

Zorina Alliata is a Principal AI Strategist, working with global customers to find solutions that speed up operations and enhance processes using Artificial Intelligence and Machine Learning. Zorina helps companies across several industries identify strategies and tactical execution plans for their AI use cases, platforms, and AI at scale implementations.

Read More