Solve forecasting challenges for the retail and CPG industry using Amazon SageMaker Canvas

Solve forecasting challenges for the retail and CPG industry using Amazon SageMaker Canvas

Businesses today deal with a reality that is increasingly complex and volatile. Companies across retail, manufacturing, healthcare, and other sectors face pressing challenges in accurate planning and forecasting. Predicting future inventory needs, setting achievable strategic goals, and budgeting effectively involve grappling with ever-changing consumer demand and global market forces. Inventory shortages, surpluses, and unmet customer expectations pose constant threats. Supply chain forecasting is critical to helping businesses tackle these uncertainties.

By using historical sales and supply data to anticipate future shifts in demand, supply chain forecasting supports executive decision-making on inventory, strategy, and budgeting. Analyzing past trends while accounting for impacts ranging from seasons to world events provides insights to guide business planning. Organizations that tap predictive capabilities to inform decisions can thrive amid fierce competition and market volatility. Overall, mastering demand predictions allows businesses to fulfill customer expectations by providing the right products at the right times.

In this post, we show you how Amazon Web Services (AWS) helps in solving forecasting challenges by customizing machine learning (ML) models for forecasting. We dive into Amazon SageMaker Canvas and explain how SageMaker Canvas can solve forecasting challenges for retail and consumer packaged goods (CPG) enterprises.

Introduction to Amazon SageMaker Canvas

Amazon SageMaker Canvas is a powerful no-code ML service that gives business analysts and data professionals the tools to build accurate ML models without writing a single line of code. This visual, point-and-click interface democratizes ML so users can take advantage of the power of AI for various business applications. SageMaker Canvas supports multiple ML modalities and problem types, catering to a wide range of use cases based on data types, such as tabular data (our focus in this post), computer vision, natural language processing, and document analysis. To learn more about the modalities that Amazon SageMaker Canvas supports, visit the Amazon SageMaker Canvas product page.

For time-series forecasting use cases, SageMaker Canvas uses autoML to train six algorithms on your historical time-series dataset and combines them using a stacking ensemble method to create an optimal forecasting model. The algorithms are: Convolutional Neural Network – Quantile Regression (CNN-QR), DeepAR+, Prophet, Non-Parametric Time Series (NPTS), Autoregressive Integrated Moving Average (ARIMA), and Exponential Smoothing (ETS). To learn more about these algorithms visit Algorithms support for time-series forecasting in the Amazon SageMaker documentation.

How Amazon SageMaker Canvas can help retail and CPG manufacturers solve their forecasting challenges

The combination of a user-friendly UI interface and automated ML technology available in SageMaker Canvas gives users the tools to efficiently build, deploy, and maintain ML models with little to no coding required. For example, business analysts who have no coding or cloud engineering expertise can quickly use Amazon SageMaker Canvas to upload their time-series data and make forecasting predictions. And this isn’t a service to be used by business analysts only. Any team at a retail or CPG company can use this service to generate forecasting data using the user-friendly UI of SageMaker Canvas.

To effectively use Amazon SageMaker Canvas for retail forecasting, customers should use their sales data for a set of SKUs for which they would like to forecast demand. It’s crucial to have data across all months of the year, considering the seasonal variation in demand in a retail environment. Additionally, it’s essential to provide a few years’ worth of data to eliminate anomalies or outliers within the data.

Retail and CPG organizations rely on industry standard methods in their approach to forecasting. One of these methods is quantiles. Quantiles in forecasting represent specific points in the predicted distribution of possible future values. They allow ML models to provide probabilistic forecasts rather than merely single point estimates. Quantiles help quantify the uncertainty in predictions by showing the range and spread of possible outcomes. Common quantiles used are the 10th, 50th (median), and 90th percentiles. For example, the 90th percentile forecast means there’s a 90% chance the actual value will be at or below that level.

By providing a probabilistic view of future demand, quantile forecasting enables retail and CPG organizations to make more informed decisions in the face of uncertainty, ultimately leading to improved operational efficiency and financial performance.

Amazon SageMaker Canvas addresses this need with ML models coupled with quantile regression. With quantile regression, you can select from a wide range of planning scenarios, which are expressed as quantiles, rather than rely on single point forecasts. It’s these quantiles that offer choice.

What do these quantiles mean? Check the following figure, which is a sample of a time-series forecasting prediction using Amazon SageMaker Canvas. The figure provides a visual of a time-series forecast with multiple outcomes, made possible through quantile regression. The red line, denoted with p05, offers a probability that the real number, whatever it may be, is expected to fall below the p05 line about 5% of the time. Conversely, this means 95% of the time the true number will likely fall above the p05 line.

Retail or CPG organizations can evaluate multiple quantile prediction points with a consideration for the over- and under-supply costs of each item to automatically select the quantile likely to provide the most profit in future periods. When necessary, you can override the selection when business rules desire a fixed quantile over a dynamic one.

quantiles

To learn more about how to use quantiles for your business, check out this Beyond forecasting: The delicate balance of serving customers and growing your business.

Another powerful feature that Amazon SageMaker Canvas offers is what-if analysis, which complements quantile forecasting with the ability to interactively explore how changes in input variables affect predictions. Users can change model inputs and immediately observe how these changes impact individual predictions. This feature allows for real-time exploration of different scenarios without needing to retrain the model.

What-if analysis in SageMaker Canvas can be applied to various scenarios, such as:

  • Forecasting inventory in coming months
  • Predicting sales for the next quarter
  • Assessing the effect of price reductions on holiday season sales
  • Estimating customer footfall in stores over the next few hours

How to generate forecasts

The following example illustrates the steps to follow for users to generate forecasts from a time-series dwe use a consumer electronics dataset to forecast 5 months of sales based on current and historic demand. To download a copy of this dataset, visit .

In order to access Amazon SageMaker Canvas, you can either directly sign in using the AWS Management Console and navigate to Amazon SageMaker Canvas, or you can access Amazon SageMaker Canvas directly using single sign-on as detailed in Enable single sign-on access of Amazon SageMaker Canvas using AWS IAM Identity Center. In this post, we access Amazon SageMaker Canvas through the AWS console.

Generate forecasts

To generate forecasts, follow these steps:

  1. On the Amazon SageMaker console, in the left navigation pane, choose Canvas.
  2. Choose Open Canvas on the right side under Get Started, as shown in the following screenshot. If this is your first time using SageMaker Canvas, you need to create a SageMaker Canvas user by following the prompts on the screen. A new browser tab will open for the SageMaker Canvas console.

SageMaker Canvas

  1. In the left navigation pane, choose Datasets.
  2. To import your time-series dataset, choose the Import data dropdown menu and then choose Tabular, as shown in the following screenshot.

Import Data

  1. In Dataset name, enter a name such as Consumer_Electronics and then choose Create, as shown in the following screenshot.

Create Dataset

  1. Upload your dataset (in CSV or Parquet format) from your computer or an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Preview the data, then choose Create dataset, as shown in the following screenshot.

Preview Dataset

Under Status, your dataset import will show as Processing. When it shows as Complete, proceed to the next step.

Processing Dataset Import

  1. Now that you have your dataset created and your time-series data file uploaded, create a new model to generate forecasts for your dataset. In the left navigation pane, choose My Models, then choose New model, as shown in the following screenshot.

Create Model

  1. In Model name, enter a name such as consumer_electronics_forecast. Under Problem type, select your use case type. Our use case is Predictive analysis, which builds models using tabular datasets for different problems, including forecasts.
  2. Choose Create.

Model Type

  1. You will be transferred to the Build In the Target column dropdown menu, select the column where you want to generate the forecasts. This is the demand column in our dataset, as shown in the followings screenshot. After you select the target column, SageMaker Canvas will automatically select Time series forecasting as the Model type.
  2. Choose Configure model.

Configure Model

  1. A window will pop up asking you to provide more information, as shown in the following screenshot. Enter the following details:
    1. Choose the column that uniquely identifies the items in your dataset – This configuration determines how you identify your items in the datasets in a unique way. For this use case, select item_id because we’re planning to forecast sales per store.
    2. Choose a column that groups the forecast by the values in the column – If you have logical groupings of the items selected in the previous field, you can choose that feature here. We don’t have one for this use case, but examples would be state, region, country, or other groupings of stores.
    3. Choose the column that contains the time stamps – The timestamp is the feature that contains the timestamp information. SageMaker Canvas requires data timestamp in the format YYYY-MM-DD HH:mm:ss (for example, 2022-01-01 01:00:00).
    4. Specify the number of months you want to forecast into the future – SageMaker Canvas forecasts values up to the point in time specified in the timestamp field. For this use case, we will forecast values up to 5 months in the future. You may choose to enter any valid value, but be aware a higher number will impact the accuracy of predictions and also may take longer to compute.
    5. You can use a holiday schedule to improve your prediction accuracy – (Optional) You can enable Use holiday schedule and choose a relevant country if you want to learn how it helps with accuracy. However, it might not have much impact on this use case because our dataset is synthetic.

Configure Model 2

Configure Model 3

  1. To change the quantiles from the default values as explained previously, in the left navigation pane, choose Forecast quantiles. In the Forecast quantiles field, enter your own values, as shown in the following screenshot.

Change Quantiles

SageMaker Canvas chooses an AutoML algorithm based on your data and then trains an ensemble model to make predictions for time-series forecasting problems. Using time-series forecasts, you can make predictions that can vary with time, such as forecasting:

  • Your inventory in the coming months
  • Your sales for the next months
  • The effect of reducing the price on sales during the holiday season
  • The number of customers entering a store in the next several hours
  • How a reduction in the price of a product affects sales over a time period

If you’re not sure which forecasting algorithms to try, select all of them. To help you decide which algorithms to select, refer to Algorithms support for time-series forecasting, where you can learn more details and compare algorithms.

  1. Choose Save.

Train the model

Now that the configuration is done, you can train the model. SageMaker Canvas offers two build options:

  • Quick build – Builds a model in a fraction of the time compared to a standard build. Potential accuracy is exchanged for speed.
  • Standard build – Builds the best model from an optimized process powered by AutoML. Speed is exchanged for greatest accuracy.
  1. For this walkthrough, we choose Standard build, as shown in the following screenshot.

Build Model

  1. When the model training finishes, you will be routed to the Analyze There, you can find the average prediction accuracy and the column impact on prediction outcome.

Your numbers might differ from what the following screenshot shows. This is due to the stochastic nature of the ML process.

Monitor Model

Here are explanations of what these metrics mean and how you can use them:

  • wQL – The average Weighted Quantile Loss (wQL) evaluates the forecast by averaging the accuracy at the P10, P50, and P90 quantiles (unless the user has changed them). A lower value indicates a more accurate model. In our example, we used the default quantiles. If you choose quantiles with different percentiles, wQL will center on the numbers you choose.
  • MAPE – Mean absolute percentage error (MAPE) is the percentage error (percent difference of the mean forecasted value compared to the actual value) averaged over all time points. A lower value indicates a more accurate model, where MAPE = 0 is a model with no errors.
  • WAPE – Weighted Absolute Percent Error (WAPE) is the sum of the absolute error normalized by the sum of the absolute target, which measure the overall deviation of forecasted values from observed values. A lower value indicates a more accurate model, where WAPE = 0 is a model with no errors.
  • RMSE – Root mean square error (RMSE) is the square root of the average squared errors. A lower RMSE indicates a more accurate model, where RMSE = 0 is a model with no errors.
  • MASE – Mean absolute scaled error (MASE) is the mean absolute error of the forecast normalized by the mean absolute error of a simple baseline forecasting method. A lower value indicates a more accurate model, where MASE < 1 is estimated to be better than the baseline and MASE > 1 is estimated to be worse than the baseline.

You can change the default metric based on your needs. wQL is the default metric. Companies should choose a metric that aligns with their specific business goals and is straightforward for  stakeholders to interpret. The choice of metric should be driven by the specific characteristics of the demand data, the business objectives, and the interpretability requirements of stakeholders.

For instance, a high-traffic grocery store that sells perishable items requires the lowest possible wQL. This is crucial to prevent lost sales from understocking while also avoiding overstocking, which can lead to spoilage of those perishables.

It’s often recommended to evaluate multiple metrics and select the one that best aligns with the company’s forecasting goals and data patterns. For example, wQL is a robust metric that can handle intermittent demand and provide a more comprehensive evaluation of forecast accuracy across different quantiles. However, RMSE gives higher weight to larger errors due to the squaring operation, making it more sensitive to outliers.

  1. Choose Predict to open the Predict

To generate forecast predictions for all the items in the dataset, select Batch prediction. To generate forecast predictions for a specific item (for example, to predict demand in real-time), select Single prediction. The following steps show how to perform both operations.

Predictions

To generate forecast predictions for a specific item, follow these steps:

  1. Choose Single item and select any of the items from the item dropdown list. SageMaker Canvas generates a prediction for our item, showing the average prediction (that is, demand of that item with respect to timestamp). SageMaker Canvas provides results for all upper bound, lower bound, and expected forecast.

It’s a best practice to have bounds rather than a single prediction point so that you can pick whichever fits best your use case. For example, you might want to reduce waste of resources of overstock by choosing to use the lower bound, or you might want to choose to follow the upper bound to make sure that you meet customer demand. For instance, a highly advertised item in a promotional flyer might be stocked at the 90th percentile (p90) to make sure of availability and prevent customer disappointment. On the other hand, accessories or bulky items that are less likely to drive customer traffic could be stocked at the 40th percentile (p40). It’s generally not advisable to stock below the 40th percentile, to avoid being consistently out of stock.

  1. To generate the forecast prediction, select the Download prediction dropdown menu button to download the forecast prediction chart as image or forecast prediction values as CSV file.

View Predictions

You can use the What if scenario button to explore how changing the price will affect the demand of an item. To use this feature, you must leave empty the future dated rows with the feature you’re predicting. This dataset has empty cells for a few items, which means that this feature is enabled for them. Choose What if scenario and edit the values for the different dates to view how changing the price will affect demand. This feature helps organizations test specific scenarios without making changes to the underlying data.

To generate batch predictions on the entire dataset, follow these steps:

  1. Choose All items and then choose Start Predictions. The Status will show as Generating predictions, as shown in the following screenshot.

Generate Predictions

  1. When it’s complete, the Status will show as Ready, as shown in the following screenshot. Select the three-dot additional options icon and choose Preview. This will open the prediction results in a preview page.

Preview Predictions

  1. Choose Download to export these results to your local computer or choose Send to Amazon QuickSight for visualization, as shown in the following screenshot.

Download Predictions

Training time and performance

SageMaker Canvas provides efficient training times and offers valuable insights into model performance. You can inspect model accuracy, perform backtesting, and evaluate various performance metrics for the underlying models. By combining multiple algorithms in the background, SageMaker Canvas significantly reduces the time required to train models compared to training each model individually. Additionally, by using the model leaderboard dashboard, you can assess the performance of each trained algorithm against your specific time-series data, ranked based on the selected performance metric (wQL by default).

This dashboard also displays other metrics, which you can use to compare different algorithms trained on your data across various performance measures, facilitating informed decision-making and model selection.

To view the leaderboard, choose Model leaderboard, as shown in the following screenshot.

Model Leader board

The model leaderboard shows you the different algorithms used to train your data along with their performance based on all the available metrics, as shown in the following screenshot.

Algorithms used

Integration

Retail and (CPG) organizations often rely on applications such as inventory lifecycle management, order management systems, and business intelligence (BI) dashboards, which incorporate forecasting capabilities. In these scenarios, organizations can seamlessly integrate the SageMaker Canvas forecasting service with their existing applications, enabling them to harness the power of forecasting data. To use the forecasting data within these applications, an endpoint for the forecasting model is required. Although SageMaker Canvas models can be deployed to provide endpoints, this process may require additional effort from a machine learning operations (MLOps) perspective. Fortunately, Amazon SageMaker streamlines this process, streamlining the deployment and integration of SageMaker Canvas models.

The following steps show how you can deploy SageMaker Canvas models using SageMaker:

  1. On the SageMaker console, in the left navigation pane, choose My Models.
  2. Select the three-dot additional options icon next to the model you want to deploy and choose Deploy, as shown in the following screenshot.

Deploy Model

  1. Under Instance type, select the size of the instance where your model will be deployed to. Choose Deploy and wait until your deployment status changes to In service.

Select Instance

  1. After your deployment is in service, in the left navigation pane, choose ML Ops to get your deployed model endpoint, as shown in the following screenshot. You can test your deployment or start using the endpoint in your applications.

Deployed Model Endpoint

Reproducibility and API management

It’s important to understand that Amazon SageMaker Canvas uses Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs in the AWS Machine Learning Blog.

Insights

Retail and CPG enterprises typically use visualization tools such as Amazon QuickSight or third-party software such as Tableau to understand forecast results and share them across business units. To streamline the visualization, SageMaker Canvas provides embedded visualization for exploring forecast results. For those retail and CPG enterprises who want to visualize the forecasting data in their own BI dashboard systems (such as Amazon QuickSight, Tableau, and Qlik), SageMaker Canvas forecasting models can be deployed to generate forecasting endpoints. Users can also generate a batch prediction file to Amazon QuickSight for batch prediction from the predict window as shown in the following screenshot.

Quicksight Integration

The following screenshot shows the batch prediction file in QuickSight as a database that you can use for analysis

Dataset selection from Quicksight

When your dataset is in Amazon QuickSight, you can start analyzing or even visualizing your data using the visualizations tools, as shown in the following screenshot.

Quicksight Analysis

Cost

Amazon SageMaker Canvas offers a flexible, cost-effective pricing model based on three key components: workspace instance runtime, utilization of pre-built models, and resource consumption for custom model creation and prediction generation. The billing cycle commences upon launching the SageMaker Canvas application, encompassing a range of essential tasks including data ingestion, preparation, exploration, model experimentation, and analysis of prediction and explainability results. This comprehensive approach means that users only pay for the resources they actively use, providing a transparent and efficient pricing structure. To learn more about pricing examples, check out Amazon SageMaker Canvas pricing.

Ownership and portability

More retail and CPG enterprises have embraced multi-cloud deployments for several reasons. To streamline portability of models built and trained on Amazon SageMaker Canvas to other cloud providers or on-premises environments, Amazon SageMaker Canvas provides downloadable model artifacts.

Also, several retail and CPG companies have many business units (such as merchandising, planning, or inventory management) within the organization who all use forecasting for solving different use cases. To streamline ownership of a model and facilitate straightforward sharing between business units, Amazon SageMaker Canvas now extends its Model Registry integration to timeseries forecasting models. With a single click, customers can register the ML models built on Amazon SageMaker Canvas with the SageMaker Model Registry, as shown in the following screenshot. Register a Model Version in the Amazon SageMaker Developer Guide shows you where to find the S3 bucket location where your model’s artifacts are stored.

Model Registry

Clean up

To avoid incurring unnecessary costs, you can delete the model you just built, then delete the dataset, and sign out of your Amazon SageMaker Canvas domain. If you also signed up for Amazon QuickSight, you can unsubscribe and remove your Amazon QuickSight account.

Conclusion

Amazon SageMaker Canvas empowers retail and CPG companies with a no-code forecasting solution. It delivers automated time-series predictions for inventory planning and demand anticipation, featuring an intuitive interface and rapid model development. With seamless integration capabilities and cost-effective insights, it enables businesses to enhance operational efficiency, meet customer expectations, and gain a competitive edge in the fast-paced retail and consumer goods markets.

We encourage you to evaluate how you can improve your forecasting capabilities using Amazon SageMaker Canvas. Use the intuitive no-code interface to analyze and improve the accuracy of your demand predictions for retail and CPG products, enhancing inventory management and operational efficiency. To get started, you can review the workshop Amazon SageMaker Canvas Immersion Day.


About the Authors

Aditya Pendyala is a Principal Solutions Architect at AWS based out of NYC. He has extensive experience in architecting cloud-based applications. He is currently working with large enterprises to help them craft highly scalable, flexible, and resilient cloud architectures, and guides them on all things cloud. He has a Master of Science degree in Computer Science from Shippensburg University and believes in the quote “When you cease to learn, you cease to grow.

Julio Hanna, an AWS Solutions Architect based in New York City, specializes in enterprise technology solutions and operational efficiency. With a career focused on driving innovation, he currently leverages Artificial Intelligence, Machine Learning, and Generative AI to help organizations navigate their digital transformation journeys. Julio’s expertise lies in harnessing cutting-edge technologies to deliver strategic value and foster innovation in enterprise environments.

Read More

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Chat-based assistants have become an invaluable tool for providing automated customer service and support. This post builds on a previous post, Integrate QnABot on AWS with ServiceNow, and explores how to build an intelligent assistant using Amazon Lex, Amazon Bedrock Knowledge Bases, and a custom ServiceNow integration to create an automated incident management support experience.

Amazon Lex is powered by the same deep learning technologies used in Alexa. With it, developers can quickly build conversational interfaces that can understand natural language, engage in realistic dialogues, and fulfill customer requests. Amazon Lex can be configured to respond to customer questions using Amazon Bedrock foundation models (FMs) to search and summarize FAQ responses. Amazon Bedrock Knowledge Bases provides the capability of amassing data sources into a repository of information. Using knowledge bases, you can effortlessly create an application that uses Retrieval Augmented Generation (RAG), a technique where the retrieval of information from data sources enhances the generation of model responses.

ServiceNow is a cloud-based platform for IT workflow management and automation. With its robust capabilities for ticketing, knowledge management, human resources (HR) services, and more, ServiceNow is already powering many enterprise service desks.

By connecting an Amazon Lex chat assistant with Amazon Bedrock Knowledge Bases and ServiceNow, companies can provide 24/7 automated support and self-service options to customers and employees. In this post, we demonstrate how to integrate Amazon Lex with Amazon Bedrock Knowledge Bases and ServiceNow.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. The ServiceNow knowledge bank is exported into Amazon Simple Storage Service (Amazon S3), which will be used as the data source for Amazon Bedrock Knowledge Bases. Data in Amazon S3 is encrypted by default. You can further enhance security by Using server-side encryption with AWS KMS keys (SSE-KMS).
  2. Amazon AppFlow can be used to sync between ServiceNow and Amazon S3. Other alternatives like AWS Glue can also be used to ingest data from ServiceNow.
  3. Amazon Bedrock Knowledge Bases is created with Amazon S3 as the data source and Amazon Titan (or any other model of your choice) as the embedding model.
  4. When users of the Amazon Lex chat assistant ask queries, Amazon Lex fetches answers from Amazon Bedrock Knowledge Bases.
  5. If the user requests a ServiceNow ticket to be created, it invokes the AWS Lambda
  6. The Lambda function fetches secrets from AWS Secrets Manager and makes an HTTP call to create a ServiceNow ticket.
  7. Application Auto Scaling is enabled on AWS Lambda to automatically scale Lambda according to user interactions.
  8. The solution will confer with responsible AI policies and Guardrails for Amazon Bedrock will enforce organizational responsible AI policies.
  9. The solution is monitored using Amazon CloudWatch, AWS CloudTrail, and Amazon GuardDuty.

Be sure to follow least privilege access policies while giving access to any system resources.

Prerequisites

The following prerequisites need to be completed before building the solution.

  1. On the Amazon Bedrock console, sign up for access to the Anthropic Claude model of your choice using the instructions at Manage access to Amazon Bedrock foundation models. For information about pricing for using Amazon Bedrock, see Amazon Bedrock pricing.
  2. Sign up for a ServiceNow account if you do not have one. Save your username and password. You will need to store them in AWS Secrets Manager later in this walkthrough.
  3. Create a ServiceNow instance following the instructions in Integrate QnABot on AWS ServiceNow.
  4. Create a user with permissions to create incidents in ServiceNow using the instructions at Create a user. Make a note of these credentials for use later in this walkthrough.

The instructions provided in this walkthrough are for demonstration purposes. Follow ServiceNow documentation to create community instances and follow their best practices.

Solution overview

To integrate Amazon Lex with Amazon Bedrock Knowledge Bases and ServiceNow, follow the steps in the next sections.

Deployment with AWS CloudFormation console

In this step, you first create the solution architecture discussed in the solution overview, except for the Amazon Lex assistant, which you will create later in the walkthrough. Complete the following steps:

  1. On the CloudFormation console, verify that you are in the correct AWS Region and choose Create stack to create the CloudFormation stack.
  2. Download the CloudFormation template and upload it in the Specify template Choose Next.
  3. For Stack name, enter a name such as ServiceNowBedrockStack.
  4. In the Parameters section, for ServiceNow details, provide the values of ServiceNow host and ServiceNow username created earlier.
  5. Keep the other values as default. Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit to create the CloudFormation stack.
  6. After the successful deployment of the whole stack, from the Outputs tab, make a note of the output key value BedrockKnowledgeBaseId because you will need it later during creation of the Amazon Lex assistant.

Integration of Lambda with Application Auto Scaling is beyond the scope of this post. For guidance, refer to the instructions at AWS Lambda and Application Auto Scaling.

Store the secrets in AWS Secrets Manager

Follow these steps to store your ServiceNow username and password in AWS Secrets Manager:

  1. On the CloudFormation console, on the Resources tab, enter the word “secrets” to filter search results. Under Physical ID, select the console URL of the AWS Secrets Manager secret you created using the CloudFormation stack.
  2. On the AWS Secrets Manager console, on the Overview tab, under Secret value, choose Retrieve secret value.
  3. Select Edit and enter the username and password of the ServiceNow instance you created earlier. Make sure that both the username and password are correct.

Download knowledge articles

You need access to ServiceNow knowledge articles. Follow these steps:

  1. Create a knowledge base if you don’t have one. Periodically, you may need to sync your knowledge base to keep it up to date.
  2. Sync the data from ServiceNow to Amazon S3 using Amazon AppFlow by following instructions at ServiceNow. Alternatively, you can use AWS Glue to ingest data from ServiceNow to Amazon S3 by following instructions at the blog post, Extract ServiceNow data using AWS Glue Studio in an Amazon S3 data lake and analyze using Amazon Athena.
  3. Download a sample article.

Sync Amazon Bedrock Knowledge Bases:

This solution uses the fully managed Knowledge Base for Amazon Bedrock to seamlessly power a RAG workflow, eliminating the need for custom integrations and data flow management. As the data source for the knowledge base, the solution uses Amazon S3. The following steps outline uploading ServiceNow articles to an S3 bucket created by a CloudFormation template.

  1. On the CloudFormation console, on the Resources tab, enter “S3” to filter search results. Under Physical ID, select the URL for the S3 bucket created using the CloudFormation stack.
  2. Upload the previously downloaded knowledge articles to this S3 bucket.

Next you need to sync the data source.

  1. On the CloudFormation console, on the Outputs tab, enter “Knowledge” to filter search results. Under Value, select the console URL of the knowledge bases that you created using the CloudFormation stack. Open that URL in a new browser tab.
  2. Scroll down to Data source and select the data source. Choose Sync.

You can test the knowledge base by choosing the model in the Test the knowledge base section and asking the model a question.

Responsible AI using Guardrails for Amazon Bedrock

Conversational AI applications require robust guardrails to safeguard sensitive user data, adhere to privacy regulations, enforce ethical principles, and mitigate hallucinations, fostering responsible development and deployment. Guardrails for Amazon Bedrock allow you to configure your organizational policies against the knowledge bases. They help keep your generative AI applications safe by evaluating both user inputs and model responses

To set up guardrails, follow these steps:

  1. Follow the instructions at the Amazon Bedrock User Guide to create a guardrail.

You can reduce the hallucinations of the model responses by enabling grounding check and relevance check and adjusting the threshold

  1. Create a version of the guardrail.
  2. Select the newly created guardrail and copy the guardrail ID. You will use this ID later in the intent creation.

Amazon Lex setup

In this section, you configure your Amazon Lex chat assistant with intents to call Amazon Bedrock. This walkthrough uses Amazon Lex V2.

  1. On the CloudFormation console, on the Outputs tab, copy the value of BedrockKnowledgeBaseId. You will need this ID later in this section.
  2. On the Outputs tab, under Outputs, enter “bot” to filter search results. Choose the console URL of the Amazon Lex assistant you created using the CloudFormation stack. Open that URL in a new browser tab.
  3. On the Amazon Lex Intents page, choose Create another intent. On the Add intent dropdown menu, choose Use built-in intent.
  4. On the Use built-in intent screen, under Built-in intent, choose QnAIntent- Gen AI feature.
  5. For Intent name, enter BedrockKb and select Add.
  6. In the QnA configuration section, under Select model, choose Anthropic and Claude 3 Haiku or a model of your choice.
  7. Expand Additional Model Settings and enter the Guardrail ID for the guardrails you created earlier. Under Guardrail Version, enter a number that corresponds to the number of versions you have created.
  8. Enter the Knowledge base for Amazon Bedrock Id that you captured earlier in the CloudFormation outputs section. Choose Save intent at the bottom.

You can now add more QnAIntents pointing to different knowledge bases.

  1. Return to the intents list by choosing Back to intents list in the navigation pane.
  2. Select Build to build the assistant.

A green banner on the top of the page with the message Successfully built language English (US) in bot: servicenow-lex-bot indicates the Amazon Lex assistant is now ready.

Test the solution

To test the solution, follow these steps:

  1. In the navigation pane, choose Aliases. Under Aliases, select TestBotAlias.
  2. Under Languages, choose English (US). Choose Test.
  3. A new test window will pop up in the bottom of the screen.
  4. Enter the question “What benefits does AnyCompany offer to its employees?” Then press Enter.

The chat assistant generates a response based on the content in knowledge base.

  1. To test Amazon Lex to create a ServiceNow ticket for information not present in the knowledge base, enter the question “Create a ticket for password reset” and press Enter.

The chat assistant generates a new ServiceNow ticket because this information is not available in the knowledge base.

To search for the incident, log in to the ServiceNow endpoint that you configured earlier.

Monitoring

You can use CloudWatch logs to review the performance of the assistant and to troubleshoot issues with conversations. From the CloudFormation stack that you deployed, you have already configured your Amazon Lex assistant CloudWatch log group with appropriate permissions.

To view the conversation logs from the Amazon Lex assistant, follow these directions.

On the CloudFormation console, on the Outputs tab, enter “Log” to filter search results. Under Value, choose the console URL of the CloudWatch log group that you created using the CloudFormation stack. Open that URL in a new browser tab.

To protect sensitive data, Amazon Lex obscures slot values in conversation logs. As security best practice, do not store any slot values in request or session attributes. Amazon Lex V2 doesn’t obscure the slot value in audio. You can selectively capture only text using the instructions at Selective conversation log capture.

Enable logging for Amazon Bedrock ingestion jobs

You can monitor Amazon Bedrock ingestion jobs using CloudWatch. To configure logging for an ingestion job, follow the instructions at Knowlege bases logging.

AWS CloudTrail logs

AWS CloudTrail is an AWS service that tracks actions taken by a user, role, or an AWS service. CloudTrail is enabled on your AWS account when you create the account. When activity occurs in that activity is recorded in a CloudTrail event along with other AWS service events in Event history. You can view, search, and download recent events in your AWS account. For more information, see Working with CloudTrail Event history.

As security best practice, you should monitor any access to your environment. You can configure Amazon GuardDuty to identify any unexpected and potentially unauthorized activity in your AWS environment.

Cleanup

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, use the following steps:

  1. Empty the contents of the S3 bucket you created as part of the CloudFormation stack.
  2. Delete the CloudFormation stack you created.

Conclusion

As customer expectations continue to evolve, embracing innovative technologies like conversational AI and knowledge management systems becomes essential for businesses to stay ahead of the curve. By implementing this integrated solution, companies can enhance operational efficiency and deliver superior service to both their customers and employees, while also adapting the responsible AI policies of the organization.

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.


About the Authors

Marcelo Silva is an experienced tech professional who excels in designing, developing, and implementing cutting-edge products. Starting off his career at Cisco, Marcelo worked on various high-profile projects including deployments of the first ever carrier routing system and the successful rollout of ASR9000. His expertise extends to cloud technology, analytics, and product management, having served as senior manager for several companies such as Cisco, Cape Networks, and AWS before joining GenAI. Currently working as a Conversational AI/GenAI Product Manager, Marcelo continues to excel in delivering innovative solutions across industries.

Sujatha Dantuluri is a seasoned Senior Solutions Architect on the US federal civilian team at AWS, with over two decades of experience supporting commercial and federal government clients. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences. She has contributed to IEEE standards and is passionate about empowering others through her engaging presentations and thought-provoking ideas.

NagaBharathi Challa is a solutions architect on the US federal civilian team at Amazon Web Services (AWS). She works closely with customers to effectively use AWS services for their mission use cases, providing architectural best practices and guidance on a wide range of services. Outside of work, she enjoys spending time with family and spreading the power of meditation.

Pranit Raje is a Cloud Architect on the AWS Professional Services India team. He specializes in DevOps, operational excellence, and automation using DevSecOps practices and infrastructure as code. Outside of work, he enjoys going on long drives with his beloved family, spending time with them, and watching movies.

Read More

How Kyndryl integrated ServiceNow and Amazon Q Business

How Kyndryl integrated ServiceNow and Amazon Q Business

This post is co-written with Sujith R Pillai from Kyndryl.

In this post, we show you how Kyndryl, an AWS Premier Tier Services Partner and IT infrastructure services provider that designs, builds, manages, and modernizes complex, mission-critical information systems, integrated Amazon Q Business with ServiceNow in a few simple steps. You will learn how to configure Amazon Q Business and ServiceNow, how to create a generative AI plugin for your ServiceNow incidents, and how to test and interact with ServiceNow using the Amazon Q Business web experience. By the end of this post, you will be able to enhance your ServiceNow experience with Amazon Q Business and enjoy the benefits of a generative AI–powered interface.

Solution overview

Amazon Q Business has three main components: a front-end chat interface, a data source connector and retriever, and a ServiceNow plugin. Amazon Q Business uses AWS Secrets Manager secrets to store the ServiceNow credentials securely. The following diagram shows the architecture for the solution.

High level architecture

Chat

Users interact with ServiceNow through the generative AI–powered chat interface using natural language.

Data source connector and retriever

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business has two types of retrievers: native retrievers and existing retrievers using Amazon Kendra. The native retrievers support a wide range of Amazon Q Business connectors, including ServiceNow. The existing retriever option is for those who already have an Amazon Kendra retriever and would like to use that for their Amazon Q Business application. For the ServiceNow integration, we use the native retriever.

ServiceNow plugin

Amazon Q Business provides a plugin feature for performing actions such as creating incidents in ServiceNow.

The following high-level steps show how to configure the Amazon Q Business – ServiceNow integration:

  1. Create a user in ServiceNow for Amazon Q Business to communicate with ServiceNow
  2. Create knowledge base articles in ServiceNow if they do not exist already
  3. Create an Amazon Q Business application and configure the ServiceNow data source and retriever in Amazon Q Business
  4. Synchronize the data source
  5. Create a ServiceNow plugin in Amazon Q Business

Prerequisites

To run this application, you must have an Amazon Web Services (AWS) account, an AWS Identity and Access Management (IAM) role, and a user that can create and manage the required resources. If you are not an AWS account holder, see How do I create and activate a new Amazon Web Services account?

You need an AWS IAM Identity Center set up in the AWS Organizations organizational unit (OU) or AWS account in which you are building the Amazon Q Business application. You should have a user or group created in IAM Identity Center. You will assign this user or group to the Amazon Q Business application during the application creation process. For guidance, refer to Manage identities in IAM Identity Center.

You also need a ServiceNow user with incident_manager and knowledge_admin permissions to create and view knowledge base articles and to create incidents. We use a developer instance of ServiceNow for this post as an example. You can find out how to get the developer instance in Personal Developer Instances.

Solution walkthrough

To integrate ServiceNow and Amazon Q Business, use the steps in the following sections.

Create a knowledge base article

Follow these steps to create a knowledge base article:

  1. Sign in to ServiceNow and navigate to Self-Service > Knowledge
  2. Choose Create an Article
  3. On the Create new article page, select a knowledge base and choose a category. Optionally, you may create a new category.
  4. Provide a Short description and type in the Article body
  5. Choose Submit to create the article, as shown in the following screenshot

Repeat these steps to create a couple of knowledge base articles. In this example, we created a hypothetical enterprise named Example Corp for demonstration purposes.

Create ServiceNow Knowledgebase

Create an Amazon Q Business application

Amazon Q offers three subscription plans: Amazon Q Business Lite, Amazon Q Business Pro, and Amazon Q Developer Pro. Read the Amazon Q Documentation for more details. For this example, we used Amazon Q Business Lite.

Create application

Follow these steps to create an application:

  1. In the Amazon Q Business console, choose Get started, then choose Create application to create a new Amazon Q Business application, as shown in the following screenshot

  1. Name your application in Application name. In Service access, select Create and use a new service-linked role (SLR). For more information about example service roles, see IAM roles for Amazon Q Business. For information on service-linked roles, including how to manage them, see Using service-linked roles for Amazon Q Business. We named our application ServiceNow-Helpdesk. Next, select Create, as shown in the following screenshot.

Choose a retriever and index provisioning

To choose a retriever and index provisioning, follow these steps in the Select retriever screen, as shown in the following screenshot:

  1. For Retrievers, select Use native retriever
  2. For Index provisioning, choose Starter
  3. Choose Next

Connect data sources

Amazon Q Business has ready-made connectors for common data sources and business systems.

  1. Enter “ServiceNow” to search and select ServiceNow Online as the data source, as shown in the following screenshot

  1. Enter the URL and the version of your ServiceNow instance. We used the ServiceNow version Vancouver for this post.

  1. Scroll down the page to provide additional details about the data source. Under Authentication, select Basic authentication. Under AWS Secrets Manager secret, select Create and add a new secret from the dropdown menu as shown in the screenshot.

  1. Provide the Username and Password you created in ServiceNow to create an AWS Secrets Manager secret. Choose Save.

  1. Under Configure VPC and security group, keep the setting as No VPC because you will be connecting to the ServiceNow by the internet. You may choose to create a new service role under IAM role. This will create a role specifically for this application.

  1. In the example, we synchronize the ServiceNow knowledge base articles and incidents. Provide the information as shown in the following image below. Notice that for Filter query the example shows the following code.
workflow_state=published^kb_knowledge_base=dfc19531bf2021003f07e2c1ac0739ab^article_type=text^active=true^EQ

This filter query aims to sync the articles that meet the following criteria:

  • workflow_state = published
  • kb_knowledge_base = dfc19531bf2021003f07e2c1ac0739ab (This is the default Sys ID for the knowledge base named “Knowledge” in ServiceNow).
  • Type = text (This field contains the text in the Knowledge article).
  • Active = true (This field filters the articles to sync only the ones that are active).

The filter fields are separated by ^, and the end of the query is represented by EQ. You can find more details about the Filter query and other parameters in Connecting Amazon Q Business to ServiceNow Online using the console.

  1. Provide the Sync scope for the Incidents, as shown in the following screenshot

  1. You may select Full sync initially so that a complete synchronization is performed. You need to select the frequency of the synchronization as well. For this post, we chose Run on demand. If you need to keep the knowledge base and incident data more up-to-date with the ServiceNow instance, choose a shorter window.

  1. A field mapping will be provided for you to validate. You won’t be able to change the field mapping at this stage. Choose Add data source to proceed.

This completes the data source configuration for Amazon Q Business. The configuration takes a few minutes to be completed. Watch the screen for any errors and updates. Once the data source is created, you will be greeted with a message You successfully created the following data source: ‘ServiceNow-Datasource’

Add users and groups

Follow these steps to add users and groups:

  1. Choose Next
  2. In the Add groups and users page, click Add groups and users. You will be presented with the option of Add and assign new users or Assign existing users and groups. Select Assign existing users and groups. Choose Next, as shown in the following image.

  1. Search for an existing user or group in your IAM Identity Center, select one, and choose Assign. After selecting the right user or group, choose Done.

This completes the activity of assigning the user and group access to the Amazon Q Business application.

Create a web experience

Follow these steps to create a web experience in the Add groups and users screen, as shown in the following screenshot.

  1. Choose Create and use a new service role in the Web experience service access section
  2. Choose Create application

The deployed application with the application status will be shown in the Amazon Q Business > Applications console as shown in the following screenshot.

Synchronize the data source

Once the data source is configured successfully, it’s time to start the synchronization. To begin this process, the ServiceNow fields that require synchronization must be updated. Because we intend to get answers from the knowledge base content, the text field needs to be synchronized. To do so, follow these steps:

  1. In the Amazon Q Business console, select Applications in the navigation pane
  2. Select ServiceNow-Helpdesk and then ServiceNow-Datasource
  3. Choose Actions. From the dropdown, choose Edit, as shown in the following screenshot.

  1. Scroll down to the bottom of the page to the Field mappings Select text and description.

  1. Choose Update. After the update, choose Sync now.

The synchronization takes a few minutes to complete depending on the amount of data to be synchronized. Make sure that the Status is Completed, as shown in the following screenshot, before proceeding further. If you notice any error, you can choose the error hyperlink. The error hyperlink will take you to Amazon CloudWatch Logs to examining the logs for further troubleshooting.

Create ServiceNow plugin

A ServiceNow plugin in Amazon Q Business helps you create incidents in ServiceNow through Amazon Q Business chat. To create one, follow these steps:

  1. In the Amazon Q Business console, select Enhancements from the navigation pane
  2. Under Plugins, choose Add plugin, as shown in the following screenshot

  1. In the Add Plugin page, shown in the following screenshot, and select the ServiceNow plugin

  1. Provide a Name for the plugin
  2. Enter the ServiceNow URL and use the previously created AWS Secrets Manager secret for the Authentication
  3. Select Create and use a new service role
  4. Choose Add plugin

  1. The status of the plugin will be shown in the Plugins If Plugin status is Active, the plugin is configured and ready to use.

Use the Amazon Q Business chat interface

To use the Amazon Q Business chat interface, follow these steps:

  1. In the Amazon Q Business console, choose Applications from the navigation pane. The web experience URL will be provided for each Amazon Q Business application.

  1. Choose the Web experience URL to open the chat interface. Enter an IAM Identity Center username and password that was assigned to this application. The following screenshot shows the Sign in

You can now ask questions and receive responses, as shown in the following image. The answers will be specific to your organization and are retrieved from the knowledge base in ServiceNow.

You can ask the chat interface to create incidents as shown in the next screenshot.

A new pop-up window will appear, providing additional information related to the incident. In this window, you can provide more information related to the ticket and choose Create.

This will create a ServiceNow incident using the web experience of Amazon Q Business without signing in to ServiceNow. You may verify the ticket in the ServiceNow console as shown in the next screenshot.

Conclusion

In this post, we showed how Kyndryl is using Amazon Q Business to enable natural language conversations with ServiceNow using the ServiceNow connector provided by Amazon Q Business. We also showed how to create a ServiceNow plugin that allows users to create incidents in ServiceNow directly from the Amazon Q Business chat interface. We hope that this tutorial will help you take advantage of the power of Amazon Q Business for your ServiceNow needs.


About the authors

Asif Fouzi is a Principal Solutions Architect leading a team of seasoned technologists supporting Global Service Integrators (GSI) such as Kyndryl in their cloud journey. When he is not innovating on behalf of users, he likes to play guitar, travel, and spend time with his family.


Sujith R Pillai is a cloud solution architect in the Cloud Center of Excellence at Kyndryl with extensive experience in infrastructure architecture and implementation across various industries. With his strong background in cloud solutions, he has led multiple technology transformation projects for Kyndryl customers.

Read More

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

This post introduces HCLTech’s AutoWise Companion, a transformative generative AI solution designed to enhance customers’ vehicle purchasing journey. By tailoring recommendations based on individuals’ preferences, the solution guides customers toward the best vehicle model for them. Simultaneously, it empowers vehicle manufacturers (original equipment manufacturers (OEMs)) by using real customer feedback to drive strategic decisions, boosting sales and company profits. Powered by generative AI services on AWS and large language models’ (LLMs’) multi-modal capabilities, HCLTech’s AutoWise Companion provides a seamless and impactful experience.

In this post, we analyze the current industry challenges and guide readers through the AutoWise Companion solution functional flow and architecture design using built-in AWS services and open source tools. Additionally, we discuss the design from security and responsible AI perspectives, demonstrating how you can apply this solution to a wider range of industry scenarios.

Opportunities

Purchasing a vehicle is a crucial decision that can induce stress and uncertainty for customers. The following are some of the real-life challenges customers and manufacturers face:

  • Choosing the right brand and model – Even after narrowing down the brand, customers must navigate through a multitude of vehicle models and variants. Each model has different features, price points, and performance metrics, making it difficult to make a confident choice that fits their needs and budget.
  • Analyzing customer feedback – OEMs face the daunting task of sifting through extensive quality reporting tool (QRT) reports. These reports contain vast amounts of data, which can be overwhelming and time-consuming to analyze.
  • Aligning with customer sentiments – OEMs must align their findings from QRT reports with the actual sentiments of customers. Understanding customer satisfaction and areas needing improvement from raw data is complex and often requires advanced analytical tools.

HCLTech’s AutoWise Companion solution addresses these pain points, benefiting both customers and manufacturers by simplifying the decision-making process for customers and enhancing data analysis and customer sentiment alignment for manufacturers.

The solution extracts valuable insights from diverse data sources, including OEM transactions, vehicle specifications, social media reviews, and OEM QRT reports. By employing a multi-modal approach, the solution connects relevant data elements across various databases. Based on the customer query and context, the system dynamically generates text-to-SQL queries, summarizes knowledge base results using semantic search, and creates personalized vehicle brochures based on the customer’s preferences. This seamless process is facilitated by Retrieval Augmentation Generation (RAG) and a text-to-SQL framework.

Solution overview

The overall solution is divided into functional modules for both customers and OEMs.

Customer assist

Every customer has unique preferences, even when considering the same vehicle brand and model. The solution is designed to provide customers with a detailed, personalized explanation of their preferred features, empowering them to make informed decisions. The solution presents the following capabilities:

  • Natural language queries – Customers can ask questions in plain language about vehicle features, such as overall ratings, pricing, and more. The system is equipped to understand and respond to these inquiries effectively.
  • Tailored interaction – The solution allows customers to select specific features from an available list, enabling a deeper exploration of their preferred options. This helps customers gain a comprehensive understanding of the features that best suit their needs.
  • Personalized brochure generation – The solution considers the customer’s feature preferences and generates a customized feature explanation brochure (with specific feature images). This personalized document helps the customer gain a deeper understanding of the vehicle and supports their decision-making process.

OEM assist

OEMs in the automotive industry must proactively address customer complaints and feedback regarding various automobile parts. This comprehensive solution enables OEM managers to analyze and summarize customer complaints and reported quality issues across different categories, thereby empowering them to formulate data-driven strategies efficiently. This enhances decision-making and competitiveness in the dynamic automotive industry. The solution enables the following:

  • Insight summaries – The system allows OEMs to better understand the insightful summary presented by integrating and aggregating data from various sources, such as QRT reports, vehicle transaction sales data, and social media reviews.
  • Detailed view – OEMs can seamlessly access specific details about issues, reports, complaints, or data point in natural language, with the system providing the relevant information from the referred reviews data, transaction data, or unstructured QRT reports.

To better understand the solution, we use the seven steps shown in the following figure to explain the overall function flow.

flow map explaning the overall function flow

The overall function flow consists of the following steps:

  1. The user (customer or OEM manager) interacts with the system through a natural language interface to ask various questions.
  2. The system’s natural language interpreter, powered by a generative AI engine, analyzes the query’s context, intent, and relevant persona to identify the appropriate data sources.
  3. Based on the identified data sources, the respective multi-source query execution plan is generated by the generative AI engine.
  4. The query agent parses the execution plan and send queries to the respective query executor.
  5. Requested information is intelligently fetched from multiple sources such as company product metadata, sales transactions, OEM reports, and more to generate meaningful responses.
  6. The system seamlessly combines the collected information from the various sources, applying contextual understanding and domain-specific knowledge to generate a well-crafted, comprehensive, and relevant response for the user.
  7. The system generates the response for the original query and empowers the user to continue the interaction, either by asking follow-up questions within the same context or exploring new areas of interest, all while benefiting from the system’s ability to maintain contextual awareness and provide consistently relevant and informative responses.

Technical architecture

The overall solution is implemented using AWS services and LangChain. Multiple LangChain functions, such as CharacterTextSplitter and embedding vectors, are used for text handling and embedding model invocations. In the application layer, the GUI for the solution is created using Streamlit in Python language. The app container is deployed using a cost-optimal AWS microservice-based architecture using Amazon Elastic Container Service (Amazon ECS) clusters and AWS Fargate.

The solution contains the following processing layers:

  • Data pipeline – The various data sources, such as sales transactional data, unstructured QRT reports, social media reviews in JSON format, and vehicle metadata, are processed, transformed, and stored in the respective databases.
  • Vector embedding and data cataloging – To support natural language query similarity matching, the respective data is vectorized and stored as vector embeddings. Additionally, to enable the natural language to SQL (text-to-SQL) feature, the corresponding data catalog is generated for the transactional data.
  • LLM (request and response formation) – The system invokes LLMs at various stages to understand the request, formulate the context, and generate the response based on the query and context.
  • Frontend application – Customers or OEMs interact with the solution using an assistant application designed to enable natural language interaction with the system.

The solution uses the following AWS data stores and analytics services:

The following figure depicts the technical flow of the solution.

details architecture design on aws

The workflow consists of the following steps:

  1. The user’s query, expressed in natural language, is processed by an orchestrated AWS Lambda
  2. The Lambda function tries to find the query match from the LLM cache. If a match is found, the response is returned from the LLM cache. If no match is found, the function invokes the respective LLMs through Amazon Bedrock. This solution uses LLMs (Anthropic’s Claude 2 and Claude 3 Haiku) on Amazon Bedrock for response generation. The Amazon Titan Embeddings G1 – Text LLM is used to convert the knowledge documents and user queries into vector embeddings.
  3. Based on the context of the query and the available catalog, the LLM identifies the relevant data sources:
    1. The transactional sales data, social media reviews, vehicle metadata, and more, are transformed and used for customers and OEM interactions.
    2. The data in this step is restricted and is only accessible for OEM personas to help diagnose the quality related issues and provide insights on the QRT reports. This solution uses Amazon Textract as a data extraction tool to extract text from PDFs (such as quality reports).
  4. The LLM generates queries (text-to-SQL) to fetch data from the respective data channels according to the identified sources.
  5. The responses from each data channel are assembled to generate the overall context.
  6. Additionally, to generate a personalized brochure, relevant images (described as text-based embeddings) are fetched based on the query context. Amazon OpenSearch Serverless is used as a vector database to store the embeddings of text chunks extracted from quality report PDFs and image descriptions.
  7. The overall context is then passed to a response generator LLM to generate the final response to the user. The cache is also updated.

Responsible generative AI and security considerations

Customers implementing generative AI projects with LLMs are increasingly prioritizing security and responsible AI practices. This focus stems from the need to protect sensitive data, maintain model integrity, and enforce ethical use of AI technologies. The AutoWise Companion solution uses AWS services to enable customers to focus on innovation while maintaining the highest standards of data protection and ethical AI use.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards that can be applied to user input and foundation model output as safety and privacy controls. By incorporating guardrails, the solution proactively steers users away from potential risks or errors, promoting better outcomes and adherence to established standards. In the automobile industry, OEM vendors usually apply safety filters for vehicle specifications. For example, they want to validate the input to make sure that the queries are about legitimate existing models. Amazon Bedrock Guardrails provides denied topics and contextual grounding checks to make sure the queries about non-existent automobile models are identified and denied with a custom response.

Security considerations

The system employs a RAG framework that relies on customer data, making data security the foremost priority. By design, Amazon Bedrock provides a layer of data security by making sure that customer data stays encrypted and protected and is neither used to train the underlying LLM nor shared with the model providers. Amazon Bedrock is in scope for common compliance standards, including ISO, SOC, CSA STAR Level 2, is HIPAA eligible, and customers can use Amazon Bedrock in compliance with the GDPR.

For raw document storage on Amazon S3, transactional data storage, and retrieval, these data sources are encrypted, and respective access control mechanisms are put in place to maintain restricted data access.

Key learnings

The solution offered the following key learnings:

  • LLM cost optimization – In the initial stages of the solution, based on the user query, multiple independent LLM calls were required, which led to increased costs and execution time. By using the AWS Glue Data Catalog, we have improved the solution to use a single LLM call to find the best source of relevant information.
  • LLM caching – We observed that a significant percentage of queries received were repetitive. To optimize performance and cost, we implemented a caching mechanism that stores the request-response data from previous LLM model invocations. This cache lookup allows us to retrieve responses from the cached data, thereby reducing the number of calls made to the underlying LLM. This caching approach helped minimize cost and improve response times.
  • Image to text – Generating personalized brochures based on customer preferences was challenging. However, the latest vision-capable multimodal LLMs, such as Anthropic’s Claude 3 models (Haiku and Sonnet), have significantly improved accuracy.

Industrial adoption

The aim of this solution is to help customers make an informed decision while purchasing vehicles and empowering OEM managers to analyze factors contributing to sales fluctuations and formulate corresponding targeted sales boosting strategies, all based on data-driven insights. The solution can also be adopted in other sectors, as shown in the following table.

Industry Solution adoption
Retail and ecommerce By closely monitoring customer reviews, comments, and sentiments expressed on social media channels, the solution can assist customers in making informed decisions when purchasing electronic devices.
Hospitality and tourism The solution can assist hotels, restaurants, and travel companies to understand customer sentiments, feedback, and preferences and offer personalized services.
Entertainment and media It can assist television, movie studios, and music companies to analyze and gauge audience reactions and plan content strategies for the future.

Conclusion

The solution discussed in this post demonstrates the power of generative AI on AWS by empowering customers to use natural language conversations to obtain personalized, data-driven insights to make informed decisions during the purchase of their vehicle. It also supports OEMs in enhancing customer satisfaction, improving features, and driving sales growth in a competitive market.

Although the focus of this post has been on the automotive domain, the presented approach holds potential for adoption in other industries to provide a more streamlined and fulfilling purchasing experience.

Overall, the solution demonstrates the power of generative AI to provide accurate information based on various structured and unstructured data sources governed by guardrails to help avoid unauthorized conversations. For more information, see the HCLTech GenAI Automotive Companion in AWS Marketplace.


About the Authors

Bhajan Deep Singh leads the AWS Gen AI/AIML Center of Excellence at HCL Technologies. He plays an instrumental role in developing proof-of-concept projects and use cases utilizing AWS’s generative AI offerings. He has successfully led numerous client engagements to deliver data analytics and AI/machine learning solutions. He holds AWS’s AI/ML Specialty, AI Practitioner certification and authors technical blogs on AI/ML services and solutions. With his expertise and leadership, he enables clients to maximize the value of AWS generative AI.

Mihir Bhambri works as AWS Senior Solutions Architect at HCL Technologies. He specializes in tailored Generative AI solutions, driving industry-wide innovation in sectors such as Financial Services, Life Sciences, Manufacturing, and Automotive. Leveraging AWS cloud services and diverse Large Language Models (LLMs) to develop multiple proof-of-concepts to support business improvements. He also holds AWS Solutions Architect Certification and has contributed to the research community by co-authoring papers and winning multiple AWS generative AI hackathons.

Yajuvender Singh is an AWS Senior Solution Architect at HCLTech, specializing in AWS Cloud and Generative AI technologies. As an AWS-certified professional, he has delivered innovative solutions across insurance, automotive, life science and manufacturing industries and also won multiple AWS GenAI hackathons in India and London. His expertise in developing robust cloud architectures and GenAI solutions, combined with his contributions to the AWS technical community through co-authored blogs, showcases his technical leadership.

Sara van de Moosdijk, simply known as Moose, is an AI/ML Specialist Solution Architect at AWS. She helps AWS partners build and scale AI/ML solutions through technical enablement, support, and architectural guidance. Moose spends her free time figuring out how to fit more books in her overflowing bookcase.

Jerry Li, is a Senior Partner Solution Architect at AWS Australia, collaborating closely with HCLTech in APAC for over four years. He also works with HCLTech Data & AI Center of Excellence team, focusing on AWS data analytics and generative AI skills development, solution building, and go-to-market (GTM) strategy.


About HCLTech

HCLTech is at the vanguard of generative AI technology, using the robust AWS Generative AI tech stack. The company offers cutting-edge generative AI solutions that are poised to revolutionize the way businesses and individuals approach content creation, problem-solving, and decision-making. HCLTech has developed a suite of readily deployable generative AI assets and solutions, encompassing the domains of customer experience, software development life cycle (SDLC) integration, and industrial processes.

Read More

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

The AWS global backbone network is the critical foundation enabling reliable and secure service delivery across AWS Regions. It connects our 34 launched Regions (with 108 Availability Zones), our more than 600 Amazon CloudFront POPs, and 41 Local Zones and 29 Wavelength Zones, providing high-performance, ultralow-latency connectivity for mission-critical services across 245 countries and territories.

This network requires continuous management through planning, maintenance, and real-time operations. Although most changes occur without incident, the dynamic nature and global scale of this system introduce the potential for unforeseen impacts on performance and availability. The complex interdependencies between network components make it challenging to predict the full scope and timing of these potential impacts, necessitating advanced risk assessment and mitigation strategies.

In this post, we show how you can use our enterprise graph machine learning (GML) framework GraphStorm to solve prediction challenges on large-scale complex networks inspired by our practices of exploring GML to mitigate the AWS backbone network congestion risk.

Problem statement

At its core, the problem we are addressing is how to safely manage and modify a complex, dynamic network while minimizing service disruptions (such as the risk of congestion, site isolation, or increased latency). Specifically, we need to predict how changes to one part of the AWS global backbone network might affect traffic patterns and performance across the entire system. In the case of congestive risk for example, we want to determine whether taking a link out of service is safe under varying demands. Key questions include:

  • Can the network handle customer traffic with remaining capacity?
  • How long before congestion appears?
  • Where will congestion likely occur?
  • How much traffic is at risk of being dropped?

This challenge of predicting and managing network disruptions is not unique to telecommunication networks. Similar problems arise in various complex networked systems across different industries. For instance, supply chain networks face comparable challenges when a key supplier or distribution center goes offline, necessitating rapid reconfiguration of logistics. In air traffic control systems, the closure of an airport or airspace can lead to complex rerouting scenarios affecting multiple flight paths. In these cases, the fundamental problem remains similar: how to predict and mitigate the ripple effects of localized changes in a complex, interconnected system where the relationships between components are not always straightforward or immediately apparent.

Today, teams at AWS operate a number of safety systems that maintain a high operational readiness bar, and work relentlessly on improving safety mechanisms and risk assessment processes. We conduct a rigorous planning process on a recurring basis to inform how we design and build our network, and maintain resiliency under various scenarios. We rely on simulations at multiple levels of detail to eliminate risks and inefficiencies from our designs. In addition, every change (no matter how small) is thoroughly tested before it is deployed into the network.

However, at the scale and complexity of the AWS backbone network, simulation-based approaches face challenges in real-time operational settings (such as expensive and time-consuming computational process), which impact the efficiency of network maintenance. To complement simulations, we are therefore investing in data-driven strategies that can scale to the size of the AWS backbone network without a proportional increase in computational time. In this post, we share our progress along this journey of model-assisted network operations.

Approach

In recent years, GML methods have achieved state-of-the-art performance in traffic-related tasks, such as routing, load balancing, and resource allocation. In particular, graph neural networks (GNNs) demonstrate an advantage over classical time series forecasting, due to their ability to capture structure information hidden in network topology and their capacity to generalize to unseen topologies when networks are dynamic.

In this post, we frame the physical network as a heterogeneous graph, where nodes represent entities in the networked system, and edges represent both demands between endpoints and actual traffic flowing through the network. We then apply GNN models to this heterogeneous graph for an edge regression task.

Unlike common GML edge regression that predicts a single value for an edge, we need to predict a time series of traffic on each edge. For this, we adopt the sliding-window prediction method. During training, we start from a time point T and use historical data in a time window of size W to predict the value at T+1. We then slide the window one step ahead to predict the value at T+2, and so on. During inference, we use predicted values rather than actual values to form the inputs in a time window as we slide the window forward, making the method an autoregressive sliding-window one. For a more detailed explanation of the principles behind this method, please refer to this link.

We train GNN models with historical demand and traffic data, along with other features (network incidents and maintenance events) by following the sliding-window method. We then use the trained model to predict future traffic on all links of the backbone network using the autoregressive sliding-window method because in a real application, we can only use the predicted values for next-step predictions.

In the next section, we show the result of adapting this method to AWS backbone traffic forecasting, for improving operational safety.

Applying GNN-based traffic prediction to the AWS backbone network

For the backbone network traffic prediction application at AWS, we need to ingest a number of data sources into the GraphStorm framework. First, we need the network topology (the graph). In our case, this is composed of devices and physical interfaces that are logically grouped into individual sites. One site may contain dozens of devices and hundreds of interfaces. The edges of the graph represent the fiber connections between physical interfaces on the devices (these are the OSI layer 2 links). For each interface, we measure the outgoing traffic utilization in bps and as a percentage of the link capacity. Finally, we have a traffic matrix that holds the traffic demands between any two pairs of sites. This is obtained using flow telemetry.

The ultimate goal of our application is to improve safety on the network. For this purpose, we measure the performance of traffic prediction along three dimensions:

  • First, we look at the absolute percentage error between the actual and predicted traffic on each link. We want this error metric to be low to make sure that our model actually learned the routing pattern of the network under varying demands and a dynamic topology.
  • Second, we quantify the model’s propensity for under-predicting traffic. It is critical to limit this behavior as much as possible because predicting traffic below its actual value can lead to increased operational risk.
  • Third, we quantify the model’s propensity for over-predicting traffic. Although this is not as critical as the second metric, it’s nonetheless important to address over-predictions because they slow down maintenance operations.

We share some of our results for a test conducted on 85 backbone segments, over a 2-week period. Our traffic predictions are at a 5-minute time resolution. We trained our model on 2 weeks of data and ran the inference on a 6-hour time window. Using GraphStorm, training took less than 1 hour on an m8g.12xlarge instance for the entire network, and inference took under 2 seconds per segment, for the entire 6-hour window. In contrast, simulation-based traffic prediction requires dozens of instances for a similar network sample, and each simulation takes more than 100 seconds to go through the various scenarios.

In terms of the absolute percentage error, we find that our p90 (90th percentile) to be on the order of 13%. This means that 90% of the time, the model’s prediction is less than 13% away from the actual traffic. Because this is an absolute metric, the model’s prediction can be either above or below the network traffic. Compared to classical time series forecasting with XGBoost, our approach yields a 35% improvement.

Next, we consider all the time intervals in which the model under-predicted traffic. We find the p90 in this case to be below 5%. This means that, in 90% of the cases when the model under-predicts traffic, the deviation from the actual traffic is less than 5%.

Finally, we look at all the time intervals in which the model over-predicted traffic (again, this is to evaluate permissiveness for maintenance operations). We find the p90 in this case to be below 14%. This means that, in 90% of the cases when the model over-predicted traffic, the deviation from the actual traffic was less than 14%.

These measurements demonstrate how we can tune the performance of the model to value safety above the pace of routine operations.

Finally, in this section, we provide a visual representation of the model output around a maintenance operation. This operation consists of removing a segment of the network out of service for maintenance. As shown in the following figure, the model is able to predict the changing nature of traffic on two different segments: one where traffic increases sharply as a result of the operation (left) and the second referring to the segment that was taken out of service and where traffic drops to zero (right).

backbone performance left backbone performance right

An example for GNN-based traffic prediction with synthetic data

Unfortunately, we can’t share the details about the AWS backbone network including the data we used to train the model. To still provide you with some code that makes it straightforward to get started solving your network prediction problems, we share a synthetic traffic prediction problem instead. We have created a Jupyter notebook that generates synthetic airport traffic data. This dataset simulates a global air transportation network using major world airports, creating fictional airlines and flights with predefined capacities. The following figure illustrates these major airports and the simulated flight routes derived from our synthetic data.

world map with airlines

Our synthetic data includes: major world airports, simulated airlines and flights with predefined capacities for cargo demands, and generated air cargo demands between airport pairs, which will be delivered by simulated flights.

We employ a simple routing policy to distribute these demands evenly across all shortest paths between two airports. This policy is intentionally hidden from our model, mimicking the real-world scenarios where the exact routing mechanisms are not always known. If flight capacity is insufficient to meet incoming demands, we simulate the excess as inventory stored at the airport. The total inventory at each airport serves as our prediction target. Unlike real air transportation networks, we didn’t follow a hub-and-spoke topology. Instead, our synthetic network uses a point-to-point structure. Using this synthetic air transportation dataset, we now demonstrate a node time series regression task, predicting the total inventory at each airport every day. As illustrated in the following figure, the total inventory amount at an airport is influenced by its own local demands, the traffic passing through it, and the capacity that it can output. By design, the output capacity of an airport is limited to make sure that most airport-to-airport demands require multiple-hop fulfillment.

airport inventory explanation

In the remainder of this section, we cover the data preprocessing steps necessary for using the GraphStorm framework, before customizing a GNN model for our application. Towards the end of the post, we also provide an architecture for an operational safety system built using GraphStorm and in an environment of AWS services.

Data preprocessing for graph time series forecasting

To use GraphStorm for node time series regression, we need to structure our synthetic air traffic dataset according to GraphStorm’s input data format requirements. This involves preparing three key components: a set of node tables, a set of edge tables, and a JSON file describing the dataset.

We abstract the synthetic air traffic network into a graph with one node type (airport) and two edge types. The first edge type, airport, demand, airport, represents demand between any pair of airports. The second one, airport, traffic, airport, captures the amount of traffic sent between connected airports.

The following diagram illustrates this graph structure.

Our airport nodes have two types of associated features: static features (longitude and latitude) and time series features (daily total inventory amount). For each edge, the src_code and dst_code capture the source and destination airport codes. The edge features also include a demand and a traffic time series. Finally, edges for connected airports also hold the capacity as a static feature.

The synthetic data generation notebook also creates a JSON file, which describes the air traffic data and provides instructions for GraphStorm’s graph construction tool to follow. Using these artifacts, we can employ the graph construction tool to convert the air traffic graph data into a distributed DGL graph. In this format:

  • Demand and traffic time series data is stored as E*T tensors in edges, where E is the number of edges of a given type, and T is the number of days in our dataset.
  • Inventory amount time series data is stored as an N*T tensor in nodes, where N is the number of airport nodes.

This preprocessing step makes sure our data is optimally structured for time series forecasting using GraphStorm.

Model

To predict the next total inventory amount for each airport, we employ GNN models, which are well-suited for capturing these complex relationships. Specifically, we use GraphStorm’s Relational Graph Convolutional Network (RGCN) module as our GNN model. This allows us to effectively pass information (demands and traffic) among airports in our network. To support the sliding-window prediction method we described earlier, we created a customized RGCN model.

The detailed implementation of the node time series regression model can be found in the Python file. In the following sections, we explain a few key implementation points.

Customized RGCN model

The GraphStorm v0.4 release adds support for edge features. This means that we can use a for-loop to iterate along the T dimensions in the time series tensor, thereby implementing the sliding-window method in the forward() function during model training, as shown in the following pseudocode:

def forward(self, ......):
    ......
    # ---- Process Time Series Data Step by Step Using Sliding Windows ---- #
    for step in range(0, (self._ts_size - self._window_size)):
       # extract one step time series feature based on time window arguments 
       ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size, step)
       ......
       # extract one step time series labels
       new_labels = get_ts_labels(labels, self._ts_size, self._window_size, step)
       ......
       # compute loss per window
       step_loss = self.model(ts_feats, new_labels)
    # sum all step losses and average them
    ts_loss = sum(step_losses) / len(step_losses)

The actual code of the forward() function is in the following code snippet.

In contrast, because the inference step needs to use the autoregressive sliding-window method, we implement a one-step prediction function in the predict() routine:

def predict(self, ....., use_ar=False, predict_step=-1):
    ......
    # ---- Use Autoregressive Method in Inference ---- 
    # It is inferrer's resposibility to provide the ``predict_step`` value.
    if use_ar:
        # extract one step time series feature based on the given predict_step
        ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size,
                                         predict_step)
        ......
        # compute prediction only
        predi = self.model(ts_feats)
    else:
        # ------------- Same as Forward() method ------------- #
        ......

The actual code of the predict() function is in the following code snippet.

Customized node trainer

GraphStorm’s default node trainer (GSgnnNodePredctionTrainer), which handles the model training loop, can’t process the time series feature requirement. Therefore, we implement a customized node trainer by inheriting the GSgnnNodePredctionTrainer and use our own customized node_mini_batch_gnn_predict() method. This is shown in the following code snippet.

Customized node_mini_batch_predict() method

The customized node_mini_batch_predict() method calls the customized model’s predict() method, passing the two additional arguments that are specific to our use case. These are used to determine whether the autoregressive property is used or not, along with the current prediction step for appropriate indexing (see the following code snippet).

Customized node predictor (inferrer)

Similar to the node trainer, GraphStorm’s default node inference class, which drives the inference pipeline (GSgnnNodePredictionInferrer), can’t handle the time series feature processing we need in this application. We therefore create a customized node inferrer by inheriting GSgnnNodePredictionInferrer, and add two specific arguments. In this customized inferrer, we use a for-loop to iterate over the T dimensions of the time series feature tensor. Unlike the for-loop we used in model training, the inference loop uses the predicted values in subsequent prediction steps (this is shown in the following code snippet).

So far, we have focused on the node prediction example with our dataset and modeling. However, our approach allows for various other prediction tasks, such as:

  • Forecasting traffic between specific airport pairs.
  • More complex scenarios like predicting potential airport congestion or increased utilization of alternative routes when reducing or eliminating flights between certain airports.

With the customized model and pipeline classes, we can use the following Jupyter notebook to run the overall training and inference pipeline for our airport inventory amount prediction task. We encourage you to explore these possibilities, adapt the provided example to your specific use cases or research interests, and refer to our Jupyter notebooks for a comprehensive understanding of how to use GraphStorm APIs for various GML tasks.

System architecture for GNN-based network traffic prediction

In this section, we propose a system architecture for enhancing operational safety within a complex network, such as the ones we discussed earlier. Specifically, we employ GraphStorm within an AWS environment to build, train, and deploy graph models. The following diagram shows the various components we need to achieve the safety functionality.

system architecture

The complex system in question is represented by the network shown at the bottom of the diagram, overlaid on the map of the continental US. This network emits telemetry data that can be stored on Amazon Simple Storage Service (Amazon S3) in a dedicated bucket. The evolving topology of the network should also be extracted and stored.

On the top right of the preceding diagram, we show how Amazon Elastic Compute Cloud (Amazon EC2) instances can be configured with the necessary GraphStorm dependencies using direct access to the project’s GitHub repository. After they’re configured, we can build GraphStorm Docker images on them. These images then can be put on Amazon Elastic Container Registry (Amazon ECR) and be made available to other services (for example, Amazon SageMaker).

During training, SageMaker jobs use those instances along with the network data to train a traffic prediction model such as the one we demonstrated in this post. The trained model can then be stored on Amazon S3. It might be necessary to repeat this training process periodically, to make sure that the model’s performance keeps up with changes to the network dynamics (such as modifications to the routing schemes).

Above the network representation, we show two possible actors: operators and automation systems. These actors call on a network safety API implemented in AWS Lambda to make sure that the actions they intend to take are safe for the anticipated time horizon (for example, 1 hour, 6 hours, 24 hours). To provide an answer, the Lambda function uses the on-demand inference capabilities of SageMaker. During inference, SageMaker uses the pre-trained model to produce the necessary traffic predictions. These predictions can also be stored on Amazon S3 to continuously monitor the model’s performance over time, triggering training jobs when significant drift is detected.

Conclusion

Maintaining operational safety for the AWS backbone network, while supporting the dynamic needs of our global customer base, is a unique challenge. In this post, we demonstrated how the GML framework GraphStorm can be effectively applied to predict traffic patterns and potential congestion risks in such complex networks. By framing our network as a heterogeneous graph and using GNNs, we’ve shown that it’s possible to capture the intricate interdependencies and dynamic nature of network traffic. Our approach, tested on both synthetic data and the actual AWS backbone network, has demonstrated significant improvements over traditional time series forecasting methods, with a 35% reduction in prediction error compared to classical approaches like XGBoost.

The proposed system architecture, integrating GraphStorm with various AWS services like Amazon S3, Amazon EC2, SageMaker, and Lambda, provides a scalable and efficient framework for implementing this approach in production environments. This setup allows for continuous model training, rapid inference, and seamless integration with existing operational workflows.

We will keep you posted about our progress in taking our solution to production, and share the benefit for AWS customers.

We encourage you to explore the provided Jupyter notebooks, adapt our approach to your specific use cases, and contribute to the ongoing development of graph-based ML techniques for managing complex networked systems. To learn how to use GraphStorm to solve a broader class of ML problems on graphs, see the GitHub repo.


About the Authors

Jian Zhang is a Senior Applied Scientist who has been using machine learning techniques to help customers solve various problems, such as fraud detection, decoration image generation, and more. He has successfully developed graph-based machine learning, particularly graph neural network, solutions for customers in China, the US, and Singapore. As an enlightener of AWS graph capabilities, Zhang has given many public presentations about GraphStorm, the GNN, the Deep Graph Library (DGL), Amazon Neptune, and other AWS services.

Fabien Chraim is a Principal Research Scientist in AWS networking. Since 2017, he’s been researching all aspects of network automation, from telemetry and anomaly detection to root causing and actuation. Before Amazon, he co-founded and led research and development at Civil Maps (acquired by Luminar). He holds a PhD in electrical engineering and computer sciences from UC Berkeley.

Patrick Taylor is a Senior Data Scientist in AWS networking. Since 2020, he has focused on impact reduction and risk management in networking software systems and operations research in networking operations teams. Previously, Patrick was a data scientist specializing in natural language processing and AI-driven insights at Hyper Anna (acquired by Alteryx) and holds a Bachelor’s degree from the University of Sydney.

Xiang Song is a Senior Applied Scientist at AWS AI Research and Education (AIRE), where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.

Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.

Read More

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

With the general availability of Amazon Bedrock Agents, you can rapidly develop generative AI applications to run multi-step tasks across a myriad of enterprise systems and data sources. However, some geographies and regulated industries bound by data protection and privacy regulations have sought to combine generative AI services in the cloud with regulated data on premises. In this post, we show how to extend Amazon Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.

Solution overview

For organizations processing or storing sensitive information such as personally identifiable information (PII), customers have asked for AWS Global Infrastructure to address these specific localities, including mechanisms to make sure that data is being stored and processed in compliance with local laws and regulations. Through AWS hybrid and edge services such as Local Zones and Outposts, you can benefit from the scalability and flexibility of the AWS Cloud with the low latency and local processing capabilities of an on-premises (or localized) infrastructure. This hybrid approach allows organizations to run applications and process data closer to the source, reducing latency, improving responsiveness for time-sensitive workloads, and adhering to data regulations.

Although architecting for data residency with an Outposts rack and Local Zone has been broadly discussed, generative AI and FMs introduce an additional set of architectural considerations. As generative AI models become increasingly powerful and ubiquitous, customers have asked us how they might consider deploying models closer to the devices, sensors, and end users generating and consuming data. Moreover, interest in small language models (SLMs) that enable resource-constrained devices to perform complex functions—such as natural language processing and predictive automation—is growing. To learn more about opportunities for customers to use SLMs, see Opportunities for telecoms with small language models: Insights from AWS and Meta on our AWS Industries blog.

Beyond SLMs, the interest in generative AI at the edge has been driven by two primary factors:

  • Latency – Running these computationally intensive models on an edge infrastructure can significantly reduce latency and improve real-time responsiveness, which is critical for many time-sensitive applications like virtual assistants, augmented reality, and autonomous systems.
  • Privacy and security – Processing sensitive data at the edge, rather than sending it to the cloud, can enhance privacy and security by minimizing data exposure. This is particularly useful in healthcare, financial services, and legal sectors.

In this post, we cover two primary architectural patterns: fully local RAG and hybrid RAG.

Fully local RAG

For the deployment of a large language model (LLM) in a RAG use case on an Outposts rack, the LLM will be self-hosted on a G4dn instance and knowledge bases will be created on the Outpost rack, using either Amazon Elastic Block Storage (Amazon EBS) or Amazon S3 on Outposts. The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they won’t be transferred to the AWS Region and will remain completely local on the Outpost rack. You can use a local vector database either hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using Amazon Relational Database Service (Amazon RDS) for PostgreSQL on the Outpost rack with the pgvector extension to store embeddings. See the following figure for an example.

Local RAG Concept Diagram

Hybrid RAG

Certain customers are required by data protection or privacy regulations to keep their data within specific state boundaries. To align with these requirements and still use such data for generative AI, customers with hybrid and edge environments need to host their FMs in both a Region and at the edge. This setup enables you to use data for generative purposes and remain compliant with security regulations. To orchestrate the behavior of such a distributed system, you need a system that can understand the nuances of your prompt and direct you to the right FM running in a compliant environment. Amazon Bedrock Agents makes this distributed system in hybrid systems possible.

Amazon Bedrock Agents enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. The orchestration includes the ability to invoke AWS Lambda functions to invoke other FMs, opening the ability to run self-managed FMs at the edge. With this mechanism, you can build distributed RAG applications for highly regulated industries subject to data residency requirements. In the hybrid deployment scenario, in response to a customer prompt, Amazon Bedrock can perform some actions in a specified Region and defer other actions to a self-hosted FM in a Local Zone. The following example illustrates the hybrid RAG high-level architecture.

Hybrid RAG Concept Diagram

In the following sections, we dive deep into both solutions and their implementation.

Fully local RAG: Solution deep dive

To start, you need to configure your virtual private cloud (VPC) with an edge subnet on the Outpost rack. To create an edge subnet on the Outpost, you need to find the Outpost Amazon Resource Name (ARN) on which you want to create the subnet, as well as the Availability Zone of the Outpost. After you create the internet gateway, route tables, and subnet associations, launch a series of EC2 instances on the Outpost rack to run your RAG application, including the following components.

  • Vector store –  To support RAG (Retrieval-Augmented Generation), deploy an open-source vector database, such as ChromaDB or Faiss, on an EC2 instance (C5 family) on AWS Outposts. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base. Your selected embedding model will be used to convert text (both documents and queries) into these vector representations, enabling efficient storage and retrieval. The actual Knowledge Base consists of the original text documents and their corresponding vector representations stored in the vector database. To query this knowledge base and generate a response based on the retrieved results, you can use LangChain to chain the related documents retrieved by the vector search to the prompt fed to your Large Language Model (LLM). This approach allows for retrieval and integration of relevant information into the LLM’s generation process, enhancing its responses with local, domain-specific knowledge.
  • Chatbot application – On a second EC2 instance (C5 family), deploy the following two components: a backend service responsible for ingesting prompts and proxying the requests back to the LLM running on the Outpost, and a simple React application that allows users to prompt a local generative AI chatbot with questions.
  • LLM or SLM– On a third EC2 instance (G4 family), deploy an LLM or SLM to conduct edge inferencing via popular frameworks such as Ollama. Additionally, you can use ModelBuilder using the SageMaker SDK to deploy to a local endpoint, such as an EC2 instance running at the edge.

Optionally, your underlying proprietary data sources can be stored on Amazon Simple Storage Service (Amazon S3) on Outposts or using Amazon S3-compatible solutions running on Amazon EC2 instances with EBS volumes.

The components intercommunicate through the traffic flow illustrated in the following figure.

Loc

The workflow consists of the following steps:

  1. Using the frontend application, the user uploads documents that will serve as the knowledge base and are stored in Amazon EBS on the Outpost rack. These documents are chunked by the application and are sent to the embedding model.
  2. The embedding model, which is hosted on the same EC2 instance as the local LLM API inference server, converts the text chunks into vector representations.
  3. The generated embeddings are sent to the vector database and stored, completing the knowledge base creation.
  4. Through the frontend application, the user prompts the chatbot interface with a question.
  5. The prompt is forwarded to the local LLM API inference server instance, where the prompt is tokenized and is converted into a vector representation using the local embedding model.
  6. The question’s vector representation is sent to the vector database where a similarity search is performed to get matching data sources from the knowledge base.
  7. After the local LLM has the query and the relevant context from the knowledge base, it processes the prompt, generates a response, and sends it back to the chatbot application.
  8. The chatbot application presents the LLM response to the user through its interface.

To learn more about the fully local RAG application or get hands-on with the sample application, see Module 2 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Hybrid RAG: Solution deep dive

To start, you need to configure a VPC with an edge subnet, either corresponding to an Outpost rack or Local Zone depending on the use case. After you create the internet gateway, route tables, and subnet associations, launch an EC2 instance on the Outpost rack (or Local Zone) to run your hybrid RAG application. On the EC2 instance itself, you can reuse the same components as the fully local RAG: a vector store, backend API server, embedding model and a local LLM.

In this architecture, we rely heavily on managed services such as Lambda and Amazon Bedrock because only select FMs and knowledge bases corresponding to the heavily regulated data, rather than the orchestrator itself, are required to live at the edge. To do so, we will extend the existing Amazon Bedrock Agents workflows to the edge using a sample FM-powered customer service bot.

In this example customer service bot, we’re a shoe retailer bot that provides customer service support for purchasing shoes by providing options in a human-like conversation. We also assume that the knowledge base surrounding the practice of shoemaking is proprietary and, therefore, resides at the edge. As a result, questions surrounding shoemaking will be addressed by the knowledge base and local FM running at the edge.

To make sure that the user prompt is effectively proxied to the right FM, we rely on Amazon Bedrock Agents action groups. An action group defines actions that the agent can perform, such as place_order or check_inventory. In our example, we could define an additional action within an existing action group called hybrid_rag or learn_shoemaking that specifically addresses prompts that can only be addressed by the AWS hybrid and edge locations.

As part of the agent’s InvokeAgent API, an agent interprets the prompt (such as “How is leather used for shoemaking?”) with an FM and generates a logic for the next step it should take, including a prediction for the most prudent action in an action group. In this example, we want the prompt, “Hello, I would like recommendations to purchase some shoes.” to be directed to the /check_inventory action group, whereas the prompt, “How is leather used for shoemaking?” could be directed to the /hybrid_rag action group.

The following diagram illustrates this orchestration, which is implemented by the orchestration phase of the Amazon Bedrock agent.

Hybrid RAG Reference Architecture

To create the additional edge-specific action group, the new OpenAPI schema must reflect the new action, hybrid_rag with a detailed description, structure, and parameters that define the action in the action group as an API operation specifically focused on a data domain only available in a specific edge location.

After you define an action group using the OpenAPI specification, you can define a Lambda function to program the business logic for an action group. This Lambda handler (see the following code) might include supporting functions (such as queryEdgeModel) for the individual business logic corresponding to each action group.

def lambda_handler(event, context):
    responses = []
    global cursor
    if cursor == None:
        cursor = load_data()
    id = ''
    api_path = event['apiPath']
    logger.info('API Path')
    logger.info(api_path)
    
    if api_path == '/customer/{CustomerName}':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "CustomerName":
                cName = parameter["value"]
        body = return_customer_info(cName)
    elif api_path == '/place_order':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "ShoeID":
                id = parameter["value"]
            if parameter["name"] == "CustomerID":
                cid = parameter["value"]
        body = place_shoe_order(id, cid)
    elif api_path == '/check_inventory':
        body = return_shoe_inventory()
    elif api_path == "/hybrid_rag":
        prompt = event['parameters'][0]["value"]
        body = queryEdgeModel(prompt)
        response_body = {"application/json": {"body": str(body)}}
        response_code = 200
    else:
        body = {"{} is not a valid api, try another one.".format(api_path)}

    response_body = {
        'application/json': {
            'body': json.dumps(body)
        }
    }

However, in the action group corresponding to the edge LLM (as seen in the code below), the business logic won’t include Region-based FM invocations, such as using Amazon Bedrock APIs. Instead, the customer-managed endpoint will be invoked, for example using the private IP address of the EC2 instance hosting the edge FM in a Local Zone or Outpost. This way, AWS native services such as Lambda and Amazon Bedrock can orchestrate complicated hybrid and edge RAG workflows.

def queryEdgeModel(prompt):
    import urllib.request, urllib.parse
    # Composing a payload for API
    payload = {'text': prompt}
    data = json.dumps(payload).encode('utf-8')
    headers = {'Content-type': 'application/json'}
    
    # Sending a POST request to the edge server
    req = urllib.request.Request(url="http://<your-private-ip-address>:5000/", data=data, headers=headers, method='POST')
    with urllib.request.urlopen(req) as response:
        response_text = response.read().decode('utf-8')
        return response_text

After the solution is fully deployed, you can visit the chat playground feature on the Amazon Bedrock Agents console and ask the question, “How are the rubber heels of shoes made?” Even though most of the prompts will be be exclusively focused on retail customer service operations for ordering shoes, the native orchestration support by Amazon Bedrock Agents seamlessly directs the prompt to your edge FM running the LLM for shoemaking.

To learn more about this hybrid RAG application or get hands-on with the cross-environment application, refer to Module 1 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Conclusion

In this post, we demonstrated how to extend Amazon Bedrock Agents to AWS hybrid and edge services, such as Local Zones or Outposts, to build distributed RAG applications in highly regulated industries subject to data residency requirements. Moreover, for 100% local deployments to align with the most stringent data residency requirements, we presented architectures converging the knowledge base, compute, and LLM within the Outposts hardware itself.

To get started with both architectures, visit AWS Workshops. To get started with our newly released workshop, see Hands-on with Generative AI on AWS Hybrid & Edge Services. Additionally, check out other AWS hybrid cloud solutions or reach out to your local AWS account team to learn how to get started with Local Zones or Outposts.


About the Authors

Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS edge computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking, and the edge cloud.

Aditya Lolla is a Sr. Hybrid Edge Specialist Solutions architect at Amazon Web Services. He assists customers across the world with their migration and modernization journey from on-premises environments to the cloud and also build hybrid architectures on AWS Edge infrastructure. Aditya’s areas of interest include private networks, public and private cloud platforms, multi-access edge computing, hybrid and multi cloud strategies and computer vision applications.

Read More

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Large language model (LLM) based AI agents that have been specialized for specific tasks have demonstrated great problem-solving capabilities. By combining the reasoning power of multiple intelligent specialized agents, multi-agent collaboration has emerged as a powerful approach to tackle more intricate, multistep workflows.

The concept of multi-agent systems isn’t entirely new—it has its roots in distributed artificial intelligence research dating back to the 1980s. However, with recent advancements in LLMs, the capabilities of specialized agents have significantly expanded in areas such as reasoning, decision-making, understanding, and generation through language and other modalities. For instance, a single attraction research agent can perform web searches and list potential destinations based on user preferences. By creating a network of specialized agents, we can combine the strengths of multiple specialist agents to solve increasingly complex problems, such as creating and optimizing an entire travel plan by considering weather forecasts in nearby cities, traffic conditions, flight and hotel availability, restaurant reviews, attraction ratings, and more.

The research team at AWS has worked extensively on building and evaluating the multi-agent collaboration (MAC) framework so customers can orchestrate multiple AI agents on Amazon Bedrock Agents. In this post, we explore the concept of multi-agent collaboration (MAC) and its benefits, as well as the key components of our MAC framework. We also go deeper into our evaluation methodology and present insights from our studies. More technical details can be found in our technical report.

Benefits of multi-agent systems

Multi-agent collaboration offers several key advantages over single-agent approaches, primarily stemming from distributed problem-solving and specialization.

Distributed problem-solving refers to the ability to break down complex tasks into smaller subtasks that can be handled by specialized agents. By breaking down tasks, each agent can focus on a specific aspect of the problem, leading to more efficient and effective problem-solving. For example, a travel planning problem can be decomposed into subtasks such as checking weather forecasts, finding available hotels, and selecting the best routes.

The distributed aspect also contributes to the extensibility and robustness of the system. As the scope of a problem increases, we can simply add more agents to extend the capability of the system rather than try to optimize a monolithic agent packed with instructions and tools. On robustness, the system can be more resilient to failures because multiple agents can compensate for and even potentially correct errors produced by a single agent.

Specialization allows each agent to focus on a specific area within the problem domain. For example, in a network of agents working on software development, a coordinator agent can manage overall planning, a programming agent can generate correct code and test cases, and a code review agent can provide constructive feedback on the generated code. Each agent can be designed and customized to excel at a specific task.

For developers building agents, this means the workload of designing and implementing an agentic system can be organically distributed, leading to faster development cycles and better quality. Within enterprises, often development teams have distributed expertise that is ideal for developing specialist agents. Such specialist agents can be further reused by other teams across the entire organization.

In contrast, developing a single agent to perform all subtasks would require the agent to plan the problem-solving strategy at a high level while also keeping track of low-level details. For example, in the case of travel planning, the agent would need to maintain a high-level plan for checking weather forecasts, searching for hotel rooms and attractions, while simultaneously reasoning about the correct usage of a set of hotel-searching APIs. This single-agent approach can easily lead to confusion for LLMs because long-context reasoning becomes challenging when different types of information are mixed. Later in this post, we provide evaluation data points to illustrate the benefits of multi-agent collaboration.

A hierarchical multi-agent collaboration framework

The MAC framework for Amazon Bedrock Agents starts from a hierarchical approach and expands to other mechanisms in the future. The framework consists of several key components designed to optimize performance and efficiency.

Here’s an explanation of each of the components of the multi-agent team:

  • Supervisor agent – This is an agent that coordinates a network of specialized agents. It’s responsible for organizing the overall workflow, breaking down tasks, and assigning subtasks to specialist agents. In our framework, a supervisor agent can assign and delegate tasks, however, the responsibility of solving the problem won’t be transferred.
  • Specialist agents – These are agents with specific expertise, designed to handle particular aspects of a given problem.
  • Inter-agent communication – Communication is the key component of multi-agent collaboration, allowing agents to exchange information and coordinate their actions. We use a standardized communication protocol that allows the supervisor agents to send and receive messages to and from the specialist agents.
  • Payload referencing – This mechanism enables efficient sharing of large content blocks (like code snippets or detailed travel itineraries) between agents, significantly reducing communication overhead. Instead of repeatedly transmitting large pieces of data, agents can reference previously shared payloads using unique identifiers. This feature is particularly valuable in domains such as software development.
  • Routing mode – For simpler tasks, this mode allows direct routing to specialist agents, bypassing the full orchestration process to improve efficiency for latency-sensitive applications.

The following figure shows inter-agent communication in an interactive application. The user first initiates a request to the supervisor agent. After coordinating with the subagents, the supervisor agent returns a response to the user.

Evaluation of multi-agent collaboration: A comprehensive approach

Evaluating the effectiveness and efficiency of multi-agent systems presents unique challenges due to several complexities:

  1. Users can follow up and provide additional instructions to the supervisor agent.
  2. For many problems, there are multiple ways to resolve them.
  3. The success of a task often requires an agentic system to correctly perform multiple subtasks.

Conventional evaluation methods based on matching ground-truth actions or states often fall short in providing intuitive results and insights. To address this, we developed a comprehensive framework that calculates success rates based on automatic judgments of human-annotated assertions. We refer to this approach as “assertion-based benchmarking.” Here’s how it works:

  • Scenario creation – We create a diverse set of scenarios across different domains, each with specific goals that an agent must achieve to obtain success.
  • Assertions – For each scenario, we manually annotate a set of assertions that must be true for the task to be considered successful. These assertions cover both user-observable outcomes and system-level behaviors.
  • Agent and user simulation We simulate the behavior of the agent in a sandbox environment, where the agent is asked to solve the problems described in the scenarios. Whenever user interaction is required, we use an independent LLM-based user simulator to provide feedback.
  • Automated evaluation – We use an LLM to automatically judge whether each assertion is true based on the conversation transcript.
  • Human evaluation – Instead of using LLMs, we ask humans to directly judge the success based on simulated trajectories.

Here is an example of a scenario and corresponding assertions for assertion-based benchmarking:

  • Goals:
    • User needs the weather conditions expected in Las Vegas for tomorrow, January 5, 2025.
    • User needs to search for a direct flight from Denver International Airport to McCarran International Airport, Las Vegas, departing tomorrow morning, January 5, 2025.
  • Assertions:
    • User is informed about the weather forecast for Las Vegas tomorrow, January 5, 2025.
    • User is informed about the available direct flight options for a trip from Denver International Airport to McCarran International Airport in Las Vegas for tomorrow, January 5, 2025.
      get_tomorrow_weather_by_city is triggered to find information on the weather conditions expected in Las Vegas tomorrow, January 5, 2025.
    • search_flights is triggered to search for a direct flight from Denver International Airport to McCarran International Airport departing tomorrow, January 5, 2025.

For better user simulation, we also include additional contextual information as part of the scenario. A multi-agent collaboration trajectory is judged as successful only when all assertions are met.

Key metrics

Our evaluation framework focuses on evaluating a high-level success rate across multiple tasks to provide a holistic view of system performance:

Goal success rate (GSR) – This is our primary measure of success, indicating the percentage of scenarios where all assertions were evaluated as true. The overall GSR is aggregated into a single number for each problem domain.

Evaluation results

The following table shows the evaluation results of multi-agent collaboration on Amazon Bedrock Agents across three enterprise domains (travel planning, mortgage financing, and software development):

Dataset Overall GSR
Automatic evaluation  Travel planning  87%
 Mortgage financing  90%
 Software development  77%
Human evaluation  Travel planning  93%
 Mortgage financing  97%
 Software development  73%

All experiments are conducted in a setting where the supervisor agents are driven by Anthropic’s Claude 3.5 Sonnet models.

Comparing to single-agent systems

We also conducted an apples-to-apples comparison with the single-agent approach under equivalent settings. The MAC approach achieved a 90% success rate across all three domains. In contrast, the single-agent approach scored 60%, 80%, and 53% in the travel planning, mortgage financing, and software development datasets, respectively, which are significantly lower than the multi-agent approach. Upon analysis, we found that when presented with many tools, a single agent tended to hallucinate tool calls and failed to reject some out-of-scope requests. These results highlight the effectiveness of our multi-agent system in handling complex, real-world tasks across diverse domains.

To understand the reliability of the automatic judgments, we conducted a human evaluation on the same scenarios to investigate the correlation between the model and human judgments and found high correlation on end-to-end GSR.

Comparison with other frameworks

To understand how our MAC framework stacks up against existing solutions, we conducted a comparative analysis with a widely adopted open source framework (OSF) under equivalent conditions, with Anthropic’s Claude 3.5 Sonnet driving the supervisor agent and Anthropic’s Claude 3.0 Sonnet driving the specialist agents. The results are summarized in the following figure:

These results demonstrate a significant performance advantage for our MAC framework across all the tested domains.

Best practices for building multi-agent systems

The design of multi-agent teams can significantly impact the quality and efficiency of problem-solving across tasks. Among the many lessons we learned, we found it crucial to carefully design team hierarchies and agent roles.

Design multi-agent hierarchies based on performance targets
It’s important to design the hierarchy of a multi-agent team by considering the priorities of different targets in a use case, such as success rate, latency, and robustness. For example, if the use case involves building a latency-sensitive customer-facing application, it might not be ideal to include too many layers of agents in the hierarchy because routing requests through multiple tertiary agents can add unnecessary delays. Similarly, to optimize latency, it’s better to avoid agents with overlapping functionalities, which can introduce inefficiencies and slow down decision-making.

Define agent roles clearly
Each agent must have a well-defined area of expertise. On Amazon Bedrock Agents, this can be achieved through collaborator instructions when configuring multi-agent collaboration. These instructions should be written in a clear and concise manner to minimize ambiguity. Moreover, there should be no confusion in the collaborator instructions across multiple agents because this can lead to inefficiencies and errors in communication.

The following is a clear, detailed instruction:

Trigger this agent for 1) searching for hotels in a given location, 2) checking availability of one or multiple hotels, 3) checking amenities of hotels, 4) asking for price quote of one or multiple hotels, and 5) answering questions of check-in/check-out time and cancellation policy of specific hotels.

The following instruction is too brief, making it unclear and ambiguous.

Trigger this agent for helping with accommodation.

The second, unclear, example can lead to confusion and lower collaboration efficiency when multiple specialist agents are involved. Because the instruction doesn’t explicitly define the capabilities of the hotel specialist agent, the supervisor agent may overcommunicate, even when the user query is out of scope.

Conclusion

Multi-agent systems represent a powerful paradigm for tackling complex real-world problems. By using the collective capabilities of multiple specialized agents, we demonstrate that these systems can achieve impressive results across a wide range of domains, outperforming single-agent approaches.

Multi-agent collaboration provides a framework for developers to combine the reasoning power of numerous AI agents powered by LLMs. As we continue to push the boundaries of what is possible, we can expect even more innovative and complex applications, such as networks of agents working together to create software or generate financial analysis reports. On the research front, it’s important to explore how different collaboration patterns, including cooperative and competitive interactions, will emerge and be applied to real-world scenarios.

Additional references


About the author

Raphael Shu is a Senior Applied Scientist at Amazon Bedrock. He received his PhD from the University of Tokyo in 2020, earning a Dean’s Award. His research primarily focuses on Natural Language Generation, Conversational AI, and AI Agents, with publications in conferences such as ICLR, ACL, EMNLP, and AAAI. His work on the attention mechanism and latent variable models received an Outstanding Paper Award at ACL 2017 and the Best Paper Award for JNLP in 2018 and 2019. At AWS, he led the Dialog2API project, which enables large language models to interact with the external environment through dialogue. In 2023, he has led a team aiming to develop the Agentic capability for Amazon Titan. Since 2024, Raphael worked on multi-agent collaboration with LLM-based agents.

Nilaksh Das is an Applied Scientist at AWS, where he works with the Bedrock Agents team to develop scalable, interactive and modular AI systems. His contributions at AWS have spanned multiple initiatives, including the development of foundational models for semantic speech understanding, integration of function calling capabilities for conversational LLMs and the implementation of communication protocols for multi-agent collaboration. Nilaksh completed his PhD in AI Security at Georgia Tech in 2022, where he was also conferred the Outstanding Dissertation Award.

Michelle Yuan is an Applied Scientist on Amazon Bedrock Agents. Her work focuses on scaling customer needs through Generative and Agentic AI services. She has industry experience, multiple first-author publications in top ML/NLP conferences, and strong foundation in mathematics and algorithms. She obtained her Ph.D. in Computer Science at University of Maryland before joining Amazon in 2022.

Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6.5 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.

Dr. Yi Zhang is a Principal Applied Scientist at AWS, Bedrock. With 25 years of combined industrial and academic research experience, Yi’s research focuses on syntactic and semantic understanding of natural language in dialogues, and their application in the development of conversational and interactive systems with speech and text/chat. He has been technically leading the development of modeling solutions behind AWS services such as Bedrock Agents, AWS Lex, HealthScribe, etc.

Read More

How BQA streamlines education quality reporting using Amazon Bedrock

How BQA streamlines education quality reporting using Amazon Bedrock

Given the value of data today, organizations across various industries are working with vast amounts of data across multiple formats. Manually reviewing and processing this information can be a challenging and time-consuming task, with a margin for potential errors. This is where intelligent document processing (IDP), coupled with the power of generative AI, emerges as a game-changing solution.

Enhancing the capabilities of IDP is the integration of generative AI, which harnesses large language models (LLMs) and generative techniques to understand and generate human-like text. This integration allows organizations to not only extract data from documents, but to also interpret, summarize, and generate insights from the extracted information, enabling more intelligent and automated document processing workflows.

The Education and Training Quality Authority (BQA) plays a critical role in improving the quality of education and training services in the Kingdom Bahrain. BQA reviews the performance of all education and training institutions, including schools, universities, and vocational institutes, thereby promoting the professional advancement of the nation’s human capital.

BQA oversees a comprehensive quality assurance process, which includes setting performance standards and conducting objective reviews of education and training institutions. The process involves the collection and analysis of extensive documentation, including self-evaluation reports (SERs), supporting evidence, and various media formats from the institutions being reviewed.

The collaboration between BQA and AWS was facilitated through the Cloud Innovation Center (CIC) program, a joint initiative by AWS, Tamkeen, and leading universities in Bahrain, including Bahrain Polytechnic and University of Bahrain. The CIC program aims to foster innovation within the public sector by providing a collaborative environment where government entities can work closely with AWS consultants and university students to develop cutting-edge solutions using the latest cloud technologies.

As part of the CIC program, BQA has built a proof of concept solution, harnessing the power of AWS services and generative AI capabilities. The primary purpose of this proof of concept was to test and validate the proposed technologies, demonstrating their viability and potential for streamlining BQA’s reporting and data management processes.

In this post, we explore how BQA used the power of Amazon Bedrock, Amazon SageMaker JumpStart, and other AWS services to streamline the overall reporting workflow.

The challenge: Streamlining self-assessment reporting

BQA has traditionally provided education and training institutions with a template for the SER as part of the review process. Institutions are required to submit a review portfolio containing the completed SER and supporting material as evidence, which sometimes did not adhere fully to the established reporting standards.

The existing process had some challenges:

  • Inaccurate or incomplete submissions – Institutions might provide incomplete or inaccurate information in the submitted reports and supporting evidence, leading to gaps in the data required for a comprehensive review.
  • Missing or insufficient supporting evidence – The supporting material provided as evidence by institutions frequently did not substantiate the claims made in their reports, which challenged the evaluation process.
  • Time-consuming and resource-intensive – The process required dedicating significant time and resources to review the submissions manually and follow up with institutions to request additional information if needed to rectify the submissions, resulting in slowing down the overall review process.

These challenges highlighted the need for a more streamlined and efficient approach to the submission and review process.

Solution overview

The proposed solution uses Amazon Bedrock and the Amazon Titan Express model to enable IDP functionalities. The architecture seamlessly integrates multiple AWS services with Amazon Bedrock, allowing for efficient data extraction and comparison.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

The following diagram illustrates the solution architecture.

solution architecture diagram

The solution consists of the following steps:

  1. Relevant documents are uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket.
  2. An event notification is sent to an Amazon Simple Queue Service (Amazon SQS) queue to align each file for further processing. Amazon SQS serves as a buffer, enabling the different components to send and receive messages in a reliable manner without being directly coupled, enhancing scalability and fault tolerance of the system.
  3. The text extraction AWS Lambda function is invoked by the SQS queue, processing each queued file and using Amazon Textract to extract text from the documents.
  4. The extracted text data is placed into another SQS queue for the next processing step.
  5. The text summarization Lambda function is invoked by this new queue containing the extracted text. This function sends a request to SageMaker JumpStart, where a Meta Llama text generation model is deployed to summarize the content based on the provided prompt.
  6. In parallel, the InvokeSageMaker Lambda function is invoked to perform comparisons and assessments. It compares the extracted text against the BQA standards that the model was trained on, evaluating the text for compliance, quality, and other relevant metrics.
  7. The summarized data and assessment results are stored in an Amazon DynamoDB table
  8. Upon request, the InvokeBedrock Lambda function invokes Amazon Bedrock to generate generative AI summaries and comments. The function constructs a detailed prompt designed to guide the Amazon Titan Express model in evaluating the university’s submission.

Prompt engineering using Amazon Bedrock

To take advantage of the power of Amazon Bedrock and make sure the generated output adhered to the desired structure and formatting requirements, a carefully crafted prompt was developed according to the following guidelines:

  • Evidence submission – Present the evidence submitted by the institution under the relevant indicator, providing the model with the necessary context for evaluation
  • Evaluation criteria – Outline the specific criteria the evidence should be assessed against
  • Evaluation instructions – Instruct the model as follows:
    • Indicate N/A if the evidence is irrelevant to the indicator
    • Evaluate the university’s self-assessment based on the criteria
    • Assign a score from 1–5 for each comment, citing evidence directly from the content
  • Response format – Specify the response as bullet points, focusing on relevant analysis and evidence, with a word limit of 100 words

To use this prompt template, you can create a custom Lambda function with your project. The function should handle the retrieval of the required data, such as the indicator name, the university’s submitted evidence, and the rubric criteria. Within the function, include the prompt template and dynamically populate the placeholders (${indicatorName}, ${JSON.stringify(allContent)}, and ${JSON.stringify(c.comment)}) with the retrieved data.

The Amazon Titan Text Express model will then generate the evaluation response based on the provided prompt instructions, adhering to the specified format and guidelines. You can process and analyze the model’s response within your function, extracting the compliance score, relevant analysis, and evidence.

The following is an example prompt template:

for (const c of comments) {
        const prompt = `
        Below is the evidence submitted by the university under the indicator "${indicatorName}":
        ${JSON.stringify(allContent)}
    
         Analyze and Evaluate the university's eviedence based on the provided rubric criteria:
        ${JSON.stringify(c.comment)}

        - If the evidence does not relate to the indicator, indicate that it is not applicable (N/A) without any additional commentary.
        
       Choose one from the below compliance score based on evidence submitted:
       1. Non-compliant: The comment does not meet the criteria or standards.
        2.Compliant with recommendation: The comment meets the criteria but includes a suggestion or recommendation for improvement.
        3. Compliant: The comment meets the criteria or standards.

        THE END OF THE RESPONSE THERE SHOULD BE EITHER SCORE: [SCORE: COMPLIANT OR NON-COMPLIANT OR COMPLIANT WITH RECOMMENDATION]
        Write your response in concise bullet points, focusing strictly on relevant analysis and evidence.
        **LIMIT YOUR RESPONSE TO 100 WORDS ONLY.**

        `;

        logger.info(`Prompt for comment ${c.commentId}: ${prompt}`);

        const body = JSON.stringify({
          inputText: prompt,
          textGenerationConfig: {
            maxTokenCount: 4096,
            stopSequences: [],
            temperature: 0,
            topP: 0.1,
          },
        });

The following screenshot shows an example of the Amazon Bedrock generated response.

Amazon Bedrock generated response

Results

The implementation of Amazon Bedrock enabled institutions with transformative benefits. By automating and streamlining the collection and analysis of extensive documentation, including SERs, supporting evidence, and various media formats, institutions can achieve greater accuracy and consistency in their reporting processes and readiness for the review process. This not only reduces the time and cost associated with manual data processing, but also improves compliance with the quality expectations, thereby enhancing the credibility and quality of their institutions.

For BQA the implementation helped in achieving one of its strategic objectives focused on streamlining their reporting processes and achieve significant improvements across a range of critical metrics, substantially enhancing the overall efficiency and effectiveness of their operations.

Key success metrics anticipated include:

  • Faster turnaround times for generating 70% accurate and standards-compliant self-evaluation reports, leading to improved overall efficiency.
  • Reduced risk of errors or non-compliance in the reporting process, enforcing adherence to established guidelines.
  • Ability to summarize lengthy submissions into concise bullet points, allowing BQA reviewers to quickly analyze and comprehend the most pertinent information, reducing evidence analysis time by 30%.
  • More accurate compliance feedback functionality, empowering reviewers to effectively evaluate submissions against established standards and guidelines, while achieving 30% reduced operational costs through process optimizations.
  • Enhanced transparency and communication through seamless interactions, enabling users to request additional documents or clarifications with ease.
  • Real-time feedback, allowing institutions to make necessary adjustments promptly. This is particularly useful to maintain submission accuracy and completeness.
  • Enhanced decision-making by providing insights on the data. This helps universities identify areas for improvement and make data-driven decisions to enhance their processes and operations.

The following screenshot shows an example generating new evaluations using Amazon Bedrock

generating new evaluations using Amazon Bedrock

Conclusion

This post outlined the implementation of Amazon Bedrock at the Education and Training Quality Authority (BQA), demonstrating the transformative potential of generative AI in revolutionizing the quality assurance processes in the education and training sectors. For those interested in exploring the technical details further, the full code for this implementation is available in the following GitHub repo. If you are interested in conducting a similar proof of concept with us, submit your challenge idea to the Bahrain Polytechnic or University of Bahrain CIC website.


About the Author

Maram AlSaegh is a Cloud Infrastructure Architect at Amazon Web Services (AWS), where she supports AWS customers in accelerating their journey to cloud. Currently, she is focused on developing innovative solutions that leverage generative AI and machine learning (ML) for public sector entities.

Read More

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

Amazon Q Business can increase productivity across diverse teams, including developers, architects, site reliability engineers (SREs), and product managers. Amazon Q Business as a web experience makes AWS best practices readily accessible, providing cloud-centered recommendations quickly and making it straightforward to access AWS service functions, limits, and implementations. These elements are brought together in a web integration that serves various job roles and personas exactly when they need it.

As enterprises continue to grow their applications, environments, and infrastructure, it has become difficult to keep pace with technology trends, best practices, and programming standards. Enterprises provide their developers, engineers, and architects with a range of knowledge bases and documents, such as usage guides, wikis, and tools. But these resources tend to become siloed over time and inaccessible across teams, resulting in reduced knowledge, duplication of work, and reduced productivity.

MuleSoft from Salesforce provides the Anypoint platform that gives IT the tools to automate everything. This includes integrating data and systems and automating workflows and processes, and the creation of incredible digital experiences—all on a single, user-friendly platform.

This post shows how MuleSoft introduced a generative AI-powered assistant using Amazon Q Business to enhance their internal Cloud Central dashboard. This individualized portal shows assets owned, costs and usage, and well-architected recommendations to over 100 engineers. For more on MuleSoft’s journey to cloud computing, refer to Why a Cloud Operating Model?

Developers, engineers, FinOps, and architects can get the right answer at the right time when they’re ready to troubleshoot, address an issue, have an inquiry, or want to understand AWS best practices and cloud-centered deployments.

This post covers how to integrate Amazon Q Business into your enterprise setup.

Solution overview

The Amazon Q Business web experience provides seamless access to information, step-by-step instructions, troubleshooting, and prescriptive guidance so teams can deploy well-architected applications or cloud-centered infrastructure. Team members can chat directly or upload documents and receive summarization, analysis, or answers to a calculation. Amazon Q Business uses supported connectors such as Confluence, Amazon Relational Database Service (Amazon RDS), and web crawlers. The following diagram shows the reference architecture for various personas, including developers, support engineers, DevOps, and FinOps to connect with internal databases and the web using Amazon Q Business.

Reference Architecture

In this reference architecture, you can see how various user personas, spanning across teams and business units, use the Amazon Q Business web experience as an access point for information, step-by-step instructions, troubleshooting, or prescriptive guidance for deploying a well-architected application or cloud-centered infrastructure. The web experience allows team members to chat directly with an AI assistant or upload documents and receive summarization, analysis, or answers to a calculation.

Use cases for Amazon Q Business

Small, medium, and large enterprises, depending on their mode of operation, type of business, and level of investment in IT, will have varying approaches and policies on providing access to information. Amazon Q Business is one of the AWS suites of generative AI services that provides a web-based utility to set up, manage, and interact with Amazon Q. It can answer questions, provide summaries, generate content, and complete tasks using the data and expertise found in your enterprise systems. You can connect internal and external datasets without compromising security to seamlessly incorporate your specific standard operating procedures, guidelines, playbooks, and reference links. With Amazon Q, MuleSoft’s engineering teams were able to address their AWS specific inquiries (such as support ticket escalation, operational guidance, and AWS Well-Architected best practices) at scale.

The Amazon Q Business web experience allows business users across various job titles and functions to interact with Amazon Q through the web browser. With the web experience, teams can access the same information and receive similar recommendations based on their prompt or inquiry, level of experience, and knowledge, ranging from beginner to advanced.

The following demos are examples of what the Amazon Q Business web experience looks like. Amazon Q Business securely connects to over 40 commonly used business tools, such as wikis, intranets, Atlassian, Gmail, Microsoft Exchange, Salesforce, ServiceNow, Slack, and Amazon Simple Storage Service (Amazon S3). Point Amazon Q Business at your enterprise data, and it will search your data, summarize it logically, analyze trends, and engage in dialogue with end users about the data. This helps users access their data no matter where it resides in their organization.

Amazon Q Business underscores prompting and response for prescriptive guidance. Optimizing Amazon Elastic Block Store (Amazon EBS) volumes as an example, it provided detailed migration steps from gp2 to gp3. This is a well-known use case asked about by several MuleSoft teams.

Through the web experience, you can effortlessly perform document uploads and prompts for summary, calculation, or recommendations based on your document. You have the flexibility to upload .pdf, .xls, .xlsx, or .csv files directly into the chat interface. You can also assume a persona such as FinOps or DevOps and get personalized recommendations or responses.

MuleSoft engineers used the Amazon Q Business web summarization feature to better understand Split Cost Allocation Data (SCAD) for Amazon Elastic Kubernetes Service (Amazon EKS). They uploaded the SCAD PDF documents to Amazon Q and got straightforward summaries. This helped them understand their customer’s use of MuleSoft Anypoint platform running on Amazon EKS.

Amazon Q helped analyze IPv4 costs by processing an uploaded Excel file. As the video shows, it calculated expenses for elastic IPs and outbound data transfers, supporting a proposed network estimate.

Amazon Q Business demonstrating its ability to provide tailored advice by responding to a specific user scenario. As the video shows, a user took on the role of a FinOps professional and asked Amazon Q to recommend AWS tools for cost optimization. Amazon Q then offered personalized suggestions based on this FinOps persona perspective.

Prerequisites

To get started with your Amazon Q Business web experience, you need the following prerequisites:

Create an Amazon Q Business web experience

Complete the following steps to create your web experience:

The web experience can be used by a variety of business users or personas to yield accurate and repeatable recommendations for level 100, 200, and 300 inquiries. Amazon Q supports a variety of data sources and data connectors to personalize your user experience. You can also further enrich your dataset with knowledge bases within Amazon Q. With Amazon Q Business set up with your own datasets and sources, teams and business units within your enterprise can index from the same information on common topics such as cost optimization, modernization, and operational excellence while maintaining their own unique area of expertise, responsibility, and job function.

Clean Up

After trying the Amazon Q Business web experience, remember to remove any resources you created to avoid unnecessary charges. Complete the following steps:

  1. Delete the web experience:
    • On the Amazon Q Business console, navigate to the Web experiences section within your application.
    • Select the web experience you want to remove.
    • On the Actions menu, choose Delete.
    • Confirm the deletion by following the prompts.
  2. If you granted specific users access to the web experience, revoke their permissions. This might involve updating AWS Identity and Access Management (IAM) policies or removing users from specific groups in IAM Identity Center.
  3. If you set up any custom configurations for the web experience, such as specific data source filters or custom prompts, make sure to remove these.
  4. If you integrated the web experience with other tools or services, remove those integrations.
  5. Check for and delete any Amazon CloudWatch alarms or logs specifically set up for monitoring this web experience.

After deletion, review your AWS billing to make sure that charges related to the web experience have stopped.

Deleting a web experience is irreversible. Make sure you have any necessary backups or exports of important data before proceeding with the deletion. Also, keep in mind that deleting a web experience doesn’t automatically delete the entire Amazon Q Business application or its associated data sources. If you want to remove everything, follow the Amazon Q Business application clean-up procedure for the entire application.

Conclusion

Amazon Q Business web experience is your gateway to a powerful generative AI assistant. Want to take it further? Integrate Amazon Q with Slack for an even more interactive experience.

Every organization has unique needs when it comes to AI. That’s where Amazon Q shines. It adapts to your business needs, user applications, and end-user personas. The best part? You don’t need to do the heavy lifting. No complex infrastructure setup. No need for teams of data scientists. Amazon Q connects to your data and makes sense of it with just a click. It’s AI power made simple, giving you the intelligence you need without the hassle.

To learn more about the power of a generative AI assistant in your workplace, see Amazon Q Business.


About the Authors

Rueben Jimenez is an AWS Sr Solutions Architect who designs and implements complex data analytics, machine learning, generative AI, and cloud infrastructure solutions.

Sona Rajamani is a Sr. Manager Solutions Architect at AWS.  She lives in the San Francisco Bay Area and helps customers architect and optimize applications on AWS. In her spare time, she enjoys traveling and hiking.

Erick Joaquin is a Sr Customer Solutions Manager for Strategic Accounts at AWS. As a member of the account team, he is focused on evolving his customers’ maturity in the cloud to achieve operational efficiency at scale.

Read More