How Kyndryl integrated ServiceNow and Amazon Q Business

How Kyndryl integrated ServiceNow and Amazon Q Business

This post is co-written with Sujith R Pillai from Kyndryl.

In this post, we show you how Kyndryl, an AWS Premier Tier Services Partner and IT infrastructure services provider that designs, builds, manages, and modernizes complex, mission-critical information systems, integrated Amazon Q Business with ServiceNow in a few simple steps. You will learn how to configure Amazon Q Business and ServiceNow, how to create a generative AI plugin for your ServiceNow incidents, and how to test and interact with ServiceNow using the Amazon Q Business web experience. By the end of this post, you will be able to enhance your ServiceNow experience with Amazon Q Business and enjoy the benefits of a generative AI–powered interface.

Solution overview

Amazon Q Business has three main components: a front-end chat interface, a data source connector and retriever, and a ServiceNow plugin. Amazon Q Business uses AWS Secrets Manager secrets to store the ServiceNow credentials securely. The following diagram shows the architecture for the solution.

High level architecture

Chat

Users interact with ServiceNow through the generative AI–powered chat interface using natural language.

Data source connector and retriever

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business has two types of retrievers: native retrievers and existing retrievers using Amazon Kendra. The native retrievers support a wide range of Amazon Q Business connectors, including ServiceNow. The existing retriever option is for those who already have an Amazon Kendra retriever and would like to use that for their Amazon Q Business application. For the ServiceNow integration, we use the native retriever.

ServiceNow plugin

Amazon Q Business provides a plugin feature for performing actions such as creating incidents in ServiceNow.

The following high-level steps show how to configure the Amazon Q Business – ServiceNow integration:

  1. Create a user in ServiceNow for Amazon Q Business to communicate with ServiceNow
  2. Create knowledge base articles in ServiceNow if they do not exist already
  3. Create an Amazon Q Business application and configure the ServiceNow data source and retriever in Amazon Q Business
  4. Synchronize the data source
  5. Create a ServiceNow plugin in Amazon Q Business

Prerequisites

To run this application, you must have an Amazon Web Services (AWS) account, an AWS Identity and Access Management (IAM) role, and a user that can create and manage the required resources. If you are not an AWS account holder, see How do I create and activate a new Amazon Web Services account?

You need an AWS IAM Identity Center set up in the AWS Organizations organizational unit (OU) or AWS account in which you are building the Amazon Q Business application. You should have a user or group created in IAM Identity Center. You will assign this user or group to the Amazon Q Business application during the application creation process. For guidance, refer to Manage identities in IAM Identity Center.

You also need a ServiceNow user with incident_manager and knowledge_admin permissions to create and view knowledge base articles and to create incidents. We use a developer instance of ServiceNow for this post as an example. You can find out how to get the developer instance in Personal Developer Instances.

Solution walkthrough

To integrate ServiceNow and Amazon Q Business, use the steps in the following sections.

Create a knowledge base article

Follow these steps to create a knowledge base article:

  1. Sign in to ServiceNow and navigate to Self-Service > Knowledge
  2. Choose Create an Article
  3. On the Create new article page, select a knowledge base and choose a category. Optionally, you may create a new category.
  4. Provide a Short description and type in the Article body
  5. Choose Submit to create the article, as shown in the following screenshot

Repeat these steps to create a couple of knowledge base articles. In this example, we created a hypothetical enterprise named Example Corp for demonstration purposes.

Create ServiceNow Knowledgebase

Create an Amazon Q Business application

Amazon Q offers three subscription plans: Amazon Q Business Lite, Amazon Q Business Pro, and Amazon Q Developer Pro. Read the Amazon Q Documentation for more details. For this example, we used Amazon Q Business Lite.

Create application

Follow these steps to create an application:

  1. In the Amazon Q Business console, choose Get started, then choose Create application to create a new Amazon Q Business application, as shown in the following screenshot

  1. Name your application in Application name. In Service access, select Create and use a new service-linked role (SLR). For more information about example service roles, see IAM roles for Amazon Q Business. For information on service-linked roles, including how to manage them, see Using service-linked roles for Amazon Q Business. We named our application ServiceNow-Helpdesk. Next, select Create, as shown in the following screenshot.

Choose a retriever and index provisioning

To choose a retriever and index provisioning, follow these steps in the Select retriever screen, as shown in the following screenshot:

  1. For Retrievers, select Use native retriever
  2. For Index provisioning, choose Starter
  3. Choose Next

Connect data sources

Amazon Q Business has ready-made connectors for common data sources and business systems.

  1. Enter “ServiceNow” to search and select ServiceNow Online as the data source, as shown in the following screenshot

  1. Enter the URL and the version of your ServiceNow instance. We used the ServiceNow version Vancouver for this post.

  1. Scroll down the page to provide additional details about the data source. Under Authentication, select Basic authentication. Under AWS Secrets Manager secret, select Create and add a new secret from the dropdown menu as shown in the screenshot.

  1. Provide the Username and Password you created in ServiceNow to create an AWS Secrets Manager secret. Choose Save.

  1. Under Configure VPC and security group, keep the setting as No VPC because you will be connecting to the ServiceNow by the internet. You may choose to create a new service role under IAM role. This will create a role specifically for this application.

  1. In the example, we synchronize the ServiceNow knowledge base articles and incidents. Provide the information as shown in the following image below. Notice that for Filter query the example shows the following code.
workflow_state=published^kb_knowledge_base=dfc19531bf2021003f07e2c1ac0739ab^article_type=text^active=true^EQ

This filter query aims to sync the articles that meet the following criteria:

  • workflow_state = published
  • kb_knowledge_base = dfc19531bf2021003f07e2c1ac0739ab (This is the default Sys ID for the knowledge base named “Knowledge” in ServiceNow).
  • Type = text (This field contains the text in the Knowledge article).
  • Active = true (This field filters the articles to sync only the ones that are active).

The filter fields are separated by ^, and the end of the query is represented by EQ. You can find more details about the Filter query and other parameters in Connecting Amazon Q Business to ServiceNow Online using the console.

  1. Provide the Sync scope for the Incidents, as shown in the following screenshot

  1. You may select Full sync initially so that a complete synchronization is performed. You need to select the frequency of the synchronization as well. For this post, we chose Run on demand. If you need to keep the knowledge base and incident data more up-to-date with the ServiceNow instance, choose a shorter window.

  1. A field mapping will be provided for you to validate. You won’t be able to change the field mapping at this stage. Choose Add data source to proceed.

This completes the data source configuration for Amazon Q Business. The configuration takes a few minutes to be completed. Watch the screen for any errors and updates. Once the data source is created, you will be greeted with a message You successfully created the following data source: ‘ServiceNow-Datasource’

Add users and groups

Follow these steps to add users and groups:

  1. Choose Next
  2. In the Add groups and users page, click Add groups and users. You will be presented with the option of Add and assign new users or Assign existing users and groups. Select Assign existing users and groups. Choose Next, as shown in the following image.

  1. Search for an existing user or group in your IAM Identity Center, select one, and choose Assign. After selecting the right user or group, choose Done.

This completes the activity of assigning the user and group access to the Amazon Q Business application.

Create a web experience

Follow these steps to create a web experience in the Add groups and users screen, as shown in the following screenshot.

  1. Choose Create and use a new service role in the Web experience service access section
  2. Choose Create application

The deployed application with the application status will be shown in the Amazon Q Business > Applications console as shown in the following screenshot.

Synchronize the data source

Once the data source is configured successfully, it’s time to start the synchronization. To begin this process, the ServiceNow fields that require synchronization must be updated. Because we intend to get answers from the knowledge base content, the text field needs to be synchronized. To do so, follow these steps:

  1. In the Amazon Q Business console, select Applications in the navigation pane
  2. Select ServiceNow-Helpdesk and then ServiceNow-Datasource
  3. Choose Actions. From the dropdown, choose Edit, as shown in the following screenshot.

  1. Scroll down to the bottom of the page to the Field mappings Select text and description.

  1. Choose Update. After the update, choose Sync now.

The synchronization takes a few minutes to complete depending on the amount of data to be synchronized. Make sure that the Status is Completed, as shown in the following screenshot, before proceeding further. If you notice any error, you can choose the error hyperlink. The error hyperlink will take you to Amazon CloudWatch Logs to examining the logs for further troubleshooting.

Create ServiceNow plugin

A ServiceNow plugin in Amazon Q Business helps you create incidents in ServiceNow through Amazon Q Business chat. To create one, follow these steps:

  1. In the Amazon Q Business console, select Enhancements from the navigation pane
  2. Under Plugins, choose Add plugin, as shown in the following screenshot

  1. In the Add Plugin page, shown in the following screenshot, and select the ServiceNow plugin

  1. Provide a Name for the plugin
  2. Enter the ServiceNow URL and use the previously created AWS Secrets Manager secret for the Authentication
  3. Select Create and use a new service role
  4. Choose Add plugin

  1. The status of the plugin will be shown in the Plugins If Plugin status is Active, the plugin is configured and ready to use.

Use the Amazon Q Business chat interface

To use the Amazon Q Business chat interface, follow these steps:

  1. In the Amazon Q Business console, choose Applications from the navigation pane. The web experience URL will be provided for each Amazon Q Business application.

  1. Choose the Web experience URL to open the chat interface. Enter an IAM Identity Center username and password that was assigned to this application. The following screenshot shows the Sign in

You can now ask questions and receive responses, as shown in the following image. The answers will be specific to your organization and are retrieved from the knowledge base in ServiceNow.

You can ask the chat interface to create incidents as shown in the next screenshot.

A new pop-up window will appear, providing additional information related to the incident. In this window, you can provide more information related to the ticket and choose Create.

This will create a ServiceNow incident using the web experience of Amazon Q Business without signing in to ServiceNow. You may verify the ticket in the ServiceNow console as shown in the next screenshot.

Conclusion

In this post, we showed how Kyndryl is using Amazon Q Business to enable natural language conversations with ServiceNow using the ServiceNow connector provided by Amazon Q Business. We also showed how to create a ServiceNow plugin that allows users to create incidents in ServiceNow directly from the Amazon Q Business chat interface. We hope that this tutorial will help you take advantage of the power of Amazon Q Business for your ServiceNow needs.


About the authors

Asif Fouzi is a Principal Solutions Architect leading a team of seasoned technologists supporting Global Service Integrators (GSI) such as Kyndryl in their cloud journey. When he is not innovating on behalf of users, he likes to play guitar, travel, and spend time with his family.


Sujith R Pillai is a cloud solution architect in the Cloud Center of Excellence at Kyndryl with extensive experience in infrastructure architecture and implementation across various industries. With his strong background in cloud solutions, he has led multiple technology transformation projects for Kyndryl customers.

Read More

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

This post introduces HCLTech’s AutoWise Companion, a transformative generative AI solution designed to enhance customers’ vehicle purchasing journey. By tailoring recommendations based on individuals’ preferences, the solution guides customers toward the best vehicle model for them. Simultaneously, it empowers vehicle manufacturers (original equipment manufacturers (OEMs)) by using real customer feedback to drive strategic decisions, boosting sales and company profits. Powered by generative AI services on AWS and large language models’ (LLMs’) multi-modal capabilities, HCLTech’s AutoWise Companion provides a seamless and impactful experience.

In this post, we analyze the current industry challenges and guide readers through the AutoWise Companion solution functional flow and architecture design using built-in AWS services and open source tools. Additionally, we discuss the design from security and responsible AI perspectives, demonstrating how you can apply this solution to a wider range of industry scenarios.

Opportunities

Purchasing a vehicle is a crucial decision that can induce stress and uncertainty for customers. The following are some of the real-life challenges customers and manufacturers face:

  • Choosing the right brand and model – Even after narrowing down the brand, customers must navigate through a multitude of vehicle models and variants. Each model has different features, price points, and performance metrics, making it difficult to make a confident choice that fits their needs and budget.
  • Analyzing customer feedback – OEMs face the daunting task of sifting through extensive quality reporting tool (QRT) reports. These reports contain vast amounts of data, which can be overwhelming and time-consuming to analyze.
  • Aligning with customer sentiments – OEMs must align their findings from QRT reports with the actual sentiments of customers. Understanding customer satisfaction and areas needing improvement from raw data is complex and often requires advanced analytical tools.

HCLTech’s AutoWise Companion solution addresses these pain points, benefiting both customers and manufacturers by simplifying the decision-making process for customers and enhancing data analysis and customer sentiment alignment for manufacturers.

The solution extracts valuable insights from diverse data sources, including OEM transactions, vehicle specifications, social media reviews, and OEM QRT reports. By employing a multi-modal approach, the solution connects relevant data elements across various databases. Based on the customer query and context, the system dynamically generates text-to-SQL queries, summarizes knowledge base results using semantic search, and creates personalized vehicle brochures based on the customer’s preferences. This seamless process is facilitated by Retrieval Augmentation Generation (RAG) and a text-to-SQL framework.

Solution overview

The overall solution is divided into functional modules for both customers and OEMs.

Customer assist

Every customer has unique preferences, even when considering the same vehicle brand and model. The solution is designed to provide customers with a detailed, personalized explanation of their preferred features, empowering them to make informed decisions. The solution presents the following capabilities:

  • Natural language queries – Customers can ask questions in plain language about vehicle features, such as overall ratings, pricing, and more. The system is equipped to understand and respond to these inquiries effectively.
  • Tailored interaction – The solution allows customers to select specific features from an available list, enabling a deeper exploration of their preferred options. This helps customers gain a comprehensive understanding of the features that best suit their needs.
  • Personalized brochure generation – The solution considers the customer’s feature preferences and generates a customized feature explanation brochure (with specific feature images). This personalized document helps the customer gain a deeper understanding of the vehicle and supports their decision-making process.

OEM assist

OEMs in the automotive industry must proactively address customer complaints and feedback regarding various automobile parts. This comprehensive solution enables OEM managers to analyze and summarize customer complaints and reported quality issues across different categories, thereby empowering them to formulate data-driven strategies efficiently. This enhances decision-making and competitiveness in the dynamic automotive industry. The solution enables the following:

  • Insight summaries – The system allows OEMs to better understand the insightful summary presented by integrating and aggregating data from various sources, such as QRT reports, vehicle transaction sales data, and social media reviews.
  • Detailed view – OEMs can seamlessly access specific details about issues, reports, complaints, or data point in natural language, with the system providing the relevant information from the referred reviews data, transaction data, or unstructured QRT reports.

To better understand the solution, we use the seven steps shown in the following figure to explain the overall function flow.

flow map explaning the overall function flow

The overall function flow consists of the following steps:

  1. The user (customer or OEM manager) interacts with the system through a natural language interface to ask various questions.
  2. The system’s natural language interpreter, powered by a generative AI engine, analyzes the query’s context, intent, and relevant persona to identify the appropriate data sources.
  3. Based on the identified data sources, the respective multi-source query execution plan is generated by the generative AI engine.
  4. The query agent parses the execution plan and send queries to the respective query executor.
  5. Requested information is intelligently fetched from multiple sources such as company product metadata, sales transactions, OEM reports, and more to generate meaningful responses.
  6. The system seamlessly combines the collected information from the various sources, applying contextual understanding and domain-specific knowledge to generate a well-crafted, comprehensive, and relevant response for the user.
  7. The system generates the response for the original query and empowers the user to continue the interaction, either by asking follow-up questions within the same context or exploring new areas of interest, all while benefiting from the system’s ability to maintain contextual awareness and provide consistently relevant and informative responses.

Technical architecture

The overall solution is implemented using AWS services and LangChain. Multiple LangChain functions, such as CharacterTextSplitter and embedding vectors, are used for text handling and embedding model invocations. In the application layer, the GUI for the solution is created using Streamlit in Python language. The app container is deployed using a cost-optimal AWS microservice-based architecture using Amazon Elastic Container Service (Amazon ECS) clusters and AWS Fargate.

The solution contains the following processing layers:

  • Data pipeline – The various data sources, such as sales transactional data, unstructured QRT reports, social media reviews in JSON format, and vehicle metadata, are processed, transformed, and stored in the respective databases.
  • Vector embedding and data cataloging – To support natural language query similarity matching, the respective data is vectorized and stored as vector embeddings. Additionally, to enable the natural language to SQL (text-to-SQL) feature, the corresponding data catalog is generated for the transactional data.
  • LLM (request and response formation) – The system invokes LLMs at various stages to understand the request, formulate the context, and generate the response based on the query and context.
  • Frontend application – Customers or OEMs interact with the solution using an assistant application designed to enable natural language interaction with the system.

The solution uses the following AWS data stores and analytics services:

The following figure depicts the technical flow of the solution.

details architecture design on aws

The workflow consists of the following steps:

  1. The user’s query, expressed in natural language, is processed by an orchestrated AWS Lambda
  2. The Lambda function tries to find the query match from the LLM cache. If a match is found, the response is returned from the LLM cache. If no match is found, the function invokes the respective LLMs through Amazon Bedrock. This solution uses LLMs (Anthropic’s Claude 2 and Claude 3 Haiku) on Amazon Bedrock for response generation. The Amazon Titan Embeddings G1 – Text LLM is used to convert the knowledge documents and user queries into vector embeddings.
  3. Based on the context of the query and the available catalog, the LLM identifies the relevant data sources:
    1. The transactional sales data, social media reviews, vehicle metadata, and more, are transformed and used for customers and OEM interactions.
    2. The data in this step is restricted and is only accessible for OEM personas to help diagnose the quality related issues and provide insights on the QRT reports. This solution uses Amazon Textract as a data extraction tool to extract text from PDFs (such as quality reports).
  4. The LLM generates queries (text-to-SQL) to fetch data from the respective data channels according to the identified sources.
  5. The responses from each data channel are assembled to generate the overall context.
  6. Additionally, to generate a personalized brochure, relevant images (described as text-based embeddings) are fetched based on the query context. Amazon OpenSearch Serverless is used as a vector database to store the embeddings of text chunks extracted from quality report PDFs and image descriptions.
  7. The overall context is then passed to a response generator LLM to generate the final response to the user. The cache is also updated.

Responsible generative AI and security considerations

Customers implementing generative AI projects with LLMs are increasingly prioritizing security and responsible AI practices. This focus stems from the need to protect sensitive data, maintain model integrity, and enforce ethical use of AI technologies. The AutoWise Companion solution uses AWS services to enable customers to focus on innovation while maintaining the highest standards of data protection and ethical AI use.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards that can be applied to user input and foundation model output as safety and privacy controls. By incorporating guardrails, the solution proactively steers users away from potential risks or errors, promoting better outcomes and adherence to established standards. In the automobile industry, OEM vendors usually apply safety filters for vehicle specifications. For example, they want to validate the input to make sure that the queries are about legitimate existing models. Amazon Bedrock Guardrails provides denied topics and contextual grounding checks to make sure the queries about non-existent automobile models are identified and denied with a custom response.

Security considerations

The system employs a RAG framework that relies on customer data, making data security the foremost priority. By design, Amazon Bedrock provides a layer of data security by making sure that customer data stays encrypted and protected and is neither used to train the underlying LLM nor shared with the model providers. Amazon Bedrock is in scope for common compliance standards, including ISO, SOC, CSA STAR Level 2, is HIPAA eligible, and customers can use Amazon Bedrock in compliance with the GDPR.

For raw document storage on Amazon S3, transactional data storage, and retrieval, these data sources are encrypted, and respective access control mechanisms are put in place to maintain restricted data access.

Key learnings

The solution offered the following key learnings:

  • LLM cost optimization – In the initial stages of the solution, based on the user query, multiple independent LLM calls were required, which led to increased costs and execution time. By using the AWS Glue Data Catalog, we have improved the solution to use a single LLM call to find the best source of relevant information.
  • LLM caching – We observed that a significant percentage of queries received were repetitive. To optimize performance and cost, we implemented a caching mechanism that stores the request-response data from previous LLM model invocations. This cache lookup allows us to retrieve responses from the cached data, thereby reducing the number of calls made to the underlying LLM. This caching approach helped minimize cost and improve response times.
  • Image to text – Generating personalized brochures based on customer preferences was challenging. However, the latest vision-capable multimodal LLMs, such as Anthropic’s Claude 3 models (Haiku and Sonnet), have significantly improved accuracy.

Industrial adoption

The aim of this solution is to help customers make an informed decision while purchasing vehicles and empowering OEM managers to analyze factors contributing to sales fluctuations and formulate corresponding targeted sales boosting strategies, all based on data-driven insights. The solution can also be adopted in other sectors, as shown in the following table.

Industry Solution adoption
Retail and ecommerce By closely monitoring customer reviews, comments, and sentiments expressed on social media channels, the solution can assist customers in making informed decisions when purchasing electronic devices.
Hospitality and tourism The solution can assist hotels, restaurants, and travel companies to understand customer sentiments, feedback, and preferences and offer personalized services.
Entertainment and media It can assist television, movie studios, and music companies to analyze and gauge audience reactions and plan content strategies for the future.

Conclusion

The solution discussed in this post demonstrates the power of generative AI on AWS by empowering customers to use natural language conversations to obtain personalized, data-driven insights to make informed decisions during the purchase of their vehicle. It also supports OEMs in enhancing customer satisfaction, improving features, and driving sales growth in a competitive market.

Although the focus of this post has been on the automotive domain, the presented approach holds potential for adoption in other industries to provide a more streamlined and fulfilling purchasing experience.

Overall, the solution demonstrates the power of generative AI to provide accurate information based on various structured and unstructured data sources governed by guardrails to help avoid unauthorized conversations. For more information, see the HCLTech GenAI Automotive Companion in AWS Marketplace.


About the Authors

Bhajan Deep Singh leads the AWS Gen AI/AIML Center of Excellence at HCL Technologies. He plays an instrumental role in developing proof-of-concept projects and use cases utilizing AWS’s generative AI offerings. He has successfully led numerous client engagements to deliver data analytics and AI/machine learning solutions. He holds AWS’s AI/ML Specialty, AI Practitioner certification and authors technical blogs on AI/ML services and solutions. With his expertise and leadership, he enables clients to maximize the value of AWS generative AI.

Mihir Bhambri works as AWS Senior Solutions Architect at HCL Technologies. He specializes in tailored Generative AI solutions, driving industry-wide innovation in sectors such as Financial Services, Life Sciences, Manufacturing, and Automotive. Leveraging AWS cloud services and diverse Large Language Models (LLMs) to develop multiple proof-of-concepts to support business improvements. He also holds AWS Solutions Architect Certification and has contributed to the research community by co-authoring papers and winning multiple AWS generative AI hackathons.

Yajuvender Singh is an AWS Senior Solution Architect at HCLTech, specializing in AWS Cloud and Generative AI technologies. As an AWS-certified professional, he has delivered innovative solutions across insurance, automotive, life science and manufacturing industries and also won multiple AWS GenAI hackathons in India and London. His expertise in developing robust cloud architectures and GenAI solutions, combined with his contributions to the AWS technical community through co-authored blogs, showcases his technical leadership.

Sara van de Moosdijk, simply known as Moose, is an AI/ML Specialist Solution Architect at AWS. She helps AWS partners build and scale AI/ML solutions through technical enablement, support, and architectural guidance. Moose spends her free time figuring out how to fit more books in her overflowing bookcase.

Jerry Li, is a Senior Partner Solution Architect at AWS Australia, collaborating closely with HCLTech in APAC for over four years. He also works with HCLTech Data & AI Center of Excellence team, focusing on AWS data analytics and generative AI skills development, solution building, and go-to-market (GTM) strategy.


About HCLTech

HCLTech is at the vanguard of generative AI technology, using the robust AWS Generative AI tech stack. The company offers cutting-edge generative AI solutions that are poised to revolutionize the way businesses and individuals approach content creation, problem-solving, and decision-making. HCLTech has developed a suite of readily deployable generative AI assets and solutions, encompassing the domains of customer experience, software development life cycle (SDLC) integration, and industrial processes.

Read More

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

The AWS global backbone network is the critical foundation enabling reliable and secure service delivery across AWS Regions. It connects our 34 launched Regions (with 108 Availability Zones), our more than 600 Amazon CloudFront POPs, and 41 Local Zones and 29 Wavelength Zones, providing high-performance, ultralow-latency connectivity for mission-critical services across 245 countries and territories.

This network requires continuous management through planning, maintenance, and real-time operations. Although most changes occur without incident, the dynamic nature and global scale of this system introduce the potential for unforeseen impacts on performance and availability. The complex interdependencies between network components make it challenging to predict the full scope and timing of these potential impacts, necessitating advanced risk assessment and mitigation strategies.

In this post, we show how you can use our enterprise graph machine learning (GML) framework GraphStorm to solve prediction challenges on large-scale complex networks inspired by our practices of exploring GML to mitigate the AWS backbone network congestion risk.

Problem statement

At its core, the problem we are addressing is how to safely manage and modify a complex, dynamic network while minimizing service disruptions (such as the risk of congestion, site isolation, or increased latency). Specifically, we need to predict how changes to one part of the AWS global backbone network might affect traffic patterns and performance across the entire system. In the case of congestive risk for example, we want to determine whether taking a link out of service is safe under varying demands. Key questions include:

  • Can the network handle customer traffic with remaining capacity?
  • How long before congestion appears?
  • Where will congestion likely occur?
  • How much traffic is at risk of being dropped?

This challenge of predicting and managing network disruptions is not unique to telecommunication networks. Similar problems arise in various complex networked systems across different industries. For instance, supply chain networks face comparable challenges when a key supplier or distribution center goes offline, necessitating rapid reconfiguration of logistics. In air traffic control systems, the closure of an airport or airspace can lead to complex rerouting scenarios affecting multiple flight paths. In these cases, the fundamental problem remains similar: how to predict and mitigate the ripple effects of localized changes in a complex, interconnected system where the relationships between components are not always straightforward or immediately apparent.

Today, teams at AWS operate a number of safety systems that maintain a high operational readiness bar, and work relentlessly on improving safety mechanisms and risk assessment processes. We conduct a rigorous planning process on a recurring basis to inform how we design and build our network, and maintain resiliency under various scenarios. We rely on simulations at multiple levels of detail to eliminate risks and inefficiencies from our designs. In addition, every change (no matter how small) is thoroughly tested before it is deployed into the network.

However, at the scale and complexity of the AWS backbone network, simulation-based approaches face challenges in real-time operational settings (such as expensive and time-consuming computational process), which impact the efficiency of network maintenance. To complement simulations, we are therefore investing in data-driven strategies that can scale to the size of the AWS backbone network without a proportional increase in computational time. In this post, we share our progress along this journey of model-assisted network operations.

Approach

In recent years, GML methods have achieved state-of-the-art performance in traffic-related tasks, such as routing, load balancing, and resource allocation. In particular, graph neural networks (GNNs) demonstrate an advantage over classical time series forecasting, due to their ability to capture structure information hidden in network topology and their capacity to generalize to unseen topologies when networks are dynamic.

In this post, we frame the physical network as a heterogeneous graph, where nodes represent entities in the networked system, and edges represent both demands between endpoints and actual traffic flowing through the network. We then apply GNN models to this heterogeneous graph for an edge regression task.

Unlike common GML edge regression that predicts a single value for an edge, we need to predict a time series of traffic on each edge. For this, we adopt the sliding-window prediction method. During training, we start from a time point T and use historical data in a time window of size W to predict the value at T+1. We then slide the window one step ahead to predict the value at T+2, and so on. During inference, we use predicted values rather than actual values to form the inputs in a time window as we slide the window forward, making the method an autoregressive sliding-window one. For a more detailed explanation of the principles behind this method, please refer to this link.

We train GNN models with historical demand and traffic data, along with other features (network incidents and maintenance events) by following the sliding-window method. We then use the trained model to predict future traffic on all links of the backbone network using the autoregressive sliding-window method because in a real application, we can only use the predicted values for next-step predictions.

In the next section, we show the result of adapting this method to AWS backbone traffic forecasting, for improving operational safety.

Applying GNN-based traffic prediction to the AWS backbone network

For the backbone network traffic prediction application at AWS, we need to ingest a number of data sources into the GraphStorm framework. First, we need the network topology (the graph). In our case, this is composed of devices and physical interfaces that are logically grouped into individual sites. One site may contain dozens of devices and hundreds of interfaces. The edges of the graph represent the fiber connections between physical interfaces on the devices (these are the OSI layer 2 links). For each interface, we measure the outgoing traffic utilization in bps and as a percentage of the link capacity. Finally, we have a traffic matrix that holds the traffic demands between any two pairs of sites. This is obtained using flow telemetry.

The ultimate goal of our application is to improve safety on the network. For this purpose, we measure the performance of traffic prediction along three dimensions:

  • First, we look at the absolute percentage error between the actual and predicted traffic on each link. We want this error metric to be low to make sure that our model actually learned the routing pattern of the network under varying demands and a dynamic topology.
  • Second, we quantify the model’s propensity for under-predicting traffic. It is critical to limit this behavior as much as possible because predicting traffic below its actual value can lead to increased operational risk.
  • Third, we quantify the model’s propensity for over-predicting traffic. Although this is not as critical as the second metric, it’s nonetheless important to address over-predictions because they slow down maintenance operations.

We share some of our results for a test conducted on 85 backbone segments, over a 2-week period. Our traffic predictions are at a 5-minute time resolution. We trained our model on 2 weeks of data and ran the inference on a 6-hour time window. Using GraphStorm, training took less than 1 hour on an m8g.12xlarge instance for the entire network, and inference took under 2 seconds per segment, for the entire 6-hour window. In contrast, simulation-based traffic prediction requires dozens of instances for a similar network sample, and each simulation takes more than 100 seconds to go through the various scenarios.

In terms of the absolute percentage error, we find that our p90 (90th percentile) to be on the order of 13%. This means that 90% of the time, the model’s prediction is less than 13% away from the actual traffic. Because this is an absolute metric, the model’s prediction can be either above or below the network traffic. Compared to classical time series forecasting with XGBoost, our approach yields a 35% improvement.

Next, we consider all the time intervals in which the model under-predicted traffic. We find the p90 in this case to be below 5%. This means that, in 90% of the cases when the model under-predicts traffic, the deviation from the actual traffic is less than 5%.

Finally, we look at all the time intervals in which the model over-predicted traffic (again, this is to evaluate permissiveness for maintenance operations). We find the p90 in this case to be below 14%. This means that, in 90% of the cases when the model over-predicted traffic, the deviation from the actual traffic was less than 14%.

These measurements demonstrate how we can tune the performance of the model to value safety above the pace of routine operations.

Finally, in this section, we provide a visual representation of the model output around a maintenance operation. This operation consists of removing a segment of the network out of service for maintenance. As shown in the following figure, the model is able to predict the changing nature of traffic on two different segments: one where traffic increases sharply as a result of the operation (left) and the second referring to the segment that was taken out of service and where traffic drops to zero (right).

backbone performance left backbone performance right

An example for GNN-based traffic prediction with synthetic data

Unfortunately, we can’t share the details about the AWS backbone network including the data we used to train the model. To still provide you with some code that makes it straightforward to get started solving your network prediction problems, we share a synthetic traffic prediction problem instead. We have created a Jupyter notebook that generates synthetic airport traffic data. This dataset simulates a global air transportation network using major world airports, creating fictional airlines and flights with predefined capacities. The following figure illustrates these major airports and the simulated flight routes derived from our synthetic data.

world map with airlines

Our synthetic data includes: major world airports, simulated airlines and flights with predefined capacities for cargo demands, and generated air cargo demands between airport pairs, which will be delivered by simulated flights.

We employ a simple routing policy to distribute these demands evenly across all shortest paths between two airports. This policy is intentionally hidden from our model, mimicking the real-world scenarios where the exact routing mechanisms are not always known. If flight capacity is insufficient to meet incoming demands, we simulate the excess as inventory stored at the airport. The total inventory at each airport serves as our prediction target. Unlike real air transportation networks, we didn’t follow a hub-and-spoke topology. Instead, our synthetic network uses a point-to-point structure. Using this synthetic air transportation dataset, we now demonstrate a node time series regression task, predicting the total inventory at each airport every day. As illustrated in the following figure, the total inventory amount at an airport is influenced by its own local demands, the traffic passing through it, and the capacity that it can output. By design, the output capacity of an airport is limited to make sure that most airport-to-airport demands require multiple-hop fulfillment.

airport inventory explanation

In the remainder of this section, we cover the data preprocessing steps necessary for using the GraphStorm framework, before customizing a GNN model for our application. Towards the end of the post, we also provide an architecture for an operational safety system built using GraphStorm and in an environment of AWS services.

Data preprocessing for graph time series forecasting

To use GraphStorm for node time series regression, we need to structure our synthetic air traffic dataset according to GraphStorm’s input data format requirements. This involves preparing three key components: a set of node tables, a set of edge tables, and a JSON file describing the dataset.

We abstract the synthetic air traffic network into a graph with one node type (airport) and two edge types. The first edge type, airport, demand, airport, represents demand between any pair of airports. The second one, airport, traffic, airport, captures the amount of traffic sent between connected airports.

The following diagram illustrates this graph structure.

Our airport nodes have two types of associated features: static features (longitude and latitude) and time series features (daily total inventory amount). For each edge, the src_code and dst_code capture the source and destination airport codes. The edge features also include a demand and a traffic time series. Finally, edges for connected airports also hold the capacity as a static feature.

The synthetic data generation notebook also creates a JSON file, which describes the air traffic data and provides instructions for GraphStorm’s graph construction tool to follow. Using these artifacts, we can employ the graph construction tool to convert the air traffic graph data into a distributed DGL graph. In this format:

  • Demand and traffic time series data is stored as E*T tensors in edges, where E is the number of edges of a given type, and T is the number of days in our dataset.
  • Inventory amount time series data is stored as an N*T tensor in nodes, where N is the number of airport nodes.

This preprocessing step makes sure our data is optimally structured for time series forecasting using GraphStorm.

Model

To predict the next total inventory amount for each airport, we employ GNN models, which are well-suited for capturing these complex relationships. Specifically, we use GraphStorm’s Relational Graph Convolutional Network (RGCN) module as our GNN model. This allows us to effectively pass information (demands and traffic) among airports in our network. To support the sliding-window prediction method we described earlier, we created a customized RGCN model.

The detailed implementation of the node time series regression model can be found in the Python file. In the following sections, we explain a few key implementation points.

Customized RGCN model

The GraphStorm v0.4 release adds support for edge features. This means that we can use a for-loop to iterate along the T dimensions in the time series tensor, thereby implementing the sliding-window method in the forward() function during model training, as shown in the following pseudocode:

def forward(self, ......):
    ......
    # ---- Process Time Series Data Step by Step Using Sliding Windows ---- #
    for step in range(0, (self._ts_size - self._window_size)):
       # extract one step time series feature based on time window arguments 
       ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size, step)
       ......
       # extract one step time series labels
       new_labels = get_ts_labels(labels, self._ts_size, self._window_size, step)
       ......
       # compute loss per window
       step_loss = self.model(ts_feats, new_labels)
    # sum all step losses and average them
    ts_loss = sum(step_losses) / len(step_losses)

The actual code of the forward() function is in the following code snippet.

In contrast, because the inference step needs to use the autoregressive sliding-window method, we implement a one-step prediction function in the predict() routine:

def predict(self, ....., use_ar=False, predict_step=-1):
    ......
    # ---- Use Autoregressive Method in Inference ---- 
    # It is inferrer's resposibility to provide the ``predict_step`` value.
    if use_ar:
        # extract one step time series feature based on the given predict_step
        ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size,
                                         predict_step)
        ......
        # compute prediction only
        predi = self.model(ts_feats)
    else:
        # ------------- Same as Forward() method ------------- #
        ......

The actual code of the predict() function is in the following code snippet.

Customized node trainer

GraphStorm’s default node trainer (GSgnnNodePredctionTrainer), which handles the model training loop, can’t process the time series feature requirement. Therefore, we implement a customized node trainer by inheriting the GSgnnNodePredctionTrainer and use our own customized node_mini_batch_gnn_predict() method. This is shown in the following code snippet.

Customized node_mini_batch_predict() method

The customized node_mini_batch_predict() method calls the customized model’s predict() method, passing the two additional arguments that are specific to our use case. These are used to determine whether the autoregressive property is used or not, along with the current prediction step for appropriate indexing (see the following code snippet).

Customized node predictor (inferrer)

Similar to the node trainer, GraphStorm’s default node inference class, which drives the inference pipeline (GSgnnNodePredictionInferrer), can’t handle the time series feature processing we need in this application. We therefore create a customized node inferrer by inheriting GSgnnNodePredictionInferrer, and add two specific arguments. In this customized inferrer, we use a for-loop to iterate over the T dimensions of the time series feature tensor. Unlike the for-loop we used in model training, the inference loop uses the predicted values in subsequent prediction steps (this is shown in the following code snippet).

So far, we have focused on the node prediction example with our dataset and modeling. However, our approach allows for various other prediction tasks, such as:

  • Forecasting traffic between specific airport pairs.
  • More complex scenarios like predicting potential airport congestion or increased utilization of alternative routes when reducing or eliminating flights between certain airports.

With the customized model and pipeline classes, we can use the following Jupyter notebook to run the overall training and inference pipeline for our airport inventory amount prediction task. We encourage you to explore these possibilities, adapt the provided example to your specific use cases or research interests, and refer to our Jupyter notebooks for a comprehensive understanding of how to use GraphStorm APIs for various GML tasks.

System architecture for GNN-based network traffic prediction

In this section, we propose a system architecture for enhancing operational safety within a complex network, such as the ones we discussed earlier. Specifically, we employ GraphStorm within an AWS environment to build, train, and deploy graph models. The following diagram shows the various components we need to achieve the safety functionality.

system architecture

The complex system in question is represented by the network shown at the bottom of the diagram, overlaid on the map of the continental US. This network emits telemetry data that can be stored on Amazon Simple Storage Service (Amazon S3) in a dedicated bucket. The evolving topology of the network should also be extracted and stored.

On the top right of the preceding diagram, we show how Amazon Elastic Compute Cloud (Amazon EC2) instances can be configured with the necessary GraphStorm dependencies using direct access to the project’s GitHub repository. After they’re configured, we can build GraphStorm Docker images on them. These images then can be put on Amazon Elastic Container Registry (Amazon ECR) and be made available to other services (for example, Amazon SageMaker).

During training, SageMaker jobs use those instances along with the network data to train a traffic prediction model such as the one we demonstrated in this post. The trained model can then be stored on Amazon S3. It might be necessary to repeat this training process periodically, to make sure that the model’s performance keeps up with changes to the network dynamics (such as modifications to the routing schemes).

Above the network representation, we show two possible actors: operators and automation systems. These actors call on a network safety API implemented in AWS Lambda to make sure that the actions they intend to take are safe for the anticipated time horizon (for example, 1 hour, 6 hours, 24 hours). To provide an answer, the Lambda function uses the on-demand inference capabilities of SageMaker. During inference, SageMaker uses the pre-trained model to produce the necessary traffic predictions. These predictions can also be stored on Amazon S3 to continuously monitor the model’s performance over time, triggering training jobs when significant drift is detected.

Conclusion

Maintaining operational safety for the AWS backbone network, while supporting the dynamic needs of our global customer base, is a unique challenge. In this post, we demonstrated how the GML framework GraphStorm can be effectively applied to predict traffic patterns and potential congestion risks in such complex networks. By framing our network as a heterogeneous graph and using GNNs, we’ve shown that it’s possible to capture the intricate interdependencies and dynamic nature of network traffic. Our approach, tested on both synthetic data and the actual AWS backbone network, has demonstrated significant improvements over traditional time series forecasting methods, with a 35% reduction in prediction error compared to classical approaches like XGBoost.

The proposed system architecture, integrating GraphStorm with various AWS services like Amazon S3, Amazon EC2, SageMaker, and Lambda, provides a scalable and efficient framework for implementing this approach in production environments. This setup allows for continuous model training, rapid inference, and seamless integration with existing operational workflows.

We will keep you posted about our progress in taking our solution to production, and share the benefit for AWS customers.

We encourage you to explore the provided Jupyter notebooks, adapt our approach to your specific use cases, and contribute to the ongoing development of graph-based ML techniques for managing complex networked systems. To learn how to use GraphStorm to solve a broader class of ML problems on graphs, see the GitHub repo.


About the Authors

Jian Zhang is a Senior Applied Scientist who has been using machine learning techniques to help customers solve various problems, such as fraud detection, decoration image generation, and more. He has successfully developed graph-based machine learning, particularly graph neural network, solutions for customers in China, the US, and Singapore. As an enlightener of AWS graph capabilities, Zhang has given many public presentations about GraphStorm, the GNN, the Deep Graph Library (DGL), Amazon Neptune, and other AWS services.

Fabien Chraim is a Principal Research Scientist in AWS networking. Since 2017, he’s been researching all aspects of network automation, from telemetry and anomaly detection to root causing and actuation. Before Amazon, he co-founded and led research and development at Civil Maps (acquired by Luminar). He holds a PhD in electrical engineering and computer sciences from UC Berkeley.

Patrick Taylor is a Senior Data Scientist in AWS networking. Since 2020, he has focused on impact reduction and risk management in networking software systems and operations research in networking operations teams. Previously, Patrick was a data scientist specializing in natural language processing and AI-driven insights at Hyper Anna (acquired by Alteryx) and holds a Bachelor’s degree from the University of Sydney.

Xiang Song is a Senior Applied Scientist at AWS AI Research and Education (AIRE), where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.

Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.

Read More

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

With the general availability of Amazon Bedrock Agents, you can rapidly develop generative AI applications to run multi-step tasks across a myriad of enterprise systems and data sources. However, some geographies and regulated industries bound by data protection and privacy regulations have sought to combine generative AI services in the cloud with regulated data on premises. In this post, we show how to extend Amazon Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.

Solution overview

For organizations processing or storing sensitive information such as personally identifiable information (PII), customers have asked for AWS Global Infrastructure to address these specific localities, including mechanisms to make sure that data is being stored and processed in compliance with local laws and regulations. Through AWS hybrid and edge services such as Local Zones and Outposts, you can benefit from the scalability and flexibility of the AWS Cloud with the low latency and local processing capabilities of an on-premises (or localized) infrastructure. This hybrid approach allows organizations to run applications and process data closer to the source, reducing latency, improving responsiveness for time-sensitive workloads, and adhering to data regulations.

Although architecting for data residency with an Outposts rack and Local Zone has been broadly discussed, generative AI and FMs introduce an additional set of architectural considerations. As generative AI models become increasingly powerful and ubiquitous, customers have asked us how they might consider deploying models closer to the devices, sensors, and end users generating and consuming data. Moreover, interest in small language models (SLMs) that enable resource-constrained devices to perform complex functions—such as natural language processing and predictive automation—is growing. To learn more about opportunities for customers to use SLMs, see Opportunities for telecoms with small language models: Insights from AWS and Meta on our AWS Industries blog.

Beyond SLMs, the interest in generative AI at the edge has been driven by two primary factors:

  • Latency – Running these computationally intensive models on an edge infrastructure can significantly reduce latency and improve real-time responsiveness, which is critical for many time-sensitive applications like virtual assistants, augmented reality, and autonomous systems.
  • Privacy and security – Processing sensitive data at the edge, rather than sending it to the cloud, can enhance privacy and security by minimizing data exposure. This is particularly useful in healthcare, financial services, and legal sectors.

In this post, we cover two primary architectural patterns: fully local RAG and hybrid RAG.

Fully local RAG

For the deployment of a large language model (LLM) in a RAG use case on an Outposts rack, the LLM will be self-hosted on a G4dn instance and knowledge bases will be created on the Outpost rack, using either Amazon Elastic Block Storage (Amazon EBS) or Amazon S3 on Outposts. The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they won’t be transferred to the AWS Region and will remain completely local on the Outpost rack. You can use a local vector database either hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using Amazon Relational Database Service (Amazon RDS) for PostgreSQL on the Outpost rack with the pgvector extension to store embeddings. See the following figure for an example.

Local RAG Concept Diagram

Hybrid RAG

Certain customers are required by data protection or privacy regulations to keep their data within specific state boundaries. To align with these requirements and still use such data for generative AI, customers with hybrid and edge environments need to host their FMs in both a Region and at the edge. This setup enables you to use data for generative purposes and remain compliant with security regulations. To orchestrate the behavior of such a distributed system, you need a system that can understand the nuances of your prompt and direct you to the right FM running in a compliant environment. Amazon Bedrock Agents makes this distributed system in hybrid systems possible.

Amazon Bedrock Agents enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. The orchestration includes the ability to invoke AWS Lambda functions to invoke other FMs, opening the ability to run self-managed FMs at the edge. With this mechanism, you can build distributed RAG applications for highly regulated industries subject to data residency requirements. In the hybrid deployment scenario, in response to a customer prompt, Amazon Bedrock can perform some actions in a specified Region and defer other actions to a self-hosted FM in a Local Zone. The following example illustrates the hybrid RAG high-level architecture.

Hybrid RAG Concept Diagram

In the following sections, we dive deep into both solutions and their implementation.

Fully local RAG: Solution deep dive

To start, you need to configure your virtual private cloud (VPC) with an edge subnet on the Outpost rack. To create an edge subnet on the Outpost, you need to find the Outpost Amazon Resource Name (ARN) on which you want to create the subnet, as well as the Availability Zone of the Outpost. After you create the internet gateway, route tables, and subnet associations, launch a series of EC2 instances on the Outpost rack to run your RAG application, including the following components.

  • Vector store –  To support RAG (Retrieval-Augmented Generation), deploy an open-source vector database, such as ChromaDB or Faiss, on an EC2 instance (C5 family) on AWS Outposts. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base. Your selected embedding model will be used to convert text (both documents and queries) into these vector representations, enabling efficient storage and retrieval. The actual Knowledge Base consists of the original text documents and their corresponding vector representations stored in the vector database. To query this knowledge base and generate a response based on the retrieved results, you can use LangChain to chain the related documents retrieved by the vector search to the prompt fed to your Large Language Model (LLM). This approach allows for retrieval and integration of relevant information into the LLM’s generation process, enhancing its responses with local, domain-specific knowledge.
  • Chatbot application – On a second EC2 instance (C5 family), deploy the following two components: a backend service responsible for ingesting prompts and proxying the requests back to the LLM running on the Outpost, and a simple React application that allows users to prompt a local generative AI chatbot with questions.
  • LLM or SLM– On a third EC2 instance (G4 family), deploy an LLM or SLM to conduct edge inferencing via popular frameworks such as Ollama. Additionally, you can use ModelBuilder using the SageMaker SDK to deploy to a local endpoint, such as an EC2 instance running at the edge.

Optionally, your underlying proprietary data sources can be stored on Amazon Simple Storage Service (Amazon S3) on Outposts or using Amazon S3-compatible solutions running on Amazon EC2 instances with EBS volumes.

The components intercommunicate through the traffic flow illustrated in the following figure.

Loc

The workflow consists of the following steps:

  1. Using the frontend application, the user uploads documents that will serve as the knowledge base and are stored in Amazon EBS on the Outpost rack. These documents are chunked by the application and are sent to the embedding model.
  2. The embedding model, which is hosted on the same EC2 instance as the local LLM API inference server, converts the text chunks into vector representations.
  3. The generated embeddings are sent to the vector database and stored, completing the knowledge base creation.
  4. Through the frontend application, the user prompts the chatbot interface with a question.
  5. The prompt is forwarded to the local LLM API inference server instance, where the prompt is tokenized and is converted into a vector representation using the local embedding model.
  6. The question’s vector representation is sent to the vector database where a similarity search is performed to get matching data sources from the knowledge base.
  7. After the local LLM has the query and the relevant context from the knowledge base, it processes the prompt, generates a response, and sends it back to the chatbot application.
  8. The chatbot application presents the LLM response to the user through its interface.

To learn more about the fully local RAG application or get hands-on with the sample application, see Module 2 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Hybrid RAG: Solution deep dive

To start, you need to configure a VPC with an edge subnet, either corresponding to an Outpost rack or Local Zone depending on the use case. After you create the internet gateway, route tables, and subnet associations, launch an EC2 instance on the Outpost rack (or Local Zone) to run your hybrid RAG application. On the EC2 instance itself, you can reuse the same components as the fully local RAG: a vector store, backend API server, embedding model and a local LLM.

In this architecture, we rely heavily on managed services such as Lambda and Amazon Bedrock because only select FMs and knowledge bases corresponding to the heavily regulated data, rather than the orchestrator itself, are required to live at the edge. To do so, we will extend the existing Amazon Bedrock Agents workflows to the edge using a sample FM-powered customer service bot.

In this example customer service bot, we’re a shoe retailer bot that provides customer service support for purchasing shoes by providing options in a human-like conversation. We also assume that the knowledge base surrounding the practice of shoemaking is proprietary and, therefore, resides at the edge. As a result, questions surrounding shoemaking will be addressed by the knowledge base and local FM running at the edge.

To make sure that the user prompt is effectively proxied to the right FM, we rely on Amazon Bedrock Agents action groups. An action group defines actions that the agent can perform, such as place_order or check_inventory. In our example, we could define an additional action within an existing action group called hybrid_rag or learn_shoemaking that specifically addresses prompts that can only be addressed by the AWS hybrid and edge locations.

As part of the agent’s InvokeAgent API, an agent interprets the prompt (such as “How is leather used for shoemaking?”) with an FM and generates a logic for the next step it should take, including a prediction for the most prudent action in an action group. In this example, we want the prompt, “Hello, I would like recommendations to purchase some shoes.” to be directed to the /check_inventory action group, whereas the prompt, “How is leather used for shoemaking?” could be directed to the /hybrid_rag action group.

The following diagram illustrates this orchestration, which is implemented by the orchestration phase of the Amazon Bedrock agent.

Hybrid RAG Reference Architecture

To create the additional edge-specific action group, the new OpenAPI schema must reflect the new action, hybrid_rag with a detailed description, structure, and parameters that define the action in the action group as an API operation specifically focused on a data domain only available in a specific edge location.

After you define an action group using the OpenAPI specification, you can define a Lambda function to program the business logic for an action group. This Lambda handler (see the following code) might include supporting functions (such as queryEdgeModel) for the individual business logic corresponding to each action group.

def lambda_handler(event, context):
    responses = []
    global cursor
    if cursor == None:
        cursor = load_data()
    id = ''
    api_path = event['apiPath']
    logger.info('API Path')
    logger.info(api_path)
    
    if api_path == '/customer/{CustomerName}':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "CustomerName":
                cName = parameter["value"]
        body = return_customer_info(cName)
    elif api_path == '/place_order':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "ShoeID":
                id = parameter["value"]
            if parameter["name"] == "CustomerID":
                cid = parameter["value"]
        body = place_shoe_order(id, cid)
    elif api_path == '/check_inventory':
        body = return_shoe_inventory()
    elif api_path == "/hybrid_rag":
        prompt = event['parameters'][0]["value"]
        body = queryEdgeModel(prompt)
        response_body = {"application/json": {"body": str(body)}}
        response_code = 200
    else:
        body = {"{} is not a valid api, try another one.".format(api_path)}

    response_body = {
        'application/json': {
            'body': json.dumps(body)
        }
    }

However, in the action group corresponding to the edge LLM (as seen in the code below), the business logic won’t include Region-based FM invocations, such as using Amazon Bedrock APIs. Instead, the customer-managed endpoint will be invoked, for example using the private IP address of the EC2 instance hosting the edge FM in a Local Zone or Outpost. This way, AWS native services such as Lambda and Amazon Bedrock can orchestrate complicated hybrid and edge RAG workflows.

def queryEdgeModel(prompt):
    import urllib.request, urllib.parse
    # Composing a payload for API
    payload = {'text': prompt}
    data = json.dumps(payload).encode('utf-8')
    headers = {'Content-type': 'application/json'}
    
    # Sending a POST request to the edge server
    req = urllib.request.Request(url="http://<your-private-ip-address>:5000/", data=data, headers=headers, method='POST')
    with urllib.request.urlopen(req) as response:
        response_text = response.read().decode('utf-8')
        return response_text

After the solution is fully deployed, you can visit the chat playground feature on the Amazon Bedrock Agents console and ask the question, “How are the rubber heels of shoes made?” Even though most of the prompts will be be exclusively focused on retail customer service operations for ordering shoes, the native orchestration support by Amazon Bedrock Agents seamlessly directs the prompt to your edge FM running the LLM for shoemaking.

To learn more about this hybrid RAG application or get hands-on with the cross-environment application, refer to Module 1 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Conclusion

In this post, we demonstrated how to extend Amazon Bedrock Agents to AWS hybrid and edge services, such as Local Zones or Outposts, to build distributed RAG applications in highly regulated industries subject to data residency requirements. Moreover, for 100% local deployments to align with the most stringent data residency requirements, we presented architectures converging the knowledge base, compute, and LLM within the Outposts hardware itself.

To get started with both architectures, visit AWS Workshops. To get started with our newly released workshop, see Hands-on with Generative AI on AWS Hybrid & Edge Services. Additionally, check out other AWS hybrid cloud solutions or reach out to your local AWS account team to learn how to get started with Local Zones or Outposts.


About the Authors

Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS edge computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking, and the edge cloud.

Aditya Lolla is a Sr. Hybrid Edge Specialist Solutions architect at Amazon Web Services. He assists customers across the world with their migration and modernization journey from on-premises environments to the cloud and also build hybrid architectures on AWS Edge infrastructure. Aditya’s areas of interest include private networks, public and private cloud platforms, multi-access edge computing, hybrid and multi cloud strategies and computer vision applications.

Read More

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Large language model (LLM) based AI agents that have been specialized for specific tasks have demonstrated great problem-solving capabilities. By combining the reasoning power of multiple intelligent specialized agents, multi-agent collaboration has emerged as a powerful approach to tackle more intricate, multistep workflows.

The concept of multi-agent systems isn’t entirely new—it has its roots in distributed artificial intelligence research dating back to the 1980s. However, with recent advancements in LLMs, the capabilities of specialized agents have significantly expanded in areas such as reasoning, decision-making, understanding, and generation through language and other modalities. For instance, a single attraction research agent can perform web searches and list potential destinations based on user preferences. By creating a network of specialized agents, we can combine the strengths of multiple specialist agents to solve increasingly complex problems, such as creating and optimizing an entire travel plan by considering weather forecasts in nearby cities, traffic conditions, flight and hotel availability, restaurant reviews, attraction ratings, and more.

The research team at AWS has worked extensively on building and evaluating the multi-agent collaboration (MAC) framework so customers can orchestrate multiple AI agents on Amazon Bedrock Agents. In this post, we explore the concept of multi-agent collaboration (MAC) and its benefits, as well as the key components of our MAC framework. We also go deeper into our evaluation methodology and present insights from our studies. More technical details can be found in our technical report.

Benefits of multi-agent systems

Multi-agent collaboration offers several key advantages over single-agent approaches, primarily stemming from distributed problem-solving and specialization.

Distributed problem-solving refers to the ability to break down complex tasks into smaller subtasks that can be handled by specialized agents. By breaking down tasks, each agent can focus on a specific aspect of the problem, leading to more efficient and effective problem-solving. For example, a travel planning problem can be decomposed into subtasks such as checking weather forecasts, finding available hotels, and selecting the best routes.

The distributed aspect also contributes to the extensibility and robustness of the system. As the scope of a problem increases, we can simply add more agents to extend the capability of the system rather than try to optimize a monolithic agent packed with instructions and tools. On robustness, the system can be more resilient to failures because multiple agents can compensate for and even potentially correct errors produced by a single agent.

Specialization allows each agent to focus on a specific area within the problem domain. For example, in a network of agents working on software development, a coordinator agent can manage overall planning, a programming agent can generate correct code and test cases, and a code review agent can provide constructive feedback on the generated code. Each agent can be designed and customized to excel at a specific task.

For developers building agents, this means the workload of designing and implementing an agentic system can be organically distributed, leading to faster development cycles and better quality. Within enterprises, often development teams have distributed expertise that is ideal for developing specialist agents. Such specialist agents can be further reused by other teams across the entire organization.

In contrast, developing a single agent to perform all subtasks would require the agent to plan the problem-solving strategy at a high level while also keeping track of low-level details. For example, in the case of travel planning, the agent would need to maintain a high-level plan for checking weather forecasts, searching for hotel rooms and attractions, while simultaneously reasoning about the correct usage of a set of hotel-searching APIs. This single-agent approach can easily lead to confusion for LLMs because long-context reasoning becomes challenging when different types of information are mixed. Later in this post, we provide evaluation data points to illustrate the benefits of multi-agent collaboration.

A hierarchical multi-agent collaboration framework

The MAC framework for Amazon Bedrock Agents starts from a hierarchical approach and expands to other mechanisms in the future. The framework consists of several key components designed to optimize performance and efficiency.

Here’s an explanation of each of the components of the multi-agent team:

  • Supervisor agent – This is an agent that coordinates a network of specialized agents. It’s responsible for organizing the overall workflow, breaking down tasks, and assigning subtasks to specialist agents. In our framework, a supervisor agent can assign and delegate tasks, however, the responsibility of solving the problem won’t be transferred.
  • Specialist agents – These are agents with specific expertise, designed to handle particular aspects of a given problem.
  • Inter-agent communication – Communication is the key component of multi-agent collaboration, allowing agents to exchange information and coordinate their actions. We use a standardized communication protocol that allows the supervisor agents to send and receive messages to and from the specialist agents.
  • Payload referencing – This mechanism enables efficient sharing of large content blocks (like code snippets or detailed travel itineraries) between agents, significantly reducing communication overhead. Instead of repeatedly transmitting large pieces of data, agents can reference previously shared payloads using unique identifiers. This feature is particularly valuable in domains such as software development.
  • Routing mode – For simpler tasks, this mode allows direct routing to specialist agents, bypassing the full orchestration process to improve efficiency for latency-sensitive applications.

The following figure shows inter-agent communication in an interactive application. The user first initiates a request to the supervisor agent. After coordinating with the subagents, the supervisor agent returns a response to the user.

Evaluation of multi-agent collaboration: A comprehensive approach

Evaluating the effectiveness and efficiency of multi-agent systems presents unique challenges due to several complexities:

  1. Users can follow up and provide additional instructions to the supervisor agent.
  2. For many problems, there are multiple ways to resolve them.
  3. The success of a task often requires an agentic system to correctly perform multiple subtasks.

Conventional evaluation methods based on matching ground-truth actions or states often fall short in providing intuitive results and insights. To address this, we developed a comprehensive framework that calculates success rates based on automatic judgments of human-annotated assertions. We refer to this approach as “assertion-based benchmarking.” Here’s how it works:

  • Scenario creation – We create a diverse set of scenarios across different domains, each with specific goals that an agent must achieve to obtain success.
  • Assertions – For each scenario, we manually annotate a set of assertions that must be true for the task to be considered successful. These assertions cover both user-observable outcomes and system-level behaviors.
  • Agent and user simulation We simulate the behavior of the agent in a sandbox environment, where the agent is asked to solve the problems described in the scenarios. Whenever user interaction is required, we use an independent LLM-based user simulator to provide feedback.
  • Automated evaluation – We use an LLM to automatically judge whether each assertion is true based on the conversation transcript.
  • Human evaluation – Instead of using LLMs, we ask humans to directly judge the success based on simulated trajectories.

Here is an example of a scenario and corresponding assertions for assertion-based benchmarking:

  • Goals:
    • User needs the weather conditions expected in Las Vegas for tomorrow, January 5, 2025.
    • User needs to search for a direct flight from Denver International Airport to McCarran International Airport, Las Vegas, departing tomorrow morning, January 5, 2025.
  • Assertions:
    • User is informed about the weather forecast for Las Vegas tomorrow, January 5, 2025.
    • User is informed about the available direct flight options for a trip from Denver International Airport to McCarran International Airport in Las Vegas for tomorrow, January 5, 2025.
      get_tomorrow_weather_by_city is triggered to find information on the weather conditions expected in Las Vegas tomorrow, January 5, 2025.
    • search_flights is triggered to search for a direct flight from Denver International Airport to McCarran International Airport departing tomorrow, January 5, 2025.

For better user simulation, we also include additional contextual information as part of the scenario. A multi-agent collaboration trajectory is judged as successful only when all assertions are met.

Key metrics

Our evaluation framework focuses on evaluating a high-level success rate across multiple tasks to provide a holistic view of system performance:

Goal success rate (GSR) – This is our primary measure of success, indicating the percentage of scenarios where all assertions were evaluated as true. The overall GSR is aggregated into a single number for each problem domain.

Evaluation results

The following table shows the evaluation results of multi-agent collaboration on Amazon Bedrock Agents across three enterprise domains (travel planning, mortgage financing, and software development):

Dataset Overall GSR
Automatic evaluation  Travel planning  87%
 Mortgage financing  90%
 Software development  77%
Human evaluation  Travel planning  93%
 Mortgage financing  97%
 Software development  73%

All experiments are conducted in a setting where the supervisor agents are driven by Anthropic’s Claude 3.5 Sonnet models.

Comparing to single-agent systems

We also conducted an apples-to-apples comparison with the single-agent approach under equivalent settings. The MAC approach achieved a 90% success rate across all three domains. In contrast, the single-agent approach scored 60%, 80%, and 53% in the travel planning, mortgage financing, and software development datasets, respectively, which are significantly lower than the multi-agent approach. Upon analysis, we found that when presented with many tools, a single agent tended to hallucinate tool calls and failed to reject some out-of-scope requests. These results highlight the effectiveness of our multi-agent system in handling complex, real-world tasks across diverse domains.

To understand the reliability of the automatic judgments, we conducted a human evaluation on the same scenarios to investigate the correlation between the model and human judgments and found high correlation on end-to-end GSR.

Comparison with other frameworks

To understand how our MAC framework stacks up against existing solutions, we conducted a comparative analysis with a widely adopted open source framework (OSF) under equivalent conditions, with Anthropic’s Claude 3.5 Sonnet driving the supervisor agent and Anthropic’s Claude 3.0 Sonnet driving the specialist agents. The results are summarized in the following figure:

These results demonstrate a significant performance advantage for our MAC framework across all the tested domains.

Best practices for building multi-agent systems

The design of multi-agent teams can significantly impact the quality and efficiency of problem-solving across tasks. Among the many lessons we learned, we found it crucial to carefully design team hierarchies and agent roles.

Design multi-agent hierarchies based on performance targets
It’s important to design the hierarchy of a multi-agent team by considering the priorities of different targets in a use case, such as success rate, latency, and robustness. For example, if the use case involves building a latency-sensitive customer-facing application, it might not be ideal to include too many layers of agents in the hierarchy because routing requests through multiple tertiary agents can add unnecessary delays. Similarly, to optimize latency, it’s better to avoid agents with overlapping functionalities, which can introduce inefficiencies and slow down decision-making.

Define agent roles clearly
Each agent must have a well-defined area of expertise. On Amazon Bedrock Agents, this can be achieved through collaborator instructions when configuring multi-agent collaboration. These instructions should be written in a clear and concise manner to minimize ambiguity. Moreover, there should be no confusion in the collaborator instructions across multiple agents because this can lead to inefficiencies and errors in communication.

The following is a clear, detailed instruction:

Trigger this agent for 1) searching for hotels in a given location, 2) checking availability of one or multiple hotels, 3) checking amenities of hotels, 4) asking for price quote of one or multiple hotels, and 5) answering questions of check-in/check-out time and cancellation policy of specific hotels.

The following instruction is too brief, making it unclear and ambiguous.

Trigger this agent for helping with accommodation.

The second, unclear, example can lead to confusion and lower collaboration efficiency when multiple specialist agents are involved. Because the instruction doesn’t explicitly define the capabilities of the hotel specialist agent, the supervisor agent may overcommunicate, even when the user query is out of scope.

Conclusion

Multi-agent systems represent a powerful paradigm for tackling complex real-world problems. By using the collective capabilities of multiple specialized agents, we demonstrate that these systems can achieve impressive results across a wide range of domains, outperforming single-agent approaches.

Multi-agent collaboration provides a framework for developers to combine the reasoning power of numerous AI agents powered by LLMs. As we continue to push the boundaries of what is possible, we can expect even more innovative and complex applications, such as networks of agents working together to create software or generate financial analysis reports. On the research front, it’s important to explore how different collaboration patterns, including cooperative and competitive interactions, will emerge and be applied to real-world scenarios.

Additional references


About the author

Raphael Shu is a Senior Applied Scientist at Amazon Bedrock. He received his PhD from the University of Tokyo in 2020, earning a Dean’s Award. His research primarily focuses on Natural Language Generation, Conversational AI, and AI Agents, with publications in conferences such as ICLR, ACL, EMNLP, and AAAI. His work on the attention mechanism and latent variable models received an Outstanding Paper Award at ACL 2017 and the Best Paper Award for JNLP in 2018 and 2019. At AWS, he led the Dialog2API project, which enables large language models to interact with the external environment through dialogue. In 2023, he has led a team aiming to develop the Agentic capability for Amazon Titan. Since 2024, Raphael worked on multi-agent collaboration with LLM-based agents.

Nilaksh Das is an Applied Scientist at AWS, where he works with the Bedrock Agents team to develop scalable, interactive and modular AI systems. His contributions at AWS have spanned multiple initiatives, including the development of foundational models for semantic speech understanding, integration of function calling capabilities for conversational LLMs and the implementation of communication protocols for multi-agent collaboration. Nilaksh completed his PhD in AI Security at Georgia Tech in 2022, where he was also conferred the Outstanding Dissertation Award.

Michelle Yuan is an Applied Scientist on Amazon Bedrock Agents. Her work focuses on scaling customer needs through Generative and Agentic AI services. She has industry experience, multiple first-author publications in top ML/NLP conferences, and strong foundation in mathematics and algorithms. She obtained her Ph.D. in Computer Science at University of Maryland before joining Amazon in 2022.

Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6.5 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.

Dr. Yi Zhang is a Principal Applied Scientist at AWS, Bedrock. With 25 years of combined industrial and academic research experience, Yi’s research focuses on syntactic and semantic understanding of natural language in dialogues, and their application in the development of conversational and interactive systems with speech and text/chat. He has been technically leading the development of modeling solutions behind AWS services such as Bedrock Agents, AWS Lex, HealthScribe, etc.

Read More

How BQA streamlines education quality reporting using Amazon Bedrock

How BQA streamlines education quality reporting using Amazon Bedrock

Given the value of data today, organizations across various industries are working with vast amounts of data across multiple formats. Manually reviewing and processing this information can be a challenging and time-consuming task, with a margin for potential errors. This is where intelligent document processing (IDP), coupled with the power of generative AI, emerges as a game-changing solution.

Enhancing the capabilities of IDP is the integration of generative AI, which harnesses large language models (LLMs) and generative techniques to understand and generate human-like text. This integration allows organizations to not only extract data from documents, but to also interpret, summarize, and generate insights from the extracted information, enabling more intelligent and automated document processing workflows.

The Education and Training Quality Authority (BQA) plays a critical role in improving the quality of education and training services in the Kingdom Bahrain. BQA reviews the performance of all education and training institutions, including schools, universities, and vocational institutes, thereby promoting the professional advancement of the nation’s human capital.

BQA oversees a comprehensive quality assurance process, which includes setting performance standards and conducting objective reviews of education and training institutions. The process involves the collection and analysis of extensive documentation, including self-evaluation reports (SERs), supporting evidence, and various media formats from the institutions being reviewed.

The collaboration between BQA and AWS was facilitated through the Cloud Innovation Center (CIC) program, a joint initiative by AWS, Tamkeen, and leading universities in Bahrain, including Bahrain Polytechnic and University of Bahrain. The CIC program aims to foster innovation within the public sector by providing a collaborative environment where government entities can work closely with AWS consultants and university students to develop cutting-edge solutions using the latest cloud technologies.

As part of the CIC program, BQA has built a proof of concept solution, harnessing the power of AWS services and generative AI capabilities. The primary purpose of this proof of concept was to test and validate the proposed technologies, demonstrating their viability and potential for streamlining BQA’s reporting and data management processes.

In this post, we explore how BQA used the power of Amazon Bedrock, Amazon SageMaker JumpStart, and other AWS services to streamline the overall reporting workflow.

The challenge: Streamlining self-assessment reporting

BQA has traditionally provided education and training institutions with a template for the SER as part of the review process. Institutions are required to submit a review portfolio containing the completed SER and supporting material as evidence, which sometimes did not adhere fully to the established reporting standards.

The existing process had some challenges:

  • Inaccurate or incomplete submissions – Institutions might provide incomplete or inaccurate information in the submitted reports and supporting evidence, leading to gaps in the data required for a comprehensive review.
  • Missing or insufficient supporting evidence – The supporting material provided as evidence by institutions frequently did not substantiate the claims made in their reports, which challenged the evaluation process.
  • Time-consuming and resource-intensive – The process required dedicating significant time and resources to review the submissions manually and follow up with institutions to request additional information if needed to rectify the submissions, resulting in slowing down the overall review process.

These challenges highlighted the need for a more streamlined and efficient approach to the submission and review process.

Solution overview

The proposed solution uses Amazon Bedrock and the Amazon Titan Express model to enable IDP functionalities. The architecture seamlessly integrates multiple AWS services with Amazon Bedrock, allowing for efficient data extraction and comparison.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

The following diagram illustrates the solution architecture.

solution architecture diagram

The solution consists of the following steps:

  1. Relevant documents are uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket.
  2. An event notification is sent to an Amazon Simple Queue Service (Amazon SQS) queue to align each file for further processing. Amazon SQS serves as a buffer, enabling the different components to send and receive messages in a reliable manner without being directly coupled, enhancing scalability and fault tolerance of the system.
  3. The text extraction AWS Lambda function is invoked by the SQS queue, processing each queued file and using Amazon Textract to extract text from the documents.
  4. The extracted text data is placed into another SQS queue for the next processing step.
  5. The text summarization Lambda function is invoked by this new queue containing the extracted text. This function sends a request to SageMaker JumpStart, where a Meta Llama text generation model is deployed to summarize the content based on the provided prompt.
  6. In parallel, the InvokeSageMaker Lambda function is invoked to perform comparisons and assessments. It compares the extracted text against the BQA standards that the model was trained on, evaluating the text for compliance, quality, and other relevant metrics.
  7. The summarized data and assessment results are stored in an Amazon DynamoDB table
  8. Upon request, the InvokeBedrock Lambda function invokes Amazon Bedrock to generate generative AI summaries and comments. The function constructs a detailed prompt designed to guide the Amazon Titan Express model in evaluating the university’s submission.

Prompt engineering using Amazon Bedrock

To take advantage of the power of Amazon Bedrock and make sure the generated output adhered to the desired structure and formatting requirements, a carefully crafted prompt was developed according to the following guidelines:

  • Evidence submission – Present the evidence submitted by the institution under the relevant indicator, providing the model with the necessary context for evaluation
  • Evaluation criteria – Outline the specific criteria the evidence should be assessed against
  • Evaluation instructions – Instruct the model as follows:
    • Indicate N/A if the evidence is irrelevant to the indicator
    • Evaluate the university’s self-assessment based on the criteria
    • Assign a score from 1–5 for each comment, citing evidence directly from the content
  • Response format – Specify the response as bullet points, focusing on relevant analysis and evidence, with a word limit of 100 words

To use this prompt template, you can create a custom Lambda function with your project. The function should handle the retrieval of the required data, such as the indicator name, the university’s submitted evidence, and the rubric criteria. Within the function, include the prompt template and dynamically populate the placeholders (${indicatorName}, ${JSON.stringify(allContent)}, and ${JSON.stringify(c.comment)}) with the retrieved data.

The Amazon Titan Text Express model will then generate the evaluation response based on the provided prompt instructions, adhering to the specified format and guidelines. You can process and analyze the model’s response within your function, extracting the compliance score, relevant analysis, and evidence.

The following is an example prompt template:

for (const c of comments) {
        const prompt = `
        Below is the evidence submitted by the university under the indicator "${indicatorName}":
        ${JSON.stringify(allContent)}
    
         Analyze and Evaluate the university's eviedence based on the provided rubric criteria:
        ${JSON.stringify(c.comment)}

        - If the evidence does not relate to the indicator, indicate that it is not applicable (N/A) without any additional commentary.
        
       Choose one from the below compliance score based on evidence submitted:
       1. Non-compliant: The comment does not meet the criteria or standards.
        2.Compliant with recommendation: The comment meets the criteria but includes a suggestion or recommendation for improvement.
        3. Compliant: The comment meets the criteria or standards.

        THE END OF THE RESPONSE THERE SHOULD BE EITHER SCORE: [SCORE: COMPLIANT OR NON-COMPLIANT OR COMPLIANT WITH RECOMMENDATION]
        Write your response in concise bullet points, focusing strictly on relevant analysis and evidence.
        **LIMIT YOUR RESPONSE TO 100 WORDS ONLY.**

        `;

        logger.info(`Prompt for comment ${c.commentId}: ${prompt}`);

        const body = JSON.stringify({
          inputText: prompt,
          textGenerationConfig: {
            maxTokenCount: 4096,
            stopSequences: [],
            temperature: 0,
            topP: 0.1,
          },
        });

The following screenshot shows an example of the Amazon Bedrock generated response.

Amazon Bedrock generated response

Results

The implementation of Amazon Bedrock enabled institutions with transformative benefits. By automating and streamlining the collection and analysis of extensive documentation, including SERs, supporting evidence, and various media formats, institutions can achieve greater accuracy and consistency in their reporting processes and readiness for the review process. This not only reduces the time and cost associated with manual data processing, but also improves compliance with the quality expectations, thereby enhancing the credibility and quality of their institutions.

For BQA the implementation helped in achieving one of its strategic objectives focused on streamlining their reporting processes and achieve significant improvements across a range of critical metrics, substantially enhancing the overall efficiency and effectiveness of their operations.

Key success metrics anticipated include:

  • Faster turnaround times for generating 70% accurate and standards-compliant self-evaluation reports, leading to improved overall efficiency.
  • Reduced risk of errors or non-compliance in the reporting process, enforcing adherence to established guidelines.
  • Ability to summarize lengthy submissions into concise bullet points, allowing BQA reviewers to quickly analyze and comprehend the most pertinent information, reducing evidence analysis time by 30%.
  • More accurate compliance feedback functionality, empowering reviewers to effectively evaluate submissions against established standards and guidelines, while achieving 30% reduced operational costs through process optimizations.
  • Enhanced transparency and communication through seamless interactions, enabling users to request additional documents or clarifications with ease.
  • Real-time feedback, allowing institutions to make necessary adjustments promptly. This is particularly useful to maintain submission accuracy and completeness.
  • Enhanced decision-making by providing insights on the data. This helps universities identify areas for improvement and make data-driven decisions to enhance their processes and operations.

The following screenshot shows an example generating new evaluations using Amazon Bedrock

generating new evaluations using Amazon Bedrock

Conclusion

This post outlined the implementation of Amazon Bedrock at the Education and Training Quality Authority (BQA), demonstrating the transformative potential of generative AI in revolutionizing the quality assurance processes in the education and training sectors. For those interested in exploring the technical details further, the full code for this implementation is available in the following GitHub repo. If you are interested in conducting a similar proof of concept with us, submit your challenge idea to the Bahrain Polytechnic or University of Bahrain CIC website.


About the Author

Maram AlSaegh is a Cloud Infrastructure Architect at Amazon Web Services (AWS), where she supports AWS customers in accelerating their journey to cloud. Currently, she is focused on developing innovative solutions that leverage generative AI and machine learning (ML) for public sector entities.

Read More

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

Amazon Q Business can increase productivity across diverse teams, including developers, architects, site reliability engineers (SREs), and product managers. Amazon Q Business as a web experience makes AWS best practices readily accessible, providing cloud-centered recommendations quickly and making it straightforward to access AWS service functions, limits, and implementations. These elements are brought together in a web integration that serves various job roles and personas exactly when they need it.

As enterprises continue to grow their applications, environments, and infrastructure, it has become difficult to keep pace with technology trends, best practices, and programming standards. Enterprises provide their developers, engineers, and architects with a range of knowledge bases and documents, such as usage guides, wikis, and tools. But these resources tend to become siloed over time and inaccessible across teams, resulting in reduced knowledge, duplication of work, and reduced productivity.

MuleSoft from Salesforce provides the Anypoint platform that gives IT the tools to automate everything. This includes integrating data and systems and automating workflows and processes, and the creation of incredible digital experiences—all on a single, user-friendly platform.

This post shows how MuleSoft introduced a generative AI-powered assistant using Amazon Q Business to enhance their internal Cloud Central dashboard. This individualized portal shows assets owned, costs and usage, and well-architected recommendations to over 100 engineers. For more on MuleSoft’s journey to cloud computing, refer to Why a Cloud Operating Model?

Developers, engineers, FinOps, and architects can get the right answer at the right time when they’re ready to troubleshoot, address an issue, have an inquiry, or want to understand AWS best practices and cloud-centered deployments.

This post covers how to integrate Amazon Q Business into your enterprise setup.

Solution overview

The Amazon Q Business web experience provides seamless access to information, step-by-step instructions, troubleshooting, and prescriptive guidance so teams can deploy well-architected applications or cloud-centered infrastructure. Team members can chat directly or upload documents and receive summarization, analysis, or answers to a calculation. Amazon Q Business uses supported connectors such as Confluence, Amazon Relational Database Service (Amazon RDS), and web crawlers. The following diagram shows the reference architecture for various personas, including developers, support engineers, DevOps, and FinOps to connect with internal databases and the web using Amazon Q Business.

Reference Architecture

In this reference architecture, you can see how various user personas, spanning across teams and business units, use the Amazon Q Business web experience as an access point for information, step-by-step instructions, troubleshooting, or prescriptive guidance for deploying a well-architected application or cloud-centered infrastructure. The web experience allows team members to chat directly with an AI assistant or upload documents and receive summarization, analysis, or answers to a calculation.

Use cases for Amazon Q Business

Small, medium, and large enterprises, depending on their mode of operation, type of business, and level of investment in IT, will have varying approaches and policies on providing access to information. Amazon Q Business is one of the AWS suites of generative AI services that provides a web-based utility to set up, manage, and interact with Amazon Q. It can answer questions, provide summaries, generate content, and complete tasks using the data and expertise found in your enterprise systems. You can connect internal and external datasets without compromising security to seamlessly incorporate your specific standard operating procedures, guidelines, playbooks, and reference links. With Amazon Q, MuleSoft’s engineering teams were able to address their AWS specific inquiries (such as support ticket escalation, operational guidance, and AWS Well-Architected best practices) at scale.

The Amazon Q Business web experience allows business users across various job titles and functions to interact with Amazon Q through the web browser. With the web experience, teams can access the same information and receive similar recommendations based on their prompt or inquiry, level of experience, and knowledge, ranging from beginner to advanced.

The following demos are examples of what the Amazon Q Business web experience looks like. Amazon Q Business securely connects to over 40 commonly used business tools, such as wikis, intranets, Atlassian, Gmail, Microsoft Exchange, Salesforce, ServiceNow, Slack, and Amazon Simple Storage Service (Amazon S3). Point Amazon Q Business at your enterprise data, and it will search your data, summarize it logically, analyze trends, and engage in dialogue with end users about the data. This helps users access their data no matter where it resides in their organization.

Amazon Q Business underscores prompting and response for prescriptive guidance. Optimizing Amazon Elastic Block Store (Amazon EBS) volumes as an example, it provided detailed migration steps from gp2 to gp3. This is a well-known use case asked about by several MuleSoft teams.

Through the web experience, you can effortlessly perform document uploads and prompts for summary, calculation, or recommendations based on your document. You have the flexibility to upload .pdf, .xls, .xlsx, or .csv files directly into the chat interface. You can also assume a persona such as FinOps or DevOps and get personalized recommendations or responses.

MuleSoft engineers used the Amazon Q Business web summarization feature to better understand Split Cost Allocation Data (SCAD) for Amazon Elastic Kubernetes Service (Amazon EKS). They uploaded the SCAD PDF documents to Amazon Q and got straightforward summaries. This helped them understand their customer’s use of MuleSoft Anypoint platform running on Amazon EKS.

Amazon Q helped analyze IPv4 costs by processing an uploaded Excel file. As the video shows, it calculated expenses for elastic IPs and outbound data transfers, supporting a proposed network estimate.

Amazon Q Business demonstrating its ability to provide tailored advice by responding to a specific user scenario. As the video shows, a user took on the role of a FinOps professional and asked Amazon Q to recommend AWS tools for cost optimization. Amazon Q then offered personalized suggestions based on this FinOps persona perspective.

Prerequisites

To get started with your Amazon Q Business web experience, you need the following prerequisites:

Create an Amazon Q Business web experience

Complete the following steps to create your web experience:

The web experience can be used by a variety of business users or personas to yield accurate and repeatable recommendations for level 100, 200, and 300 inquiries. Amazon Q supports a variety of data sources and data connectors to personalize your user experience. You can also further enrich your dataset with knowledge bases within Amazon Q. With Amazon Q Business set up with your own datasets and sources, teams and business units within your enterprise can index from the same information on common topics such as cost optimization, modernization, and operational excellence while maintaining their own unique area of expertise, responsibility, and job function.

Clean Up

After trying the Amazon Q Business web experience, remember to remove any resources you created to avoid unnecessary charges. Complete the following steps:

  1. Delete the web experience:
    • On the Amazon Q Business console, navigate to the Web experiences section within your application.
    • Select the web experience you want to remove.
    • On the Actions menu, choose Delete.
    • Confirm the deletion by following the prompts.
  2. If you granted specific users access to the web experience, revoke their permissions. This might involve updating AWS Identity and Access Management (IAM) policies or removing users from specific groups in IAM Identity Center.
  3. If you set up any custom configurations for the web experience, such as specific data source filters or custom prompts, make sure to remove these.
  4. If you integrated the web experience with other tools or services, remove those integrations.
  5. Check for and delete any Amazon CloudWatch alarms or logs specifically set up for monitoring this web experience.

After deletion, review your AWS billing to make sure that charges related to the web experience have stopped.

Deleting a web experience is irreversible. Make sure you have any necessary backups or exports of important data before proceeding with the deletion. Also, keep in mind that deleting a web experience doesn’t automatically delete the entire Amazon Q Business application or its associated data sources. If you want to remove everything, follow the Amazon Q Business application clean-up procedure for the entire application.

Conclusion

Amazon Q Business web experience is your gateway to a powerful generative AI assistant. Want to take it further? Integrate Amazon Q with Slack for an even more interactive experience.

Every organization has unique needs when it comes to AI. That’s where Amazon Q shines. It adapts to your business needs, user applications, and end-user personas. The best part? You don’t need to do the heavy lifting. No complex infrastructure setup. No need for teams of data scientists. Amazon Q connects to your data and makes sense of it with just a click. It’s AI power made simple, giving you the intelligence you need without the hassle.

To learn more about the power of a generative AI assistant in your workplace, see Amazon Q Business.


About the Authors

Rueben Jimenez is an AWS Sr Solutions Architect who designs and implements complex data analytics, machine learning, generative AI, and cloud infrastructure solutions.

Sona Rajamani is a Sr. Manager Solutions Architect at AWS.  She lives in the San Francisco Bay Area and helps customers architect and optimize applications on AWS. In her spare time, she enjoys traveling and hiking.

Erick Joaquin is a Sr Customer Solutions Manager for Strategic Accounts at AWS. As a member of the account team, he is focused on evolving his customers’ maturity in the cloud to achieve operational efficiency at scale.

Read More

Build an Amazon Bedrock based digital lending solution on AWS

Build an Amazon Bedrock based digital lending solution on AWS

Digital lending is a critical business enabler for banks and financial institutions. Customers apply for a loan online after completing the know your customer (KYC) process. A typical digital lending process involves various activities, such as user onboarding (including steps to verify the user through KYC), credit verification, risk verification, credit underwriting, and loan sanctioning. Currently, some of these activities are done manually, leading to delays in loan sanctioning and impacting the customer experience.

In India, the KYC verification usually involves identity verification through identification documents for Indian citizens, such as a PAN card or Aadhar card, address verification, and income verification. Credit checks in India are normally done using the PAN number of a customer. The ideal way to address these challenges is to automate them to the extent possible.

The digital lending solution primarily needs orchestration of a sequence of steps and other features such as natural language understanding, image analysis, real-time credit checks, and notifications. You can seamlessly build automation around these features using Amazon Bedrock Agents. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With Amazon Bedrock Agents, you can orchestrate multi-step processes and integrate with enterprise data using natural language instructions.

In this post, we propose a solution using DigitalDhan, a generative AI-based solution to automate customer onboarding and digital lending. The proposed solution uses Amazon Bedrock Agents to automate services related to KYC verification, credit and risk assessment, and notification. Financial institutions can use this solution to help automate the customer onboarding, KYC verification, credit decisioning, credit underwriting, and notification processes. This post demonstrates how you can gain a competitive advantage using Amazon Bedrock Agents based automation of a complex business process.

Why generative AI is best suited for assistants that support customer journeys

Traditional AI assistants that use rules-based navigation or natural language processing (NLP) based guidance fall short when handling the nuances of complex human conversations. For instance, in a real-world customer conversation, the customer might provide inadequate information (for example, missing documents), ask random or unrelated questions that aren’t part of the predefined flow (for example, asking for loan pre-payment options while verifying the identity documents), natural language inputs (such as using various currency modes, such as representing twenty thousand as “20K” or “20000” or “20,000”). Additionally, rules-based assistants don’t provide additional reasoning and explanations (such as why a loan was denied). Some of the rigid and linear flow-related rules either force customers to start the process over again or the conversation requires human assistance.

Generative AI assistants excel at handling these challenges. With well-crafted instructions and prompts, a generative AI-based assistant can ask for missing details, converse in human-like language, and handle errors gracefully while explaining the reasoning for their actions when required. You can add guardrails to make sure that these assistants don’t deviate from the main topic and provide flexible navigation options that account for real-world complexities. Context-aware assistants also enhance customer engagement by flexibly responding to the various off-the-flow customer queries.

Solution overview

DigitalDhan, the proposed digital lending solution, is powered by Amazon Bedrock Agents. They have developed a solution that fully automates the customer onboarding, KYC verification, and credit underwriting process. The DigitalDhan service provides the following features:

  • Customers can understand the step-by-step loan process and the documents required through the solution
  • Customers can upload KYC documents such as PAN and Aadhar, which DigitalDhan verifies through automated workflows
  • DigitalDhan fully automates the credit underwriting and loan application process
  • DigitalDhan notifies the customer about the loan application through email

We have modeled the digital lending process close to a real-world scenario. The high-level steps of the DigitalDhan solution are shown in the following figure.

Digital Lending Process

The key business process steps are:

  1. The loan applicant initiates the loan application flow by accessing the DigitalDhan solution.
  2. The loan applicant begins the loan application journey. Sample prompts for the loan application include:
    1. “What is the process to apply for loan?”
    2. “I would like to apply for loan.”
    3. “My name is Adarsh Kumar. PAN is ABCD1234 and email is john_doe@example.org. I need a loan for 150000.”
    4. The applicant uploads their PAN card.
    5. The applicant uploads their Aadhar card.
  3. The DigitalDhan processes each of the natural language prompts. As part of the document verification process, the solution extracts the key details from the uploaded PAN and Aadhar cards such as name, address, date of birth, and so on. The solution then identifies whether the user is an existing customer using the PAN.
    1. If the user is an existing customer, the solution gets the internal risk score for the customer.
    2. If the user is a new customer, the solution gets the credit score based on the PAN details.
  4. The solution uses the internal risk score for an existing customer to check for credit worthiness.
  5. The solution uses the external credit score for a new customer to check for credit worthiness.
  6. The credit underwriting process involves credit decisioning based on the credit score and risk score, and calculates the final loan amount for the approved customer.
  7. The loan application details along with the decision are sent to the customer through email.

Technical solution architecture

The solution primarily uses Amazon Bedrock Agents (to orchestrate the multi-step process), Amazon Textract (to extract data from the PAN and Aadhar cards), and Amazon Comprehend (to identify the entities from the PAN and Aadhar card). The solution architecture is shown in the following figure.

Technical Solution Architecture for Digital Dhan Solution

The key solution components of the DigitalDhan solution architecture are:

  1. A user begins the onboarding process with the DigitalDhan application. They provide various documents (including PAN and Aadhar) and a loan amount as part of the KYC
  2. After the documents are uploaded, they’re automatically processed using various artificial intelligence and machine learning (AI/ML) services.
  3. Amazon Textract is used to extract text information from the uploaded documents.
  4. Amazon Comprehend is used to identify entities such as PAN and Aadhar.
  5. The credit underwriting flow is powered by Amazon Bedrock Agents.
    1. The knowledge base contains loan-related documents to respond to loan-related queries.
    2. The loan handler AWS Lambda function uses the information in the KYC documents to check the credit score and internal risk score. After the credit checks are complete, the function calculates the loan eligibility and processes the loan application.
    3. The notification Lambda function emails information about the loan application to the customer.
  6. The Lambda function can be integrated with external credit APIs.
  7. Amazon Simple Email Service (Amazon SES) is used to notify customers of the status of their loan application.
  8. The events are logged using Amazon CloudWatch.

Amazon Bedrock Agents deep dive

Because we used Amazon Bedrock Agents heavily in the DigitalDhan solution, let’s look at the overall functioning of Amazon Bedrock Agents. The flow of the various components of Amazon Bedrock Agents is shown in the following figure.

Amazon Bedrock Agents Flow

The Amazon Bedrock agents break each task into subtasks, determine the right sequence, and perform actions and knowledge searches. The detailed steps are:

  1. Processing the loan application is the primary task performed by the Amazon Bedrock agents in the DigitalDhan solution.
  2. The Amazon Bedrock agents use the user prompts, conversation history, knowledge base, instructions, and action groups to orchestrate the sequence of steps related to loan processing. The Amazon Bedrock agent takes natural language prompts as inputs. The following are the instructions given to the agent:
You are DigitalDhan, an advanced AI lending assistant designed to provide personal loan-related information create loan application. Always ask for relevant information and avoid making assumptions. If you're unsure about something, clearly state "I don't have that information."

Always greet the user by saying the following: Hi there! I am DigitalDhan bot. I can help you with loans over this chat. To apply for a loan, kindly provide your full name, PAN Number, email, and the loan amount."

When a user expresses interest in applying for a loan, follow these steps in order, always ask the user for necessary details:

1. Determine user status: Identify if they're an existing or new customer.

2. User greeting (mandatory, do not skip): After determining user status, welcome returning users using the following format:

  Existing customer: Hi {customerName}, I see you are an existing customer. Please upload your PAN for KYC.

  New customer: Hi {customerName}, I see you are a new customer. Please upload your PAN and Aadhar for KYC.

3. Call Pan Verification step using the uploaded PAN document

4. Call Aadhaar Verification step using the uploaded Aadhaar document. Request the user to upload their Aadhaar card document for verification.

5. Loan application: Collect all necessary details to create the loan application.

6. If the loan is approved (email will be sent with details):

   For existing customers: If the loan officer approves the application, inform the user that their loan application has been approved using following format: Congratulations {customerName}, your loan is sanctioned. Based on your PAN {pan}, your risk score is {riskScore} and your overall credit score is {cibilScore}. I have created your loan and the application ID is {loanId}. The details have been sent to your email.

   For new customers: If the loan officer approves the application, inform the user that their loan application has been approved using following format: Congratulations {customerName}, your loan is sanctioned. Based on your PAN {pan} and {aadhar}, your risk score is {riskScore} and your overall credit score is {cibilScore}. I have created your loan and the application ID is {loanId}. The details have been sent to your email.

7. If the loan is rejected ( no emails sent):

   For new customers: If the loan officer rejects the application, inform the user that their loan application has been rejected using following format: Hello {customerName}, Based on your PAN {pan} and aadhar {aadhar}, your overall credit score is {cibilScore}. Because of the low credit score, unfortunately your loan application cannot be processed.

   For existing customers: If the loan officer rejects the application, inform the user that their loan application has been rejected using following format: Hello {customerName}, Based on your PAN {pan}, your overall credit score is {creditScore}. Because of the low credit score, unfortunately your loan application cannot be processed.

Remember to maintain a friendly, professional tone and prioritize the user's needs and concerns throughout the interaction. Be short and direct in your responses and avoid making assumptions unless specifically requested by the user.

Be short and prompt in responses, do not answer queries beyond the lending domain and respond saying you are a lending assistant
  1. We configured the agent preprocessing and orchestration instructions to validate and perform the steps in a predefined sequence. The few-shot examples specified during the agent instructions boost the accuracy of the agent performance. Based on the instructions and the API descriptions, the Amazon Bedrock agent creates a logical sequence of steps to complete an action. In the DigitalDhan example, instructions are specified such that the Amazon Bedrock agent creates the following sequence:
    1. Greet the customer.
    2. Collect the customer’s name, email, PAN, and loan amount.
    3. Ask for the PAN card and Aadhar card to read and verify the PAN and Aadhar number.
    4. Categorize the customer as an existing or new customer based on the verified PAN.
    5. For an existing customer, calculate the customer internal risk score.
    6. For a new customer, get the external credit score.
    7. Use the internal risk score (for existing customers) or credit score (for external customers) for credit underwriting. If the internal risk score is less than 300 or if the credit score is more than 700, sanction the loan amount.
    8. Email the credit decision to the customer’s email address.
  2. Action groups define the APIs for performing actions such as creating the loan, checking the user, fetching the risk score, and so on. We described each of the APIs in the OpenAPI schema, which the agent uses to select the most appropriate API to perform the action. Lambda is associated with the action group. The following code is an example of the create_loan API. The Amazon Bedrock agent uses the description for the create_loan API while performing the action. The API schema also specifies customerName, address, loanAmt, PAN, and riskScore as required elements for the APIs. Therefore, the corresponding APIs read the PAN number for the customer (verify_pan_card API), calculate the risk score for the customer (fetch_risk_score API), and identify the customer’s name and address (verify_aadhar_card API) before calling the create_loan API.
"/create_loan":
  post:
    summary: Create New Loan application
    description: Create new loan application for the customer. This API must be
      called for each new loan application request after calculating riskscore and
      creditScore
    operationId: createLoan
    requestBody:
      required: true
      content:
        application/json:
          schema:
            type: object
            properties:
              customerName:
                type: string
                description: Customer’s Name for creating the loan application
                minLength: 3
              loanAmt:
                type: string
                description: Preferred loan amount for the loan application
                minLength: 5
              pan:
                type: string
                description: Customer's PAN number for the loan application
                minLength: 10
              riskScore:
                type: string
                description: Risk Score of the customer
                minLength: 2
              creditScore:
                type: string
                description: Risk Score of the customer
                minLength: 3
            required:
            - customerName
            - address
            - loanAmt
            - pan
            - riskScore
            - creditScore
    responses:
      '200':
        description: Success
        content:
          application/json:
            schema:
              type: object
              properties:
                loanId:
                  type: string
                  description: Identifier for the created loan application
                status:
                  type: string
                  description: Status of the loan application creation process
  1. Amazon Bedrock Knowledge Bases provides a cloud-based Retrieval Augmented Generation (RAG) experience to the customer. We have added the documents related to loan processing, the general information, the loan information guide, and the knowledge base. We specified the instructions for when to use the knowledge base. Therefore, during the beginning of a customer journey, when the customer is in the exploration stage, they get responses with how-to instructions and general loan-related information. For instance, if the customer asks “What is the process to apply for a loan?” the Amazon Bedrock agent fetches the relevant step-by-step details from the knowledge base.
  2. After the required steps are complete, the Amazon Bedrock agent curates the final response to the customer.

Let’s explore an example flow for an existing customer. For this example, we have depicted various actions performed by Amazon Bedrock Agents for an existing customer. First, the customer begins the loan journey by asking exploratory questions. We have depicted one such question—“What is the process to apply for a loan?”—in the following figure. Amazon Bedrock responds to such questions by providing a step-by-step guide fetched from the configured knowledge base.

Conversation with Digital Lending Solution

The customer proceeds to the next step and tries to apply for a loan. The DigitalDhan solution asks for the user details such as the customer name, email address, PAN number, and desired loan amount. After the customer provides those details, the solution asks for the actual PAN card to verify the details, as shown in in the following figure.

Identity Verification with Digital Lending Solution

When the PAN verification and the risk score checks are complete, the DigitalDhan solution creates a loan application and notifies the customer of the decision through the email, as shown in the following figure.

Notification in Digital Lending Solution

Prerequisites

This project is built using the AWS Cloud Development Kit (AWS CDK).

For reference, the following versions of node and AWS CDK are used:

  • js: v20.16.0
  • AWS CDK: 2.143.0
  • The command to install a specific version of the AWS CDK is npm install -g aws-cdk@<X.YY.Z>

Deploy the Solution

Complete the following steps to deploy the solution. For more details, refer to the GitHub repo.

  1. Clone the repository:
    git clone https://github.com/aws-samples/DigitalDhan-GenAI-FSI-LendingSolution-India.git

  2. Enter the code sample backend directory:
    cd DigitalDhan-GenAI-FSI-LendingSolution-India/

  3. Install packages:
    npm install
    npm install -g aws-cdk

  4. Bootstrap AWS CDK resources on the AWS account. If deployed in any AWS Region other than us-east-1, the stack might fail because of Lambda layers dependency. You can either comment the layer and deploy in another Region or deploy in us-east-1.
    cdk bootstrap aws://<ACCOUNT_ID>/<REGION>

  5. You must explicitly enable access to models before they can be used with the Amazon Bedrock service. Follow the steps in Access Amazon Bedrock foundation models to enable access to the models (Anthropic::Claude (Sonnet) and Cohere::Embed English).
  6. Deploy the sample in your account. The following command will deploy one stack in your account cdk deploy --all
    To protect against unintended changes that might affect your security posture, the AWS CDK prompts you to approve security-related changes before deploying them. You will need to answer yes to fully deploy the stack.

The AWS Identity and Access Management (IAM) role creation in this example is for illustration only. Always provision IAM roles with the least required privileges. The stack deployment takes approximately 10–15 minutes. After the stack is successfully deployed, you can find InsureAssistApiAlbDnsName in the output section of the stack—this is the application endpoint.

Enable user input

After deployment is complete, enable user input so the agent can prompt the customer to provide addition information if necessary.

  1. Open the Amazon Bedrock console in the deployed Region and edit the agent.
  2. Modify the additional settings to enable User Input to allow the agent to prompt for additional information from the user when it doesn’t have enough information to respond to a prompt.

Test the solution

We covered three test scenarios in the solution. The sample data and prompts for the three scenarios can found in the GitHub repo.

  • Scenario 1 is an existing customer who will be approved for the requested loan amount
  • Scenario 2 is a new customer who will be approved for the requested loan amount
  • Scenario 3 is a new customer whose loan application will be denied because of a low credit score

Clean up

To avoid future charges, delete the sample data stored in Amazon Simple Storage Service (Amazon S3) and the stack:

  1. Remove all data from the S3 bucket.
  2. Delete the S3 bucket.
  3. Use the following command to destroy the stack: cdk destroy

Summary

The proposed digital lending solution discussed in this post onboards a customer by verifying the KYC documents (including the PAN and Aadhar cards) and categorizes the customer as an existing customer or a new customer. For an existing customer, the solution uses an internal risk score, and for a new customer, the solution uses the external credit score.

The solution uses Amazon Bedrock Agents to orchestrate the digital lending processing steps. The documents are processed using Amazon Textract and Amazon Comprehend, after which Amazon Bedrock Agents processes the workflow steps. The customer identification, credit checks, and customer notification are implemented using Lambda.

The solution demonstrates how you can automate a complex business process with the help of Amazon Bedrock Agents and enhance customer engagement through a natural language interface and flexible navigation options.

Test some Amazon Bedrock for banking use cases such as building customer service bots, email classification, and sales assistants by using the powerful FMs and Amazon Bedrock Knowledge Bases that provide a managed RAG experience. Explore using Amazon Bedrock Agents to help orchestrate and automate complex banking processes such as customer onboarding, document verification, digital lending, loan origination, and customer servicing.


About the Authors

Shailesh Shivakumar is a FSI Sr. Solutions Architect with AWS India. He works with financial enterprises such as banks, NBFCs, and trading enterprises to help them design secure cloud services and engages with them to accelerate their cloud journey. He builds demos and proofs of concept to demonstrate the possibilities of AWS Cloud. He leads other initiatives such as customer enablement workshops, AWS demos, cost optimization, and solution assessments to make sure that AWS customers succeed in their cloud journey. Shailesh is part of Machine Learning TFC at AWS, handling the generative AI and machine learning-focused customer scenarios. Security, serverless, containers, and machine learning in the cloud are his key areas of interest.

Reena Manivel is AWS FSI Solutions Architect. She specializes in analytics and works with customers in lending and banking businesses to create secure, scalable, and efficient solutions on AWS. Besides her technical pursuits, she is also a writer and enjoys spending time with her family.

Read More

Build AI-powered malware analysis using Amazon Bedrock with Deep Instinct

Build AI-powered malware analysis using Amazon Bedrock with Deep Instinct

This post is co-written with Yaniv Avolov, Tal Furman and Maor Ashkenazi from Deep Instinct.

Deep Instinct is a cybersecurity company that offers a state-of-the-art, comprehensive zero-day data security solution—Data Security X (DSX), for safeguarding your data repositories across the cloud, applications, network attached storage (NAS), and endpoints. DSX provides unmatched prevention and explainability by using a powerful combination of deep learning-based DSX Brain and generative AI DSX Companion to protect systems from known and unknown malware and ransomware in real-time.

Using deep neural networks (DNNs), Deep Instinct analyzes threats with unmatched accuracy, adapting to identify new and unknown risks that traditional methods might miss. This approach significantly reduces false positives and enables unparalleled threat detection rates, making it popular among large enterprises and critical infrastructure sectors such as finance, healthcare, and government.

In this post, we explore how Deep Instinct’s generative AI-powered malware analysis tool, DIANNA, uses Amazon Bedrock to revolutionize cybersecurity by providing rapid, in-depth analysis of known and unknown threats, enhancing the capabilities of AWS System and Organization Controls (SOC) teams and addressing key challenges in the evolving threat landscape.

Main challenges for SecOps

There are two main challenges for SecOps:

  • The growing threat landscape – With a rapidly evolving threat landscape, SOC teams are becoming overwhelmed with a continuous increase of security alerts that require investigation. This situation hampers proactive threat hunting and exacerbates team burnout. Most importantly, the surge in alert storms increases the risk of missing critical alerts. A solution is needed that provides the explainability necessary to allow SOC teams to perform quick risk assessments regarding the nature of incidents and make informed decisions.
  • The challenges of malware analysis – Malware analysis has become an increasingly critical and complex field. The challenge of zero-day attacks lies in the limited information about why a file was blocked and classified as malicious. Threat analysts often spend considerable time assessing whether it was a genuine exploit or a false positive.

Let’s explore some of the key challenges that make malware analysis demanding:

  • Identifying malware – Modern malware has become incredibly sophisticated in its ability to disguise itself. It often mimics legitimate software, making it challenging for analysts to distinguish between benign and malicious code. Some malware can even disable security tools or evade scanners, further obfuscating detection.
  • Preventing zero-day threats – The rise of zero-day threats, which have no known signatures, adds another layer of difficulty. Identifying unknown malware is crucial, because failure can lead to severe security breaches and potentially incapacitate organizations.
  • Information overload The powerful malware analysis tools currently available can be both beneficial and detrimental. Although they offer high explainability, they can also produce an overwhelming amount of data, forcing analysts to sift through a digital haystack to find indicators of malicious activity, increasing the possibility of analysts overlooking critical compromises.
  • Connecting the dots – Malware often consists of multiple components interacting in complex ways. Not only do analysts need to identify the individual components, but they also need to understand how they interact. This process is like assembling a jigsaw puzzle to form a complete picture of the malware’s capabilities and intentions, with pieces constantly changing shape.
  • Keeping up with cybercriminals – The world of cybercrime is fluid, with bad actors relentlessly developing new techniques and exploiting newly emerging vulnerabilities, leaving organizations struggling to keep up. The time window between the discovery of a vulnerability and its exploitation in the wild is narrowing, putting pressure on analysts to work faster and more efficiently. This rapid evolution means that malware analysts must constantly update their skill set and tools to stay one step ahead of the cybercriminals.
  • Racing against the clock – In malware analysis, time is of the essence. Malicious software can spread rapidly across networks, causing significant damage in a matter of minutes, often before the organization realizes an exploit has occurred. Analysts face the pressure of conducting thorough examinations while also providing timely insights to prevent or mitigate exploits.

DIANNA, the DSX Companion

There is a critical need for malware analysis tools that can provide precise, real-time, in-depth malware analysis for both known and unknown threats, supporting SecOps efforts. Deep Instinct, recognizing this need, has developed DIANNA (Deep Instinct’s Artificial Neural Network Assistant), the DSX Companion. DIANNA is a groundbreaking malware analysis tool powered by generative AI to tackle real-world issues, using Amazon Bedrock as its large language model (LLM) infrastructure. It offers on-demand features that provide flexible and scalable AI capabilities tailored to the unique needs of each client. Amazon Bedrock is a fully managed service that grants access to high-performance foundation models (FMs) from top AI companies through a unified API. By concentrating our generative AI models on specific artifacts, we can deliver comprehensive yet focused responses to address this gap effectively.

DIANNA is a sophisticated malware analysis tool that acts as a virtual team of malware analysts and incident response experts. It enables organizations to shift strategically toward zero-day data security by integrating with Deep Instinct’s deep learning capabilities for a more intuitive and effective defense against threats.

DIANNA’s unique approach

Current cybersecurity solutions use generative AI to summarize data from existing sources, but this approach is limited to retrospective analysis with limited context. DIANNA enhances this by integrating the collective expertise of numerous cybersecurity professionals within the LLM, enabling in-depth malware analysis of unknown files and accurate identification of malicious intent.

DIANNA’s unique approach to malware analysis sets it apart from other cybersecurity solutions. Unlike traditional methods that rely solely on retrospective analysis of existing data, DIANNA harnesses generative AI to empower itself with the collective knowledge of countless cybersecurity experts, sources, blog posts, papers, threat intelligence reputation engines, and chats. This extensive knowledge base is effectively embedded within the LLM, allowing DIANNA to delve deep into unknown files and uncover intricate connections that would otherwise go undetected.

At the heart of this process are DIANNA’s advanced translation engines, which transform complex binary code into natural language that LLMs can understand and analyze. This unique approach bridges the gap between raw code and human-readable insights, enabling DIANNA to provide clear, contextual explanations of a file’s intent, malicious aspects, and potential system impact. By translating the intricacies of code into accessible language, DIANNA addresses the challenge of information overload, distilling vast amounts of data into concise, actionable intelligence.

This translation capability is key for linking between different components of complex malware. It allows DIANNA to identify relationships and interactions between various parts of the code, offering a holistic view of the threat landscape. By piecing together these components, DIANNA can construct a comprehensive picture of the malware’s capabilities and intentions, even when faced with sophisticated threats. DIANNA doesn’t stop at simple code analysis—it goes deeper. It provides insights into why unknown events are malicious, streamlining what is often a lengthy process. This level of understanding allows SOC teams to focus on the threats that matter most.

Solution overview

DIANNA’s integration with Amazon Bedrock allows us to harness the power of state-of-the-art language models while maintaining agility to adapt to evolving client requirements and security considerations. DIANNA benefits from the robust features of Amazon Bedrock, including seamless scaling, enterprise-grade security, and the ability to fine-tune models for specific use cases.

The integration offers the following benefits:

  • Accelerated development with Amazon Bedrock – The fast-paced evolution of the threat landscape necessitates equally responsive cybersecurity solutions. DIANNA’s collaboration with Amazon Bedrock has played a crucial role in optimizing our development process and speeding up the delivery of innovative capabilities. The service’s versatility has enabled us to experiment with different FMs, exploring their strengths and weaknesses in various tasks. This experimentation has led to significant advancements in DIANNA’s ability to understand and explain complex malware behaviors. We have also benefited from the following features:
    • Fine-tuning – Alongside its core functionalities, Amazon Bedrock provides a range of ready-to-use features for customizing the solution. One such feature is model fine-tuning, which allows you to train FMs on proprietary data to enhance your performance in specific domains. For example, organizations can fine-tune an LLM-based malware analysis tool to recognize industry-specific jargon or detect threats associated with particular vulnerabilities.
    • Retrieval Augmented Generation – Another valuable feature is the use of Retrieval Augmented Generation (RAG), enabling access to and the incorporation of relevant information from external sources, such as knowledge bases or threat intelligence feeds. This enhances the model’s ability to provide contextually accurate and informative responses, improving the overall effectiveness of malware analysis.
  • A landscape for innovation and comparison – Amazon Bedrock has also served as a valuable landscape for conducting LLM-related research and comparisons.
  • Seamless integration, scalability, and customization – Integrating Amazon Bedrock into DIANNA’s architecture was a straightforward process. The user-friendly Amazon Bedrock API and well-documented facilitated seamless integration with our existing infrastructure. Furthermore, the service’s on-demand nature allows us to scale our AI capabilities up or down based on customer demand. This flexibility makes sure that DIANNA can handle fluctuating workloads without compromising performance.
  • Prioritizing data security and compliance – Data security and compliance are paramount in the cybersecurity domain. Amazon Bedrock offers enterprise-grade security features that provide us with the confidence to handle sensitive customer data. The service’s adherence to industry-leading security standards, coupled with the extensive experience of AWS in data protection, makes sure DIANNA meets the highest regulatory requirements such as GDPR. By using Amazon Bedrock, we can offer our customers a solution that not only protects their assets, but also demonstrates our commitment to data privacy and security.

By combining Deep Instinct’s proprietary prevention algorithms with the advanced language processing capabilities of Amazon Bedrock, DIANNA offers a unique solution that not only identifies and analyzes threats with high accuracy, but also communicates its findings in clear, actionable language. This synergy between Deep Instinct’s expertise in cybersecurity and the leading AI infrastructure of Amazon positions DIANNA at the forefront of AI-driven malware analysis and threat prevention.

The following diagram illustrates DIANNA’s architecture.

DIANNA’s architecture

Evaluating DIANNA’s malware analysis

In our task, the input is a malware sample, and the output is a comprehensive, in-depth report on the behaviors and intents of the file. However, generating ground truth data is particularly challenging. The behaviors and intents of malicious files aren’t readily available in standard datasets and require expert malware analysts for accurate reporting. Therefore, we needed a custom evaluation approach.

We focused our evaluation on two core dimensions:

  • Technical features – This dimension focuses on objective, measurable capabilities. We used programmable metrics to assess how well DIANNA handled key technical aspects, such as extracting indicators of compromise (IOCs), detecting critical keywords, and processing the length and structure of threat reports. These metrics allowed us to quantitatively assess the model’s basic analysis capabilities.
  • In-depth semantics – Because DIANNA is expected to generate complex, human-readable reports on malware behavior, we relied on domain experts (malware analysts) to assess the quality of the analysis. The reports were evaluated based on the following:
    • Depth of information – Whether DIANNA provided a detailed understanding of the malware’s behavior and techniques.
    • Accuracy – How well the analysis aligned with the true behaviors of the malware.
    • Clarity and structure – Evaluating the organization of the report, making sure the output was clear and comprehensible for security teams.

Because human evaluation is labor-intensive, fine-tuning the key components (the model itself, the prompts, and the translation engines) involved iterative feedback loops. Small adjustments in a component led to significant variations in the output, requiring repeated validations by human experts. The meticulous nature of this process, combined with the continuous need for scaling, has subsequently led to the development of the auto-evaluation capability.

Fine-tuning process and human validation

The fine-tuning and validation process consisted of the following steps:

  • Gathering a malware dataset To cover the breadth of malware techniques, families, and threat types, we collected a large dataset of malware samples, each with technical metadata.
  • Splitting the dataset – The data was split into subsets for training, validation, and evaluation. Validation data was continually used to test how well DIANNA adapted after each key component update.
  • Human expert evaluation – Each time we fine-tuned DIANNA’s model, prompts, and translation mechanisms, human malware analysts reviewed a portion of the validation data. This made sure improvements or degradations in the quality of the reports were identified early. Because DIANNA’s outputs are highly sensitive to even minor changes, each update required a full reevaluation by human experts to verify whether the response quality was improved or degraded.
  • Final evaluation on a broader dataset – After sufficient tuning based on the validation data, we applied DIANNA to a large evaluation set. Here, we gathered comprehensive statistics on its performance to confirm improvements in report quality, correctness, and overall technical coverage.

Automation of evaluation

To make this process more scalable and efficient, we introduced an automatic evaluation phase. We trained a language model specifically designed to critique DIANNA’s outputs, providing a level of automation in assessing how well DIANNA was generating reports. This critique model acted as an internal judge, allowing for continuous, rapid feedback on incremental changes during fine-tuning. This enabled us to make small adjustments across DIANNA’s three core components (model, prompts, and translation engines) while receiving real-time evaluations of the impact of those changes.

This automated critique model enhanced our ability to test and refine DIANNA without having to rely solely on the time-consuming manual feedback loop from human experts. It provided a consistent, reliable measure of performance and allowed us to quickly identify which model adjustments led to meaningful improvements in DIANNA’s analysis.

Advanced integration and proactive analysis

DIANNA is integrated with Deep Instinct’s proprietary deep learning algorithms, enabling it to detect zero-day threats with high accuracy and a low false positive rate. This proactive approach helps security teams quickly identify unknown threats, reduce false positives, and allocate resources more effectively. Additionally, it streamlines investigations, minimizes cross-tool efforts, and automates repetitive tasks, making the decision-making process clearer and faster. This ultimately helps organizations strengthen their security posture and significantly reduce the mean time to triage.

This analysis offers the following key features and benefits:

  • Performs on-the-fly file scans, allowing for immediate assessment without prior setup or delays
  • Generates comprehensive malware analysis reports for a variety of file types in seconds, making sure users receive timely information about potential threats
  • Streamlines the entire file analysis process, making it more efficient and user-friendly, thereby reducing the time and effort required for thorough evaluations
  • Supports a wide range of common file formats, including Office documents, Windows executable files, script files, and Windows shortcut files (.lnk), providing compatibility with various types of data
  • Offers in-depth contextual analysis, malicious file triage, and actionable insights, greatly enhancing the efficiency of investigations into potentially harmful files
  • Empowers SOC teams to make well-informed decisions without relying on manual malware analysis by providing clear and concise insights into the behavior of malicious files
  • Alleviates the need to upload files to external sandboxes or VirusTotal, thereby enhancing security and privacy while facilitating quicker analysis

Explainability and insights into better decision-making for SOC teams

DIANNA stands out by offering clear insights into why unknown events are flagged as malicious. Traditional AI tools often rely on lengthy, retrospective analyses that can take hours or even days to generate, and often lead to vague conclusions. DIANNA dives deeper, understanding the intent behind the code and providing detailed explanations of its potential impact. This clarity allows SOC teams to prioritize the threats that matter most.

Example scenario of DIANNA in action

In this section, we explore some DIANNA use cases.

For example, DIANNA can perform investigations on malicious files.

The following screenshot is an example of a Windows executable file analysis.Windows executable file analysis

The following screenshot is an example of an Office file analysis.

Office file analysis

You can also quickly triage incidents with enriched data on file analysis provided by DIANNA. The following screenshot is an example using Windows shortcut files (LNK) analysis.Windows shortcut files (LNK) analysis

The following screenshot is an example with a script file (JavaScript) analysis.script file (JavaScript) analysis

The following figure presents a before and after comparison of the analysis process.comparison of the analysis process

Additionally, a key advantage of DIANNA is its ability to provide explainability by correlating and summarizing the intentions of malicious files in a detailed narrative. This is especially valuable for zero-day and unknown threats that aren’t yet recognized, making investigations challenging when starting from scratch without any clues.

Potential advancements in AI-driven cybersecurity

AI capabilities are enhancing daily operations, but adversaries are also using AI to create sophisticated malicious events and advanced persistent threats. This leaves organizations, particularly SOC and cybersecurity teams, dealing with more complex incidents.

Although detection controls are useful, they often require significant resources and can be ineffective on their own. In contrast, using AI engines for prevention controls—such as a high-efficacy deep learning engine—can lower the total cost of ownership and help SOC analysts streamline their tasks.

Conclusion

The Deep Instinct solution can predict and prevent known, unknown, and zero-day threats in under 20 milliseconds—750 times faster than the fastest ransomware encryption. This makes it essential for security stacks, offering comprehensive protection in hybrid environments.

DIANNA provides expert malware analysis and explainability for zero-day attacks and can enhance the incident response process for the SOC team, allowing them to efficiently tackle and investigate unknown threats with minimal time investment. This, in turn, reduces the resources and expenses that Chief Information Security Officers (CISOs) need to allocate, enabling them to invest in more valuable initiatives.

DIANNA’s collaboration with Amazon Bedrock accelerated development, enabled innovation through experimentation with various FMs, and facilitated seamless integration, scalability, and data security. The rise of AI-based threats is becoming more pronounced. As a result, defenders must outpace increasingly sophisticated bad actors by moving beyond traditional AI tools and embracing advanced AI, especially deep learning. Companies, vendors, and cybersecurity professionals must consider this shift to effectively combat the growing prevalence of AI-driven exploits.


About the Authors

Tzahi Mizrahi is a Solutions Architect at Amazon Web Services with experience in cloud architecture and software development. His expertise includes designing scalable systems, implementing DevOps best practices, and optimizing cloud infrastructure for enterprise applications. He has a proven track record of helping organizations modernize their technology stack and improve operational efficiency. In his free time, he enjoys music and plays the guitar.

Tal Panchek is a Senior Business Development Manager for Artificial Intelligence and Machine Learning with Amazon Web Services. As a BD Specialist, he is responsible for growing adoption, utilization, and revenue for AWS services. He gathers customer and industry needs and partner with AWS product teams to innovate, develop, and deliver AWS solutions.

Yaniv Avolov is a Principal Product Manager at Deep Instinct, bringing a wealth of experience in the cybersecurity field. He focuses on defining and designing cybersecurity solutions that leverage AIML, including deep learning and large language models, to address customer needs. In addition, he leads the endpoint security solution, ensuring it is robust and effective against emerging threats. In his free time, he enjoys cooking, reading, playing basketball, and traveling.

Tal Furman is a Data Science and Deep Learning Director at Deep Instinct. His focused on applying Machine Learning and Deep Learning algorithms to tackle real world challenges, and takes pride in leading people and technology to shape the future of cyber security. In his free time, Tal enjoys running, swimming, reading and playfully trolling his kids and dogs.

Maor Ashkenazi is a deep learning research team lead at Deep Instinct, and a PhD candidate at Ben-Gurion University of the Negev. He has extensive experience in deep learning, neural network optimization, computer vision, and cyber security. In his spare time, he enjoys traveling, cooking, practicing mixology and learning new things.

Read More