At Apple, we believe privacy is a fundamental human right. It’s also one of our core values, influencing both our research and the design of Apple’s products and services.
Understanding how people use their devices often helps in improving the user experience. However, accessing the data that provides such insights — for example, what users type on their keyboards and the websites they visit — can compromise user privacy. We develop system architectures that enable learning at scale by leveraging advances in machine learning (ML), such as private federated learning (PFL), combined with…Apple Machine Learning Research
Implementing tenant isolation using Agents for Amazon Bedrock in a multi-tenant environment
The number of generative artificial intelligence (AI) features is growing within software offerings, especially after market-leading foundational models (FMs) became consumable through an API using Amazon Bedrock. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
Agents for Amazon Bedrock enables software builders to complete actions and tasks based on user input and organization data. A common challenge in multi-tenant offerings, such as software as a service (SaaS) products, is tenant isolation. Tenant isolation makes sure each tenant can access only their own resources—even if all tenants run on shared infrastructure.
You can isolate tenants in an application using different multi-tenant architecture patterns. In some cases, isolation can be achieved by having entire stacks of resources dedicated to one tenant (silo model) with coarse-grained policies to prevent cross-tenant access. In other scenarios, you might have pooled resources (such as one database table containing rows from different tenants) that require fine-grained policies to control access. Oftentimes, Amazon Web Services (AWS) customers design their applications using a mix of both models to balance the models’ tradeoffs.
Isolating tenants in a pooled model is achieved by using tenant context information in different application components. The tenant context can be injected by an authoritative source, such as the identity provider (IdP) during the authentication of a user. Integrity of the tenant context must be preserved throughout the system to prevent malicious users from acting on behalf of a tenant that they shouldn’t have access to, resulting in potentially sensitive data being disclosed or modified.
FMs act on unstructured data and respond in a probabilistic fashion. These properties make FMs unfit to handle tenant context securely. For example, FMs are susceptible to prompt injection, which can be used by malicious actors to change the tenant context. Instead, tenant context should be securely passed between deterministic components of an application, which can in turn consume FM capabilities, giving the FM only information that is already scoped down to the specific tenant.
In this blog post, you will learn how to implement tenant isolation using Amazon Bedrock agents within a multi-tenant environment. We’ll demonstrate this using a sample multi-tenant e-commerce application that provides a service for various tenants to create online stores. This application uses Amazon Bedrock agents to develop an AI assistant or chatbot capable of providing tenant-specific information, such as return policies and user-specific information like order counts and status updates. This architecture showcases how you can use pooled Amazon Bedrock agents and enforce tenant isolation at both the tenant level for return policy information and the user level for user-related data, providing a secure and personalized experience for each tenant and their users.
Architecture overview
Figure 1: Architecture of the sample AI assistant application
Let’s explore the different components this solution is using.
- A tenant user signs in to an identity provider such as Amazon Cognito. They get a JSON Web Token (JWT), which they use for API requests. The JWT contains claims such as the user ID (or subject,
sub
), which identifies the tenant user, and thetenantId
, which defines which tenant the user belongs to. - The tenant user inputs their question into the client application. The client application sends the question to a GraphQL API endpoint provided by AWS AppSync, in the form of a GraphQL mutation. You can learn more about this pattern in the blog post Build a Real-time, WebSockets API for Amazon Bedrock. The client application authenticates to AWS AppSync using the JWT from Amazon Cognito. The user is authorized using the Cognito User Pools integration.
- The GraphQL mutation invokes using the EventBridge resolver. The event triggers an AWS Lambda function using an EventBridge rule.
- The Lambda function calls the Amazon Bedrock InvokeAgent API. This function uses a tenant isolation policy to scope the permissions and generates tenant specific scoped credentials. More about this can be read in the blog Building a Multi-Tenant SaaS Solution Using AWS Serverless Services. Then, it sends the tenant ID, user ID and tenant specific scoped credentials to this API using the
sessionAttributes
parameter from the agent’ssessionState
. - The Amazon Bedrock agent determines what it needs to do to satisfy the user request by using the reasoning capabilities of the associated large language model (LLM). A variety of LLMs can be used, and for this solution we used Anthropic Claude 3 Sonnet. It passes the
sessionAttributes
object to an action group determined to help with the request, thereby securely forwarding tenant and user ID for further processing steps. - This Lambda function uses the provided tenant specific scoped credentials and tenant ID to fetch information from Amazon DynamoDB. Tenant configuration data is stored in a single, shared table, while user data is split in one table per tenant. After the correct data is fetched, it’s returned to the agent. The agent interacts with the LLM for the second time to formulate a natural-language answer to the user based on the provided data.
- The agent’s response is published as another GraphQL mutation through AWS AppSync.
- The client listens to the response using a GraphQL subscription. It renders the response to the user after it’s received from the server.
Note that each component in this sample architecture can be changed to fit into your pre-existing architecture and knowledge in the organization. For example, you might choose to use a WebSocket implementation through Amazon API Gateway instead of using GraphQL or implement a synchronous request and response pattern. Whichever technology stack you choose to use, verify that you securely pass tenant and user context between its different layers. Do not rely on probabilistic components of your stack, such as an LLM, to accurately transmit security information.
How tenant and user data is isolated
This section describes how user and tenant data is isolated when a request is processed throughout the system. Each step is discussed in more detail following the diagram. For each prompt in the UI, the frontend sends the prompt as a mutation request to the AWS AppSync API and listens for the response through a subscription, as explained in step 8 of Figure 1 shown above. The subscription is needed to receive the answer from the prompt, as the agent is invoked asynchronously. Both the request and response are authenticated using Amazon Cognito, and the request’s context, including user and tenant ID, is made available to downstream components.
Figure 2: User and tenant data isolation
- For each prompt created in the sample UI, a unique ID(
answerId
) is generated. TheanswerId
is needed to correlate the input prompt with the answer from the agent. It uses the Cognito user ID (stored in the sub field in the JWT and accessible asuserId
in the AWS Amplify SDK) as a prefix to enable fine-grained permissions. This is explained in more depth in step 3. TheanswerId
is generated in thepage.tsx
file:
- The frontend uses the AWS Amplify SDK, which takes care of authenticating the GraqhQL request. This is done for the prompt request (a GraphQL mutation request) and for the response (a GraphQL subscription which listens to an answer to the prompt). The authentication mode is set in the tsx file. Amplify uses the Amazon Cognito user pool it has been configured with. Also, the previously generated answerId is used as a unique identifier for the request.
- The frontend sends the GraphQL mutation request and the response is received by the subscription. To correlate the mutation request and response in the subscription, the
answerId
, generated in Step1, is used. By running the code below in a resolver attached to a subscription, user isolation is enforced. Users cannot subscribe to arbitrary mutations and receive their response. The code verifies that that theuserId
in the mutation request matches theuserId
in the response received by the subscription. Thectx
variable is populated by AWS AppSync with the request’s payload and metadata such as the user identity.
Note that the authorization is checked against the cryptographically signed JWT from the Amazon Cognito user pool. Hence, even if a malicious user could tamper with the token locally to change the userId
, the authorization check would still fail.
- The
userId
andtenantId
(from the AWS AppSync context) is passed on to Amazon EventBridge and to AWS Lambda, which invokes the Agent. The Lambda function gets the user information from the event object in fileinvokeAgent/index.py
:
The Lambda function assumes the below IAM role that has permissions scoped down to a specific tenant and generates tenant specific scoped credentials. This role only grants access to DynamoDB items which has the given tenant ID as the leading key.
- This identity information and tenant specific scoped credentials are passed to the agent through
sessionAttributes
in the Amazon Bedrock InvokeAgent API call as shown below.
- The
sessionAttributes
are used within the agent task to grant the agent access to only the database tables and rows for the specific tenant user. The task creates a DynamoDB client using the tenant-scoped credentials. Using the scoped client, it looks up the correct order table name in the tenant configuration and queries the order table for data:
When modifying / debugging this function, make sure that you don’t log any credentials or the whole event object.
Walkthrough
In this section, you will set up the sample AI assistant described in the previous sections in your own AWS account.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account with administrator access to the us-east-1 (North Virginia) AWS Region
- AWS Cloud Development Kit (CDK) installed and configured on your machine
- git client
Enable large language model
An agent needs a large language model (LLM) to reason about the best way to fulfil a user request and formulate natural-language answers. Follow the Amazon Bedrock model access documentation to enable Anthropic Claude 3 Sonnet model access in the us-east-1 (N. Virginia) Region. After enabling the LLM, you will see the following screen with a status of Access granted:
Figure 3: You have now enabled Anthropic Claude 3 Sonnet in Amazon Bedrock for your AWS account.
Deploy sample application
We prepared most of the sample application’s infrastructure as an AWS Cloud Development Kit (AWS CDK) project.
If you have never used the CDK in the current account and Region (us-east-1), you must bootstrap the environment using the following command:
Using your local command line interface, issue the following commands to clone the project repository and deploy the CDK project to your AWS account:
This takes about 3 minutes, after which you should see output similar to the following:
In addition to the AWS resources shown in Figure1, this AWS CDK stack provisions three users, each for a separate tenant, into your AWS account. Note down the passwords for the three users from the CDK output, labelled MultiTenantAiAssistantStack.tenantXPassword
. You will need them in the next section. If you come back to this walkthrough later, you can retrieve these values from the file cdk/cdk-output.json
generated by the CDK. Note that these are only initial passwords and need to be changed on first sign-in of each user.
You have now successfully deployed the stack called MultiTenantAiAssistantStack
.
Start the frontend and sign in
Now that the backend is deployed and configured, you can start the frontend on your local machine, which is built in JavaScript using React. The frontend automatically pulls information from the AWS CDK output, so you don’t need to configure it manually.
- Issue the following commands to install dependencies and start the local webserver:
Open the frontend application by visiting localhost:3000
in your browser. You should see a sign-in page:
Figure 4: Sign-in screen
- For Username, enter
tenant1-user
. For Password, enter the password you have previously retrieved from CDK output. - Set a new password for the user.
- On the page Account recovery requires verified contact information, choose Skip.
You’re now signed in and can start interacting with the agent.
Interact with the agent
You have completed the setup of the architecture shown in Figure 1 in your own environment. You can start exploring the web application by yourself or follow the steps suggested below.
- Under Enter your Prompt, enter the following question logged in as
tenant1-user
:What is your return policy?
You should receive a response that you can return items for 10 days. Tenant 2 has a return policy of 20 days, tenant 3 of 30 days. - Under Enter your Prompt, enter the following question:
Which orders did I place?
You should receive a response that you have not placed any orders yet.
Figure 5: Sample application screenshot
You have now verified the functionality of the application. You can also try to access data from another user, and you will not get an answer due to the scoped IAM policy. For example, you can modify the agent and hardcode a tenant ID (such as tenant2). In the UI, sign in as the tenant1 user and you will see that with the generated tenant1 scoped credentials you will not be able to access tenant2 resources and you will get an AccessDeniedException
. You can also see the error in the CloudWatch Logs for the AgentTask Lambda function:
[ERROR] ClientError: An error occurred (AccessDeniedException) when calling the Query operation: User: *****/agentTaskLambda is not authorized to perform: dynamodb:Query on resource: TABLE because no identity-based policy allows the dynamodb:Query action
Add test data
To simplify the process of adding orders to your database, we have written a bash script that inserts entries into the order tables.
- In your CLI, from the repository root folder, issue this command to add an order for tenant1-user:
./manage-orders.sh tenant1-user add
- Return to the web application and issue the following prompt:
Which orders did I place?
The agent should now respond with the order that you created. - Issue the following command to delete the orders for
tenant1-user
:./manage-orders.sh tenant1-user clear
Repeat steps 1 through 3 with multiple orders. You can create a new user in Amazon Cognito and sign in to see that no data from other users can be accessed. The implementation is detailed in Figure 2.
Clean up
To avoid incurring future charges, delete the resources created during this walkthrough. From the cdk
folder of the repository, run the following command:
cdk destroy
Conclusion
Enabling secure multi-tenant capabilities in AI assistants is crucial for maintaining data privacy and preventing unauthorized access. By following the approach outlined in this blog post, you can create an AI assistant that isolates tenants while using the power of large language models.
The key points to remember are:
- When building multi-tenant SaaS applications, always enforce tenant isolation (leverage IAM where ever possible).
- Securely pass tenant and user context between deterministic components of your application, without relying on an AI model to handle this sensitive information.
- Use Agents for Amazon Bedrock to help build an AI assistant that can securely pass along tenant context.
- Implement isolation at different layers of your application to verify that users can only access data and resources associated with their respective tenant and user context.
By following these principles, you can build AI-powered applications that provide a personalized experience to users while maintaining strict isolation and security. As AI capabilities continue to advance, it’s essential to design architectures that use these technologies responsibly and securely.
Remember, the sample application demonstrated in this blog post is just one way to approach multi-tenant AI assistants. Depending on your specific requirements, you might need to adapt the architecture or use different AWS services.
To continue learning about generative AI patterns on AWS, visit the AWS Machine Learning Blog. To explore SaaS on AWS, start by visiting our SaaS landing page. If you have any questions, you can start a new thread on AWS re:Post or reach out to AWS Support.
About the authors
Ulrich Hinze is a Solutions Architect at AWS. He partners with software companies to architect and implement cloud-based solutions on AWS. Before joining AWS, he worked for AWS customers and partners in software engineering, consulting, and architecture roles for 8+ years.
Florian Mair is a Senior Solutions Architect and data streaming expert at AWS. He is a technologist that helps customers in Europe succeed and innovate by solving business challenges using AWS Cloud services. Besides working as a Solutions Architect, Florian is a passionate mountaineer and has climbed some of the highest mountains across Europe.
Connect the Amazon Q Business generative AI coding companion to your GitHub repositories with Amazon Q GitHub (Cloud) connector
Incorporating generative artificial intelligence (AI) into your development lifecycle can offer several benefits. For example, using an AI-based coding companion such as Amazon Q Developer can boost development productivity by up to 30 percent. Additionally, reducing the developer context switching that stems from frequent interactions with many different development tools can also increase developer productivity. In this post, we show you how development teams can quickly obtain answers based on the knowledge distributed across your development environment using generative AI.
GitHub (Cloud) is a popular development platform that helps teams build, scale, and deliver software used by more than 100 million developers and over 4 million organizations worldwide. GitHub helps developers host and manage Git repositories, collaborate on code, track issues, and automate workflows through features such as pull requests, code reviews, and continuous integration and deployment (CI/CD) pipelines.
Amazon Q Business is a fully managed, generative AI–powered assistant designed to enhance enterprise operations. You can tailor it to specific business needs by connecting to company data, information, and systems using over 40 built-in connectors.
You can connect your GitHub (Cloud) instance to Amazon Q Business using an out-of-the-box connector to provide a natural language interface to help your team analyze the repositories, commits, issues, and pull requests contained in your GitHub (Cloud) organization. After establishing the connection and synchronizing data, your teams can use Amazon Q Business to perform natural language queries in the supported GitHub (Cloud) data entities, streamlining access to this information.
Overview of solution
To create an Amazon Q Business application to connect to your GitHub repositories using AWS IAM Identity Center and AWS Secrets Manager, follow these high-level steps:
- Create an Amazon Q Business application
- Perform sync
- Run sample queries to test the solution
The following screenshot shows the solution architecture.
In this post, we show how developers and other relevant users can use the Amazon Q Business web experience to perform natural language–based Q&A over the indexed information reflective of the associated access control lists (ACLs). For this post, we set up a dedicated GitHub (Cloud) organization with four repositories and two teams—review and development. Two of the repositories are private and are only accessible to the members of the review team. The remaining two repositories are public and are accessible to all members and teams.
Prerequisites
To perform the solution, make sure you have the following prerequisites in place:
- Have an AWS account with privileges necessary to administer Amazon Q Business
- Have access to the AWS region in which Amazon Q Business is available (Supported regions)
- Enable the IAM Identity Center and add a user (Guide to enable IAM Identity Center, Guide to add user)
- Have a GitHub account with an organization and repositories (Guide to create organization)
- Have a GitHub access token classic (Guide to create access tokens, Permissions needed for tokens)
Create, sync, and test an Amazon Q business application with IAM Identity Center
To create the Amazon Q Business application, you need to select the retriever, connect the data sources, and add groups and users.
Create application
- On the AWS Management Console, search for Amazon Q Business in the search bar, then select Amazon Q Business.
- On the Amazon Q Business landing page, choose Get started.
- On the Amazon Q Business Applications screen, at the bottom, choose Create application.
- Under Create application, provide the required values. For example, in Application name, enter
anycompany-git-application
. For Service access, select Create and use a new service-linked role (SLR). Under Application connected to IAM Identity Center, note the ARN for the associated IAM Identity Center instance. Choose Create.
Select retriever
Under Select retriever, in Retrievers, select Use native retriever. Under Index provisioning, enter “1.”
Amazon Q Business pricing is based on the chosen document index capacity. You can choose up to 50 capacity units as part of index provisioning. Each unit can contain up to 20,000 documents or 200 MB, whichever comes first. You can adjust this number as needed for your use case.
Choose Next at the bottom of the screen.
Connect data sources
- Under Connect data sources, in the search field under All, enter “GitHub” and select the plus sign to the right of the GitHub selection. Choose Next to configure the data source.
You can use the following examples to create a default configuration with file type exclusions to bypass crawling common image and stylesheet files.
- Enter
anycompany-git-datasource
in the Data source name and Description.
- In the GitHub organization name field, enter your GitHub organization name. Under Authentication, provide a new access token or select an existing access token stored in AWS Secrets Manager.
- Under IAM role, select Create a new service role and enter the role name under Role name for the data source.
- Define Sync scope by selecting the desired repositories and content types to be synced.
- Complete the Additional configuration and Sync mode.
This optional section can be used to specify the file names, types, or file path using regex patterns to define the sync scope. Also, the Sync Mode setting to define the types of content changes to sync when your data source content changes.
- For the purposes of this post, under Sync run schedule, select Run on demand under Frequency so you can manually invoke the sync process. Other options for automated periodic sync runs are also supported. In the Field Mappings section, keep the default settings. After you complete the retriever creation, you can modify field mappings and add custom field attributes. You can access field mapping by editing the data source.
Add groups and users
There are two users we will use for testing: one with full permissions on all the repositories in the GitHub (Cloud) organization, and a second user with permission only on one specific repository.
- Choose Add groups and users.
- Select Assign existing users and groups. This will show you the option to select the users from the IAM Identity Center and add them to this Amazon Q Business application. Choose Next.
- Search for the username or name and select the user from the listed options. Repeat for all of the users you wish to test with.
- Assign the desired subscrption to the added users.
- For Web experience service access, use the default value of Create and use a new service role. Choose Create Application and wait for the application creation process to complete.
Perform sync
To sync your new Amazon Q Business application with your desired data sources, follow these steps:
- Select the newly created data source under Data sources and choose Sync now.
Depending on the number of supported data entities in the source GitHub (Cloud) organization, the sync process might take several minutes to complete.
- Once the sync is complete, click on the data source name to show the sync history including number of objects scanned, added, deleted, modified, and failed. You can also access the associated Amazon CloudWatch logs to inspect the sync process and failed objects.
- To access the Amazon Q Business application, select Web experience settings and choose Deployed URL. A new tab will open and ask you for sign-in details. Provide the details of the user you created earlier and choose Sign in.
Run sample queries to test the solution
You should now see the home screen of Amazon Q Business, including the associated web experience. Now we can ask questions in natural language and Amazon Q Business will provide answers based on the information indexed from your GitHub (Cloud) organization.
- To begin, enter a natural language question in the Enter a prompt.
- You can ask questions about the information from the synced GitHub (Cloud) data entities. For example, you can enter, “Tell me how to start a new Serverless application from scratch?” and obtain a response based on the information from the associated repository
README.md
file.
- Because you are logged in as the first user and mapped to a GitHub (Cloud) user belonging to the review team, you should also be able to ask questions about the contents of private repositories accessible by the members of that team.
As shown in the following screenshot, you can ask questions about the private repository called aws-s3-object-management
and obtain the response based on the README.md
in that repository.
However, when you attempt to ask the same question when logged in as the second user, which has no access to the associated GitHub (Cloud) repository, Amazon Q Business will provide an ACL-filtered response.
Troubleshooting and frequently asked questions:
1. Why isn’t Amazon Q Business answering any of my questions?
If you are not getting answers to your questions from Amazon Q Business, verify the following:
- Permissions – document ACLs indexed by Amazon Q Business may not allow you to query certain data entities as demonstrated in our example. If this is the case, please reach out to your GitHub (Cloud) administrator to verify that your user has access to the restricted documents and repeat the sync process.
- Data connector sync – a failed data source sync may prevent the documents from being indexed, meaning that Amazon Q Business would be unable to answer questions about the documents that failed to sync. Please refer to the official documentation to troubleshoot data source connectors.
2. My connector is unable to sync.
Please refer to the official documentation to troubleshoot data source connectors. Please also verify that all of the required prerequisites for connecting Amazon Q Business to GitHub (Cloud) are in place.
3. I updated the contents of my data source but Amazon Q business answers using old data.
Verifying the sync status and sync schedule frequency for your GitHub (Cloud) data connector should reveal when the last sync ran successfully. It could be that your data connector sync run schedule is set to run on demand or has not yet been triggered for its next periodic run. If the sync is set to run on demand, it will need to be manually triggered.
4. How can I know if the reason I don’t see answers is due to ACLs?
If different users are getting different answers to the same questions, including differences in source attribution with citation, it is likely that the chat responses are being filtered based on user document access level represented via associated ACLs.
5. How can I sync documents without ACLs?
Access control list (ACL) crawling is on by default and can’t be turned off.
Cleanup
To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q Business application:
- On the Amazon Q Business console, choose Applications in the navigation pane.
- Select the application you created.
- On the Actions menu, choose Delete.
- Delete the AWS Identity and Access Management (IAM) roles created for the application and data retriever. You can identify the IAM roles used by the created Amazon Q Business application and data retriever by inspecting the associated configuration using the AWS console or AWS Command Line Interface (AWS CLI).
- If you created an IAM Identity Center instance for this walkthrough, delete it.
Conclusion
In this post, we walked through the steps to connect your GitHub (Cloud) organization to Amazon Q Business using the out-of-the-box GitHub (Cloud) connector. We demonstrated how to create an Amazon Q Business application integrated with AWS IAM Identity Center as the identity provider. We then configured the GitHub (Cloud) connector to crawl and index supported data entities such as repositories, commits, issues, pull requests, and associated metadata from your GitHub (Cloud) organization. We showed how to perform natural language queries over the indexed GitHub (Cloud) data using the AI-powered chat interface provided by Amazon Q Business. Finally, we covered how Amazon Q Business applies ACLs associated with the indexed documents to provide permissions-filtered responses.
Beyond the web-based chat experience, Amazon Q Business offers a Chat API to create custom conversational interfaces tailored to your specific use cases. You can also use the associated API operations using the AWS CLI or AWS SDK to manage Amazon Q Business applications, retriever, sync, and user configurations.
By integrating Amazon Q Business with your GitHub (Cloud) organization, development teams can streamline access to information scattered across repositories, issues, and pull requests. The natural language interface powered by generative AI reduces context switching and can provide timely answers in a conversational manner.
To learn more about Amazon Q connector for GitHub (Cloud), refer to Connecting GitHub (Cloud) to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.
About the Authors
Maxim Chernyshev is a Senior Solutions Architect working with mining, energy, and industrial customers at AWS. Based in Perth, Western Australia, Maxim helps customers devise solutions to complex and novel problems using a broad range of applicable AWS services and features. Maxim is passionate about industrial Internet of Things (IoT), scalable IT/OT convergence, and cyber security.
Manjunath Arakere is a Senior Solutions Architect on the Worldwide Public Sector team at AWS, based in Atlanta, Georgia. He works with public sector partners to design and scale well-architected solutions and supports their cloud migrations and modernization initiatives. Manjunath specializes in migration, modernization, and serverless technology.
Mira Andhale is a Software Development Engineer on the Amazon Q and Amazon Kendra engineering team. She works on the Amazon Q connector design, development, integration and test operations.
Elevate customer experience through an intelligent email automation solution using Amazon Bedrock
Organizations spend a lot of resources, effort, and money on running their customer care operations to answer customer questions and provide solutions. Your customers may ask questions through various channels, such as email, chat, or phone, and deploying a workforce to answer those queries can be resource intensive, time-consuming, and unproductive if the answers to those questions are repetitive.
Although your organization might have the data assets for customer queries and answers, you may still struggle to implement an automated process to reply to your customers. Challenges might include unstructured data, different languages, and a lack of expertise in artificial intelligence (AI) and machine learning (ML) technologies.
In this post, we show you how to overcome such challenges by using Amazon Bedrock to automate email responses to customer queries. With our solution, you can identify the intent of customer emails and send an automated response if the intent matches your existing knowledge base or data sources. If the intent doesn’t have a match, the email goes to the support team for a manual response.
Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock offers a serverless experience so you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage infrastructure.
The following are some common customer intents when contacting customer care:
- Transaction status (for example, status of a money transfer)
- Password reset
- Promo code or discount
- Hours of operation
- Find an agent location
- Report fraud
- Unlock account
- Close account
Agents for Amazon Bedrock can help you perform classification and entity detection on emails for these intents. For this solution, we show how to classify customer emails for the first three intents. You can also use Agents for Amazon Bedrock to detect key information from emails, so you can automate your business processes with some actions. For example, you can use Agents for Amazon Bedrock to automate the reply to a customer request with specific information related to that query.
Moreover, Agents for Amazon Bedrock can serve as an intelligent conversational interface, facilitating seamless interactions with both internal team members and external clients, efficiently addressing inquiries and implementing desired actions. Currently, Agents for Amazon Bedrock supports Anthropic Claude models and the Amazon Titan Text G1 – Premier model on Amazon Bedrock.
Solution overview
To build our customer email response flow, we use the following services:
- Agents for Amazon Bedrock
- Amazon DynamoDB
- AWS Lambda
- Amazon Simple Email Service (Amazon SES)
- Amazon Simple Notification Service (Amazon SNS)
- Amazon WorkMail
Although we illustrate this use case using WorkMail, you can use another email tool that allows integration with serverless functions or webhooks to accomplish similar email automation workflows. Agents for Amazon Bedrock enables you to build and configure autonomous agents in your application. An agent helps your end-users complete actions based on organization data and user input. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. In addition, agents automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions. Developers can save weeks of development effort by integrating agents to accelerate the delivery of generative AI applications. For this use case, we use the Anthropic Claude 3 Sonnet model.
When you create your agent, you enter details to tell the agent what it should do and how it should interact with users. The instructions replace the $instructions$ placeholder in the orchestration prompt template.
The following is an example of instructions we used for our use cases:
An action group defines actions that the agent can help the user perform. For example, you could define an action group called GetTransferStatus with an OpenAPI schema and Lambda function attached to it. Agents for Amazon Bedrock takes care of constructing the API based on the OpenAPI schema and fulfills actions using the Lambda function to get the status from the DynamoDB money_transfer_status table.
The following architecture diagram highlights the end-to-end solution.
The solution workflow includes the following steps:
- A customer initiates the process by sending an email to the dedicated customer support email address created within WorkMail.
- Upon receiving the email, WorkMail invokes a Lambda function, setting the subsequent workflow in motion.
- The Lambda function seamlessly relays the email content to Agents for Amazon Bedrock for further processing.
- The agent employs the natural language processing capabilities of Anthropic Claude 3 Sonnet to understand the email’s content classification based on the predefined agent instruction configuration. If relevant entities are detected within the email, such as a money transfer ID, the agent invokes a Lambda function to retrieve the corresponding payment status.
- If the email classification doesn’t pertain to a money transfer inquiry, the agent generates an appropriate email response (for example, password reset instructions) and calls a Lambda function to facilitate the response delivery.
- For inquiries related to money transfer status, the agent action group Lambda function queries the DynamoDB table to fetch the relevant status information based on the provided transfer ID and relays the response back to the agent.
- With the retrieved information, the agent crafts a tailored email response for the customer and invokes a Lambda function to initiate the delivery process.
- The Lambda function uses Amazon SES to send the email response, providing the email body, subject, and customer’s email address.
- Amazon SES delivers the email message to the customer’s inbox, providing seamless communication.
- In scenarios where the agent can’t discern the customer’s intent accurately, it escalates the issue by pushing the message to an SNS topic. This mechanism allows subscribed ticketing system to receive the notification and create a support ticket for further investigation and resolution.
Prerequisites
Refer to the README.md file in the GitHub repo to make sure you meet the prerequisites to deploy this solution.
Deploy the solution
The solution is comprised of three AWS Cloud Deployment Kit (AWS CDK) stacks:
- WorkmailOrgUserStack – Creates the WorkMail account with domain, user, and inbox access
- BedrockAgentCreation – Creates the Amazon Bedrock agent, agent action group, OpenAPI schema, S3 bucket, DynamoDB table, and agent group Lambda function for getting the transfer status from DynamoDB
- EmailAutomationWorkflowStack – Creates the classification Lambda function that interacts with the agent and integration Lambda function, which is integrated with WorkMail
To deploy the solution, you also perform some manual configurations using the AWS Management Console.
For full instructions, refer to the README.md file in the GitHub repo.
Test the solution
To test the solution, send an email from your personal email to the support email created as part of the AWS CDK deployment (for this post, we use support@vgs-workmail-org.awsapps.com). We use the following three intents in our sample data for custom classification training:
- MONEYTRANSFER – The customer wants to know the status of a money transfer
- PASSRESET – The customer has a login, account locked, or password request
- PROMOCODE – The customer wants to know about a discount or promo code available for a money transfer
The following screenshot shows a sample customer email requesting the status of a money transfer.
The following screenshot shows the email received in a WorkMail inbox.
The following screenshot shows a response from the agent regarding the customer query.
If the customer email isn’t classified, the content of the email is forwarded to an SNS topic. The following screenshot shows an example customer email.
The following screenshot shows the agent response.
Whoever is subscribed to the topic receives the email content as a message. We subscribed to this SNS topic with the email that we passed with the human_workflow_email parameter during the deployment.
Clean up
To avoid incurring ongoing costs, delete the resources you created as part of this solution when you’re done. For instructions, refer to the README.md file.
Conclusion
In this post, you learned how to configure an intelligent email automation solution using Agents for Amazon Bedrock, WorkMail, Lambda, DynamoDB, Amazon SNS, and Amazon SES. This solution can provide the following benefits:
- Improved email response time
- Improved customer satisfaction
- Cost savings regarding time and resources
- Ability to focus on key customer issue
You can expand this solution to other areas in your business and to other industries. Also, you can use this solution to build a self-service chatbot by deploying the BedrockAgentCreation stack to answer customer or internal user queries using Agents for Amazon Bedrock.
As next steps, check out Agents for Amazon Bedrock to start using its features. Follow Amazon Bedrock on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Bedrock.
About the Author
Godwin Sahayaraj Vincent is an Enterprise Solutions Architect at AWS who is passionate about Machine Learning and providing guidance to customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play cricket with his friends and tennis with his three kids.
Ramesh Kumar Venkatraman is a Senior Solutions Architect at AWS who is passionate about Generative AI, Containers and Databases. He works with AWS customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play with his two kids and follows cricket.
Build an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK
Retrieval Augmented Generation (RAG) is a state-of-the-art approach to building question answering systems that combines the strengths of retrieval and generative language models. RAG models retrieve relevant information from a large corpus of text and then use a generative language model to synthesize an answer based on the retrieved information.
The complexity of developing and deploying an end-to-end RAG solution involves several components, including a knowledge base, retrieval system, and generative language model. Building and deploying these components can be complex and error-prone, especially when dealing with large-scale data and models.
This post demonstrates how to seamlessly automate the deployment of an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS Cloud Development Kit (AWS CDK), enabling organizations to quickly set up a powerful question answering system.
Solution overview
The solution provides an automated end-to-end deployment of a RAG workflow using Knowledge Bases for Amazon Bedrock. By using the AWS CDK, the solution sets up the necessary resources, including an AWS Identity and Access Management (IAM) role, Amazon OpenSearch Serverless collection and index, and knowledge base with its associated data source.
The RAG workflow enables you to use your document data stored in an Amazon Simple Storage Service (Amazon S3) bucket and integrate it with the powerful natural language processing (NLP) capabilities of foundation models (FMs) provided by Amazon Bedrock. The solution simplifies the setup process by allowing you to programmatically modify the infrastructure, deploy the model, and start querying your data using the selected FM.
Prerequisites
To implement the solution provided in this post, you should have the following:
- An active AWS account and familiarity with FMs, Amazon Bedrock, and Amazon OpenSearch Service.
- Model access enabled for the required models that you intend to experiment with.
- The AWS CDK already set up. For installation instructions, refer to the AWS CDK workshop.
- An S3 bucket set up with your documents in a supported format (.txt, .md, .html, .doc/docx, .csv, .xls/.xlsx, .pdf).
- The Amazon Titan Embeddings V2 model enabled in Amazon Bedrock. You can confirm it’s enabled on the Model Access page of the Amazon Bedrock console. If the Amazon Titan Embeddings V2 model is enabled, the access status will show as Access granted, as shown in the following screenshot.
Set up the solution
When the prerequisite steps are complete, you’re ready to set up the solution:
- Clone the GitHub repository containing the solution files:
- Navigate to the solution directory:
- Create and activate the virtual environment:
The activation of the virtual environment differs based on the operating system; refer to the AWS CDK workshop for activating in other environments.
- After the virtual environment is activated, you can install the required dependencies:
You can now prepare the code .zip file and synthesize the AWS CloudFormation template for this code.
- In your terminal, export your AWS credentials for a role or user in
ACCOUNT_ID
. The role needs to have all necessary permissions for CDK deployment:
export AWS_REGION=”<region>” # Same region asACCOUNT_REGION
above
export AWS_ACCESS_KEY_ID=”<access-key>” # Set to the access key of your role/user
export AWS_SECRET_ACCESS_KEY=”<secret-key>” # Set to the secret key of your role/user - Create the dependency:
- If you’re deploying the AWS CDK for the first time, run the following command:
- To synthesize the CloudFormation template, run the following command:
- Because this deployment contains multiple stacks, you have to deploy them in a specific sequence. Deploy the stacks in the following order:
- Once deployment is finished, you can see these deployed stacks by visiting AWS CloudFormation console as shown below. Also you can note knowledge base details (i.e. name, id) under resources tab.
Test the solution
Now that you have deployed the solution using the AWS CDK, you can test it with the following steps:
- On the Amazon Bedrock console, choose Knowledge bases in the navigation page.
- Select the knowledge base you created.
- Choose Sync to initiate the data ingestion job.
- After the data ingestion job is complete, choose the desired FM to use for retrieval and generation. (This requires model access to be granted to this FM in Amazon Bedrock before using.)
- Start querying your data using natural language queries.
That’s it! You can now interact with your documents using the RAG workflow powered by Amazon Bedrock.
Clean up
To avoid incurring future charges on the AWS account, complete the following steps:
- Delete all files within the provisioned S3 bucket.
- Run the following command in the terminal to delete the CloudFormation stack provisioned using the AWS CDK:
Conclusion
In this post, we demonstrated how to quickly deploy an end-to-end RAG solution using Knowledge Bases for Amazon Bedrock and the AWS CDK.
This solution streamlines the process of setting up the necessary infrastructure, including an IAM role, OpenSearch Serverless collection and index, and knowledge base with an associated data source. The automated deployment process enabled by the AWS CDK minimizes the complexities and potential errors associated with manually configuring and deploying the various components required for a RAG solution. By taking advantage of the power of FMs provided by Amazon Bedrock, you can seamlessly integrate your document data with advanced NLP capabilities, enabling you to efficiently retrieve relevant information and generate high-quality answers to natural language queries.
This solution not only simplifies the deployment process, but also provides a scalable and efficient way to use the capabilities of RAG for question-answering systems. With the ability to programmatically modify the infrastructure, you can quickly adapt the solution to help meet your organization’s specific needs, making it a valuable tool for a wide range of applications that require accurate and contextual information retrieval and generation.
About the Authors
Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in generative AI, machine learning, and system design. He has successfully delivered state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.
Manoj Krishna Mohan is a Machine Learning Engineering at Amazon. He specializes in building AI/ML solutions using Amazon SageMaker. He is passionate about developing ready-to-use solutions for the customers. Manoj holds a master’s degree in Computer Science specialized in Data Science from the University of North Carolina, Charlotte.
Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High-Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.
Research Focus: Week of August 26, 2024
EVENT
Register now for Research Forum on September 3
Discover what’s next in the world of AI at Microsoft Research Forum (opens in new tab), an event series that explores recent research advances, bold new ideas, and important discussions with the global research community.
In Episode 4, learn about Microsoft’s research initiatives at the frontiers of multimodal AI. Discover novel models, benchmarks, and infrastructure for self-improvement, agents, weather prediction, and more.
Your one-time registration includes access to our live chat with researchers on the event day.
Episode 4 will air Tuesday, September 3 at 9:00 AM Pacific Time.
microsoft research podcast
What’s Your Story: Weishung Liu
Principal PM Manager Weishung Liu shares how a career delivering products and customer experiences aligns with her love of people and storytelling and how—despite efforts to defy the expectations that come with growing up in Silicon Valley—she landed in tech.
NEW RESEARCH
Can LLMs Learn by Teaching? A Preliminary Study
Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in large language models (LLMs). However, for humans, teaching not only improves students but also improves teachers. In a recent paper: Can LLMs Learn by Teaching? A Preliminary Study, researchers from Microsoft and external colleagues explore whether that rule also applies to LLMs. If so, this could potentially enable the models to advance and improve continuously without solely relying on human-produced data or stronger models.
In this paper, the researchers show that learning by teaching (LbT) practices can be incorporated into existing LLM training/prompting pipelines and provide noticeable improvements. They design three methods, each mimicking one of the three levels of LbT in humans: observing students’ feedback; learning from the feedback; and learning iteratively, with the goals of improving answer accuracy without training and improving the models’ inherent capability with fine-tuning. The results show that LbT is a promising paradigm to improve LLMs’ reasoning ability and outcomes on several complex tasks (e.g., mathematical reasoning, competition-level code synthesis). The key findings are: (1) LbT can induce weak-to-strong generalization—strong models can improve themselves by teaching other weak models; (2) Diversity in student models might help—teaching multiple student models could be better than teaching one student model or the teacher itself. This study also offers a roadmap for integrating more educational strategies into the learning processes of LLMs in the future.
NEW RESEARCH
Arena Learning: Building a data flywheel for LLMs post-training via simulated chatbot arena
Conducting human-annotated competitions between chatbots is a highly effective approach to assessing the effectiveness of large language models (LLMs). However, this process comes with high costs and time demands, complicating the enhancement of LLMs via post-training. In a recent preprint: Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena, researchers from Microsoft and external colleagues introduce an innovative offline strategy designed to simulate these arena battles. This includes a comprehensive set of instructions for simulated battles employing AI-driven annotations to assess battle outcomes, facilitating continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. A crucial aspect of this approach is ensuring precise evaluations and achieving consistency between offline simulations and online competitions.
To this end, the researchers present WizardArena, a pipeline crafted to accurately predict the Elo rankings of various models using a meticulously designed offline test set. Their findings indicate that WizardArena’s predictions are closely aligned with those from the online arena. They apply this novel framework to train a model, WizardLM-β, which demonstrates significant performance enhancements across various metrics. This fully automated training and evaluation pipeline paves the way for ongoing incremental advancements in various LLMs via post-training.
NEW RESEARCH
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Computational challenges of large language model (LLM) inference restrict their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8 billion parameter LLM to process a prompt of 1 million tokens (i.e., the pre-filling stage) on a single NVIDIA A100 graphics processing unit (GPU). Existing methods for speeding up pre-filling often fail to maintain acceptable accuracy or efficiency.
In a recent preprint: MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, researchers from Microsoft introduce a sparse calculation method designed to accelerate pre-filling of long-sequence processing. They identify three unique patterns in long-context attention matrices – the A-shape, Vertical-Slash, and Block-Sparse – that can be leveraged for efficient sparse computation on GPUs. They determine the optimal pattern for each attention head offline and dynamically build sparse indices based on the assigned pattern during inference. They then perform efficient sparse attention calculations via optimized GPU kernels to reduce latency in the pre-filling stage of long-context LLMs. The research demonstrates that MInference (million tokens inference) reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy.
NEW RESEARCH
Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Regular expressions (regex) are used to represent and match patterns in text documents in a variety of applications: content moderation, input validation, firewalls, clinical trials, and more. Existing use cases assume that the regex and the document are both readily available to the querier, so they can match the regex on their own with standard algorithms. But what about situations where the document is actually held by someone else who does not wish to disclose to the querier anything about the document besides the fact that it matches or does not match a particular regex? The ability to prove such facts enables interesting new applications.
In a recent paper: Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs, researchers from Microsoft and the University of Pennsylvania present a system for generating publicly verifiable, succinct, non-interactive, zero-knowledge proofs that a committed document matches or does not match a regular expression. They describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Experimental evaluation confirms that Reef can generate proofs for documents with 32 million characters; the proofs are small and cheap to verify, taking less than one second.
Reef is built on an open-source project from Microsoft Research, Nova: High-speed recursive arguments from folding schemes, (opens in new tab) which implements earlier research work described in a paper titled Nova: Recursive Zero-Knowledge Arguments from Folding Schemes (opens in new tab) by researchers from Microsoft, Carnegie Mellon University, and New York University.
NEW RESEARCH
HyperNova: Recursive arguments for customizable constraint systems
Incrementally verifiable computation (IVC) is a powerful cryptographic tool that allows its user to produce a proof of the correct execution of a “long running” computation in an incremental fashion. IVC enables a wide variety of applications in decentralized settings, including verifiable delay functions, succinct blockchains, rollups, verifiable state machines, and proofs of machine executions.
In a recent paper: HyperNova: Recursive arguments for customizable constraint systems, researchers from Microsoft and Carnegie Mellon University introduce a new recursive argument for proving incremental computations whose steps are expressed with CCS, a customizable constraint system that simultaneously generalizes Plonkish, R1CS, and AIR without overheads. HyperNova resolves four major problems in the area of recursive arguments.
First, it provides a folding scheme for CCS where the prover’s cryptographic cost is a single multiscalar multiplication (MSM) of size equal to the number of variables in the constraint system, which is optimal when using an MSM-based commitment scheme. This makes it easier to build generalizations of IVC, such as proof carrying data (PCD). Second, the cost of proving program executions on stateful machines (e.g., EVM, RISC-V) is proportional only to the size of the circuit representing the instruction invoked by the program step. Third, the researchers use a folding scheme to “randomize” IVC proofs, achieving zero-knowledge for “free” and without the need to employ zero-knowledge SNARKs. Fourth, the researchers show how to efficiently instantiate HyperNova over a cycle of elliptic curves.
The post Research Focus: Week of August 26, 2024 appeared first on Microsoft Research.
How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark
When we began studying jailbreak evaluations, we found a fascinating paper claiming that you could jailbreak frontier LLMs simply by translating forbidden prompts into obscure languages. Excited by this result, we attempted to reproduce it and found something unexpected.
NVIDIA Blackwell Sets New Standard for Generative AI in MLPerf Inference Debut
As enterprises race to adopt generative AI and bring new services to market, the demands on data center infrastructure have never been greater. Training large language models is one challenge, but delivering LLM-powered real-time services is another.
In the latest round of MLPerf industry benchmarks, Inference v4.1, NVIDIA platforms delivered leading performance across all data center tests. The first-ever submission of the upcoming NVIDIA Blackwell platform revealed up to 4x more performance than the NVIDIA H100 Tensor Core GPU on MLPerf’s biggest LLM workload, Llama 2 70B, thanks to its use of a second-generation Transformer Engine and FP4 Tensor Cores.
The NVIDIA H200 Tensor Core GPU delivered outstanding results on every benchmark in the data center category — including the latest addition to the benchmark, the Mixtral 8x7B mixture of experts (MoE) LLM, which features a total of 46.7 billion parameters, with 12.9 billion parameters active per token.
MoE models have gained popularity as a way to bring more versatility to LLM deployments, as they’re capable of answering a wide variety of questions and performing more diverse tasks in a single deployment. They’re also more efficient since they only activate a few experts per inference — meaning they deliver results much faster than dense models of a similar size.
The continued growth of LLMs is driving the need for more compute to process inference requests. To meet real-time latency requirements for serving today’s LLMs, and to do so for as many users as possible, multi-GPU compute is a must. NVIDIA NVLink and NVSwitch provide high-bandwidth communication between GPUs based on the NVIDIA Hopper architecture and provide significant benefits for real-time, cost-effective large model inference. The Blackwell platform will further extend NVLink Switch’s capabilities with larger NVLink domains with 72 GPUs.
In addition to the NVIDIA submissions, 10 NVIDIA partners — ASUSTek, Cisco, Dell Technologies, Fujitsu, Giga Computing, Hewlett Packard Enterprise (HPE), Juniper Networks, Lenovo, Quanta Cloud Technology and Supermicro — all made solid MLPerf Inference submissions, underscoring the wide availability of NVIDIA platforms.
Relentless Software Innovation
NVIDIA platforms undergo continuous software development, racking up performance and feature improvements on a monthly basis.
In the latest inference round, NVIDIA offerings, including the NVIDIA Hopper architecture, NVIDIA Jetson platform and NVIDIA Triton Inference Server, saw leaps and bounds in performance gains.
The NVIDIA H200 GPU delivered up to 27% more generative AI inference performance over the previous round, underscoring the added value customers get over time from their investment in the NVIDIA platform.
Triton Inference Server, part of the NVIDIA AI platform and available with NVIDIA AI Enterprise software, is a fully featured open-source inference server that helps organizations consolidate framework-specific inference servers into a single, unified platform. This helps lower the total cost of ownership of serving AI models in production and cuts model deployment times from months to minutes.
In this round of MLPerf, Triton Inference Server delivered near-equal performance to NVIDIA’s bare-metal submissions, showing that organizations no longer have to choose between using a feature-rich production-grade AI inference server and achieving peak throughput performance.
Going to the Edge
Deployed at the edge, generative AI models can transform sensor data, such as images and videos, into real-time, actionable insights with strong contextual awareness. The NVIDIA Jetson platform for edge AI and robotics is uniquely capable of running any kind of model locally, including LLMs, vision transformers and Stable Diffusion.
In this round of MLPerf benchmarks, NVIDIA Jetson AGX Orin system-on-modules achieved more than a 6.2x throughput improvement and 2.4x latency improvement over the previous round on the GPT-J LLM workload. Rather than developing for a specific use case, developers can now use this general-purpose 6-billion-parameter model to seamlessly interface with human language, transforming generative AI at the edge.
Performance Leadership All Around
This round of MLPerf Inference showed the versatility and leading performance of NVIDIA platforms — extending from the data center to the edge — on all of the benchmark’s workloads, supercharging the most innovative AI-powered applications and services. To learn more about these results, see our technical blog.
H200 GPU-powered systems are available today from CoreWeave — the first cloud service provider to announce general availability — and server makers ASUS, Dell Technologies, HPE, QTC and Supermicro.
See notice regarding software product information.
More Than Fine: Multi-LoRA Support Now Available in NVIDIA RTX AI Toolkit
Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC users.
Large language models are driving some of the most exciting developments in AI with their ability to quickly understand, summarize and generate text-based content.
These capabilities power a variety of use cases, including productivity tools, digital assistants, non-playable characters in video games and more. But they’re not a one-size-fits-all solution, and developers often must fine-tune LLMs to fit the needs of their applications.
The NVIDIA RTX AI Toolkit makes it easy to fine-tune and deploy AI models on RTX AI PCs and workstations through a technique called low-rank adaptation, or LoRA. A new update, available today, enables support for using multiple LoRA adapters simultaneously within the NVIDIA TensorRT-LLM AI acceleration library, improving the performance of fine-tuned models by up to 6x.
Fine-Tuned for Performance
LLMs must be carefully customized to achieve higher performance and meet growing user demands.
These foundational models are trained on huge amounts of data but often lack the context needed for a developer’s specific use case. For example, a generic LLM can generate video game dialogue, but it will likely miss the nuance and subtlety needed to write in the style of a woodland elf with a dark past and a barely concealed disdain for authority.
To achieve more tailored outputs, developers can fine-tune the model with information related to the app’s use case.
Take the example of developing an app to generate in-game dialogue using an LLM. The process of fine-tuning starts with using the weights of a pretrained model, such as information on what a character may say in the game. To get the dialogue in the right style, a developer can tune the model on a smaller dataset of examples, such as dialogue written in a more spooky or villainous tone.
In some cases, developers may want to run all of these different fine-tuning processes simultaneously. For example, they may want to generate marketing copy written in different voices for various content channels. At the same time, they may want to summarize a document and make stylistic suggestions — as well as draft a video game scene description and imagery prompt for a text-to-image generator.
It’s not practical to run multiple models simultaneously, as they won’t all fit in GPU memory at the same time. Even if they did, their inference time would be impacted by memory bandwidth — how fast data can be read from memory into GPUs.
Lo(RA) and Behold
A popular way to address these issues is to use fine-tuning techniques such as low-rank adaptation. A simple way of thinking of it is as a patch file containing the customizations from the fine-tuning process.
Once trained, customized LoRA adapters can integrate seamlessly with the foundation model during inference, adding minimal overhead. Developers can attach the adapters to a single model to serve multiple use cases. This keeps the memory footprint low while still providing the additional details needed for each specific use case.
In practice, this means that an app can keep just one copy of the base model in memory, alongside many customizations using multiple LoRA adapters.
This process is called multi-LoRA serving. When multiple calls are made to the model, the GPU can process all of the calls in parallel, maximizing the use of its Tensor Cores and minimizing the demands of memory and bandwidth so developers can efficiently use AI models in their workflows. Fine-tuned models using multi-LoRA adapters perform up to 6x faster.
In the example of the in-game dialogue application described earlier, the app’s scope could be expanded, using multi-LoRA serving, to generate both story elements and illustrations — driven by a single prompt.
The user could input a basic story idea, and the LLM would flesh out the concept, expanding on the idea to provide a detailed foundation. The application could then use the same model, enhanced with two distinct LoRA adapters, to refine the story and generate corresponding imagery. One LoRA adapter generates a Stable Diffusion prompt to create visuals using a locally deployed Stable Diffusion XL model. Meanwhile, the other LoRA adapter, fine-tuned for story writing, could craft a well-structured and engaging narrative.
In this case, the same model is used for both inference passes, ensuring that the space required for the process doesn’t significantly increase. The second pass, which involves both text and image generation, is performed using batched inference, making the process exceptionally fast and efficient on NVIDIA GPUs. This allows users to rapidly iterate through different versions of their stories, refining the narrative and the illustrations with ease.
This process is outlined in more detail in a recent technical blog.
LLMs are becoming one of the most important components of modern AI. As adoption and integration grows, demand for powerful, fast LLMs with application-specific customizations will only increase. The multi-LoRA support added today to the RTX AI Toolkit gives developers a powerful new way to accelerate these capabilities.
Index website contents using the Amazon Q Web Crawler connector for Amazon Q Business
Amazon Q Business is a fully managed service that lets you build interactive chat applications using your enterprise data. These applications can generate answers based on your data or a large language model (LLM) knowledge. Your data is not used for training purposes, and the answers provided by Amazon Q Business are based solely on the data users have access to.
Enterprise data is often distributed across different sources, such as documents in Amazon Simple Storage Service (Amazon S3) buckets, database engines, websites, and more. In this post, we demonstrate how to create an Amazon Q Business application and index website contents using the Amazon Q Web Crawler connector for Amazon Q Business.
For this example, we use two data sources (websites). The first data source is an employee onboarding guide from a fictitious company, which requires basic authentication. We demonstrate how to set up authentication for the Web Crawler. The second data source is the official documentation for Amazon Q Business. For this data source, we demonstrate how to apply advanced settings, such as regular expressions, to instruct the Web Crawler to crawl only pages and links related to Amazon Q Business, ignoring pages related to other AWS services.
Overview of the Amazon Q Web Crawler connector
The Amazon Q Web Crawler connector makes it possible to crawl websites that use HTTPS and index their contents so you can build a generative artificial intelligence (AI) experience for your users based on the indexed data. This connector relies on the Selenium Web Crawler Package and a Chromium driver. The connector is fully managed and updates to these components are applied automatically without your intervention.
This connector crawls and indexes the contents of webpages and attachments. Amazon Q Business supports multiple connectors, and each connector has its own properties and entities that it considers documents. In the context of the Web Crawler connector, a document refers to a single page or attachment contents. Separately, an index is commonly referred to as a corpus of documents; think of it as the place where you add and sync your documents for Amazon Q Business to use for generating answers to user requests.
Each document has its own attributes, also known as metadata. Metadata can be mapped to fields in your Amazon Q Business index. By creating index fields, you can boost results based on document attributes. For example, there might be use cases where you want to give more relevance to results from a specific category, department, or creation date.
Amazon Q Business data source connectors are designed to crawl the default attributes in your data source automatically. You can also add custom document attributes and map them to custom fields in your index. To learn more, see Mapping document attributes in Amazon Q Business.
For a better understanding of what is indexed by the Web Crawler connector, we present a list of metadata indexed from webpages and attachments.
The following table lists webpage metadata indexed by the Amazon Q Web Crawler connector.
Field | Data Source Field | Amazon Q Business Index Field (reserved) | Field Type |
Category | category | _category | String |
URL | sourceUrl | _source_uri | String |
Title | title | _document_title | String |
Meta Tags | metaTags | wc_meta_tags | String List |
File Size | htmlSize | wc_html_size | Long (numeric) |
The following table lists attachments metadata indexed by the Amazon Q Web Crawler connector.
Field | Data Source Field | Amazon Q Business Index Field (reserved) | Field Type |
Category | category | _category | String |
URL | sourceUrl | _source_uri | String |
File Name | fileName | wc_file_name | String |
File Type | fileType | wc_file_type | String |
File Size | fileSize | wc_file_size | Long (numeric) |
When configuring the data source for your website, you can use URLs or sitemaps, which can be defined either manually or using a text file stored in Amazon S3.
To enforce secure access to protected websites, the Amazon Q Web Crawler supports the following authentication types and standards:
- Basic authentication
- NTLM/Kerberos authentication
- Form-based authentication
- SAML authentication
Unlike other data source connectors, the Amazon Q Web Crawler connector doesn’t support access control list (ACL) crawling or identity crawling.
Lastly, you have a range of options for configuring how and what data is synchronized. For example, you can choose to synchronize website domains only, website domains with subdomains only, or website domains with subdomains and the webpages included in links. Additionally, you can use regular expressions to filter which URLS to include or exclude in the crawling process.
Overview of solution
On a high level, this solution consists of an Amazon Q Business application that utilizes two data sources: a website hosting documents related to an employee onboarding guide, and the Amazon Q Business official documentation website. This solution demonstrates how to configure both websites as data sources for the Amazon Q Business application. The following steps will be performed:
- Deploy an AWS CloudFormation template containing a static website secured with basic authentication.
- Create an Amazon Q Business application.
- Create a Web Crawler data source for the Amazon Q Business documentation.
- Create a Web Crawler data source for the employee onboarding guide.
- Add groups and users to the Amazon Q Business application.
- Run sample queries to test the solution.
You can follow along using one or both data sources provided in this post or try your own URLs.
Prerequisites
To follow along with this demo, you should have the following prerequisites:
- An AWS account with privileges to create Amazon Q Business applications and AWS Identity and Access Management (IAM) roles and policies
- An IAM Identity Center instance with at least one user (and optionally, one or more groups)
- If you decide to use a public website, make sure you have permission to crawl the website
- Optionally, privileges to deploy CloudFormation templates
Deploy a CloudFormation template for the employee onboarding website secured with basic authentication
Deploying this CloudFormation template is optional, but we recommend using it so you can learn more about how the Web Crawler connector works with websites that require authentication.
We start by deploying a CloudFormation template. This template will create a simple static website secured with basic authentication.
- On the AWS CloudFormation console, choose Create stack and choose With new resources (standard).
- Select Choose an existing template.
- For Specify template, select Amazon S3 URL.
- For Amazon S3 URL enter the URL
https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-16532/template-website.yml
- Choose Next.
- For Stack name, enter a name. For example,
onboarding-website-for-q-business-sample
. - Choose Next.
- Leave all options in Configure stack options as default and choose Next.
- On the Review and create page, select I acknowledge that AWS CloudFormation might create IAM resources, then choose Submit.
The deployment process will take a few minutes to complete. You can move to the next section of this post while it’s in process. Keep this tab open—you’ll need to refer to the Outputs tab later.
Create an Amazon Q Business application
Before you start creating Amazon Q Business applications, you are required to enable and configure an IAM Identity Center instance. This step is mandatory because Amazon Q Business integrates with IAM Identity Center to manage user access to your Amazon Q Business applications. If you don’t have an IAM Identity Center instance set up when trying to create your first application, you will see the option to create one, as shown in the following screenshot.
If you already have an IAM Identity Center instance set up, you’re ready to start creating your first application by following these steps:
- On a new tab in your browser, open the Amazon Q Business console.
- Choose Get started or Create application (options will vary based on whether it’s your first time trying the service).
- For Application name¸ enter a name for your application, for example,
my-q-business-app
. - For Service access, select Create and use a new service-linked role (SLR).
- Choose Create.
- For Retrievers, select Use native retriever.
- For Index provisioning, enter
1
for Number of units. One unit can index 20,000 documents (a document in this context is either a single page of content or a single attachment). - Choose Next.
Create a Web Crawler data source for the Amazon Q Business documentation
After you complete the steps in the previous section, you should see the Connect data sources page, as shown in the following screenshot.
If you closed the tab by accident, you can get to this page by navigating to the Amazon Q Business console, choosing your application name, and then choosing Add data source.
Let’s create the data source for the Amazon Q Business documentation website:
- On the Connect data sources page, choose Web crawler.
- For Data source name, enter a name, for example,
q-business-documentation
- For Description, enter a description.
- For Source, you have the option to provide either URLs or sitemaps. For this example, select Source URLs and enter the URL of the official documentation of Amazon Q:
https://docs.aws.amazon.com/amazonq/
Starting point URLs can be added directly in this UI (up to 10), or you could use a file hosted in Amazon S3 to list up to 100 starting point URLs. Likewise, sitemap URLs can be added in this UI (up to three), or you could add up to three sitemap XML files hosted in Amazon S3.
We refer to source URLs as starting point URLs; later in this post, you’ll have the opportunity to define what gets crawled, for example, domains and subdomains that the webpages might link to. It’s worth mentioning that the Web Crawler connector can only work with HTTPS.
- Select No authentication in the Authentication section because this is a public website.
- The Web proxy section is optional, so we leave it empty.
- For Configure VPC and security group, select No VPC.
- In the IAM role section, choose Create a new service role.
- In the Sync scope section, for Sync domain range, select Sync domains with subdomains only.
- For Maximum file size, you can keep the default value of 50 MB.
- Under Additional configuration, expand Scope settings.
- Leave Crawl depth set to 2, Maximum links per page set to 999, and Maximum throttling set to 300.
If you open the Amazon Q official documentation, you’ll see that there are links to Amazon Q Developer documentation and other AWS services. Because we’re only interested in crawling Amazon Q Business, we need to instruct the crawler to focus only on relevant links and pages related to Amazon Q Business. To achieve this, we use regular expressions to define exactly what URLs the crawler should crawl.
- Under Crawl URL Patterns, enter the following expressions one by one, and choose Add:
^https://docs.aws.amazon.com/amazonq/$
^https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/.*.html$
^https://docs.aws.amazon.com/amazonq/latest/business-use-dg/.*.html$
- In the Sync mode section, select Full sync. This option makes it possible to sync all contents regardless of their previous status.
- In the Sync run schedule section, you define how often Amazon Q Business should sync this data source. For Frequency, select Run on demand.
Choosing this option means you must manually run the sync operation; this option is suitable given the simplicity of this example. For production workloads, you’ll want to define a schedule tailored to your needs, for example, hourly, daily, or weekly, or you could define your own schedule using a cron expression.
- The Tags section is optional, so we leave it empty.
The default values in the Field mappings section can’t be changed at this point. This can only be modified after the application and retriever have been created.
- Choose Add data source and wait a couple of seconds while changes are applied.
After the data source is created, you will be shown the same interface you saw at the beginning of this section, with the note that one Web Crawler data source has been added. Keep this tab open, because you’ll create a second data source for the employee onboarding guide in the next section.
Create a Web Crawler data source for the employee onboarding guide
Complete the following steps to create your second data source:
- On the Connect data sources page, choose Web crawler.
- Keep this tab open and navigate back to the AWS CloudFormation console tab and verify the stack’s status is CREATE_COMPLETE.
- If the status of the stack is CREATE_COMPLETE, choose the Outputs tab of the stack you deployed.
- Note the URL, user name, and password (the following screenshot shows sample values).
- Choose the link for WebsiteURL.
Although unlikely, if the URL isn’t working, it might be because Amazon CloudFront hasn’t finished replicating the website. In that case, you should wait a couple of minutes and try again.
- Sign in with your user name and password.
You should now be able to browse the employee onboarding guide. Take a few minutes to get familiar with the contents of the website, because you’ll be asking your Amazon Q Business application questions about this content in a later step.
- Return to the browser tab where you’re creating the new data source.
- For Data source name, enter a name, for example,
onboarding-guide
. - For Source, select Source URLs and enter the website URL you saved earlier.
- For Authentication, select Basic authentication.
- Under Authentication credentials, for AWS Secrets Manager secret, choose Create and add new secret.
- For Secret name, enter a secret name of your preference.
- For User name and Password, use the values you saved earlier and make sure there are no extra whitespaces.
- Choose Save.
These credentials will be stored as a secret in AWS Secrets Manager.
Depending on the type of authentication you use, you’ll need certain fields present in your secret, as shown in the following table.
Authentication Type | Fields present in secret |
Form based | username, password, userNameFieldXpath, passwordFieldXpath, passwordButtonXpath, loginPageUrl |
NTLM | username, password |
Basic auth | username, password |
No Authentication | NA |
- Leave the Web proxy section empty.
- Select No VPC in the Configure VPC and security group
- For IAM role, choose Create a new service role.
- Select Sync domains with subdomains only in the Sync scope
- Select Full sync in the Sync mode
- For Sync run schedule, choose Run on demand.
- Leave the sections Tags and Field mappings with their default values.
- Choose Add data source and wait a couple of seconds while changes are applied.
After changes are applied, the Connect data sources page shows two Web Crawler data sources have been added.
- Scroll down to the end of the page and choose Next.
We have added our two data sources. In the next section, we add groups and users to our Amazon Q Business application.
Add groups and users to the Amazon Q Business application
Complete the following steps to add groups and users:
- On the Add groups and users page, choose Add groups and users.
- Select Assign existing users and groups and choose Next.
If you’ve completed the prerequisite of setting up IAM Identity Center, you’ve likely added at least one user. Although it’s not mandatory, we recommend creating multiple users and groups. This will enable you to fully explore and understand all the features of Amazon Q Business beyond what’s covered in this post.
If you haven’t added any users to your Identity Center directory, you can create them here by choosing Add new users. However, you’ll need to complete additional steps, such as setting up their passwords on the IAM Identity Center console. To fully benefit from this tutorial, we recommend having active users and groups by the time you reach this step.
- In the search bar, enter either the display name or group name you want to add to the application.
- Choose the user (or group) and choose Assign.
If you added a group, you’ll see it on the Groups tab. If you added a user, you’ll see it on the Users tab.
The next step is choosing a subscription for your groups or users.
- Select the user (or group) you just added, and on the Current subscription dropdown menu, choose your subscription tier. For this example, we choose Q Business Pro.
This is a good time to get familiar with the Amazon Q Business subscription tiers and pricing. For this example, we use Q Business Pro, but you could also use a Q Business Lite subscription.
- In the Web experience service access section, select Create and use a new service role.
A web experience is the chat interface that your users will utilize to ask questions and perform tasks.
- Choose Create application.
After the application is created successfully, you’ll be redirected to the Amazon Q Business console, where you can see your new application. Your application is ready, but the data sources haven’t synced any data yet. We’ll do that in the next steps.
- Choose the name of your new application to open the Application Details.
- In the Data sources section, select each data source and choose Sync now.
You will see the Current sync state for both data sources as Syncing. This process might take several minutes.
After the data sources are synced, you will see their Last sync status as Completed.
You’re now ready to test your application! Keep this page open because you’ll need it for next steps.
Run sample queries to test the solution
At this point, you have created an Amazon Q Business application, added two data sources using the Amazon Q Web Crawler connector, added users to the application, and synchronized all data sources.
The next step is going through the full user experience of logging in to the application and running a few test queries to test our application.
- On the Application Details page, navigate to the Web experience settings
- Choose the link under Deployed URL.
You’ll be redirected to the AWS access portal URL, which is set up by IAM Identity Center.
- Enter the user name of a user previously added to your Amazon Q Business application and choose Next.
You’re now on your Amazon Q Business app and ready to start asking questions!
- Enter your question (prompt) in the Enter a prompt text field and press Enter.
For this example, we start by asking questions related to the employee onboarding website.
Amazon Q Business uses the onboarding guide data source you created earlier. If you choose Sources, you’ll see a list of in-text source citations in the form of a numbered list.
Now we ask questions related to the Amazon Q Business documentation.
Try it out with your own prompts!
Troubleshooting
In this section, we discuss several common issues and how to troubleshoot:
- Amazon Q Business isn’t answering your questions – If Amazon Q Business isn’t answering your questions, it’s likely due to your data not being indexed correctly. To make sure your data has synced correctly, make sure your data sources have synced correctly.
- The Web Crawler is unable to sync – If you used a starting point URL different from this post and the Web Crawler can’t sync, it might be due to permissions. If the website requires authentication, refer to the section where we create a data source for more information. Another common scenario is when settings on the web server or firewalls prevent the Web Crawler from accessing the data. Lastly, it’s recommended to check if a txt file on your web server is explicitly denying access to the Web Crawler. For more details on how to configure a robots.txt file, refer to Configuring a robots.txt file for Amazon Q Business Web Crawler.
- Amazon Q Business answers questions using old data – When you create a data source, you have the option to tell Amazon Q Business how often it should sync your data source with your index. During the creation of our data sources, we chose to sync the data sources manually (Run on demand), which means the sync process will occur only when we choose Sync now on our data source. For more information, refer to Sync run schedule.
- Amazon Q Business provides an inaccurate answer or no answer at all – In situations where Amazon Q Business is providing an inaccurate answer, incomplete answers, or no answer at all, we recommend looking at the format of the data. Is the data part of an image? Is the data in a tabular format? Amazon Q Business works best with unstructured, plain text data.
Document enrichment
Although not covered in this post, we recommend exploring document enrichment. This functionality allows you to manipulate and enrich document attributes prior to being added to an index. The following are a couple of ideas for advanced applications of document enrichment:
- Run an AWS Lambda function that sends your document to Amazon Textract. This service uses optical character recognition (OCR) to extract text from images containing handwriting, forms, tables, and more.
- Use Amazon Transcribe to convert videos or audio files in your documents into text.
- Use Amazon Comprehend to detect and redact personal identifiable information (PII).
Clean up
After you finish testing the solution and to avoid incurring in extra costs, clean up the resources you created as part of this solution.
Let’s start by deleting the Amazon Q Business application.
- On the Amazon Q Business console, select your application from the application list and on the Actions menu, choose Delete.
- Confirm its deletion by entering Delete, then choose Delete.
You might be asked to complete an optional survey on your reasons for application deletion. You are can select multiple reasons (or none), then choose Submit.
The next step is to delete the CloudFormation stack responsible for deploying the employee onboarding website we used as a data source.
- On the CloudFormation console, select the stack you created at the beginning of this walkthrough and choose Delete.
- Choose Delete to confirm the stack deletion.
The stack deletion might take a few minutes. When the deletion is complete, you’ll see the stack has been removed from your list of stacks.
Optionally, if you enabled IAM Identity Center only for this tutorial and want to delete your IAM Identity Center instance, follow these steps:
- On IAM Identity Center console, choose Settings in the navigation pane.
- Choose the Management tab
- Choose Delete.
- Select the acknowledgement check boxes, enter your instance, and choose Confirm.
Conclusion
The Amazon Q Business Web Crawler allows you to connect websites to your Amazon Q Business applications. This connector supports multiple forms of authentication (if required by your website) and can run sync jobs on a defined schedule.
To learn more about Amazon Q Business and its features, refer to the Amazon Q Business Developer Guide. For a comprehensive list of what can be done with this connector, refer to Connecting Web Crawler to Amazon Q Business.
About the Author
Guillermo Mansilla is a Senior Solutions Architect based in Orlando, Florida. He has had the opportunity to collaborate with startups and enterprise customers in the USA and Canada, assisting them in building and architecting their applications on AWS. Guillermo has developed a keen interest in serverless architectures and generative AI applications. Prior to his current role, he gained over a decade of experience working as a software developer. Away from work, Guillermo enjoys participating in chess tournaments at his local chess club, a pursuit that allows him to exercise his analytical skills in a different context.