October 2024 – Vedere AI

Startup Helps Surgeons Target Breast Cancers With AI-Powered 3D Visualizations

A new AI-powered, imaging-based technology that creates accurate three-dimensional models of tumors, veins and other soft tissue offers a promising new method to help surgeons operate on, and better treat, breast cancers.

The technology, from Illinois-based startup SimBioSys, converts routine black-and-white MRI images into spatially accurate, volumetric images of a patient’s breasts. It then illuminates different parts of the breast with distinct colors — the vascular system, or veins, may be red; tumors are shown in blue; surrounding tissue is gray.

Surgeons can then easily manipulate the 3D visualization on a computer screen, gaining important insight to help guide surgeries and influence treatment plans. The technology, called TumorSight, calculates key surgery-related measurements, including a tumor’s volume and how far tumors are from the chest wall and nipple.

It also provides key data about a tumor’s volume in relation to a breast’s overall volume, which can help determine — before a procedure begins — whether surgeons should try to preserve a breast or choose a mastectomy, which often presents cosmetic and painful side effects. Last year, TumorSight received FDA clearance.

Across the world, nearly 2.3 million women are diagnosed with breast cancer each year, according to the World Health Organization. Every year, breast cancer is responsible for the deaths of more than 500,000 women. Around 100,000 women in the U.S. annually undergo some form of mastectomy, according to the Brigham and Women’s Hospital.

According to Jyoti Palaniappan, chief commercial officer at SimBioSys, the company’s visualization technology offers a step-change improvement over the kind of data surgeons typically see before they begin surgery.

“Typically, surgeons will get a radiology report, which tells them, ‘Here’s the size and location of the tumor,’ and they’ll get one or two pictures of the patient’s tumor,” said Palaniappan. “If the surgeon wants to get more information, they’ll need to find the radiologist and have a conversation with them — which doesn’t always happen — and go through the case with them.”

Dr. Barry Rosen, the company’s chief medical officer, said one of the technology’s primary goals is to uplevel and standardize presurgical imaging, which he believes can have broad positive impacts on outcomes.

“We’re trying to move the surgical process from an art to a science by harnessing the power of AI to improve surgical planning,” Dr. Rosen said.

SimBioSys uses NVIDIA A100 Tensor Core GPUs in the cloud for pretraining its models. It also uses NVIDIA MONAI for training and validation data, and NVIDIA CUDA-X libraries including cuBLAS and MONAI Deploy to run its imaging technology. SimBioSys is part of the NVIDIA Inception program for startups.

SimBioSys is already working on additional AI use cases it hopes can improve breast cancer survival rates.

It has developed a novel technique to reconcile MRI images of a patient’s breasts, taken when the patient is lying face down, and converts those images into virtual, realistic 3D visualizations that show how the tumor and surrounding tissue will appear during surgery — when a patient is lying face up.

This 3D visualization is especially relevant for surgeons so they can visualize what a breast and any tumors will look like once surgery begins.

To create this imagery, the technology calculates gravity’s impact on different kinds of breast tissue and accounts for how different kinds of skin elasticity impact a breast’s shape when a patient is lying on the operating table.

The startup is also working on a new strategy that also relies on AI to quickly provide insights that can help avoid cancer recurrence.

Currently, hospital labs run pathology tests on tumors that surgeons have removed. The biopsies are then sent to a different outside lab, which conducts a more comprehensive molecular analysis.

This process routinely takes up to six weeks. Without knowing how aggressive a cancer in the removed tumor is, or how that type of cancer might respond to different treatments, patients and doctors are unable to quickly chart out treatment plans to avoid recurrence.

SimBioSys’s new technology uses an AI model to analyze the 3D volumetric features of the just-removed tumor, the hospital’s initial tumor pathology report and a patient’s demographic data. From that information, SimBioSys generates — in a matter of hours — a risk analysis for that patient’s cancer, which helps doctors quickly determine the best treatment to avoid recurrence.

According to SimBioSys’s Palaniappan, the startup’s new method matches or exceeds the risk of recurrence scoring ability of more traditional methodologies, based upon its internal studies. It also takes a fraction of the time of these other methods while costing far less money.

Introducing DRIFT Search: Combining global and local search methods to improve quality and efficiency

Three icons that represent local and global search and GraphRAG. These icons sit on a blue to pink gradient.

GraphRAG is a technique that uses large language models (LLMs) to create knowledge graphs and summaries from unstructured text documents and leverages them to improve retrieval-augmented generation (RAG) operations on private datasets. It offers comprehensive global overviews of large, private troves of unstructured text documents while also enabling exploration of detailed, localized information. By using LLMs to create comprehensive knowledge graphs that connect and describe entities and relationships contained in those documents, GraphRAG leverages semantic structuring of the data to generate responses to a wide variety of complex user queries. Uncharted (opens in new tab), one of Microsoft’s research collaborators, has recently been expanding the frontiers of this technology by developing a new approach to processing local queries: DRIFT search (Dynamic Reasoning and Inference with Flexible Traversal). This approach builds upon Microsoft’s GraphRAG technique, combining characteristics of both global and local search to generate detailed responses in a method that balances computational costs with quality outcomes.

How GraphRAG works

GraphRAG has two primary components, an indexing engine and a query engine.

The indexing engine breaks down documents into smaller chunks, converting them into a knowledge graph with entities and relationships. It then identifies communities within the graph and generates summaries—or “community reports”—that represent the global data structure.

The query engine utilizes LLMs to build graph indexes over unstructured text and query them in two primary modes:

Global search handles queries that span the entire dataset. This mode synthesizes information from diverse underlying sources to answer questions that require a broad understanding of the whole corpus. For example, in a dataset about tech company research efforts, a global query could be: “What trends in AI research have emerged over the past five years across multiple organizations?” While effective for connecting scattered information, global search can be resource intensive.
Local search optimizes for targeted queries, drawing from a smaller subset of documents that closely match the user’s input. This mode works best when the answer lies within a small number of text units. E.g. a query asking: “What new features and integrations did Microsoft’s Cosmos DB team release on October 4th?”

The creation of these summaries often involves a human in the loop (HITL), as user input shapes how information is summarized (e.g., what kinds of entities and relationships are extracted). To index documents using GraphRAG, a clear description of the intended user persona (as defined in the indexing phase) is needed, as it influences how nodes, edges, and community reports are structured.

Introducing DRIFT Search

DRIFT Search introduces a new approach to local search queries by including community information in the search process. This greatly expands the breadth of the query’s starting point and leads to retrieval and usage of a far higher variety of facts in the final answer. This addition expands the GraphRAG query engine by providing a more comprehensive option for local search, which uses community insights to refine a query into detailed follow-up questions. These follow-ups allow DRIFT to handle queries that may not fully align with the original extraction templates defined by the user at index time.

Answer details	Drift (DS_Default)	Local (LS)
Supply Chain	Traced back to cinnamon in Ecuador and Sri Lanka [Redacted Brand] and [Redacted Brand] Brands Impacted Products sold at [Redacted Brand] and [Redacted Brand]	Plants in Ecuador
Contamination Levels	2000 times higher than FDA max	Blood lead levels ranging from 4 to 29 micrograms per deciliter
Actions	Recalls and health advisories Investigating plant in Ecuador Issued warnings to retailers	Recalls and health advisories

Table 1: An example of summarized responses from two search techniques (DRIFT and Local Search) on a dataset of AP News articles to the query: “Describe what actions are being taken by the U.S. Food and Drug Administration and the Centers for Disease Control and Prevention to address the lead contamination in apple cinnamon fruit puree and applesauce pouches in the United States during November 2023”. As shown in the table, DRIFT search was able to surface details not immediately available with the two other approaches.

DRIFT Search: A step-by-step process

Primer: When a user submits a query, DRIFT compares it to the top K most semantically relevant community reports. This generates an initial answer along with several follow-up questions, which act as a lighter version of global search. To do this, we expand the query using Hypothetical Document Embeddings (HyDE), to increase sensitivity (recall), embed the query, look up the query against all community reports, select the top K and then use the top K to try to answer the query. The aim is to leverage high-level abstractions to guide further exploration.
Follow-Up: With the primer in place, DRIFT executes each follow-up using a local search variant. This yields additional intermediate answers and follow-up questions, creating a loop of refinement that continues until the search engine meets its termination criteria, which is currently configured for two iterations (further research will investigate reward functions to guide terminations). This phase represents a globally informed query refinement. Using global data structures, DRIFT navigates toward specific, relevant information within the knowledge graph even when the initial query diverges from the indexing persona. This follow-up process enables DRIFT to adjust its approach based on emerging information.
Output Hierarchy: The final output is a hierarchy of questions and answers ranked on their relevance to the original query. This hierarchical structure can be customized to fit specific user needs. During benchmark testing, a naive map-reduce approach aggregated all intermediate answers, with each answer weighted equally.

An image that shows a hierarchical tree with each node represented as a pie chart of weighting. — Figure 1. An entire DRIFT search hierarchy highlighting the three core phases of the DRIFT search process. **A (Primer):** DRIFT compares the user’s query with the top K most semantically relevant community reports, generating a broad initial answer and follow-up questions to steer further exploration. **B (Follow-Up):** DRIFT uses local search to refine queries, producing additional intermediate answers and follow-up questions that enhance specificity, guiding the engine towards context-rich information. A glyph on each node in the diagram shows the confidence the algorithm has to continue the query expansion step. **C (Output Hierarchy):** The final output is a hierarchical structure of questions and answers ranked by relevance, reflecting a balanced mix of global insights and local refinements, making the results adaptable and comprehensive.

Why DRIFT search is effective

DRIFT search excels by dynamically combining global insights with local refinement, enabling navigation from high-level summaries down to original text chunks within the knowledge graph. This layered approach ensures that detailed, context-rich information is preserved even when the initial query diverges from the persona used during indexing. By decomposing broad questions into fine-grained follow-ups, DRIFT captures granular details and adjusts based on the emerging context, making it adaptable to diverse query types. This makes it particularly effective when handling queries that require both breadth and depth without losing specific details.

Benchmarking DRIFT search

As shown, we tested the effectiveness of DRIFT search by performing a comparative analysis across a variety of use cases against GraphRAG local search and a highly tuned variant of semantic search methods. The analysis evaluated each method’s performance based on key metrics such as:

Comprehensiveness: Does the response answer all aspects of the question?
Diversity of responses: Does the response provide different perspectives and insights on the question?

In our results, DRIFT search provided significantly better results on both comprehensiveness and diversity in the metrics. We set up an experiment where we ingested 5K+ news articles from the Associated Press and ingested those articles using GraphRAG. After ingestion, we generated 50 “local” questions on this dataset and used both DRIFT and Local Search to generate answers for each of these questions. These “local” questions were questions that target specific details in the dataset that could be attributed to a small number of text units containing the answer. These answers were then used with an LLM judge to score for comprehensiveness and diversity.

On comprehensiveness, DRIFT search outperformed Local Search 78% of the time.
On diversity, DRIFT search outperformed Local Search 81% of the time.

Availability

DRIFT search is available now on the GraphRAG GitHub (opens in new tab).

Future research directions

A future version of DRIFT will incorporate an improved version of Global Search that will allow it to more directly address questions currently serviced best by global search. The hope is to then move towards a single query interface that can service questions of both local and global varieties. This work will further evolve DRIFT’s termination logic, potentially through a reward model that balances novel information with redundancy. Additionally, executing follow-up queries using either global or local search modes could improve efficiency. Some queries require broader data access, which can be achieved by leveraging a query router and a lite-global search variant that uses fewer community reports, tokens, and overall resources.

DRIFT search is the first of several major optimizations to GraphRAG that are being explored. It shows how a global index can even benefit local queries. In our future work, we plan to explore more approaches to bring greater efficiency to the system by leveraging the knowledge graph that GraphRAG creates.

The post Introducing DRIFT Search: Combining global and local search methods to improve quality and efficiency appeared first on Microsoft Research.

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to enhance productivity and decision-making processes.

This post shows how you can implement an AI-powered business assistant, such as a custom Google Chat app, using the power of Amazon Bedrock. The solution integrates large language models (LLMs) with your organization’s data and provides an intelligent chat assistant that understands conversation context and provides relevant, interactive responses directly within the Google Chat interface.

This solution showcases how to bridge the gap between Google Workspace and AWS services, offering a practical approach to enhancing employee efficiency through conversational AI. By implementing this architectural pattern, organizations that use Google Workspace can empower their workforce to access groundbreaking AI solutions powered by Amazon Web Services (AWS) and make informed decisions without leaving their collaboration tool.

With this solution, you can interact directly with the chat assistant powered by AWS from your Google Chat environment, as shown in the following example.

Solution overview

We use the following key services to build this intelligent chat assistant:

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI
AWS Lambda, a serverless computing service, lets you handle the application logic, processing requests, and interaction with Amazon Bedrock
Amazon DynamoDB lets you store session memory data to maintain context across conversations
Amazon API Gateway lets you create a secure API endpoint for the custom Google Chat app to communicate with our AWS based solution.

The following figure illustrates the high-level design of the solution.

The workflow includes the following steps:

The process begins when a user sends a message through Google Chat, either in a direct message or in a chat space where the application is installed.
The custom Google Chat app, configured for HTTP integration, sends an HTTP request to an API Gateway endpoint. This request contains the user’s message and relevant metadata.
Before processing the request, a Lambda authorizer function associated with the API Gateway authenticates the incoming message. This verifies that only legitimate requests from the custom Google Chat app are processed.
After it’s authenticated, the request is forwarded to another Lambda function that contains our core application logic. This function is responsible for interpreting the user’s request and formulating an appropriate response.
The Lambda function interacts with Amazon Bedrock through its runtime APIs, using either the RetrieveAndGenerate API that connects to a knowledge base, or the Converse API to chat directly with an LLM available on Amazon Bedrock. This also allows the Lambda function to search through the organization’s knowledge base and generate an intelligent, context-aware response using the power of LLMs. The Lambda function also uses a DynamoDB table to keep track of the conversation history, either directly with a user or within a Google Chat space.
After receiving the generated response from Amazon Bedrock, the Lambda function sends this answer back through API Gateway to the Google Chat app.
Finally, the AI-generated response appears in the user’s Google Chat interface, providing the answer to their question.

This architecture allows for a seamless integration between Google Workspace and AWS services, creating an AI-driven assistant that enhances information accessibility within the familiar Google Chat environment. You can customize this architecture to connect other solutions that you develop in AWS to Google Chat.

In the following sections, we explain how to deploy this architecture.

Prerequisites

To implement the solution outlined in this post, you must have the following:

A Linux or MacOS development environment with at least 20 GB of free disk space. It can be a local machine or a cloud instance. If you use an AWS Cloud9 instance, make sure you have increased the disk size to 20 GB.
The AWS Command Line Interface (AWS CLI) installed on your development environment. This tool allows you to interact with AWS services through command line commands.
An AWS account and an AWS Identity and Access Management (IAM) principal with sufficient permissions to create and manage the resources needed for this application. If you don’t have an AWS account, refer to How do I create and activate a new Amazon Web Services account? To configure the AWS CLI with the associated credentials, typically, you set up an AWS access key ID and secret access key for a designated IAM user with appropriate permissions.
Request access to Amazon Bedrock FMs. In this post, we use either Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 Premier available in Amazon Bedrock, but you can also choose other models that are supported for Amazon Bedrock knowledge bases.
Optionally, an Amazon Bedrock knowledge base created in your account, which allows you to integrate your own documents into your generative AI applications. If you don’t have an existing knowledge base, refer to Create an Amazon Bedrock knowledge base. Alternatively, the solution proposes an option without a knowledge base, with answers generated only by the FM on the backend.
A Business or Enterprise Google Workspace account with access to Google Chat. You also need a Google Cloud project with billing enabled. To check that an existing project has billing enabled, see Verify the billing status of your projects.
Docker installed on your development environment.

Deploy the solution

The application presented in this post is available in the accompanying GitHub repository and provided as an AWS Cloud Development Kit (AWS CDK) project. Complete the following steps to deploy the AWS CDK project in your AWS account:

Clone the GitHub repository on your local machine.
Install the Python package dependencies that are needed to build and deploy the project. This project is set up like a standard Python project. We recommend that you create a virtual environment within this project, stored under the .venv. To manually create a virtual environment on MacOS and Linux, use the following command:
```
python3 -m venv .venv
```

After the initialization process is complete and the virtual environment is created, you can use the following command to activate your virtual environment:
```
source .venv/bin/activate
```

Install the Python package dependencies that are needed to build and deploy the project. In the root directory, run the following command:
```
pip install -r requirements.txt
```

Run the cdk bootstrap command to prepare an AWS environment for deploying the AWS CDK application.
Run the script init-script.bash:

chmod u+x init-script.bash
./init-script.bash

This script prompts you for the following:

The Amazon Bedrock knowledge base ID to associate with your Google Chat app (refer to the prerequisites section). Keep this blank if you decide not to use an existing knowledge base.
Which LLM you want to use in Amazon Bedrock for text generation. For this solution, you can choose between Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 – Premier

The following screenshot shows the input variables to the init-script.bash script.

The script deploys the AWS CDK project in your account. After it runs successfully, it outputs the parameter ApiEndpoint, whose value designates the invoke URL for the HTTP API endpoint deployed as part of this project. Note the value of this parameter because you use it later in the Google Chat app configuration.
The following screenshot shows the output of the init-script.bash script.

You can also find this parameter on the AWS CloudFormation console, on the stack’s Outputs tab.

Register a new app in Google Chat

To integrate the AWS powered chat assistant into Google Chat, you create a custom Google Chat app. Google Chat apps are extensions that bring external services and resources directly into the Google Chat environment. These apps can participate in direct messages, group conversations, or dedicated chat spaces, allowing users to access information and take actions without leaving their chat interface.

For our AI-powered business assistant, we create an interactive custom Google Chat app that uses the HTTP integration method. This approach allows our app to receive and respond to user messages in real time, providing a seamless conversational experience.

After you have deployed the AWS CDK stack in the previous section, complete the following steps to register a Google Chat app in the Google Cloud portal:

Open the Google Cloud portal and log in with your Google account.
Search for “Google Chat API” and navigate to the Google Chat API page, which lets you build Google Chat apps to integrate your services with Google Chat.
If this is your first time using the Google Chat API, choose ACTIVATE. Otherwise, choose MANAGE.
On the Configuration tab, under Application info, provide the following information, as shown in the following screenshot:
1. For App name, enter an app name (for example, bedrock-chat).
2. For Avatar URL, enter the URL for your app’s avatar image. As a default, you can provide the Google chat product icon.
3. For Description, enter a description of the app (for example, Chat App with Amazon Bedrock).

Under Interactive features, turn on Enable Interactive features.
Under Functionality, select Receive 1:1 messages and Join spaces and group conversations, as shown in the following screenshot.

Under Connection settings, provide the following information:
1. Select App URL.
2. For App URL, enter the Invoke URL associated with the deployment stage of the HTTP API gateway. This is the ApiEndpoint parameter that you noted at the end of the deployment of the AWS CDK template.
3. For Authentication Audience, select App URL, as shown in the following screenshot.

Under Visibility, select Make this Chat app available to specific people and groups in <your-company-name> and provide email addresses for individuals and groups who will be authorized to use your app. You need to add at least your own email if you want to access the app.

Choose Save.

The following animation illustrates these steps on the Google Cloud console.

By completing these steps, the new Amazon Bedrock chat app should be accessible on the Google Chat console for the persons or groups that you authorized in your Google Workspace.

To dispatch interaction events to the solution deployed in this post, Google Chat sends requests to your API Gateway endpoint. To verify the authenticity of these requests, Google Chat includes a bearer token in the Authorization header of every HTTPS request to your endpoint. The Lambda authorizer function provided with this solution verifies that the bearer token was issued by Google Chat and targeted at your specific app using the Google OAuth client library. You can further customize the Lambda authorizer function to implement additional control rules based on User or Space objects included in the request from Google Chat to your API Gateway endpoint. This allows you to fine-tune access control, for example, by restricting certain features to specific users or limiting the app’s functionality in particular chat spaces, enhancing security and customization options for your organization.

Converse with your custom Google Chat app

You can now converse with the new app within your Google Chat interface. Connect to Google Chat with an email that you authorized during the configuration of your app and initiate a conversation by finding the app:

Choose New chat in the chat pane, then enter the name of the application (bedrock-chat) in the search field.
Choose Chat and enter a natural language phrase to interact with the application.

Although we previously demonstrated a usage scenario that involves a direct chat with the Amazon Bedrock application, you can also invoke the application from within a Google chat space, as illustrated in the following demo.

Customize the solution

In this post, we used Amazon Bedrock to power the chat-based assistant. However, you can customize the solution to use a variety of AWS services and create a solution that fits your specific business needs.

To customize the application, complete the following steps:

Edit the file lambda/lambda-chat-app/lambda-chatapp-code.py in the GitHub repository you cloned to your local machine during deployment.
Implement your business logic in this file.

The code runs in a Lambda function. Each time a request is processed, Lambda runs the lambda_handler function:

def lambda_handler(event, context):
    if event['requestContext']['http']['method'] == 'POST':
        # A POST request indicates a Google Chat App Event sent by the application        
        data = json.loads(event['body'])
        # Invoke handle_post function that includes the logic to process Google chat app events
        response = handle_post(data)
        return { 'text': response }
    else:
        return {
            'statusCode': 405,
            'body': json.dumps("Method Not Allowed. This function must be called from Google Chat.")
        }

When Google Chat sends a request, the lambda_handler function calls the handle_post function.

Let’s replace the handle_post function with the following code:

def handle_post(data):
    if data['type'] == 'MESSAGE':
        user_message = data['message']['text']  
        space_name = data['space']['name']
        return f"Hello! You said: {user_message}nThe space name is: {space_name}"

Save your file, then run the following command in your terminal to deploy your new code:

cdk deploy

The deployment should take about a minute. When it’s complete, you can go to Google Chat and test your new business logic. The following screenshot shows an example chat.

As the image shows, your function gets the user message and a space name. You can use this space name as a unique ID for the conversation, which lets you to manage history.

As you become more familiar with the solution, you may want to explore advanced Amazon Bedrock features to significantly expand its capabilities and make it more robust and versatile. Consider integrating Amazon Bedrock Guardrails to implement safeguards customized to your application requirements and responsible AI policies. Consider also expanding the assistant’s capabilities through function calling, to perform actions on behalf of users, such as scheduling meetings or initiating workflows. You could also use Amazon Bedrock Prompt Flows to accelerate the creation, testing, and deployment of workflows through an intuitive visual builder. For more advanced interactions, you could explore implementing Amazon Bedrock Agents capable of reasoning about complex problems, making decisions, and executing multistep tasks autonomously.

Performance optimization

The serverless architecture used in this post provides a scalable solution out of the box. As your user base grows or if you have specific performance requirements, there are several ways to further optimize performance. You can implement API caching to speed up repeated requests or use provisioned concurrency for Lambda functions to eliminate cold starts. To overcome API Gateway timeout limitations in scenarios requiring longer processing times, you can increase the integration timeout on API Gateway, or you might replace it with an Application Load Balancer, which allows for extended connection durations. You can also fine-tune your choice of Amazon Bedrock model to balance accuracy and speed. Finally, Provisioned Throughput in Amazon Bedrock lets you provision a higher level of throughput for a model at a fixed cost.

Clean up

In this post, you deployed a solution that lets you interact directly with a chat assistant powered by AWS from your Google Chat environment. The architecture incurs usage cost for several AWS services. First, you will be charged for model inference and for the vector databases you use with Amazon Bedrock Knowledge Bases. AWS Lambda costs are based on the number of requests and compute time, and Amazon DynamoDB charges depend on read/write capacity units and storage used. Additionally, Amazon API Gateway incurs charges based on the number of API calls and data transfer. For more details about pricing, refer to Amazon Bedrock pricing.

There might also be costs associated with using Google services. For detailed information about potential charges related to Google Chat, refer to the Google Chat product documentation.

To avoid unnecessary costs, clean up the resources created in your AWS environment when you’re finished exploring this solution. Use the cdk destroy command to delete the AWS CDK stack previously deployed in this post. Alternatively, open the AWS CloudFormation console and delete the stack you deployed.

Conclusion

In this post, we demonstrated a practical solution for creating an AI-powered business assistant for Google Chat. This solution seamlessly integrates Google Workspace with AWS hosted data by using LLMs on Amazon Bedrock, Lambda for application logic, DynamoDB for session management, and API Gateway for secure communication. By implementing this solution, organizations can provide their workforce with a streamlined way to access AI-driven insights and knowledge bases directly within their familiar Google Chat interface, enabling natural language interaction and data-driven discussions without the need to switch between different applications or platforms.

Furthermore, we showcased how to customize the application to implement tailored business logic that can use other AWS services. This flexibility empowers you to tailor the assistant’s capabilities to their specific requirements, providing a seamless integration with your existing AWS infrastructure and data sources.

AWS offers a comprehensive suite of cutting-edge AI services to meet your organization’s unique needs, including Amazon Bedrock and Amazon Q. Now that you know how to integrate AWS services with Google Chat, you can explore their capabilities and build awesome applications!

About the Authors

Nizar Kheir is a Senior Solutions Architect at AWS with more than 15 years of experience spanning various industry segments. He currently works with public sector customers in France and across EMEA to help them modernize their IT infrastructure and foster innovation by harnessing the power of the AWS Cloud.

Lior Perez is a Principal Solutions Architect on the construction team based in Toulouse, France. He enjoys supporting customers in their digital transformation journey, using big data, machine learning, and generative AI to help solve their business challenges. He is also personally passionate about robotics and Internet of Things (IoT), and he constantly looks for new ways to use technologies for innovation.

Discover insights from Gmail using the Gmail connector for Amazon Q Business

A number of organizations use Gmail for their business email needs. Gmail for business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Gmail, and Google Calendar. Google Drive supports storing documents such as Emails contain a wealth of information found in different places, such as within the subject of an email, the message content, or even attachments. Performing an intelligent search on emails with co-workers can help you find answers to questions, improving productivity and enhancing the overall customer experience for the organization.

Amazon Q Business is a fully managed, generative AI-powered assistant designed to enhance enterprise operations. It can be tailored to specific business needs by connecting to company data, information, and systems through over 40 built-in connectors.

Amazon Q Business enables users in various roles, such as marketers, project managers, and sales representatives, to have tailored conversations, solve problems, generate content, take action, and more, all through a web-based interface. This tool aims to make employees work smarter, move faster, and drive more significant impact by providing immediate and relevant information and streamlining tasks.

With the Gmail connector for Amazon Q Business, you can enhance productivity and streamline communication processes within your organization. This integration empowers you to use advanced search capabilities and intelligent email management using natural language.

In this post, we guide you through the process of setting up the Gmail connector, enabling seamless interaction between Gmail and Amazon Q Business. Whether you’re a small startup or a large enterprise, this solution can help you maximize the potential of your Gmail data and empower your team with actionable insights.

Finding accurate answers from content in Gmail mailbox using Amazon Q Business

After you integrate Amazon Q Business with Gmail, you can ask a question and Amazon Q Business can index through your mailbox and find relevant answers. For example, you can make the following queries:

Natural language search – You can search for emails and attachments within your mailbox using natural language, making it effortless to find your desired information without having to remember specific keywords or filters
Summarization – You can request a concise summary of the conversations and attachments matching your search query, allowing you to quickly grasp the key points without having to manually sift through individual items
Query clarification – If your query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, so you receive the most relevant and accurate results

Overview of the Gmail connector for Amazon Q Business

To crawl and index contents in Gmail, you can configure the Gmail connector for Amazon Q Business as a data source in your Amazon Q Business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. A data source is a data repository or location that Amazon Q Business connects to in order to retrieve your email data. After you set up the connector, you can create one or multiple data sources within Amazon Q Business and configure them to start indexing emails from your Gmail account.

Types of documents

Gmail messages can be sorted and stored inside your email inbox using folders and labels.

Let’s looks at what are considered as documents in the context of the Gmail connector for Amazon Q Business. The connector supports the crawling of the following entities in Gmail:

Email – Each email is considered a single document
Attachment – Each email attachment is considered a single document

Additionally, supported custom metadata and custom objects are also crawled during the sync process.

The Gmail connector for Amazon Q Business also supports the indexing of a rich set of metadata from the various entities in Gmail. It further provides the ability to map these source metadata fields to Amazon Q index fields for indexing. These field mappings allow you to map Gmail field names to Amazon Q index field names. There are three types of metadata fields that Amazon Q connectors support:

Default fields – These are required with each document, such as the title, creation date, or author
Optional fields – These are provided by the data source, and the administrator can optionally choose one or more of these fields if they contain important and relevant information to produce accurate answers
Custom metadata fields – These are fields created in the data source in addition to what the data source already provides

Refer to Gmail data source connector field mappings for more information.

Authentication

Before we index the content from Gmail, we need to first establish a secure connection between the Gmail connector for Amazon Q Business with your Google service account. To establish a secure connection, we need to authenticate with the data source.

The connector supports authentication using a Google service account. We describe the process of creating an account later in this post. For more information about authentication, see Gmail connector overview.

Secure querying with ACL crawling and identity crawling

Secure querying is when a user runs a query and is returned answers only from documents that the user has access to. To enable users to do secure querying, Amazon Q Business honors the access control lists (ACLs) of the documents. Amazon Q Business does this by first supporting the indexing of ACLs. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public. Additionally, the user’s credentials (email address) are passed along with the query so that answers from documents that are relevant and which user is authorized to access are displayed.

When connecting a Gmail data source, Amazon Q Business crawls the ACL information attached to a document (user and group information) from your Gmail instance. In Gmail, user IDs are mapped to _user_id. User IDs exist in Gmail on files with set access permissions. They’re mapped from the user emails as the IDs in Gmail.

When a user logs in to a web application to conduct a search, the user’s credentials, such as an email address, need to match what is in the ACL of the document to return results from that document. The web application that the user uses to retrieve answers is connected to an identity provider (IdP) or AWS IAM Identity Center. The user’s credentials from the IdP or IAM Identity Center are referred to here as the federated user credentials. The federated user credentials are passed along with the query so that Amazon Q can return the answers from the documents that this user has access to.

Refer to How Amazon Q Business connector crawls Gmail ACLs for more information.

Solution overview

In the following sections, we demonstrate how to set up the Gmail connector for Amazon Q Business. Then we provide examples of how to use the AI-powered chat interface to gain insights from the connected data source.

In our solution, we index emails from Gmail by configuring the Gmail data source connector. This connector allows you to query your Gmail data using Amazon Q Business as your query engine.

After the configuration is complete, you can configure how often Amazon Q Business should synchronize with your Gmail account to keep up to date with the email content. This process makes sure that your email interactions are systematically updated within Amazon Q Business, enabling you to query and uncover valuable insights from your Gmail data.

The following diagram illustrates the solution architecture. Google Workspace is the data source. Emails and attachments along with the ACL information are passed to Amazon Q Business from the Google workspace. The user submits a query to the Amazon Q Business application. Amazon Q Business retrieves the ACL of the user and provides answers based on the emails and attachments that the user has access to.

Prerequisites

You should have the following:

An Amazon Q Business application. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.
A Google Workspace account and an organization for your business with one or many users that have access to Gmail.
Administrator account credentials to Google Workspace and the Google Cloud console.
Access to AWS Secrets Manager.
Privileges to create a new Amazon Q application (or add data sources to existing applications), AWS resources, and AWS Identity and Access Management (IAM) roles and policies.

Configure the Gmail connector for an Amazon Q Business application

To enable Amazon Q Business to access and index emails from Gmail accounts within the organization, it’s essential to configure the organization’s Google workspace. In the steps that follow, we create a service account that will be used by the Gmail connector for Amazon Q Business to index emails.

We provide the service account with authorization scopes to allow access to the required Gmail APIs. The authorization scopes express the permissions you request users to authorize for your application and are applicable to emails within your organization’s Google workspace.

Complete the following steps:

Log in to your organization’s Google Cloud account.
Create a new project with an appropriate name and assign it to your organization. In our example, we name the project GmailConnector.
Choose Create.

After you create the project, on the navigation menu, choose APIs and Services and Library to view the API Library.

On the API Library page, search for and choose Admin SDK API.

The Admin SDK API enables managing the Google workspace account resources and audit usage.

Choose Enable.

Similarly, search for the Gmail API on the API Library

The Gmail API can help in viewing and managing the Gmail mailbox data like threads, messages, and labels.

Choose Enable to enable this API.

We now create a service account. The service account will be used by the Amazon Q Business Gmail data source connector to access the organization’s emails based on the allowed API scope.

On the navigation menu, choose IAM and Admin and Service accounts.

Choose Create service account.

Name the service account Amazon-q-integration-gmail, enter a description, and choose Create and continue.
Skip the optional sections Grant this service account access to project and Grant users access to this service account.
Choose Done.

Choose the service account you created to navigate to the service account details page.
Note the unique ID for the service account—the unique ID is also known as the client ID, and will be used in later steps.

Next, we create the keys for the service account, which will allow it to be used by the Gmail connector for Amazon Q Business.

On the Keys tab, choose Add key and Create new key.

When prompted for the key type, select the recommended option JSON and choose Create.

This will download the private key to your computer, which must be kept safe to allow configuration within the Amazon Q console. The following screenshot shows an example of the credentials JSON file.

On the Details tab, expand the Advanced settings section and choose View Google Workspace Admin console in the Domain-wide Delegation

Granting access to the service account using a domain-wide delegation to your organization’s data must be treated as a privileged operation and done with caution. You can reverse the access grant by disabling or deleting the service account or removing access through the Google Workspace Admin console.

Use the Google Workspace Admin credentials to log in to the Google Workspace Admin console.
Under Security on the navigation menu, under Access and data control, choose API controls.
In the Domain-wide delegation section, choose Manage domain-wide delegation.

Choose Add new.

In the Add a new client ID dialog, enter the unique ID for the service account you created.
Enter the following scopes to allow the service account to access the emails from Gmail:
- https://www.googleapis.com/auth/gmail.readonly – This scope allows to you to view your email messages and settings.
- https://www.googleapis.com/auth/admin.directory.user.readonly – This scope allows to see and download your organization’s Google Workspace directory.

For more details about all the scopes available, refer to OAuth 2.0 Scopes for Google APIs.

Choose Authorize.

This concludes the configuration within the Google Cloud console and Google Workspace Admin console.

Create the Gmail connector for an Amazon Q Business application

This post assumes that an Amazon Q Business application has already been created beforehand. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.

Complete the following steps to configure the connector:

On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application that you want to add the Gmail connector to.
On the Actions menu, choose Edit.

On the Update application page, leave all values unchanged and choose Update.

On the Update retriever page, leave all values as default and choose Next.

On the Connect data sources page, on the All tab, search for Gmail in the search field.
Choose the plus sign next to Gmail, which will open up a page to set up the data source.

In the Name and description section, enter a name and description.

In the Authentication section, choose Create and add new secret.

In the Create an AWS Secrets Manager secret pop-up, provide the following information:
- Enter a name for your Secrets Manager secret.
- For Client email and Private key, refer to the JSON file that you downloaded to your local machine earlier.
- For Admin account email, enter the admin account for your Google
- For Private key, enter the private key details.
- Choose Save.

In the IAM role section, for IAM role, choose Create a new service role (recommended).

In the Sync scope section, select Message attachments and enter a value for Maximum file size.
Optionally, configure the following under Additional configuration (we leave everything as default for this post):
- For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
- For Email domains, enter the email from domains, email to domains, subject, CC emails, and BCC emails you want to include or exclude in your index.
- For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects
- For Labels, add regular expression patterns to include or exclude certain labels or attachment types. You can add up to 100 patterns.
- For Attachments, add regular expression patterns to include or exclude certain attachments. You can add up to 100 patterns.

In the Sync mode section, select New, modified, or deleted content sync.
In the Sync run schedule section, choose the frequency that works best for your use case. For this post, we choose Run on demand.

Choose Add data source and wait for the retriever to be created.

After the data source is created, you’re redirected to the Connect data sources page to add more data sources as needed.

Verify your data source is added and choose Next.

On the Update groups and users page, choose Add groups and users.

The users and groups that you add in this section are from the IAM Identity Center users and groups set up by your administrator.

In the Add or assign users and groups pop-up window, select Assign existing users and groups to add existing users configured in your connected IAM Identity Center, then choose Next.

Optionally, if you have permissions to add users to connected IAM Identity Center, you can select Add new users.

Choose Get started.

Search for users by user display name or groups by group name.
Choose the users or groups you want you add and choose Assign.

The groups and users that you added should now be available on the Groups or Users tabs.

Choose Assign.

For each group or user entry, an Amazon Q Business subscription tier needs to be assigned.

To enable a subscription for a group, on the Update groups and users page, choose the Groups tab (if individual users need to be assigned a subscription, choose the Users tab).
Under the Subscription column, select Choose subscription and choose a subscription (Q Business Lite or Q Business Pro).
Choose Update application to complete adding and setting up the Gmail connector for Amazon Q Business.

Configure Gmail field mappings

To help you structure data for retrieval and chat filtering, Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Amazon Q has reserved fields that it uses when querying your application. When possible, Amazon Q automatically maps these built-in fields to attributes in your data source.

If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, use the custom field mappings to specify how a data source attribute maps to your Amazon Q application.

On the Amazon Q Business console, choose your application.
Under Data sources, select your data source.
On the Actions menu, choose Edit.

In the Field mappings section, select the required fields to crawl under Messages and Message attachments and any types that are available.

The Gmail connector setup for Amazon Q Business is now complete.

To test the connectivity to Gmail and initiate the data synchronization, choose Sync now. The initial sync process may take several minutes to complete.

When the sync is complete, in the Sync run history section, you can see the sync status along with a summary of how may total items were added, deleted, modified, and failed during the sync process.

Query Gmail data using the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. In the newly created Amazon Q application, choose Customize web experience to open a new tab with a preview of the UI and options to customize as per your needs.

You can customize the Title, Subtitle, and Welcome message fields according to your needs, which will be reflected in the UI.

For this walkthrough, we use the defaults and choose View web experience to be redirected to the login page for the Amazon Q application.

Log in to the application using the credentials for the user that were added to the Amazon Q application. After the login is successful, you’re redirected to the Amazon Q assistant UI, where you can ask questions using natural language and get insights from your Gmail index.

The Gmail data source connected to this Amazon Q Business application has email and Gmail attachments. We demonstrate how the Amazon Q application lets you ask questions on your email using natural language and receive responses and insights for those queries.

Let’s begin by asking Amazon Q to summarize key points from Matt Garma’s (CEO of AWS) email. The following screenshot displays the response and it also includes the email source from where it is generating the response.

For our next example, let’s ask Amazon Q to provide details about return issue customer is facing for a bicycle order they placed with Amazon. Following screenshot shows the details about the issue being faced by the customer and includes the email source from where Amazon Q is generating the response.

Troubleshooting

Troubleshooting your Amazon Q Business Gmail connector provides information about error codes you might see for the Gmail connector and suggested troubleshooting actions. If you encounter an HTTP status code 403 (Forbidden) error when you open your Amazon Q Business application, it means that the user is unable to access the application. . See Troubleshooting Amazon Q Business and identity provider integration for common causes and how to address them.

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

This could happen due to a several reasons:

No permissions – ACLs applied to your account doesn’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
Data connector sync failed – The data connector might have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.

If neither of these reasons are true in your case, open a support case to get this resolved.

How to generate responses from authoritative data sources

You can configure these options using Amazon Q Business application global controls under Admin controls and guardrails.

Log in as an Amazon Q Business application administrator.
Navigate to the application and choose Admin controls and guardrails in the navigation pane.
Choose Edit in the Global controls section to control these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with unique sync run schedule frequency. Verify the sync status and sync schedule frequency for your data connector to see when the last sync ran successfully. Your data connector’s sync run schedule could be set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be run manually. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information on each option.

How to set up Amazon Q Business using a different IdP

You can set up Amazon Q Business with another SAML 2.0-compliant IdP, such as Okta, Entra ID, or Ping Identity. For more information, see Creating an Amazon Q Business application using Identity Federation through IAM.

Expand the solution

You can explore other features in Amazon Q Business. For example, the Amazon Q Business document enrichment feature helps you control both which documents and document attributes are ingested into your index and how they’re ingested. With document enrichment, you can create, modify, or delete document attributes and document content when you ingest them into your Amazon Q Business index. For example, you can scrub personally identifiable information (PII) by choosing to delete any document attributes related to PII.

Amazon Q Business also offers the following features:

Filtering using metadata – Use document attributes to customize and control users’ chat experience. This is currently supported only if you use the Amazon Q Business API.
Source attribution with citations – Verify responses using Amazon Q Business source attributions.
Upload files and chat – Let users upload files directly into chat and use uploaded file data to perform web experience tasks.
Quick prompts – Feature sample prompts to inform users of the capabilities of their Amazon Q Business web experience.

To improve retrieved results and customize the user chat experience, you can map document attributes from your data sources to fields in your Amazon Q index. To learn more, see Gmail data source connector field mappings.

Clean up

To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q application:

On the Amazon Q console, choose Applications in the navigation pane.
Select the dashboard you created.
On the Actions menu, choose Delete.
Delete the IAM roles created for the application and data retriever.
If you used IAM Identity Center for this walkthrough, delete your IAM Identity Center instance.

Conclusion

In this post, we discussed how to configure the Gmail connector for Amazon Q Business and use the AI-powered chat interface to gain insights from the connected data source.

To learn more about the Gmail connector for Amazon Q Business, refer to Connecting Gmail to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.

About the Authors

Divyajeet (DJ) Singh is a Sr. Solutions Architect at AWS Canada. He loves working with customers to help them solve their unique business challenges using the cloud. In his free time, he enjoys spending time with family and friends, and exploring new places.

Temi Aremu is a Solutions Architect at AWS Canada. She is passionate about helping customers solve their business problems with the power of the AWS Cloud. Temi’s areas of interest are analytics, machine learning, and empowering the next generation of women in STEM.

Vineet Kachhawaha is a Sr. Solutions Architect at AWS focusing on AI/ML and generative AI. He co-leads the AWS for Legal Tech team within AWS. He is passionate about working with enterprise customers and partners to design, deploy, and scale AI/ML applications to derive business value.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

Dipti Kulkarni is a Software Development Manager on the Amazon Q and Amazon Kendra engineering team of Amazon Web Services, where she manages the connector development and integration teams.

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets, essential for fine-tuning across a wide range of applications, including large language models (LLMs) and generative AI. By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. Whether it’s annotating images, videos, or text, SageMaker Ground Truth allows you to build accurate datasets while maintaining human oversight and feedback at scale. This human-in-the-loop approach is crucial for aligning foundation models with human preferences, enhancing their ability to perform tasks tailored to your specific requirements.

To support various labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks like image classification, object detection, and semantic segmentation. Additionally, it offers the flexibility to create custom workflows, enabling you to design your own UI templates for specialized data labeling tasks, tailored to your unique requirements.

Previously, setting up a custom labeling job required specifying two AWS Lambda functions: a pre-annotation function, which is run on each dataset object before it’s sent to workers, and a post-annotation function, which is run on the annotations of each dataset object and consolidates multiple worker annotations if needed. Although these functions offer valuable customization capabilities, they also add complexity for users who don’t require additional data manipulation. In these cases, you would have to write functions that merely returned your input unchanged, increasing development effort and the potential for errors when integrating the Lambda functions with the UI template and input manifest file.

Today, we’re pleased to announce that you no longer need to provide pre-annotation and post-annotation Lambda functions when creating custom SageMaker Ground Truth labeling jobs. These functions are now optional on both the SageMaker console and the CreateLabelingJob API. This means you can create custom labeling workflows more efficiently when you don’t require extra data processing.

In this post, we show you how to set up a custom labeling job without Lambda functions using SageMaker Ground Truth. We guide you through configuring the workflow using a multimodal content evaluation template, explain how it works without Lambda functions, and highlight the benefits of this new capability.

Solution overview

When you omit the Lambda functions in a custom labeling job, the workflow simplifies:

No pre-annotation function – The data from the input manifest file is inserted directly into the UI template. You can reference the data object fields in your template without needing a Lambda function to map them.
No post-annotation function – Each worker’s annotation is saved directly to your specified Amazon Simple Storage Service (Amazon S3) bucket as an individual JSON file, with the annotation stored under a worker-response key. Without a post-annotation Lambda function, the output manifest file references these worker response files instead of including all annotations directly within the manifest.

In the following sections, we walk through how to set up a custom labeling job without Lambda functions using a multimodal content evaluation template, which allows you to evaluate model-generated descriptions of images. Annotators can review an image, a prompt, and the model’s response, then evaluate the response based on criteria such as accuracy, relevance, and clarity. This provides crucial human feedback for fine-tuning models using Reinforcement Learning from Human Feedback (RLHF) or evaluating LLMs.

Prepare the input manifest file

To set up our labeling job, we begin by preparing the input manifest file that the template will use. The input manifest is a JSON Lines file where each line represents a dataset item to be labeled. Each line contains a source field for embedded data or a source-ref field for references to data stored in Amazon S3. These fields are used to provide the data objects that annotators will label. For detailed information on the input manifest file structure, refer to Input manifest files.

For our specific task—evaluating model-generated descriptions of images—we structure the input manifest to include the following fields:

“source” – The prompt provided to the model
“image” – The S3 URI of the image associated with the prompt
“modelResponse” – The model’s generated description of the image

By including these fields, we’re able to present both the prompt and the related data directly to the annotators within the UI template. This approach eliminates the need for a pre-annotation Lambda function because all necessary information is readily accessible in the manifest file.

The following code is an example of what a line in our input manifest might look like:

{
  "source": "Describe the following image in four lines",
  "image": "s3://your-bucket-name/path-to-image/image.jpeg",
  "modelResponse": "The image features a stylish pair of over-ear headphones with cushioned ear cups and a tan leather headband on a wooden desk. Soft natural light fills a cozy home office, with a laptop, smartphone, and notebook nearby. A cup of coffee and a pen add to the workspace's relaxed vibe. The setting blends modern tech with a warm, inviting atmosphere."
}

Insert the prompt in the UI template

In your UI template, you can insert the prompt using {{ task.input.source }}, display the image using an <img> tag with src="{{ task.input.image | grant_read_access }}" (the grant_read_access Liquid filter provides the worker with access to the S3 object), and show the model’s response with {{ task.input.modelResponse }}. Annotators can then evaluate the model’s response based on predefined criteria, such as accuracy, relevance, and clarity, using tools like sliders or text input fields for additional comments. You can find the complete UI template for this task in our GitHub repository.

Create the labeling job on the SageMaker console

To configure the labeling job using the AWS Management Console, complete the following steps:

On the SageMaker console, under Ground Truth in the navigation pane, choose Labeling job.
Choose Create labeling job.
Specify your input manifest location and output path.
Select Custom as the task type.
Choose Next.
Enter a task title and description.
Under Template, upload your UI template.

The annotation Lambda functions are now an optional setting under Additional configuration.

Choose Preview to display the UI template for review.

Choose Create to create the labeling job.

Create the labeling job using the CreateLabelingJob API

You can also create the custom labeling job programmatically by using the AWS SDK to invoke the CreateLabelingJob API. After uploading the input manifest files to an S3 bucket and setting up a work team, you can define your labeling job in code, omitting the Lambda function parameters if they’re not needed. The following example demonstrates how to do this using Python and Boto3.

In the API, the pre-annotation Lambda function is specified using the PreHumanTaskLambdaArn parameter within the HumanTaskConfig structure. The post-annotation Lambda function is specified using the AnnotationConsolidationLambdaArn parameter within the AnnotationConsolidationConfig structure. With the recent update, both PreHumanTaskLambdaArn and AnnotationConsolidationConfig are now optional. This means you can omit them if your labeling workflow doesn’t require additional data preprocessing or postprocessing.

The following code is an example of how to create a labeling job without specifying the Lambda functions:

response = sagemaker.create_labeling_job(
    LabelingJobName="Lambda-free-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://customer-bucket/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://customer-bucket/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/CustomerRole",

    # Notice, no PreHumanTaskLambdaArn or AnnotationConsolidationConfig!
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-west-2:058264523720:workteam/private-crowd/customer-work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://customer-bucket/path-to-ui-template"
        }
    }
)

When the annotators submit their evaluations, their responses are saved directly to your specified S3 bucket. The output manifest file includes the original data fields and a worker-response-ref that points to a worker response file in S3. This worker response file contains all the annotations for that data object. If multiple annotators have worked on the same data object, their individual annotations are included within this file under an answers key, which is an array of responses. Each response includes the annotator’s input and metadata such as acceptance time, submission time, and worker ID.

This means that all annotations for a given data object are collected in one place, allowing you to process or analyze them later according to your specific requirements, without needing a post-annotation Lambda function. You have access to all the raw annotations and can perform any necessary consolidation or aggregation as part of your post-processing workflow.

Benefits of labeling jobs without Lambda functions

Creating custom labeling jobs without Lambda functions offers several benefits:

Simplified setup – You can create custom labeling jobs more quickly by skipping the creation and configuration of Lambda functions when they’re not needed.
Time savings – Reducing the number of components in your labeling workflow saves development and debugging time.
Reduced complexity – Fewer moving parts mean a lower chance of encountering configuration errors or integration issues.
Cost reduction – By not using Lambda functions, you reduce the associated costs of deploying and invoking these resources.
Flexibility – You retain the ability to use Lambda functions for preprocessing and annotation consolidation when your project requires these capabilities. This update offers simplicity for straightforward tasks and flexibility for more complex requirements.

This feature is currently available in all AWS Regions that support SageMaker Ground Truth. In the future, look out for built-in task types that don’t require annotation Lambda functions, providing a simplified experience for SageMaker Ground Truth across the board.

Conclusion

The introduction of workflows for custom labeling jobs in SageMaker Ground Truth without Lambda functions significantly simplifies the data labeling process. By making Lambda functions optional, we’ve made it simpler and faster to set up custom labeling jobs, reducing potential errors and saving valuable time.

This update maintains the flexibility of custom workflows while removing unnecessary steps for those who don’t require specialized data processing. Whether you’re conducting simple labeling tasks or complex multi-stage annotations, SageMaker Ground Truth now offers a more streamlined path to high-quality labeled data.

We encourage you to explore this new feature and see how it can enhance your data labeling workflows. To get started, check out the following resources:

Browse over 80 available UI templates to suit your labeling needs on GitHub
Follow the step-by-step guide on creating custom labeling workflows to tailor your data labeling tasks

About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable and cost-efficient pipelines for computer vision applications, natural language processing, and generative AI. In his free time, Sundar loves exploring new places, sampling local eateries and embracing the great outdoors.

Alan Ismaiel is a software engineer at AWS based in New York City. He focuses on building and maintaining scalable AI/ML products, like Amazon SageMaker Ground Truth and Amazon Bedrock Model Evaluation. Outside of work, Alan is learning how to play pickleball, with mixed results.

Yinan Lang is a software engineer at AWS GroundTruth. He worked on GroundTruth, MechanicalTurk and Bedrock infrastructure, as well as customer facing projects for GroundTruth Plus. He also focuses on product security and worked on fixing risks and creating security tests. In leisure time, he is an audiophile and particularly loves to practice keyboard compositions by Bach.

George King is a summer 2024 intern at Amazon AI. He studies Computer Science and Math at the University of Washington and is currently between his second and third year. George loves being outdoors, playing games (chess and all kinds of card games), and exploring Seattle, where he has lived his entire life.

Scale New Heights With ‘Dragon Age: The Veilguard’ in the Cloud on GeForce NOW

Even post-spooky season, GFN Thursday has some treats for GeForce NOW members: a new batch of 17 games joining the cloud in November.

Catch the five games available to stream this week, including Dragon Age: The Veilguard, the highly anticipated next installment in BioWare’s beloved fantasy role-playing game series. Players who purchased the GeForce NOW Ultimate bundle can stream the game at launch for free starting today.

Unite the Veilguard

Dragon Age: The Veilguard on GeForce NOW — *What’s your dragon age?*

In Dragon Age: The Veilguard, take on the role of Rook and stop a pair of corrupt ancient gods who’ve broken free from centuries of darkness, hellbent on destroying the world. Set in the rich world of Thedas, the game includes an epic story with meaningful choices, deep character relationships, and a mix of familiar and new companions to go on adventures with.

Select from three classes, each with distinct weapon types, and harness the classes’ unique, powerful abilities while coordinating with a team of seven companions, who have their own rich lives and deep backstories. An expansive skill-tree system allows for diverse character builds across the Warrior, Rogue and Mage classes.

Experience the adventure in the vibrant world of Thedas with enhanced visual fidelity and performance by tapping into a GeForce NOW membership. Performance members can enjoy the game at up to 1440p resolution and 60 frames per second (fps). Ultimate members can take advantage of 4K resolution, up to 120 fps and advanced features like NVIDIA DLSS 3, low-latency gameplay with NVIDIA Reflex, and enhanced image quality and immersion with ray-traced ambient occlusion and reflections, even on low-powered devices.

‘Resident Evil 4’ in the Cloud

RE4 on GeForce NOW — *Stream it from the cloud to survive.*

Capcom’s Resident Evil 4 is now available on GeForce NOW, bringing the horror to cloud gaming.

Survival is just the beginning. Six years have passed since the biological disaster in Raccoon City.

Agent Leon S. Kennedy, one of the incident’s survivors, has been sent to rescue the president’s kidnapped daughter. The agent tracks her to a secluded European village, where there’s something terribly wrong with the locals. The curtain rises on this story of daring rescue and grueling horror where life and death, terror and catharsis intersect.

Featuring modernized gameplay, a reimagined storyline and vividly detailed graphics, Resident Evil 4 marks the rebirth of an industry juggernaut. Relive the nightmare that revolutionized survival horror, with stunning high-dynamic-range visuals and immersive ray-tracing technology for Performance and Ultimate members.

Life Is Great With New Games

The Division 2 Y6S2 on GeForce NOW — *Time to gear up, agents.*

A new season for “Year 6” in Tom Clancy’s The Division 2 from Ubisoft is now available for members to stream. In Shades of Red, rogue ex-Division agent Aaron Keener has given himself up and is now in custody at the White House. The Division must learn what he knows to secure the other members of his team. New Seasonal Modifiers change gameplay and gear usage for players. A new revamped progression is also available. The Seasonal Journey comprises a series of missions, each containing a challenge-style objective for players to complete.

Look for the following games available to stream in the cloud this week:

Life Is Strange: Double Exposure (New release on Steam and Xbox, available in the Microsoft store, Oct. 29)
Dragon Age: The Veilguard (New release on Steam and EA App, Oct. 31)
Resident Evil 4 (Steam)
Resident Evil 4 Chainsaw Demo (Steam)
VRChat (Steam)

Here’s what members can expect for the rest of November:

Metal Slug Tactics (New release on Steam, Nov. 5)
Planet Coaster 2 (New release on Steam, Nov. 6)
Teenage Mutant Ninja Turtles: Splintered Fate (New Release on Steam, Nov. 6)
Empire of the Ants (New release on Steam, Nov. 7)
Unrailed 2: Back on Track (New release on Steam, Nov. 7)
Farming Simulator 25 (New release on Steam, Nov. 12)
Sea Power: Naval Combat in the Missile Age (New release on Steam, Nov. 12)
Industry Giant 4.0 (New release Steam, Nov. 15)
Towers of Aghasba (New release on Steam, Nov. 19)
S.T.A.L.K.E.R. 2: Heart of Chornobyl (New release on Steam and Xbox, available on PC Game Pass, Nov .20)
Star Wars Outlaws (New release on Steam, Nov. 21)
Dungeons & Degenerate Gamblers (Steam)
Headquarters: World War II (Steam)
PANICORE (Steam)
Slime Rancher (Steam)
Sumerian Six (Steam)
TCG Card Shop Simulator (Steam)

Outstanding October

In addition to the 22 games announced last month, eight more joined the GeForce NOW library:

Empyrion – Galactic Survival (New release on Epic Games Store, Oct. 10)
Assassin’s Creed Mirage (New release on Steam, Oct. 17)
Windblown (New release on Steam, Oct. 24)
Call of Duty HQ, including Call of Duty: Modern Warfare III and Call of Duty: Warzone (Xbox, available on PC Game Pass)
Dungeon Tycoon (Steam)
Off the Grid (Epic Games Store)
South Park: The Fractured but Whole (Available on PC Game Pass, Oct 16. Members need to activate access.)
Star Trucker (Steam and Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

What’s video game character you’ve dressed up as for Halloween?

— NVIDIA GeForce NOW (@NVIDIAGFN) October 29, 2024

4 experiments with voice AI models to help you explore culture

Here are four AI voice models from Google Arts & Culture that offer a new way to experience and engage with art, history and culture.Read More

Deploying LLMs with TorchServe + vLLM

The vLLM engine is currently one of the top-performing ways to execute large language models (LLM). It provides the vllm serve command as an easy option to deploy a model on a single machine. While this is convenient, to serve these LLMs in production and at scale some advanced features are necessary.

TorchServe offers these essential production features (like custom metrics and model versioning) and through its flexible custom handler design, makes it very easy to integrate features such as retrieval-augmented generation (RAG) or safeguards like Llama Guard. It is therefore natural to pair the vLLM engine with TorchServe to create a full-fledged LLM serving solution for production.

Before going into the specifics of the integration, we will demonstrate the deployment of a Llama-3.1-70B-Instruct model using TorchServe’s vLLM docker image.

Quickly getting started with Llama 3.1 on TorchServe + vLLM

To get started we need to build the new TS LLM Docker container image by checking out the TorchServe repository and execute the following command from the main folder:

docker build --pull . -f docker/Dockerfile.vllm -t ts/vllm

The container uses our new LLM launcher script ts.llm_launcher which takes a Hugging Face model URI or local folder and spins up a local TorchServe instance with the vLLM engine running in the backend. To serve a model locally, you can create an instance of the container with the following command:

#export token=<HUGGINGFACE_HUB_TOKEN>
docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 
8080:8080 -v data:/data ts/vllm --model_id meta-llama/Meta-Llama-3.1-70B-Instruct --disable_token_auth

You can test the endpoint locally with this curl command:

curl -X POST -d '{"model":"meta-llama/Meta-Llama-3.1-70B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"

The docker stores the model weights in the local folder “data” which gets mounted as /data inside the container. To serve your custom local weights simply copy them into data and point the model_id to /data/<your weights>.

Internally, the container uses our new ts.llm_launcher script to launch TorchServe and deploy the model. The launcher simplifies the deployment of an LLM with TorchServe into a single command line and can also be used outside the container as an efficient tool for experimentation and testing. To use the launcher outside the docker, follow the TorchServe installation steps and then execute the following command to spin up a 8B Llama model:

# after installing TorchServe and vLLM run
python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3.1-8B-Instruct  --disable_token_auth

If multiple GPUs are available the launcher will automatically claim all visible devices and apply tensor parallelism (see CUDA_VISIBLE_DEVICES to specify which GPUs to use).

While this is very convenient, it’s important to note that it does not encompass all the functionalities provided by TorchServe. For those looking to leverage more advanced features, a model archive needs to be created. While this process is a bit more involved than issuing a single command, it bears the advantage of custom handlers and versioning. While the former allows to implement RAG inside the preprocessing step, the latter lets you test different versions of a handler and model before deploying on a larger scale.

Before we provide the detailed steps to create and deploy a model archive, let’s dive into the details of the vLLM engine integration.

TorchServe’s vLLM Engine Integration

As a state-of-the-art serving framework, vLLM offers a plethora of advanced features, including PagedAttention, continuous batching, rapid model execution through CUDA graphs, and support for various quantization methods such as GPTQ, AWQ, INT4, INT8, and FP8. It also provides integration for important parameter-efficient adapter methods like LoRA and access to a wide range of model architectures including Llama and Mistral. vLLM is maintained by the vLLM team and a thriving open-source community.

To facilitate quick deployment, it offers a serving mode based on FastAPI to serve LLMs over HTTP. For a tighter, more flexible integration the project also provides the vllm.LLMEngine which offers interfaces to process requests on a continuous basis. We leveraged the asynchronous variant for the integration into TorchServe.

TorchServe is an easy-to-use, open-source solution for serving PyTorch models in production. As a production-tested serving solution, TorchServe offers numerous benefits and features beneficial for deploying PyTorch models at scale. By combining it with the inference performance of the vLLM engine these benefits can now also be used to deploy LLMs at scale.

To maximize hardware utilization it is generally a good practice to batch requests from multiple users together. Historically, TorchServe only offered a synchronized mode to collect requests from various users. In this mode, TorchServe waits for a predefined amount of time (e.g., batch_delay=200ms) or until enough requests (e.g., batch_size=8) have arrived. When one of these events is triggered, the batched data gets forwarded to the backend where the model is applied to the batch, and the model output is returned to the users through the frontend. This works especially well for traditional vision models where outputs for each request usually finish at the same time.

For generative use cases, particularly text generation, the assumption that requests are ready simultaneously is no longer valid, as responses will have varying lengths. Although TorchServe supports continuous batching (the ability to add and remove requests dynamically), this mode only accommodates a static maximum batch size. With the introduction of PagedAttention, even this assumption of a maximum batch size becomes more flexible, as vLLM can combine requests of different lengths in a highly adaptable manner to optimize memory utilization.

To achieve optimal memory utilization, i.e., to fill unused gaps in memory (think Tetris), vLLM requires complete control over the decision of which requests to process at any given time. To provide this flexibility, we had to reevaluate how TorchServe handles user requests. Instead of the previous synchronous processing mode, we introduced an asynchronous mode (see diagram below) where incoming requests are directly forwarded to the backend, making them available for vLLM. The backend feeds the vllm.AsyncEngine, which can now select from all available requests. If streaming mode is enabled and the first token of a request is available, the backend will send out the result immediately and continue sending tokens until the final token is generated.

Our implementation of the VLLMHandler enables users to quickly deploy any model compatible with vLLM using a configuration file, while still offering the same level of flexibility and customizability through a custom handler. Users are free to add e.g. custom preprocessing or post-processing steps by inheriting from VLLMHandler and overriding the respective class methods.

We also support single-node, multi-GPU distributed inference, where we configure vLLM to use tensor parallel sharding of the model to either increase capacity for smaller models or enable larger models that do not fit on a single GPU, such as the 70B Llama variants. Previously, TorchServe only supported distributed inference using torchrun, where multiple backend worker processes were spun up to shard the model. vLLM manages the creation of these processes internally, so we introduced the new “custom” parallelType to TorchServe which launches a single backend worker process and provides the list of assigned GPUs. The backend process can then launch its own subprocesses if necessary.

To facilitate integration of TorchServe + vLLM into docker-based deployments, we provide a separate Dockerfile based on TorchServe’s GPU docker image, with vLLM added as a dependency. We chose to keep the two separate to avoid increasing the docker image size for non-LLM deployments.

Next, we will demonstrate the steps required to deploy a Llama 3.1 70B model using TorchServe + vLLM on a machine with four GPUs.

Step-by-Step Guide

For this step-by-step guide we assume the installation of TorchServe has finished successfully. Currently, vLLM is not a hard-dependency for TorchServe so let’s install the package using pip:

$ pip install -U vllm==0.6.1.post2

In the following steps, we will (optionally) download the model weights, explain the configuration, create a model archive, deploy and test it:

1. (Optional) Download Model Weights

This step is optional, as vLLM can also handle downloading the weights when the model server is started. However, pre-downloading the model weights and sharing the cached files between TorchServe instances can be beneficial in terms of storage usage and startup time of the model worker. If you choose to download the weights, use the huggingface-cli and execute:

# make sure you have logged into huggingface with huggingface-cli login before
# and have your access request for the Llama 3.1 model weights approved

huggingface-cli download meta-llama/Meta-Llama-3.1-70B-Instruct --exclude original/*

This will download the files under $HF_HOME, and you can alter the variable if you want to place the files elsewhere. Please ensure that you update the variable wherever you run TorchServe and make sure it has access to that folder.

2. Configure the Model

Next, we create a YAML configuration file that contains all the necessary parameters for our model deployment. The first part of the config file specifies how the frontend should launch the backend worker, which will ultimately run the model in a handler. The second part includes parameters for the backend handler, such as the model to load, followed by various parameters for vLLM itself. For more information on possible configurations for the vLLM engine, please refer to this link.

echo '
# TorchServe frontend parameters
minWorkers: 1            
maxWorkers: 1            # Set the number of worker to create a single model instance
startupTimeout: 1200     # (in seconds) Give the worker time to load the model weights
deviceType: "gpu" 
asyncCommunication: true # This ensures we can cummunicate asynchronously with the worker
parallelType: "custom"   # This lets TS create a single backend prosses assigning 4 GPUs
parallelLevel: 4

# Handler parameters
handler:
    # model_path can be a model identifier for Hugging Face hub or a local path
    model_path: "meta-llama/Meta-Llama-3.1-70B-Instruct"
    vllm_engine_config:  # vLLM configuration which gets fed into AsyncVLLMEngine
        max_num_seqs: 16
        max_model_len: 512
        tensor_parallel_size: 4
        served_model_name:
            - "meta-llama/Meta-Llama-3.1-70B-Instruct"
            - "llama3"
'> model_config.yaml

3. Create the Model Folder

After creating the model configuration file (model_config.yaml), we will now create a model archive that includes the configuration and additional metadata, such as versioning information. Since the model weights are large, we will not include them inside the archive. Instead, the handler will access the weights by following the model_path specified in the model configuration. Note that in this example, we have chosen to use the “no-archive” format, which creates a model folder containing all necessary files. This allows us to easily modify the config files for experimentation without any friction. Later, we can also select the mar or tgz format to create a more easily transportable artifact.

mkdir model_store
torch-model-archiver --model-name vllm --version 1.0 --handler vllm_handler --config-file model_config.yaml --archive-format no-archive --export-path model_store/

4. Deploy the Model

The next step is to start a TorchServe instance and load the model. Please note that we have disabled token authentication for local testing purposes. It is highly recommended to implement some form of authentication when publicly deploying any model.

To start the TorchServe instance and load the model, run the following command:

torchserve --start --ncs  --model-store model_store --models vllm --disable-token-auth

You can monitor the progress of the model loading through the log statements. Once the model has finished loading, you can proceed to test the deployment.

5. Test the Deployment

The vLLM integration uses an OpenAI API compatible format so we can either use a specialized tool for this purpose or curl. The JSON data we are using here includes the model identifier as well as the prompt text. Other options and their default values can be found in the vLLMEngine docs.

echo '{
  "model": "llama3",
  "prompt": "A robot may not injure a human being",
  "stream": 0
}' | curl --header "Content-Type: application/json"   --request POST --data-binary @-   http://localhost:8080/predictions/vllm/1.0/v1/completions

The output of the request looks like this:

{
  "id": "cmpl-cd29f1d8aa0b48aebcbff4b559a0c783",
  "object": "text_completion",
  "created": 1727211972,
  "model": "meta-llama/Meta-Llama-3.1-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " or, through inaction, allow a human being to come to harm.nA",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 26,
    "completion_tokens": 16
  }

When streaming is False TorchServe will collect the full answer and send it in one go after the last token was created. If we flip the stream parameter we will receive piecewise data containing a single token in each message.

Conclusion

In this blog post, we explored the new, native integration of the vLLM inference engine into TorchServe. We demonstrated how to locally deploy a Llama 3.1 70B model using the ts.llm_launcher script and how to create a model archive for deployment on any TorchServe instance. Additionally, we discussed how to build and run the solution in a Docker container for deployment on Kubernetes or EKS. In future works, we plan to enable multi-node inference with vLLM and TorchServe, as well as offer a pre-built Docker image to simplify the deployment process.

We would like to express our gratitude to Mark Saroufim and the vLLM team for their invaluable support in the lead-up to this blog post.

Ask a Techspert: What’s the difference between a CPU, GPU and TPU?

Learn more from a Google expert about CPUs, GPUs and TPUs — and Google latest TPU, Trillium.Read More

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

Preserving and taking advantage of institutional knowledge is critical for organizational success and adaptability. This collective wisdom, comprising insights and experiences accumulated by employees over time, often exists as tacit knowledge passed down informally. Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges. Traditional methods, such as manual documentation or interviews, are often time-consuming, inconsistent, and prone to errors. Moreover, the most valuable knowledge frequently resides in the minds of seasoned employees, who may find it difficult to articulate or lack the time to document their expertise comprehensively.

This post introduces an innovative voice-based application workflow that harnesses the power of Amazon Bedrock, Amazon Transcribe, and React to systematically capture and document institutional knowledge through voice recordings from experienced staff members. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Our solution uses Amazon Transcribe for real-time speech-to-text conversion, enabling accurate and immediate documentation of spoken knowledge. We then use generative AI, powered by Amazon Bedrock, to analyze and summarize the transcribed content, extracting key insights and generating comprehensive documentation.

The front-end of our application is built using React, a popular JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, providing users with a real-time transcription experience. As employees speak, they can observe their words converted to text in real-time, permitting immediate review and editing.

By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a comprehensive solution for capturing, processing, and preserving valuable institutional knowledge. This approach not only streamlines the documentation process but also enhances the quality and accessibility of the captured information, supporting operational excellence and fostering a culture of continuous learning and improvement within organizations.

Solution overview

This solution uses a combination of AWS services, including Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to deliver real-time transcription and document generation. This solution uses a combination of cutting-edge technologies to create a seamless knowledge capture process:

User interface – A React-based front-end, distributed through Amazon CloudFront, provides an intuitive interface for employees to input voice data.
Real-time transcription – Amazon Transcribe streaming converts speech to text in real time, providing accurate and immediate transcription of spoken knowledge.
Intelligent processing – A Lambda function, powered by generative AI models through Amazon Bedrock, analyzes and summarizes the transcribed text. It goes beyond simple summarization by performing the following actions:
- Extracting key concepts and terminologies.
- Structuring the information into a coherent, well-organized document.
Secure storage – Raw audio files, processed information, summaries, and generated content are securely stored in Amazon S3, providing scalable and durable storage for this valuable knowledge repository. S3 bucket policies and encryption are implemented to enforce data security and compliance.

This solution uses a custom authorization Lambda function with Amazon API Gateway instead of more comprehensive identity management solutions such as Amazon Cognito. This approach was chosen for several reasons:

Simplicity – As a sample application, it doesn’t demand full user management or login functionality
Minimal user friction – Users don’t need to create accounts or log in, simplifying the user experience
Quick implementation – For rapid prototyping, this approach can be faster to implement than setting up a full user management system
Temporary credential management – Businesses can use this approach to offer secure, temporary access to AWS services without embedding long-term credentials in the application

Although this solution works well for this specific use case, it’s important to note that for production applications, especially those dealing with sensitive data or needing user-specific functionality, a more robust identity solution such as Amazon Cognito would typically be recommended.

The following diagram illustrates the architecture of our solution.

The workflow includes the following steps:

Users access the front-end UI application, which is distributed through CloudFront
The React web application sends an initial request to Amazon API Gateway
API Gateway forwards the request to the authorization Lambda function
The authorization function checks the request against the AWS Identity and Access Management (IAM) role to confirm proper permissions
The authorization function sends temporary credentials back to the front-end application through API Gateway
With the temporary credentials, the React web application communicates directly with Amazon Transcribe for real-time speech-to-text conversion as the user records their input
After recording and transcription, the user sends (through the front-end UI) the transcribed texts and audio files to the backend through API Gateway
API Gateway routes the authorized request (containing transcribed text and audio files) to the orchestration Lambda function
The orchestration function sends the transcribed text for summarization
The orchestration function receives summarized text from Amazon Bedrock to generate content
The orchestration function stores the generated PDF files and recorded audio files in the artifacts S3 bucket

Prerequisites

You need the following prerequisites:

An active AWS account
Docker installed
The AWS CDK Toolkit 2.114.1+ installed and bootstrapped to the us-east-1 AWS Region
Python 3.12+ installed
Model access to Anthropic’s Claude enabled in Amazon Bedrock
An IAM user or role with access to Amazon Transcribe, Amazon Bedrock, Amazon S3, and Lambda

Deploy the solution with the AWS CDK

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation. Our AWS CDK stack deploys resources from the following AWS services:

Amazon Bedrock
Amazon CloudFront
AWS CodeBuild
Amazon EventBridge
IAM
AWS Key Management Service (AWS KMS)
AWS Lambda
Amazon S3
AWS Systems Manager Parameter Store
Amazon Transcribe
AWS WAF

To deploy the solution, complete the following steps:

Clone the GitHub repository: genai-knowledge-capture-webapp
Follow the Prerequisites section in the README.md file to set up your local environment

As of this writing, this solution supports deployment to the us-east-1 Region. The CloudFront distribution in this solution is geo-restricted to the US and Canada by default. To change this configuration, refer to the react-app-deploy.ts GitHub repo.

Invoke npm install to install the dependencies
Invoke cdk deploy to deploy the solution

The deployment process typically takes 20–30 minutes. When the deployment is complete, CodeBuild will build and deploy the React application, which typically takes 2–3 minutes. After that, you can access the UI at the ReactAppUrl URL that is output by the AWS CDK.

Amazon Transcribe Streaming within React application

Our solution’s front-end is built using React, a popular JavaScript library for creating dynamic user interfaces. We integrate Amazon Transcribe streaming into our React application using the aws-sdk/client-transcribe-streaming library. This integration enables real-time speech-to-text functionality, so users can observe their spoken words converted to text instantly.

The real-time transcription offers several benefits for knowledge capture:

With the immediate feedback, speakers can correct or clarify their statements in the moment
The visual representation of spoken words can help maintain focus and structure in the knowledge sharing process
It reduces the cognitive load on the speaker, who doesn’t need to worry about note-taking or remembering key points

In this solution, the Amazon Transcribe client is managed in a reusable React hook, useAudioTranscription.ts. An additional React hook, useAudioProcessing.ts, implements the necessary audio stream processing. Refer to the GitHub repo for more information. The following is a simplified code snippet demonstrating the Amazon Transcribe client integration:

// Create Transcribe client
transcribeClientRef.current = new TranscribeStreamingClient({
  region: credentials.Region,
  credentials: {
    accessKeyId: credentials.AccessKeyId,
    secretAccessKey: credentials.SecretAccessKey,
    sessionToken: credentials.SessionToken,
  },
});

// Create Transcribe Start Command
const transcribeStartCommand = new StartStreamTranscriptionCommand({
  LanguageCode: transcribeLanguage,
  MediaEncoding: audioEncodingType,
  MediaSampleRateHertz: audioSampleRate,
  AudioStream: getAudioStreamGenerator(),
});

// Start Transcribe session
const data = await transcribeClientRef.current.send(
  transcribeStartCommand
);
console.log("Transcribe session established ", data.SessionId);
setIsTranscribing(true);

// Process Transcribe result stream
if (data.TranscriptResultStream) {
  try {
    for await (const event of data.TranscriptResultStream) {
      handleTranscriptEvent(event, setTranscribeResponse);
    }
  } catch (error) {
    console.error("Error processing transcript result stream:", error);
  }
}

For optimal results, we recommend using a good-quality microphone and speaking clearly. At the time of writing, the system supports major dialects of English, with plans to expand language support in future updates.

Use the application

After deployment, open the ReactAppUrl link (https://<cloud front domain name>.cloudfront.net) in your browser (the solution supports Chrome, Firefox, Edge, Safari, and Brave browsers on Mac and Windows). A web UI opens, as shown in the following screenshot.

To use this application, complete the following steps:

Enter a question or topic.
Enter a file name for the document.
Choose Start Transcription and start recording your input for the given question or topic. The transcribed text will be shown in the Transcription box in real time.
After recording, you can edit the transcribed text.
You can also choose the play icon to play the recorded audio clips.
Choose Generate Document to invoke the backend service to generate a document from the input question and associated transcription. Meanwhile, the recorded audio clips are sent to an S3 bucket for future analysis.

The document generation process uses FMs from Amazon Bedrock to create a well-structured, professional document. The FM model performs the following actions:

Organizes the content into logical sections with appropriate headings
Identifies and highlights important concepts or terminologies
Generates a brief executive summary at the beginning of the document
Applies consistent formatting and styling

The audio files and generated documents are stored in a dedicated S3 bucket, as shown in the following screenshot, with appropriate encryption and access controls in place.

Choose View Document after you generate the document, and you will notice a professional PDF document generated with the user’s input in your browser, accessed through a presigned URL.

Additional information

To further enhance your knowledge capture solution and address specific use cases, consider the additional features and best practices discussed in this section.

Custom vocabulary with Amazon Transcribe

For industries with specialized terminology, Amazon Transcribe offers a custom vocabulary feature. You can define industry-specific terms, acronyms, and phrases to improve transcription accuracy. To implement this, complete the following steps:

Create a custom vocabulary file with your specialized terms
Use the Amazon Transcribe API to add this vocabulary to your account
Specify the custom vocabulary in your transcription requests

Asynchronous file uploads

For handling large audio files or improving user experience, implement an asynchronous upload process:

Create a separate Lambda function for file uploads
Use Amazon S3 presigned URLs to allow direct uploads from the client to Amazon S3
Invoke the upload Lambda function using S3 Event Notifications

Multi-topic document generation

For generating comprehensive documents covering multiple topics, refer to the following AWS Prescriptive Guidance pattern: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This pattern provides a scalable approach to combining multiple voice inputs into a single, coherent document.

Key benefits of this approach include:

Efficient capture of complex, multifaceted knowledge
Improved document structure and coherence
Reduced cognitive load on subject matter experts (SMEs)

Use captured knowledge as a knowledge base

The knowledge captured through this solution can serve as a valuable, searchable knowledge base for your organization. To maximize its utility, you can integrate with enterprise search solutions such as Amazon Bedrock Knowledge Bases to make the captured knowledge quickly discoverable. Additionally, you can set up regular review and update cycles to keep the knowledge base current and relevant.

Clean up

When you’re done testing the solution, remove it from your AWS account to avoid future costs:

Invoke cdk destroy to remove the solution
You may also need to manually remove the S3 buckets created by the solution

Summary

This post demonstrates the power of combining AWS services such as Amazon Transcribe and Amazon Bedrock with popular front-end frameworks such as React to create a robust knowledge capture solution. By using real-time transcription and generative AI, organizations can efficiently document and preserve valuable institutional knowledge, fostering innovation, improving decision-making, and maintaining a competitive edge in dynamic business environments.

We encourage you to explore this solution further by deploying it in your own environment and adapting it to your organization’s specific needs. The source code and detailed instructions are available in our genai-knowledge-capture-webapp GitHub repository, providing a solid foundation for your knowledge capture initiatives.

By embracing this innovative approach to knowledge capture, organizations can unlock the full potential of their collective wisdom, driving continuous improvement and maintaining their competitive edge.

About the Authors

Jundong Qiao is a Machine Learning Engineer at AWS Professional Service, where he specializes in implementing and enhancing AI/ML capabilities across various sectors. His expertise encompasses building next-generation AI solutions, including chatbots and predictive models that drive efficiency and innovation.

Michael Massey is a Cloud Application Architect at Amazon Web Services. He helps AWS customers achieve their goals by building highly-available and highly-scalable solutions on the AWS Cloud.

Praveen Kumar Jeyarajan is a Principal DevOps Consultant at AWS, supporting Enterprise customers and their journey to the cloud. He has 13+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.

How GraphRAG works

Introducing DRIFT Search

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

DRIFT Search: A step-by-step process

Why DRIFT search is effective

Benchmarking DRIFT search

Availability

Future research directions

Solution overview

Prerequisites

Deploy the solution

Register a new app in Google Chat

Converse with your custom Google Chat app

Customize the solution

Performance optimization

Clean up

Conclusion

About the Authors

Finding accurate answers from content in Gmail mailbox using Amazon Q Business

Overview of the Gmail connector for Amazon Q Business

Types of documents

Authentication

Secure querying with ACL crawling and identity crawling

Solution overview

Prerequisites

Configure the Gmail connector for an Amazon Q Business application

Create the Gmail connector for an Amazon Q Business application

Configure Gmail field mappings

Query Gmail data using the Amazon Q web experience

Troubleshooting

Frequently asked questions

Amazon Q Business is unable to answer your questions

How to generate responses from authoritative data sources

Amazon Q Business responds using old (stale) data even though your data source is updated

How to set up Amazon Q Business using a different IdP

Expand the solution

Clean up

Conclusion

About the Authors

Solution overview

Prepare the input manifest file

Insert the prompt in the UI template

Create the labeling job on the SageMaker console

Create the labeling job using the CreateLabelingJob API

Benefits of labeling jobs without Lambda functions

Conclusion

About the Authors

Unite the Veilguard

‘Resident Evil 4’ in the Cloud

Life Is Great With New Games

Outstanding October

Quickly getting started with Llama 3.1 on TorchServe + vLLM

TorchServe’s vLLM Engine Integration

Step-by-Step Guide

1. (Optional) Download Model Weights

2. Configure the Model

3. Create the Model Folder

4. Deploy the Model

5. Test the Deployment

Conclusion

Solution overview

Prerequisites

Deploy the solution with the AWS CDK

Amazon Transcribe Streaming within React application

Use the application

Additional information

Custom vocabulary with Amazon Transcribe

Asynchronous file uploads

Multi-topic document generation

Use captured knowledge as a knowledge base

Clean up

Summary

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.