We’re publishing our 2024 Responsible AI Progress Report and updating our Frontier Safety Framework and AI Principles.Read More
Updating the Frontier Safety Framework
Our next iteration of the FSF sets out stronger security protocols on the path to AGIRead More
Adaptive Training Distributions with Scalable Online Bilevel Optimization
Large neural networks pretrained on web-scale corpora are central to modern machine learning. In this paradigm, the distribution of the large, heterogeneous pretraining data rarely matches that of the application domain. This work considers modifying the pretraining distribution in the case where one has a small sample of data reflecting the targeted test conditions. We propose an algorithm motivated by a recent formulation of this setting as an online, bilevel optimization problem. With scalability in mind, our algorithm prioritizes computing gradients at training points which are likely to…Apple Machine Learning Research
Accelerate video Q&A workflows using Amazon Bedrock Knowledge Bases, Amazon Transcribe, and thoughtful UX design
Organizations are often inundated with video and audio content that contains valuable insights. However, extracting those insights efficiently and with high accuracy remains a challenge. This post explores an innovative solution to accelerate video and audio review workflows through a thoughtfully designed user experience that enables human and AI collaboration. By approaching the problem from the user’s point of view, we can create a powerful tool that allows people to quickly find relevant information within long recordings without the risk of AI hallucinations.
Many professionals, from lawyers and journalists to content creators and medical practitioners, need to review hours of recorded content regularly to extract verifiably accurate insights. Traditional methods of manual review or simple keyword searches over transcripts are time-consuming and often miss important context. More advanced AI-powered summarization tools exist, but they risk producing hallucinations or inaccurate information, which can be dangerous in high-stakes environments like healthcare or legal proceedings.
Our solution, the Recorded Voice Insight Extraction Webapp (ReVIEW), addresses these challenges by providing a seamless method for humans to collaborate with AI, accelerating the review process while maintaining accuracy and trust in the results. The application is built on top of Amazon Transcribe and Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
User experience
To accelerate a user’s review of a long-form audio or video while mitigating the risk of hallucinations, we introduce the concept of timestamped citations. Not only are large language models (LLMs) capable of answering a user’s question based on the transcript of the file, they are also capable of identifying the timestamp (or timestamps) of the transcript during which the answer was discussed. By using a combination of transcript preprocessing, prompt engineering, and structured LLM output, we enable the user experience shown in the following screenshot, which demonstrates the conversion of LLM-generated timestamp citations into clickable buttons (shown underlined in red) that navigate to the correct portion of the source video.
The user in this example has uploaded a number of videos, including some recordings of AWS re:Invent talks. You’ll notice that the preceding answer actually contains a hallucination originating from an error in the transcript; the AI assistant replied that “Hyperpaths” was announced, when in reality the service is called Amazon SageMaker HyperPod.
The user in the preceding screenshot had the following journey:
- The user asks the AI assistant “What’s new with SageMaker?” The assistant searches the timestamped transcripts of the uploaded re:Invent videos.
- The assistant provides an answer with citations. Those citations contain both the name of the video and a timestamp, and the frontend displays buttons corresponding to the citations. Each citation can point to a different video, or to different timestamps within the same video.
- The user reads that SageMaker “Hyperpaths” was announced. They proceed to verify the accuracy of the generated answer by selecting the buttons, which auto play the source video starting at that timestamp.
- The user sees that the product is actually called Amazon SageMaker HyperPod, and can be confident that SageMaker HyperPod was the product announced at re:Invent.
This experience, which is at the heart of the ReVIEW application, enables users to efficiently get answers to questions based on uploaded audio or video files and to verify the accuracy of the answers by rewatching the source media for themselves.
Solution overview
The full code for this application is available on the GitHub repo.
The architecture of the solution is shown in the following diagram, showcasing the flow of data through the application.
The workflow consists of the following steps:
- A user accesses the application through an Amazon CloudFront distribution, which adds a custom header and forwards HTTPS traffic to an Elastic Load Balancing application load balancer. Behind the load balancer is a containerized Streamlit application running on Amazon Elastic Container Service (Amazon ECS).
- Amazon Cognito handles user logins to the frontend application and Amazon API Gateway.
- When a user uploads a media file through the frontend, a pre-signed URL is generated for the frontend to upload the file to Amazon Simple Storage Service (Amazon S3).
- The frontend posts the file to an application S3 bucket, at which point a file processing flow is initiated through a triggered AWS Lambda. The file is sent to Amazon Transcribe and the resulting transcript is stored in Amazon S3. The transcript gets postprocessed into a text form more appropriate for use by an LLM, and an AWS Step Functions state machine syncs the transcript to a knowledge base configured in Amazon Bedrock Knowledge Bases. The knowledge base sync process handles chunking and embedding of the transcript, and storing embedding vectors and file metadata in an Amazon OpenSearch Serverless vector database.
- If a user asks a question of one specific transcript (designated by the “pick media file” dropdown menu in the UI), the entire transcript is used to generate the response, so a retrieval step using the knowledge base is not required and an LLM is called directly through Amazon Bedrock.
- If the user is asking a question whose answer might appear in any number of source videos (by choosing Chat with all media files on the dropdown menu in the UI), the Amazon Bedrock Knowledge Bases RetrieveAndGenerate API is used to embed the user query, find semantically similar chunks in the vector database, input those chunks into an LLM prompt, and generate a specially formatted response.
- Throughout the process, application data from tracking transcription and ingestion status, mapping user names to uploaded files, and caching responses are accomplished with Amazon DynamoDB.
One important characteristic of the architecture is the clear separation of frontend and backend logic through an API Gateway deployed REST API. This was a design decision to enable users of this application to replace the Streamlit frontend with a custom frontend. There are instructions for replacing the frontend in the README of the GitHub repository.
Timestamped citations
The key to this solution lies in the prompt engineering and structured output format. When generating a response to a user’s question, the LLM is instructed to not only provide an answer to the question (if possible), but also to cite its sources in a specific way.
The full prompt can be seen in the GitHub repository, but a shortened pseudo prompt (for brevity) is shown here:
You are an intelligent AI which attempts to answer questions based on retrieved chunks of automatically generated transcripts.
Below are retrieved chunks of transcript with metadata including the file name. Each chunk includes a <media_name> and lines of a transcript, each line beginning with a timestamp.
$$ retrieved transcript chunks $$
Your answer should be in json format, including a list of partial answers, each of which has a citation. The citation should include the source file name and timestamp. Here is the user’s question:
$$ user question $$
The frontend then parses the LLM response into a fixed schema data model, described with Pydantic BaseModels:
from pydantic import BaseModel
class Citation(BaseModel):
"""A single citation from a transcript"""
media_name: str
timestamp: int
class PartialQAnswer(BaseModel):
"""Part of a complete answer, to be concatenated with other partial answers"""
partial_answer: str
citations: List[Citation]
class FullQAnswer(BaseModel):
"""Full user query response including citations and one or more partial answers"""
answer: List[PartialQAnswer]
This format allows the frontend to parse the response and display buttons for each citation that cue up the relevant media segment for user review.
Deployment details
The solution is deployed in the form of one AWS Cloud Development Kit (AWS CDK) stack, which contains four nested stacks:
- A backend that handles transcribing uploaded media and tracking job statuses
- A Retrieval Augmented Generation (RAG) stack that handles setting up OpenSearch Serverless and Amazon Bedrock Knowledge Bases
- An API stack that stands up an Amazon Cognito authorized REST API and various Lambda functions to logically separate the frontend from the backend
- A frontend stack that consists of a containerized Streamlit application running as a load balanced service in an ECS cluster, with a CloudFront distribution connected to the load balancer
Prerequisites
The solution requires the following prerequisites:
- You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?
- You also need to request access to at least one Amazon Bedrock LLM (to generate answers to questions) and one embedding model (to find transcript chunks that are semantically similar to a user question). The following Amazon Bedrock models are the default, but can be changed using a configuration file at the application deployment time as described later in this post:
- Amazon Titan Embeddings V2 – Text
- Amazon’s Nova Pro
- You need a Python environment with AWS CDK dependencies installed. For instructions, see Working with the AWS CDK in Python.
- Docker is required to build the Streamlit frontend container at deployment time.
- The minimal IAM permissions needed to bootstrap and deploy the AWS CDK are described in the ReVIEW/infra/minimal-iam-policy.json file in the GitHub repository. Make sure the IAM user or role deploying the stacks has these permissions.
Clone the repository
Fork the repository, and clone it to the location of your choice. For example:
Edit the deployment config file
Optionally, edit the infra/config.yaml
file to provide a descriptive base name for your stack. This file is also where you can choose specific Amazon Bedrock embedding models for semantic retrieval and LLMs for response generation, and define chunking strategies for the knowledge base that will ingest transcriptions of uploaded media files. This file is also where you can reuse an existing Amazon Cognito user pool if you want to bootstrap your application with an existing user base.
Deploy the AWS CDK stacks
Deploy the AWS CDK stacks with the following code:
You only need to use the preceding command one time per AWS account. The deploy command will deploy the parent stack and four nested stacks. The process takes approximately 20 minutes to complete.
When the deployment is complete, a CloudFront distribution URL of the form xxx.cloudfront.net
will be printed on the console screen to access the application. This URL can also be found on the AWS CloudFormation console by locating the stack whose name matches the value in the config file, then choosing the Outputs tab and locating the value associated with the key ReVIEWFrontendURL
. That URL will lead you to a login screen like the following screenshot.
Create an Amazon Cognito user to access the app
To log in to the running web application, you have to create an Amazon Cognito user. Complete the following steps:
- On the Amazon Cognito console, navigate to the recently created user pool.
- In the Users section under User Management¸ choose Create user.
- Create a user name and password to log in to the ReVIEW application deployed in the account.
When the application deployment is destroyed (as described in the cleanup section), the Amazon Cognito pool remains to preserve the user base. The pool can be fully removed manually using the Amazon Cognito console.
Test the application
Test the application by uploading one or more audio or video files on the File Upload tab. The application supports media formats supported by Amazon Transcribe. If you are looking for a sample video, consider downloading a TED talk. After uploading, you will see the file appear on the Job Status tab. You can track processing progress through transcription, postprocessing, and knowledge base syncing steps on this tab. After at least one file is marked Complete, you can chat with it on the Chat With Your Media tab.
The Analyze Your Media tab allows you to create and apply custom LLM template prompts to individual uploaded files. For example, you can create a basic summary template, or an extract key information template, and apply it to your uploaded files here. This functionality was not described in detail in this post.
Clean up
The deployed application will incur ongoing costs even if it isn’t used, for example from OpenSearch Serverless indexing and search OCU minimums. To delete all resources created when deploying the application, run the following command:
Conclusion
The solution presented in this post demonstrates a powerful pattern for accelerating video and audio review workflows while maintaining human oversight. By combining the power of AI models in Amazon Bedrock with human expertise, you can create tools that not only boost productivity but also maintain the critical element of human judgment in important decision-making processes.
We encourage you to explore this fully open sourced solution, adapt it to your specific use cases, and provide feedback on your experiences.
For expert assistance, the AWS Generative AI Innovation Center, AWS Professional Services, and our AWS Partners are here to help.
About the Author
David Kaleko is a Senior Applied Scientist in the AWS Generative AI Innovation Center.
Boost team innovation, productivity, and knowledge sharing with Amazon Q Apps
As enterprises rapidly expand their applications, platforms, and infrastructure, it becomes increasingly challenging to keep up with technology trends, best practices, and programming standards. Enterprises typically provide their developers, engineers, and architects with a variety of knowledge resources such as user guides, technical wikis, code repositories, and specialized tools. However, over time these resources often become siloed within individual teams or organizational silos, making it difficult for employees to easily access relevant information across the broader organization. This lack of knowledge sharing can lead to duplicated efforts, reduced productivity, and missed opportunities to use institutional expertise.
Imagine you’re a developer tasked with troubleshooting a complex issue in your company’s cloud infrastructure. You scour through outdated user guides and scattered conversations, but can’t find the right answer. Minutes turn into hours, sometimes days, as you struggle to piece together the information you need, all while your project falls behind.
To address these challenges, the MuleSoft team integrated Amazon Q Apps, a capability within Amazon Q Business, a generative AI-powered assistant service, directly into their Cloud Central portal—an individualized portal that shows assets owned, costs and usage, and AWS Well-Architected recommendations to over 100 engineer teams. Amazon Q Apps is designed to use Amazon Q Business and its ability to draw upon an enterprise’s own internal data, documents, and systems to provide conversational assistance to users. By tapping into these rich information sources, you can enable your users to create Amazon Q Apps that can answer questions, summarize key points, generate custom content, and even securely complete certain tasks—all without the user having to navigate through disparate repositories or systems. Prior to Amazon Q Apps, MuleSoft was using a chatbot that used Slack, Amazon Lex V2, and Amazon Kendra. The chatbot solution didn’t meet the needs of the engineering and development teams, which prompted the exploration of Amazon Q Apps.
In this post, we demonstrate how Amazon Q Apps can help maximize the value of existing knowledge resources and improve productivity among various teams, ranging from finance to DevOps to support engineers. We share specific examples of how the generative AI assistant can enable surface relevant information, distill complex topics, generate custom content, and execute workflows—all while maintaining robust security and data governance controls.
In addition to demonstrating the power of Amazon Q Apps, we provide guidance on prompt engineering and system prompts reflective of real-world use cases using the rich features of Amazon Q Apps. For instance, let’s consider the scenario of troubleshooting network connectivity. By considering personas and their specific lines of business, we can derive the optimal tone and language to provide a targeted, actionable response. This level of personalization is key to delivering optimized customer experiences and building trust.
Improve production with Amazon Q Apps
Amazon Q Apps is a feature within Amazon Q Business that assists you in creating lightweight, purpose-built applications within Amazon Q Business. You can create these apps in several ways like creating applications with your own words to fit specific requirements, or by transforming your conversations with an Amazon Q Business assistant into prompts that then can be used to generate an application.
With Amazon Q Apps, you can build, share, and customize applications on enterprise data to streamline tasks and boost individual and team productivity. You can also publish applications to an admin-managed library and share them with their coworkers. Amazon Q Apps inherits user permissions, access controls, and enterprise guardrails from Amazon Q Business for secure sharing and adherence to data governance policies.
Amazon Q Apps is only available to users with a Pro subscription. If you have the Lite subscription, you will not be able to view or use Amazon Q Apps.
MuleSoft’s use case with Amazon Q Apps
The team needed a more personalized approach to Amazon Q Business. Upon the announcement of Amazon Q Apps, the team determined it could solve an immediate need across teams. Their Cloud Central portal is already geared for a personalized experience for its users. MuleSoft completed a successful proof of concept integrating Amazon Q Apps into their overall Cloud Central portal. Cloud Central (see the following screenshot) serves as a single pane of glass for both managers and team members to visualize and understand each persona’s personalized cloud assets, cost metrics, and Well-Architected status based on application or infrastructure.

Fig 1: Salesforce MuleSoft Cloud Central Portal
The MuleSoft support team was looking for a way to help them troubleshoot network traffic latency when they rolled out a new customer into their production environment. The MuleSoft team found Amazon Q Apps helpful in providing possible causes for network latency for virtual private clouds (VPCs) as well as in providing prescriptive guidance on how to troubleshoot VPC network latencies. We explore a similar network latency use case in this post.
Solution overview
In this post, we focus on creating Amazon Q applications from the Amazon Q Business Chat and Amazon Q Apps Creator:
- Amazon Q Business Chat – You can use the Amazon Q Apps icon in the Amazon Q Business Chat assistant to generate a prompt that can be used to create an application. This feature summarizes the Amazon Q Business Chat conversation to create a prompt that you can review and edit before generating an application.
- Amazon Q Apps Creator – With Amazon Q Apps Creator, you can describe the type of application you want to build using your own words to generate an application. Amazon Q Apps will generate an application for you based on the provided prompt.
Pre-requisites
Make sure you have an AWS account. If not, you can sign up one. Refer to Pre-requisites for Amazon Q Apps for the steps to complete prior to deploying Amazon Q Apps. For more information, see Getting started with Amazon Q Business.
Create an application using Amazon Q Business Chat
You can choose the Amazon Q Apps icon from an Amazon Q chat conversation to generate an application prompt and using it to create an Amazon Q application. The icon is available in the conversations pane on the left, above the Amazon Q Assistant Chat conversation in the upper-right corner, or on the prompt dropdown menu.
Let’s explore an example of using an Amazon Q chat assistant conversation to create an application.
- Begin by asking the Amazon Q Business assistant a question related to the data that is provided in the Amazon Q Business application.
For this example, we ask about steps to troubleshoot network latency.
- After you’ve finished your conversation, choose the Amazon Q Apps icon in either the conversation pane or in the upper-right corner to launch Amazon Q App Creator.
- Review the generated prompt from the conversation and update the prompt to match your application purpose as needed.
- Choose Generate to create the application.
- To test the application, we enter under User input “I am unable to reach my EC2 host via port 22,” and choose Run.
- Review the generated text output and confirm that the troubleshooting steps look correct.
- Share the app with all in the library, choose Publish.
The Amazon Q Apps library will show all published applications shared by your teammates. Only users who have access to the Amazon Q Business application will be able to view your published application.
- You can choose labels where the application will reside, relating to teams, personas, or categories.
Create an application using Amazon Q Apps Creator
You can start building an Amazon Q application with Amazon Q Apps Creator by describing the task you want to create an application for. Complete the following steps:
- Choose Apps in the navigation pane.
- Enter your prompt or use an example prompt.
For this post, we enter the prompt “Create an app that crafts insightful content for users to troubleshoot AWS services. It takes inputs like as a use case to work backwards from on a solution. Based on these inputs, the app generates a tailored response for resolving the AWS service use case, providing steps to remediate and content links.”
- Choose Generate to create the application.
The Amazon Q application was created with AWS Use Case, Troubleshooting Steps, and Additional Resources sections translated from your prompt.
- To test the application, we enter under User input “which AWS tool to manage many AWS accounts and take advantage of consolidated billing,” and choose Run.
The Troubleshooting Steps section highlights using AWS Organizations and provides a walkthrough. The Additional Resources section provides more information about your use case, while citing AWS customer references.
- Choose Share to publish your application and choose the appropriate labels.
Results
MuleSoft offers a prime example of the transformative impact of Amazon Q Apps. With this solution, MuleSoft was able to realize a 50% reduction in team inquiries—from 100 down to just 50. These inquiries spanned a wide range, from basic AWS service information to complex networking troubleshooting and even Amazon Elastic Block Store (Amazon EBS) volume migrations from gp2 to gp3.
Pricing
Amazon Q Business offers subscription options for you to customize your access. For more details, see Amazon Q Business pricing.
Conclusion
Amazon Q Business empowers enterprises to maximize the value of their knowledge resources by democratizing access to powerful conversational AI capabilities. Through Amazon Q Apps, organizations can create purpose-built applications using internal data and systems, unlocking new solutions and accelerating innovation.
The MuleSoft team demonstrated this by integrating Amazon Q Apps into their Cloud Central portal, enhancing user experience, streamlining collaboration, and optimizing cloud infrastructure while maintaining robust security and data governance.
Amazon Q Apps provides flexible generative AI application development using natural language, allowing organizations to build and securely publish custom applications tailored to their unique needs. This approach enables teams to boost innovation, productivity, and knowledge sharing across job functions.
By leveraging Amazon Q Business, enterprises can find answers, build applications, and drive productivity using their own enterprise data and conversational AI capabilities.
To learn about other Amazon Q Business customers’ success stories, see Amazon Q Developer customers.
*Note Amazon Q Apps is only available to users with the Pro subscription, if you have the Lite subscription you will not be able to view or use Amazon Q Apps.
About the Authors
Rueben Jimenez is an AWS Sr Solutions Architect. Designing and implementing complex Data Analytics, Machine learning, Generative AI, and cloud infrastructure solutions.
Tiffany Myers is an AWS Product Manager for Amazon Q Apps. Launching generative AI solutions for business users.
Summer Petersil is a Strategic Account Representative (SAR) on the AWS Salesforce team, where she leads Generative AI (GenAI) enablement efforts.
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted…Apple Machine Learning Research