February 2025 – Vedere AI

Streamline work insights with the Amazon Q Business connector for Smartsheet

Amazon Q Business is a fully managed, generative AI–powered assistant that empowers enterprises to unlock the full potential of their data and organizational knowledge. With Amazon Q Business, you can quickly access answers to questions, generate summaries and content, and complete tasks by using the expertise and information stored across various data sources and enterprise systems within your organization. At the heart of this innovative solution are data source connectors, which seamlessly integrate and index content from multiple data sources and enterprise systems such as SharePoint, Confluence, and Smartsheet.

This post explains how to integrate Smartsheet with Amazon Q Business to use natural language and generative AI capabilities for enhanced insights. Smartsheet, the AI-enhanced enterprise-grade work management platform, helps users manage projects, programs, and processes at scale. By connecting Amazon Q Business with Smartsheet, business users, customer solutions managers, product managers and others can gain deeper understanding into their work by asking natural language questions.

The following are examples of questions you can ask Amazon Q Business to gain actionable insights:

Project status updates – Get quick insights into project health
- What’s the status of the website redesign project?
- Is the mobile app launch on track for the planned date?
- Which projects are currently behind schedule in the Q3 roadmap?
Task management – Find information about tasks and action items
- What tasks are assigned to John Doe?
- Has the marketing plan been completed?
- What’s the due date for the customer research presentation?
Resource allocation – Understand resource distribution and workload
- How many resources are allocated to the product launch project?
- Which projects require additional staffing based on current task loads?
Budget tracking – Monitor project and departmental budgets in real time
- What is the current budget status for the marketing campaign?
- How much budget is remaining for the customer service training initiative?

Overview of Smartsheet

Smartsheet combines the simplicity of a spreadsheet with powerful features for collaboration, workflow automation, content management, and reporting. Smartsheet powers mission-critical work securely and reliably at scale for thousands of organizations worldwide, including over 85% of Fortune 500 companies. Customers rely on Smartsheet to open thousands of new restaurant locations, distribute vaccines, build rockets, and more.

In this example, we’re using Smartsheet to track tasks for a software development project. This sheet includes columns for Task, Owner, Team, Stage, Start Date, End Date, and more.

Overview of the Smartsheet connector for Amazon Q Business

By integrating Smartsheet as a data source in Amazon Q Business, you can seamlessly extract insights. For example, service operations managers can use the new connector to deliver complex projects more efficiently and consistently. By asking the Amazon Q Business intelligent assistant specific questions, the team can access insights from multiple data sources, including sheets, conversations, and attachments in Smartsheet that have been connected. The generative AI–powered assistant performs deep searches within the data while respecting access and permission levels, saving valuable time and enhancing project oversight. This streamlined process improves client retention, increases accuracy, and elevates overall service quality.

You can integrate Smartsheet to Amazon Q Business through the AWS Management Console, AWS Command Line Interface (AWS CLI), or the CreateDataSource API.

The Amazon Q Business Smartsheet connector understands user access permissions and strictly enforces them at the time of the query. This makes sure that users can’t access content they don’t have permissions for. For managing security, refer to Identity and access management for Amazon Q Business.

Prerequisites

Before you begin, make sure that you have completed the following prerequisites.

In Smartsheet
- Have access to the Smartsheet Event Reporting API. Use the Events API Access Request form to request access for your organization, illustrated by the following screenshot.
- A Smartsheet access token. You need this to connect Smartsheet to Amazon Q Business. For generating a token, refer to Authentication and Access Tokens. Keep note of the generated access token because you’ll need to store it in AWS Secrets Manager as part of configuring your Amazon Q application.
- In your AWS account
  - Create an Amazon Q Business application.
  - Create an Amazon Q Business retriever and add an index.
  - Create an AWS Identity and Access Management (IAM) role for your data source and, if using the Amazon Q API, note the Amazon Resource Name (ARN) of the IAM role.

For detailed guidance on completing these steps, refer to Prerequisites for connecting Amazon Q Business to Smartsheet.

Configure and prepare the Amazon Q Business Smartsheet connector

Follow the steps below to create the retriever and data source:

Under Enhancements in the navigation pane, select Data sources. Then choose Add an index, as shown in the following screenshot.
Under Index provisioning, select Enterprise and then choose Add an index, as shown in the following screenshot. The Enterprise option is ideal for workloads requiring maximum update performance.
Under Enhancements in the left navigation pane, select Data sources.
On the Data sources page, choose Add data source, as shown in the following screenshot.
On the Add data source page, in the Data sources section, add the Smartsheet data source to your Amazon Q Business application and follow the steps at Connecting Amazon Q Business to Smartsheet using the console.
On the Smartsheet data source page, select and enter the following information:
- Data source name
- AWS Secrets Manager secret
- IAM role and Role name
- Sync scope
- Frequency

Creating the data source should only take a few minutes. After it’s set up, you’ll notice a green success notification on the console and the data source will be displayed in the Data source details section, as shown in the following screenshot.

Next, you need to sync the Smartsheet data source. In the Sync history section, choose Sync now to initiate the process of crawling and ingesting data from your source into Amazon Q Business. After the sync job is complete, your data source will be fully ready for use, as shown in the following screenshots.

Amazon Q Business and Smartsheet connector in action

After creating your Amazon Q Business application, and successfully syncing the Smartsheet data source, you can test the integration. Ask questions related to a project and observe how the app responds in real time. Follow these steps:

In the left navigation pane, select AmazonQ-smartsheet-connector application and choose Deployed URL, as shown in the following screenshot.

In our example, we asked the following questions regarding our project captured in Smartsheet, and Amazon Q Business generated responses for the project owners regarding status and provided additional information for each task.

Question 1 – What is the project status for creating the UI?

Amazon Q Business response – As shown in the following screenshot, Amazon Q Business generated a response identifying the status as in progress, the name of the team member performing the work, and the scheduled completion date.

Question 2 – List all the projects with their deadlines

Amazon Q Business response – As shown in the following screenshot, Amazon Q Business generated an answer and listed the projects with their deadlines, including citation links from the Smartsheet.

Question 3 – What project is Chloe Evans handling? Can you provide more information about it?

Amazon Q Business response – As shown in the following screenshot, Amazon Q Business generated an answer summarizing the tasks that Chloe Evans is handling.

Troubleshooting

If you encounter issues while asking questions in Amazon Q Business, the problem might be due to missing permissions required to access certain information. Amazon Q Business strictly enforces the document permissions set in its data source. Follow these steps to troubleshoot:

Check indexing status – Confirm whether the Smartsheet connector has been successfully indexed in the Amazon Q Business application. This makes sure that the data source is properly integrated.
Verify user permissions – Make sure that the Smartsheet user account has the necessary permissions to access and read the information from the sheet. Proper permissions are critical for enabling Amazon Q Business to retrieve and process the required data.

Additionally, as an administrator managing the Amazon Q Business application, you can troubleshoot these issues by following these steps and using the document-level sync reports, which enhance visibility into data source sync operations. These reports provide comprehensive and detailed insights integrated into the sync history, including granular indexing status, metadata, and access control list (ACL) details for every document processed during a data source sync job.

The detailed document reports are stored in the new SYNC_RUN_HISTORY_REPORT log stream under the Amazon Q Business application log group, making sure that critical sync job details are available on demand when troubleshooting. For more information, refer to document-level sync reports.

As shown in the following screenshot, we used Amazon CloudWatch Logs Insights to query the SYNC_RUN_HISTORY_REPORT log stream, which meant we could review the sync status in detail.

Clean up

Complete the following steps to clean up your resources:

Select the app you created earlier, then choose Delete.
On the Amazon Q Business console, choose Applications in the navigation pane.
Select the application you created, and on the Actions menu, choose Delete.

Conclusion

In this post, we explored how Amazon Q Business can seamlessly integrate with Smartsheet to help enterprises unlock the full potential of their data and knowledge. With the Smartsheet connector, organizations can empower their teams to find answers quickly, accelerate project tracking, streamline task management, automate workflows, and enhance collaboration.

Now that you’ve learned how to integrate Amazon Q Business with your Smartsheet content, it’s time to tap into the full potential of your organization’s data. To get started, sign up for an Amazon Q Business account and follow the steps in this post to set up the Smartsheet connector. Then, you can start asking Amazon Q Business natural language questions and watch it surface insights in seconds.

About the Authors

Brandon Seiter is a Senior Director, Corporate Development at Smartsheet. He has 15+ years of experience helping companies develop their inorganic growth strategies and overseeing their Corporate Development activities, including mergers & acquisitions, strategic partnerships and new business incubation. At Smartsheet he plays a pivotal role in nurturing relationships with Smartsheet’s technology partners and developing joint partner initiatives.

Aidin Khosrowshahi is an Amazon Q Specialist Solutions Architect at AWS, where he brings his passion for generative AI and serverless applications to life. As an active member of the AI/ML and serverless community, he specializes in Amazon Q Business and Developer solutions while serving as a generative AI expert. He helps customers implement best practices while fostering collaboration and driving innovation across the AI/ML ecosystem.

Chinmayee Rane is a Generative AI Specialist Solutions Architect at AWS, with a core focus on generative AI. She helps Independent Software Vendors (ISVs) accelerate the adoption of generative AI by designing scalable and impactful solutions. With a strong background in applied mathematics and machine learning, she specializes in intelligent document processing and AI-driven innovation. Outside of work, she enjoys salsa and bachata dancing.

Lokesh Chauhan is a Sr. Technical Account Manager at AWS, where he partners with Enterprise Customers to optimize their AWS journey and drive cloud success. He is a member of the AI/ML community and serves as a generative AI expert. As a 12x AWS-certified TAM, he brings deep technical expertise across the AWS platform. Prior to joining AWS, he held leadership positions including Project Lead and Sr. Database admin, building extensive experience in database and operations across multiple organizations.

Level up your problem-solving and strategic thinking skills with Amazon Bedrock

Organizations across many industries are harnessing the power of foundation models (FMs) and large language models (LLMs) to build generative AI applications to deliver new customer experiences, boost employee productivity, and drive innovation.

Amazon Bedrock, a fully managed service that offers a choice of high-performing FMs from leading AI companies, provides the easiest way to build and scale generative AI applications with FMs.

Some of the most widely used and successful generative AI use cases on Amazon Bedrock include summarizing documents, answering questions, translating languages, and understanding and generating brand new multimodal content.

Business challenge

Problem-solving, logical reasoning, and critical thinking are critical competencies for achieving business success, accelerating decision-making, and fostering innovation. Although strategy consultants have honed these skills, many knowledge workers lack them due to inadequate training and limited access to appropriate tools. Developing these skills not only enhances individual productivity but also drives significant benefits for the organization.

Business use cases

In this post, we want to demonstrate some additional generative AI use cases on Amazon Bedrock. We show how Anthropic’s Claude 3.5 Sonnet in Amazon Bedrock can be used for a variety of business-related cognitive tasks, such as problem-solving, critical thinking and ideation—to help augment human thinking and improve decision-making among knowledge workers to accelerate innovation. For this, we are using several frameworks and tools widely used by the management consulting community, such as mutually exclusively collectively exhaustive (MECE); strengths, weakness, opportunities, threats (SWOT) analysis, issue tree, value chain analysis and value driver tree analysis.

Solution overview

To demonstrate these five use cases, we used the Amazon Bedrock playground with Anthropic’s Claude Sonnet 3.5 LLM. Where necessary, in addition to text prompts, we also used Anthropic’s Claude Sonnet 3.5’s image-to-text capability to improve the accuracy of the responses generated.

Explanation of the five use cases—together with the prompts and images used to feed the LLM and the responses generated—are shown in the following sections. To improve the explicability of text responses generated by the LLM, we’ve provided additional diagrams, where necessary, to complement each LLM response (for example, the tree diagram corresponding to the LLM generated response).

The following sections explain the solution flow for each use case.

MECE

MECE is a widely used framework for business problem-solving. MECE helps break down a problem into well-defined, nonoverlapping parts (mutually exclusive), but when all the parts are taken together, they’re comprehensive and exhaustive to address the entirety of the problem (collectively exhaustive).

In this example, we want to demonstrate the use of the MECE principle by asking the LLM to arrange food items (in this case, various fried rice options) listed in a restaurant menu, adhering to the MECE guidelines.

We provided the following prompt to the LLM:

Here are some fried rice options: chicken fried rice, vegetable fried rice, lamb fried rice, beef cried rice, egg fried rice, prawn fried rice, squid fried rice

Can you please arrange them using MECE (Mutually Exclusive Collectively Exhaustive) principle?

We get the following response from the LLM:

We arranged the LLM response in tree format for visual clarity, as shown in the following diagram.

Suppose the restaurant wants to add three more varieties to the menu: egg and prawn fried rice, lamb and prawn fried rice, and vegan fried rice. We sought the help of the LLM with the following prompt to rearrange the menu while preserving the MECE principle:

I want to add 3 more varieties to the menu: egg and prawn fried rice, lamb and prawn fried rice, vegan fried rice. Can you please rearrange the list in MECE?

We get the following modified response from the LLM:

We rearranged the LLM response in tree format for visual clarity. As shown in the following diagram, the LLM has preserved the MECE principle, intelligently adding new categories as needed to accommodate the menu changes.

Issue tree

An issue tree, also known as a logic tree or problem-solving tree, is a strategic analytical tool used to deconstruct complex problems into their constituent elements. This hierarchical framework facilitates a systematic approach to problem-solving by:

Disaggregating the primary issue into discrete, manageable subcomponents
Organizing these elements in a structured, top-down format
Providing comprehensive coverage through the application of the MECE principle

The visual representation afforded by an issue tree enables stakeholders to:

Identify key drivers and root causes
Prioritize areas for further investigation or resource allocation
Maintain a holistic view of the problem while focusing on specific aspects

By employing this methodology, organizations can enhance their decision-making processes, streamline strategic planning, and improve the efficiency of their problem-solving endeavors.

To demonstrate the LLM’s ability to solve problems using an issue tree, we used a fictitious company—AnyCompany Tile Factory—whose profits are down by 30%. AnyCompany’s management wants to use an issue tree to identify the main issues and subordinate issues, and then use it to analyze reasons for declining profits. To give additional context to the LLM, we provided the following diagram with a skeleton issue tree structure.

To prompt the LLM, we attached the preceding diagram and used the following text:

Problem = profits at the AnyCompany Tile Factory is down 30%. Using the diagram as a guide, can you develop an issue tree identifying the main issues, sub issues and then help with the corresponding analysis against each sub-issue to find the reasons for profit decline?

We get the following response from the LLM:

And we populated the issue tree with the response from the LLM for additional visual clarity, as shown in the following diagram.

As shown in the diagram, the LLM has intelligently identified the two main top-level issues contributing to profit decline at AnyCompany (revenue decline and cost increases) and under each category identified the secondary issues, together with a high-level analysis for the management to pursue.

Next, we asked the LLM to elaborate “facility overhead costs” using the prompt:

Please elaborate “facility overhead costs”

We get the following response from the LLM:

SWOT

A SWOT analysis is a strategic management tool that can be used to evaluate the strengths, weaknesses, opportunities, and threats of an organization, industry, or project. SWOT helps in decision-making and strategy formulation by identifying internal factors (strengths and weaknesses) and external factors (opportunities and threats) that can impact success. Management can then use the analysis to develop way forward strategies, using strengths, addressing weaknesses, capitalizing on opportunities, and mitigating threats, as identified in the SWOT.

In this example, we ask the LLM to develop a way forward strategy for the Australian higher education sector using the SWOT analysis diagram provided. We ask it to identify four key strategic themes for the sector, making sure the approach uses inherent strengths, addresses weaknesses, capitalizes on opportunities, and mitigates threats, as identified in the SWOT diagram and illustrated in the following graphic. We also ask the LLM to list critical activities to be pursued by the sector under each strategic theme.

To prompt the LLM, we attached the preceding diagram and used the following text:

Using the SWOT analysis for the Australian higher education sector, we want your expertise to help develop the way forward strategy. Please identify 4 key strategic themes for the sector, ensuring your approach leverages strengths, addresses weaknesses, capitalizes on opportunities, mitigates threats as identified in the SWOT diagram. Under each strategic theme, list critical activities to be pursued.

We get the following response from the LLM, which includes four strategic themes and activities to be pursued:

We constructed the following diagram based on the LLM response for visual clarity.

Value chain analysis

Value chain analysis is a strategic management tool that helps organizations evaluate each value-creating activity in their value chain, such as inbound logistics or operations, to identify opportunities to build completive advantage, reduce costs, and increase efficiencies.

In this example, we want the LLM to perform a value chain analysis for the AnyCompany Tile Factory and make recommendations to improve profitability. As additional context to the LLM, we provided the following end-to-end value chain diagram for AnyCompany.

To prompt the LLM, we used the following text:

Profits at the AnyCompany Tile Factory are down 30%. The diagram shows their end-to-end value chain. Please perform a value chain analysis and make recommendations to improve profitability at AnyCompany.

We get the following response from the LLM, with recommendations for improving profitability across the five main areas:

We updated the value chain diagram with the recommendations supplied by the LLM under each category, as shown in the following diagram.

Value driver tree

A value driver tree is a framework that maps out key factors influencing an organization’s value or specific metrics such as revenue, profit, or customer satisfaction. This framework breaks down high-level business objectives and drivers into smaller, measurable components. By doing so, it reveals the cause-and-effect relationships between these elements, providing insights into how various factors contribute to overall business performance. Value driver trees are used for business performance improvement, strategic planning, and decision-making.

In this example, we want the LLM to define a value driver tree for the AnyCompany Tile Factory so the management team can analyze revenue, cost, and efficiency drivers contributing to low profitability and take action to remediate issues.

To prompt the LLM we used the following:

Profits at the AnyCompany Tile Factory are down 30%. Please help develop a value driver tree for the AnyCompany’s management to analyze the problem and take remedial action. Consider revenue, cost and efficiency drivers

We get the following response from the LLM, with a breakdown of major components—revenue, costs, and efficiency— affecting profitability at AnyCompany. It has also provided a five-step action plan for the management to consider.

We constructed the following value driver diagram for AnyCompany Tile Factory in tree format, based on the responses provided by the LLM.

Conclusion

Problem-solving, critical thinking, and logical reasoning are cognitive processes that use the brain to find a solution to a problem or reach an end goal, especially when the answer isn’t immediately obvious. As we’ve shown in the examples in this post, LLMs such as Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock can be used to improve your cognitive skills, especially in the areas of problem-solving, creative thinking, and ideation. This in turn will help improve team collaboration, cut decision times, and drive innovation. The examples we used are basic to showcase the art of the possible. To improve LLM responses in complex problem-solving use cases, we recommend using RAG sources that are relevant to the problem, chain-of-thought prompting, and giving additional problem-specific context through prompt engineering.

We encourage you to begin exploring these capabilities through the Amazon Bedrock chat playground, a tool in the AWS Management Console that provides a visual interface to experiment with running inference on different LLMs and using different configurations.

About the Authors

Senaka Ariyasinghe is a Senior Partner Solutions Architect working with Global Systems Integrators at Amazon Web Services (AWS). In his role, Senaka guides AWS Partners in the APJ region to design and scale well-architected solutions, focusing on generative AI, machine learning, cloud migrations, and application modernization initiatives.

Deependra Shekhawat is a Senior Energy and Utilities Industry Specialist Solutions Architect based in Sydney, Australia. In his role, Deependra helps energy companies across the APJ region use cloud technologies to drive sustainability and operational efficiency. He specializes in creating robust data foundations and advanced workflows that enable organizations to harness the power of big data, analytics, and machine learning for solving critical industry challenges.

Optimizing AI implementation costs with Automat-it

This post was written by Claudiu Bota, Oleg Yurchenko, and Vladyslav Melnyk of AWS Partner Automat-it.

As organizations adopt AI and machine learning (ML), they’re using these technologies to improve processes and enhance products. AI use cases include video analytics, market predictions, fraud detection, and natural language processing, all relying on models that analyze data efficiently. Although models achieve impressive accuracy with little latency, they often demand computational resources with significant computing power, including GPUs, to run inferences. Therefore, maintaining the right balance between performance and cost is essential, especially when deploying models at scale.

One of our customers encountered this exact challenge. To address this issue, they engaged Automat-it, an AWS Premier Tier Partner, to design and implement their platform on AWS, specifically using Elastic Kubernetes Service (Amazon EKS). Automat-it specializes in helping startups and scaleups grow through hands-on cloud DevOps, MLOps and FinOps services. The collaboration aimed to achieve scalability and performance while optimizing costs. Their platform requires highly accurate models with low latency, and the costs for such demanding tasks escalate quickly without proper optimization.

In this post, we explain how Automat-it helped this customer achieve a more than twelvefold cost savings while keeping AI model performance within the required performance thresholds. This was accomplished through careful tuning of architecture, algorithm selection, and infrastructure management.

Customer challenge

Our customer specializes in developing AI models for video intelligence solutions using YOLOv8 and the Ultralytics library. An end-to-end YoloV8 deployment consists of three stages:

Preprocessing – Prepares raw video frames through resizing, normalization, and format conversion
Inference – In which the YOLOv8 model generates predictions by detecting and classifying objects in the curated video frames
Postprocessing – In which predictions are refined using techniques such as non-maximum suppression (NMS), filtering, and output formatting.

They provide their clients with models that analyze live video streams and extract valuable insights from the captured frames, each customized to a specific use case. Initially, the solution required each model to run on a dedicated GPU at runtime, needing GPU instances per customer. This setup led to underutilized GPU resources and elevated operational costs.

Therefore, our primary objective was to optimize GPU utilization while lowering overall platform costs and keeping data processing time as minimal as possible. Specifically, we aimed to limit AWS infrastructure costs to $30 per camera per month while keeping the total processing time (preprocessing, inference, and postprocessing) under 500 milliseconds. Achieving these savings without lowering the model performance—particularly by maintaining low inference latency—remains essential to providing the desired level of service for each customer.

Initial approach

Our initial approach followed a client-server architecture, splitting the YOLOv8 end-to-end deployment into two components. The client component, running on CPU instances, handled the preprocessing and postprocessing stages. Meanwhile, the server component, running on GPU instances, was dedicated to inference and responded to requests from the client. This functionality was implemented using a custom gRPC wrapper, providing efficient communication between the components.

The goal of this approach was to reduce costs by using GPUs exclusively for the inference stage rather than for the entire end-to-end deployment. Additionally, we assumed that client-server communication latency would have a minimal impact on the overall inference time. To assess the effectiveness of this architecture, we conducted performance tests using the following baseline parameters:

Inference was performed on g4dn.xlarge GPU-based instances because the customer’s models were optimized to run on T4 GPUs NVIDIA
Customer’s models used the YOLOv8n model with Ultralytics version 8.2.71

The results were evaluated based on the following key performance indicators (KPIs):

Preprocessing time – The amount of time required to prepare the input data for the model
Inference time – The duration taken by the YoloV8 model to process the input and produce results
Postprocessing time – The time needed to finalize and format the model’s output for use
Network communication time – The duration of communication between the client component running on CPU instances and the server component running on GPU instances
Total time – The overall duration from when an image is sent to the YoloV8 model until results are received, including all processing stages

The findings were as follows:

	Preprocess (ms)	Inference (ms)	Postprocess (ms)	Network communication (ms)	Total (ms)
Custom gRPC	2.7	7.9	1.1	10.26	21.96

The GPU-based instance completed inference in 7.9 ms. However, the network communication overhead of 10.26 ms increased the total processing time. Although the total processing time was acceptable, each model required a dedicated GPU-based instance to run, resulting in unacceptable costs for the customer. Specifically, the inference cost per camera was $353.03 monthly, exceeding the customer’s budget.

Finding a better solution

Although the performance results were promising, even with the added latency from network communication, costs per camera were still too high, so our solution needed further optimization. Additionally, the custom gRPC wrapper lacked an automatic scaling mechanism to accommodate the addition of new models and required ongoing maintenance, adding to its operational complexity.

To address these challenges, we moved away from the client-server approach and implemented GPU time-slicing (fractionalization), which involves dividing GPU access into discrete time intervals. This approach allows AI models to share a single GPU, each utilizing a virtual GPU during its assigned slice. It’s similar to CPU time-slicing between processes, optimizing resource allocation without lowering the performance. This approach was inspired by several AWS blog posts that can be found in the references section.

We implemented GPU time-slicing in the EKS cluster by using the NVIDIA Kubernetes device plugin. This equipped us to use Kubernetes’s native scaling mechanisms, simplifying the scaling process to accommodate new models and reducing operational overhead. Moreover, by relying on the plugin, we avoided the need to maintain custom code, streamlining both implementation and long-term maintenance.

In this configuration, the GPU instance was set to split into 60 time-sliced virtual GPUs. We used the same KPIs as in the previous setup to measure efficiency and performance under these optimized conditions, making sure that cost reduction aligned with our service quality benchmarks.

We conducted the tests in three stages, as described in the following sections.

Stage 1

In this stage, we ran one pod on a g4dn.xlarge GPU-based instance. Each pod runs the three phases of the end-to-end YOLOv8 deployment on the GPU and processes video frames from a single camera. The findings are shown in the following graph and table.

	Preprocess (ms)	Inference (ms)	Postprocess (ms)	Total (ms)
1 pod	2	7.8	1	10.8

We successfully achieved an inference time of 7.8 ms and a total processing time of 10.8 ms, which aligned with the project’s requirements. The GPU memory usage for a single pod was 247MiB, and the GPU processor utilization was 12%. The memory usage per pod indicated we could run approximately 60 processes (or pods) on a 16GiB GPU.

Stage 2

In this stage, we ran 20 pods on a g4dn.2xlarge GPU-based instance. We changed the instance type from g4dn.xlarge to g4dn.2xlarge due to CPU overload associated with data processing and loading. The findings are shown in the following graph and table.

	Preprocess (ms)	Inference (ms)	Postprocess (ms)	Total (ms)
20 pods	11	42	55	108

At this stage, GPU memory usage reached 7,244 MiB, with GPU processor utilization peaking between 95% and 99%. A total of 20 pods utilized half of the GPU’s 16 GiB memory and fully consumed the GPU processor, leading to increased processing times. Although both inference and total processing times rose, this outcome was anticipated and deemed acceptable. The next objective was determining the maximum number of pods the GPU could support on its memory capacity.

Stage 3

At this stage, we aimed to run 60 pods on a g4dn.2xlarge GPU-based instance. Subsequently, we changed the instance type from g4dn.2xlarge to g4dn.4xlarge and then to g4dn.8xlarge.

The goal was to maximize GPU memory utilization. However, data processing and loading overloaded the instance’s CPU. This prompted us to switch to instances that still had one GPU but offered more CPUs.

The findings are shown in the following graph and table.

	Preprocess (ms)	Inference (ms)	Postprocess (ms)	Total (ms)
54 pods	21	56	128	205

The GPU memory usage was 14780MiB, and the GPU processor utilization was 99–100%. Despite these adjustments, we encountered GPU out-of-memory errors that prevented us from scheduling all 60 pods. Ultimately, we could accommodate 54 pods, representing the maximum number of AI models that could fit on a single GPU.

In this scenario, the inference costs per camera associated with GPU usage were $27.81 per month per camera, a twelvefold reduction compared to the initial approach. By adopting this approach, we successfully met the customer’s cost requirements per camera per month while maintaining acceptable performance levels.

Conclusion

In this post, we explored how Automat-it helped one of our customers achieve a twelvefold cost reduction while maintaining the performance of their YOLOV8-based AI models within acceptable ranges. The test results demonstrate that GPU time-slicing enables the maximum number of AI models to operate efficiently on a single GPU, significantly reducing costs while providing high performance. Furthermore, this method necessitates minimal maintenance and modifications to the model code, enhancing scalability and ease of use.

References

To learn more, refer to the following resources:

AWS

Community

Time-Slicing GPUs in Kubernetes

Disclaimer

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

About the authors

Claudiu Bota is a Senior Solutions Architect at Automat-it, helping customers across the entire EMEA region migrate to AWS and optimize their workloads. He specializes in containers, serverless technologies, and microservices, focusing on building scalable and efficient cloud solutions. Outside of work, Claudiu enjoys reading, traveling, and playing chess.

Oleg Yurchenko is the DevOps Director at Automat-it, where he spearheads the company’s expertise in DevOps best practices and solutions. His focus areas include containers, Kubernetes, serverless, Infrastructure as Code, and CI/CD. With over 20 years of hands-on experience in system administration, DevOps, and cloud technologies, Oleg is a passionate advocate for his customers, guiding them in building modern, scalable, and cost-effective cloud solutions.

Vladyslav Melnyk is a Senior MLOps Engineer at Automat-it. He is a Seasoned Deep Learning enthusiast with a passion for Artificial Intelligence, taking care of AI products through their lifecycle, from experimentation to production. With over 9 years of experience in AI within AWS environments, he is also a big fan of leveraging cool open-source tools. Result-oriented and ambitious, with a strong focus on MLOps, Vladyslav ensures smooth transitions and efficient model deployment. He is skilled in delivering deep learning models, always learning and adapting to stay ahead in the field.

Using SAT solving to optimize quantum circuit mapping

In experiments involving real quantum devices and algorithms, automated-reasoning-based method for mapping quantum computations onto quantum circuits is 26 times as fast as predecessors.Read More

The end of an era: the final AWS DeepRacer League Championship at re:Invent 2024

AWS DeepRacer League 2024 Championship finalists at re:Invent 2024

The AWS DeepRacer League is the world’s first global autonomous racing league powered by machine learning (ML). Over the past 6 years, a diverse community of over 560,000 builders from more than 150 countries worldwide have participated in the League to learn ML fundamentals hands-on through the fun of friendly autonomous racing. After an 8-month season of nail-biting virtual qualifiers, finalists convened in person at re:Invent in Las Vegas for one final showdown to compete for prizes and glory in the high-stakes, winner-take-all AWS DeepRacer League Championship. It wasn’t just the largest prize purse in DeepRacer history or the bragging rights of being the fastest builder in the world that was on the line—2024 marked the final year of the beloved League, and the winner of the 2024 Championship would also have the honor of being the very last AWS DeepRacer League Champion.

Tens of thousands raced in Virtual Circuit monthly qualifiers from March to October in an attempt to earn one of only four spots from six regions (24 spots in all), making qualifying for the 2024 Championship one of the most difficult to date. Finalists were tested on consistent performance over consecutive laps rather than “one lucky lap,” making the championship a true test of skill and dashing the championship dreams of some of the most prolific racers, community leaders, and former championship finalists during the regular season. When the dust settled, the best of the best from around the world secured their spots in the championship alongside returning 2023 champion FiatLux (Youssef Addioui) and 2023 Open Winner PolishThunder (Daryl Jezierski). For the rest of the hopefuls, there was only one more opportunity—the WildCard Last Chance qualifier on the Expo floor at re:Invent 2024—to gain entry into the final championship and etch their names in the annals of DeepRacer history.

In a tribute to the League’s origins, the Forever Raceway was chosen as the final championship track. This adaptation of the RL Speedway, which made its initial debut at the DeepRacer League Summit Circuit in 2022, presented a unique challenge: it’s nearly 30% narrower than tracks from the last five seasons, returning to a width only seen in the inaugural League season. The 76-centimeter-wide track leaves little room for error because it exaggerates the curves of the course and forces the vehicle to follow the track contours closely to stay within the borders.

The Wildcard Last Chance

As the Wildcard began on Monday night, the sidelines were packed with a mix of DeepRacer royalty and fresh-faced contenders, all vying for one of the coveted six golden tickets to round 1 of the championship. Twenty elite racers from all corners of the globe unleashed their best models onto the track, each looking for three clean laps (no off-tracks) in a mere 120 seconds to take their place alongside the Virtual Circuit champions.

The following image displays the 2024 qualifiers.

Newcomer and JP Morgan Chase (JPMC) DeepRacer Women’s League winner geervani (Geervani Kolukuluri) made her championship debut, dominating the competition early and showing the crowd exactly why she was the best of the league’s 829 participants and one of the top racers of the 4,400 that participated from JP Morgan Chase in 2024. She was nipping at the heels of community leader, DeepRacer for Cloud contributor, and Wildcard pack leader MattCamp, and less than a second behind SimonHill, another championship rookie and winner of the Eviden internal league.

But the old guard wasn’t about to go down without a fight. DeepRacer legend LeadingAI (Jacky Chen), an educator and mentor to Virtual Circuit winner AIDeepRacer (Nathan Liang), landed the fourth best time of 9.527, slightly ahead of Jerec (Daniel Morgan), who returned to the finals after taking third place in the 2023 Championship. Duckworth (Lars Ludvigson), AWS Community Builder and solutions developer, took the sixth and final Wildcard spot, less than three-tenths of a second above the rest of the pack.

The field of 32 begins

The first 16 of 32 code warriors hit the track on Tuesday, each one going for glory in two heart-pounding, 2-minute sprints to secure their spot in the bracket of eight over 2 days of competition. MattCamp proved he was no one-hit wonder by blazing across the finish line with Tuesday’s fastest time of 8.223 seconds, geervani still just milliseconds behind at 8.985. Asian Pacific regional stars TonyJ (Tony Jenness) and CodeMachine (Igor Maksaev Brockway) came out swinging, claiming their place in the top four just ahead of 2023 champion FiatLux of the EMEA region, who dug deep to stay in the game, finishing fifth with a time of 9.223. Liao (Wei-Chen Liao), Nevertari (Neveen Ahmed), and BSi (Bobby Stenly) also edged above the pack as the day of racing ended, their positions precarious as they hovered just inside the top eight for the day.

The tension was high and the stands full on Wednesday as the second day of Round 1 racing began, the second 16 ready to burn rubber and leave the Tuesday competitors in the dust. AIDeepRacer (Nathan Liang) quickly dethroned MattCamp with a blistering time of 8.102, looking to redeem himself from barely missing the bracket of eight in the 2022 Championship. 2023 Global points winner rosscomp1 (Ross Williams) slid into the top four right behind the indomitable SimonHill, while geervani, TonyJ, and CodeMachine managed to stay in the game as the Wednesday racers tried to best their times.

A mechanical failure in his second attempt gave community leader MarkRoss one last attempt to break into the top eight. His 8.635-second run looked like it might just be his ticket to Round 2, but a challenge from the sidelines had everyone holding their breath. The tension was palpable as the DeepRacer Pit Crew reviewed video footage to determine if Mark’s car had stayed within track limits, knowing that a spot in the semifinals—and a shot at the $25,000 grand prize—hung in the balance. Mark’s hopes were dashed when the replay revealed all four wheels left the track, and PolishThunder snuck into Round 2 by the skin of their teeth, claiming the coveted eighth-place spot.

The following images show Blaine Sundrud and Ryan Myrehn commentating on Round 2 (left) and the off-track footage (right).

And then there were eight

On Thursday morning, racers and spectators returned to the Forever Raceway for the bracket of eight, a series of head-to-head battles pitting the top racers against each other in a grueling double elimination format. Each matchup was another contest of speed and accuracy, another chance for the undefeatable to be toppled from their thrones and underdogs to scrap their way back into contention in the last chance bracket, as shown in the following image.

First seed AIDeepRacer dominated their first two matchups, sending rosscomp1 and PolishThunder to early defeat, definitively earning a place in the final three. Sixth seed TonyJ couldn’t get his average lap under 12 seconds against third seed SimonHill, while fellow regional heavyweight CodeMachine was unable to close the 1-second gap between him and second seed MattCamp, the strong favorite coming out of the Wildcard and round 1. That left MattCamp and SimonHill to one last battle for a place in the top three, with SimonHill finally gaining the upper hand in a 9.026- to 9.478-second shootout, sending MattCamp to the Last Chance bracket to try to fight his way back to the top.

With one final spot in the top three up for grabs, five-time championship finalist and eighth-seed PolishThunder, who was stunned into defeat in the very first matchup of the day, mounted an equally stunning comeback as he bested fifth seed geervani in match five and then knocked out fellow DeepRacer veteran, friend, and fourth-seed rosscomp1 with a remarkable 8.389-second time. MattCamp regained his mojo to beat TonyJ in match 10, setting him and PolishThunder up for a showdown to see who would go up against SimonHill and AIDeepRacer in the finale. Undaunted, PolishThunder toppled the Wildcard winner once and for all, taking him one step closer to being the DeepRacer Champion.

The final three

After almost 4 days of triumph and heartbreak, only three competitors remained for one final winner-take-all time trial that would determine, for the last time, the fastest builder in the world. Commentators Ryan Myrehn and Blaine Sundrud, who have been with the DeepRacer program since the very beginning, took to the mic together as the sidelines filled with finalists, spectators, and Pit Crew eager to see who would take home the championship trophy.

Prior to their final lap, each racer was able to choose a car and calibrate it to their liking, giving them an opportunity to optimize for the best performance with their chosen model. A coin toss decided the racing direction, counterclockwise, and each was given 1 minute to test the newly calibrated cars to make sure they were happy with their adjustments. With three cars tested, calibrated, and loaded with their best models, it was down to just 6 minutes of racing that would determine the 2024 Champion.

PolishThunder took to the track first, starting the finale strong as his model averaged a mere 8.436 seconds in laps five, six, and seven, setting a high bar for his competitors. SimonHill answered with a blazing 8.353 seconds, the fastest counterclockwise average of the competition, edging out the comeback king with less than a pixel’s width between them. Then, with just one race left, the crowd watched with bated breath as the competition’s top seed AIDeepRacer approached the starting line, hoping to replicate his enviable lap times from previous rounds. After a tense 2 minutes, first-time competitor SimonHill held onto first place, winning the $25,000 grand prize, the coveted DeepRacer trophy, and legendary status as the final DeepRacer champion. PolishThunder, who previously hadn’t broken the top six in his DeepRacer career, finished second, and AIDeepRacer third for his second, and best, championship appearance since his debut in 2022.

As the final chapter came to a close and trophies were handed out, racers and Pit Crew who have worked, competed, and built lifelong friendships around a 1/18th scale autonomous car celebrated together one more time on the very last championship track. Although the League may have concluded, the legacy of DeepRacer and its unique ability to teach ML and the foundations of generative AI through gamified learning continues. Support for DeepRacer in the AWS console will continue through the end of 2025, and DeepRacer will enter a new era as organizations around the world will also be able to launch their own leagues and competitions at a fraction of the cost with the AWS DeepRacer Solution. Featuring the same functionality as the console and deployable anywhere, the Solution will also contain new workshops and resources bridging fundamental concepts of ML using AWS DeepRacer with foundation model (FM) training and fine-tuning techniques, using AWS services such as Amazon SageMaker and Amazon Bedrock for popular industry use cases. Look for the Solution to kickstart your company’s ML transformation starting in Q2 of 2025.

The following images show championship finalists and pit crew at re:Invent 2024 (left) and 2024 League Champion Simon Hill with his first-place trophy (right).

Join the DeepRacer community at deepracing.io.

About the Author

Jackie Moffett is a Senior Project Manager in the AWS AI Builder Programs Product Marketing team. She believes hands-on is the best way to learn and is passionate about building better systems to create exceptional customer experiences. Outside of work she loves to travel, is addicted to learning new things, and definitely wants to say hi to your dog.

CUDA Accelerated: How CUDA Libraries Bolster Cybersecurity With AI

Editor’s note: This is the next topic in our new CUDA Accelerated news series, which showcases the latest software libraries, NVIDIA NIM microservices and tools that help developers, software makers and enterprises use GPUs to accelerate their applications.

Traditional cybersecurity measures are proving insufficient for addressing emerging cyber threats such as malware, ransomware, phishing and data access attacks. Moreover, future quantum computers pose a security risk to today’s data through ‘harvest now, decrypt later’ attack strategies.

Cybersecurity technology powered by NVIDIA accelerated computing and high-speed networking is transforming the way organizations protect their data, systems and operations. These advanced technologies not only enhance security but also drive operational efficiency, scalability and business growth.

Accelerated AI-Powered Cybersecurity

Modern cybersecurity relies heavily on AI for predictive analytics and automated threat mitigation. NVIDIA GPUs are essential for training and deploying AI models due to their exceptional computational power. They offer:

Faster AI model training: GPUs reduce the time required to train machine learning models for tasks like fraud detection or phishing prevention.
Real-time inference: AI models running on GPUs can analyze network traffic in real time to identify zero-day vulnerabilities or advanced persistent threats.
Automation at scale: Businesses can automate repetitive security tasks such as log analysis or vulnerability scanning, freeing up human resources for strategic initiatives.
For example, AI-driven intrusion detection systems powered by NVIDIA GPUs can analyze billions of events per second to detect anomalies that traditional systems might miss. Learn more about NVIDIA AI cybersecurity solutions.

Real-Time Threat Detection and Response

GPUs excel at parallel processing, making them ideal for handling the massive computational demands of real-time cybersecurity tasks such as intrusion detection, malware analysis and anomaly detection. By combining them with high-performance networking software frameworks like NVIDIA DOCA and NVIDIA Morpheus, businesses can:

Detect threats faster: GPUs process large datasets in real time, enabling immediate identification of suspicious activities.
Respond proactively: High-speed networking ensures rapid communication between systems, allowing for swift containment of threats.
Minimize downtime: Faster response times reduce the impact of cyberattacks on business operations.

This capability is particularly beneficial for industries like finance and healthcare, where even a few seconds of downtime can result in significant losses or risks to public safety. Read the NVIDIA AI Enterprise security white paper to learn more.

Scalability for Growing Infrastructure Cybersecurity Needs

As businesses grow and adopt more connected devices and cloud-based services, the volume of network traffic increases exponentially. Traditional CPU-based systems often struggle to keep up with these demands. GPUs and high-speed networking software provide massive scalability, capable of handling large-scale data processing effortlessly, either on premises or in the cloud.

For example, NVIDIA’s cybersecurity solutions can help future-proof cybersecurity technologies and improve cost efficiency via centralized control.

Enhanced Data Security Across Distributed Environments

With remote work becoming the norm, businesses must secure sensitive data across a growing number of distributed locations. Distributed computing systems enhance the overall resilience of cybersecurity infrastructure by providing redundancy and fault tolerance, reduced downtime and data protection for continuous operation and minimum interruption, even during cyber attacks.

NVIDIA’s high-speed data management and networking software paired with GPU-powered cybersecurity solutions offers consistent protection with automated updates, improved encryption and isolated threat zones. This is especially crucial for industries handling sensitive customer data, such as retail or e-commerce, where breaches can severely damage brand reputation. Learn more about NVIDIA’s GPU cloud computing technologies.

Improved Regulatory Compliance

Regulatory frameworks such as GDPR, HIPAA, PCI DSS and SOC 2 require businesses to implement stringent security measures. GPU-powered cybersecurity solutions and high-speed networking software make compliance easier by ensuring data integrity, providing audit trails and reducing risk exposure.

Accelerating Post-Quantum Cryptography

Sufficiently large quantum computers can crack the Rivest-Shamir-Adleman (RSA) encryption algorithm underpinning today’s data security solutions. Even though such devices have not yet been built, governing agencies around the world are recommending the use of post-quantum cryptography (PQC) algorithms to protect against attackers that might hoard sensitive data for decryption in the future.

PQC algorithms are based on mathematical operations more sophisticated than RSA, which are expected to be secure against attacks even by future quantum computers. The National Institute of Standards and Technology (NIST) has standardized a number of PQC algorithms and recommended that organizations should begin phasing out existing encryption methods by 2030 — and transition entirely to PQC by 2035.

Widespread adoption of PQC requires ready access to highly performant and flexible implementations of these complex algorithms. NVIDIA cuPQC accelerates the most popular PQC algorithms, granting enterprises high throughputs of sensitive data to remain secure now and in the future.

Essentiality of Investing in Modern Cybersecurity Infrastructure

The integration of GPU-powered cybersecurity technology with high-speed networking software represents a paradigm shift in how businesses approach digital protection. By adopting these advanced solutions, businesses can stay ahead of evolving cyber threats while unlocking new opportunities for growth in an increasingly digital economy. Whether for safeguarding sensitive customer data or ensuring uninterrupted operations across global networks, investing in modern cybersecurity infrastructure is no longer optional but essential.

NVIDIA provides over 400 libraries for a variety of use cases, including building cybersecurity infrastructure. New updates continue to be added to the CUDA platform roadmap.

GPUs can’t simply accelerate software written for general-purpose CPUs. Specialized algorithm software libraries, solvers and tools are needed to accelerate specific workloads, especially on computationally intensive distributed computing architectures. Strategically tighter integration between CPUs, GPUs and networking helps provide the right platform focus for future applications and business benefits.

Learn more about NVIDIA CUDA libraries and microservices for AI.

dMel: Speech Tokenization Made Simple

Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data. Inspired by this success, researchers have investigated complicated speech tokenization methods to discretize continuous speech signals so that language modeling techniques can be applied to speech data. However, existing approaches either model semantic (content) tokens, potentially losing acoustic information, or model acoustic tokens, risking the loss of semantic (content) information. Having multiple token types also complicates the architecture and requires…Apple Machine Learning Research

Novel View Synthesis with Pixel-Space Diffusion Models

Synthesizing a novel view from a single input image is a challenging task. Traditionally, this task was approached by estimating scene depth, warping, and inpainting, with machine learning models enabling parts of the pipeline. More recently, generative models are being increasingly employed in novel view synthesis (NVS), often encompassing the entire end-to-end system. In this work, we adapt a modern diffusion model architecture for end-to-end NVS in the pixel space, substantially outperforming previous state-of-the-art (SOTA) techniques. We explore different ways to encode geometric…Apple Machine Learning Research

Evaluate healthcare generative AI applications using LLM-as-a-judge on AWS

In our previous blog posts, we explored various techniques such as fine-tuning large language models (LLMs), prompt engineering, and Retrieval Augmented Generation (RAG) using Amazon Bedrock to generate impressions from the findings section in radiology reports using generative AI. Part 1 focused on model fine-tuning. Part 2 introduced RAG, which combines LLMs with external knowledge bases to reduce hallucinations and improve accuracy in medical applications. Through real-time retrieval of relevant medical information, RAG systems can provide more reliable and contextually appropriate responses, making them particularly valuable for healthcare applications where precision is crucial. In both previous posts, we used traditional metrics like ROUGE scores for performance evaluation. This metric is suitable for evaluating general summarization tasks, but can’t effectively assess whether a RAG system successfully integrates retrieved medical knowledge or maintains clinical accuracy.

In Part 3, we’re introducing an approach to evaluate healthcare RAG applications using LLM-as-a-judge with Amazon Bedrock. This innovative evaluation framework addresses the unique challenges of medical RAG systems, where both the accuracy of retrieved medical knowledge and the quality of generated medical content must align with stringent standards such as clear and concise communication, clinical accuracy, and grammatical accuracy. By using the latest models from Amazon and the newly released RAG evaluation feature for Amazon Bedrock Knowledge Bases, we can now comprehensively assess how well these systems retrieve and use medical information to generate accurate, contextually appropriate responses.

This advancement in evaluation methodology is particularly crucial as healthcare RAG applications become more prevalent in clinical settings. The LLM-as-a-judge approach provides a more nuanced evaluation framework that considers both the quality of information retrieval and the clinical accuracy of generated content, aligning with the rigorous standards required in healthcare.

In this post, we demonstrate how to implement this evaluation framework using Amazon Bedrock, compare the performance of different generator models, including Anthropic’s Claude and Amazon Nova on Amazon Bedrock, and showcase how to use the new RAG evaluation feature to optimize knowledge base parameters and assess retrieval quality. This approach not only establishes new benchmarks for medical RAG evaluation, but also provides practitioners with practical tools to build more reliable and accurate healthcare AI applications that can be trusted in clinical settings.

Overview of the solution

The solution uses Amazon Bedrock Knowledge Bases evaluation capabilities to assess and optimize RAG applications specifically for radiology findings and impressions. Let’s examine the key components of this architecture in the following figure, following the data flow from left to right.

The workflow consists of the following phases:

Data preparation – Our evaluation process begins with a prompt dataset containing paired radiology findings and impressions. This clinical data undergoes a transformation process where it’s converted into a structured JSONL format, which is essential for compatibility with the knowledge base evaluation system. After it’s prepared, this formatted data is securely uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, providing accessibility and data security throughout the evaluation process.
Evaluation processing – At the heart of our solution lies an Amazon Bedrock Knowledge Bases evaluation job. This component processes the prepared data while seamlessly integrating with Amazon Bedrock Knowledge Bases. This integration is crucial because it enables the system to create specialized medical RAG capabilities specifically tailored for radiology findings and impressions, making sure that the evaluation considers both medical context and accuracy.
Analysis – The final stage empowers healthcare data scientists with detailed analytical capabilities. Through an advanced automated report generation system, professionals can access detailed analysis of performance metrics of the summarization task for impression generation. This comprehensive reporting system enables thorough assessment of both retrieval quality and generation accuracy, providing valuable insights for system optimization and quality assurance.

This architecture provides a systematic and thorough approach to evaluating medical RAG applications, providing both accuracy and reliability in healthcare contexts where precision and dependability are paramount.

Dataset and background

The MIMIC Chest X-ray (MIMIC-CXR) database v2.0.0 is a large, publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. We used the MIMIC CXR dataset consisting of 91,544 reports, which can be accessed through a data use agreement. This requires user registration and the completion of a credentialing process.

During routine clinical care, clinicians trained in interpreting imaging studies (radiologists) will summarize their findings for a particular study in a free-text note. The reports were de-identified using a rule-based approach to remove protected health information. Because we used only the radiology report text data, we downloaded just one compressed report file (mimic-cxr-reports.zip) from the MIMIC-CXR website. For evaluation, 1,000 of the total 2,000 reports from a subset of MIMIC-CXR dataset were used. This is referred to as the dev1 dataset. Another set of 1,000 of the total 2,000 radiology reports (referred to as dev2) from the chest X-ray collection from the Indiana University hospital network were also used.

RAG with Amazon Bedrock Knowledge Bases

Amazon Bedrock Knowledge Bases helps take advantage of RAG, a popular technique that involves drawing information from a data store to augment the responses generated by LLMs. We used Amazon Bedrock Knowledge Bases to generate impressions from the findings section of the radiology reports by enriching the query with context that is received from querying the knowledge base. The knowledge base is set up to contain findings and corresponding impression sections of 91,544 MIMIC-CXR radiology reports as {prompt, completion} pairs.

LLM-as-a-judge and quality metrics

LLM-as-a-judge represents an innovative approach to evaluating AI-generated medical content by using LLMs as automated evaluators. This method is particularly valuable in healthcare applications where traditional metrics might fail to capture the nuanced requirements of medical accuracy and clinical relevance. By using specialized prompts and evaluation criteria, LLM-as-a-judge can assess multiple dimensions of generated medical content, providing a more comprehensive evaluation framework that aligns with healthcare professionals’ standards.

Our evaluation framework encompasses five critical metrics, each designed to assess specific aspects of the generated medical content:

Correctness – Evaluated on a 3-point Likert scale, this metric measures the factual accuracy of generated responses by comparing them against ground truth responses. In the medical context, this makes sure that the clinical interpretations and findings align with the source material and accepted medical knowledge.
Completeness – Using a 5-point Likert scale, this metric assesses whether the generated response comprehensively addresses the prompt holistically while considering the ground truth response. It makes sure that critical medical findings or interpretations are not omitted from the response.
Helpfulness – Measured on a 7-point Likert scale, this metric evaluates the practical utility of the response in clinical contexts, considering factors such as clarity, relevance, and actionability of the medical information provided.
Logical coherence – Assessed on a 5-point Likert scale, this metric examines the response for logical gaps, inconsistencies, or contradictions, making sure that medical reasoning flows naturally and maintains clinical validity throughout the response.
Faithfulness – Scored on a 5-point Likert scale, this metric specifically evaluates whether the response contains information not found in or quickly inferred from the prompt, helping identify potential hallucinations or fabricated medical information that could be dangerous in clinical settings.

These metrics are normalized in the final output and job report card, providing standardized scores that enable consistent comparison across different models and evaluation scenarios. This comprehensive evaluation framework not only helps maintain the reliability and accuracy of medical RAG systems, but also provides detailed insights for continuous improvement and optimization. For details about the metric and evaluation prompts, see Evaluator prompts used in a knowledge base evaluation job.

Prerequisites

Before proceeding with the evaluation setup, make sure you have the following:

An active AWS account with appropriate permissions
Amazon Bedrock model access enabled in your preferred AWS Region
An S3 bucket with CORS enabled for storing evaluation data
An Amazon Bedrock knowledge base
An AWS Identity and Access Management (IAM) role with necessary permissions for Amazon S3 and Amazon Bedrock

The solution code can be found at the following GitHub repo.

Make sure that your knowledge base is fully synced and ready before initiating an evaluation job.

Convert the test dataset into JSONL for RAG evaluation

In preparation for evaluating our RAG system’s performance on radiology reports, we implemented a data transformation pipeline to convert our test dataset into the required JSONL format. The following code shows the format of the original dev1 and dev2 datasets:

{
    "prompt": "value of prompt key",
    "completion": "value of completion key"
}
Output Format

{
    "conversationTurns": [{
        "referenceResponses": [{
            "content": [{
                "text": "value from completion key"
            }]
        }],
        "prompt": {
            "content": [{
                "text": "value from prompt key"
            }]
        }
    }]
}

Drawing from Wilcox’s seminal paper The Written Radiology Report, we carefully structured our prompt to include comprehensive guidelines for generating high-quality impressions:

import json
import random
import boto3

# Initialize the S3 client
s3 = boto3.client('s3')

# S3 bucket name
bucket_name = "<BUCKET_NAME>"

# Function to transform a single record
def transform_record(record):
    return {
        "conversationTurns": [
            {
                "referenceResponses": [
                    {
                        "content": [
                            {
                                "text": record["completion"]
                            }
                        ]
                    }
                ],
                "prompt": {
                    "content": [
                        {
                            "text": """You're given a radiology report findings to generate a concise radiology impression from it.

A Radiology Impression is the radiologist's final concise interpretation and conclusion of medical imaging findings, typically appearing at the end of a radiology report.
n Follow these guidelines when writing the impression:
n- Use clear, understandable language avoiding obscure terms.
n- Number each impression.
n- Order impressions by importance.
n- Keep impressions concise and shorter than the findings section.
n- Write for the intended reader's understanding.n
Findings: n""" + record["prompt"]
                        }
                    ]
                }
            }
        ]
    }

The script processes individual records, restructuring them to include conversation turns with both the original radiology findings and their corresponding impressions, making sure each report maintains the professional standards outlined in the literature. To maintain a manageable dataset size set used by this feature, we randomly sampled 1,000 records from the original dev1 and dev2 datasets, using a fixed random seed for reproducibility:

# Read from input file and write to output file
def convert_file(input_file_path, output_file_path, sample_size=1000):
    # First, read all records into a list
    records = []
    with open(input_file_path, 'r', encoding='utf-8') as input_file:
        for line in input_file:
            records.append(json.loads(line.strip()))
    
    # Randomly sample 1000 records
    random.seed(42)  # Set the seed first
    sampled_records = random.sample(records, sample_size)
    
    # Write the sampled and transformed records to the output file
    with open(output_file_path, 'w', encoding='utf-8') as output_file:
        for record in sampled_records:
            transformed_record = transform_record(record)
            output_file.write(json.dumps(transformed_record) + 'n')
            
# Usage
input_file_path = '<INPUT_FILE_NAME>.jsonl'  # Replace with your input file path
output_file_path = '<OUTPUT_FILE_NAME>.jsonl'  # Replace with your desired output file path
convert_file(input_file_path, output_file_path)

# File paths and S3 keys for the transformed files
transformed_files = [
    {'local_file': '<OUTPUT_FILE_NAME>.jsonl', 'key': '<FOLDER_NAME>/<OUTPUT_FILE_NAME>.jsonl'},
    {'local_file': '<OUTPUT_FILE_NAME>.jsonl', 'key': '<FOLDER_NAME>/<OUTPUT_FILE_NAME>.jsonl'}
]

# Upload files to S3
for file in transformed_files:
    s3.upload_file(file['local_file'], bucket_name, file['key'])
    print(f"Uploaded {file['local_file']} to s3://{bucket_name}/{file['key']}")

Set up a RAG evaluation job

Our RAG evaluation setup begins with establishing core configurations for the Amazon Bedrock evaluation job, including the selection of evaluation and generation models (Anthropic’s Claude 3 Haiku and Amazon Nova Micro, respectively). The implementation incorporates a hybrid search strategy with a retrieval depth of 10 results, providing comprehensive coverage of the knowledge base during evaluation. To maintain organization and traceability, each evaluation job is assigned a unique identifier with timestamp information, and input data and results are systematically managed through designated S3 paths. See the following code:

import boto3
from datetime import datetime

# Generate unique name for the job
job_name = f"rag-eval-{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}"

# Configure knowledge base and model settings
knowledge_base_id = "<KNOWLEDGE_BASE_ID>"
evaluator_model = "anthropic.claude-3-haiku-20240307-v1:0"
generator_model = "amazon.nova-micro-v1:0"
role_arn = "<IAM_ROLE_ARN>"

# Specify S3 locations
input_data = "<INPUT_S3_PATH>"
output_path = "<OUTPUT_S3_PATH>"

# Configure retrieval settings
num_results = 10
search_type = "HYBRID"

# Create Bedrock client
bedrock_client = boto3.client('bedrock')

With the core configurations in place, we initiate the evaluation job using the Amazon Bedrock create_evaluation_job API, which orchestrates a comprehensive assessment of our RAG system’s performance. The evaluation configuration specifies five key metrics—correctness, completeness, helpfulness, logical coherence, and faithfulness—providing a multi-dimensional analysis of the generated radiology impressions. The job is structured to use the knowledge base for retrieval and generation tasks, with the specified models handling their respective roles: Amazon Nova Micro for generation and Anthropic’s Claude 3 Haiku for evaluation, and the results are systematically stored in the designated S3 output location for subsequent analysis. See the following code:

retrieve_generate_job = bedrock_client.create_evaluation_job(
    jobName=job_name,
    jobDescription="Evaluate retrieval and generation",
    roleArn=role_arn,
    applicationType="RagEvaluation",
    inferenceConfig={
        "ragConfigs": [{
            "knowledgeBaseConfig": {
                "retrieveAndGenerateConfig": {
                    "type": "KNOWLEDGE_BASE",
                    "knowledgeBaseConfiguration": {
                        "knowledgeBaseId": knowledge_base_id,
                        "modelArn": generator_model,
                        "retrievalConfiguration": {
                            "vectorSearchConfiguration": {
                                "numberOfResults": num_results,
                                "overrideSearchType": search_type
                            }
                        }
                    }
                }
            }
        }]
    },
    outputDataConfig={
        "s3Uri": output_path
    },
    evaluationConfig={
        "automated": {
            "datasetMetricConfigs": [{
                "taskType": "Custom",
                "dataset": {
                    "name": "RagDataset",
                    "datasetLocation": {
                        "s3Uri": input_data
                    }
                },
                "metricNames": [
                    "Builtin.Correctness",
                    "Builtin.Completeness",
                    "Builtin.Helpfulness",
                    "Builtin.LogicalCoherence",
                    "Builtin.Faithfulness"
                ]
            }],
            "evaluatorModelConfig": {
                "bedrockEvaluatorModels": [{
                    "modelIdentifier": evaluator_model
                }]
            }
        }
    }
)

Evaluation results and metrics comparisons

The evaluation results for the healthcare RAG applications, using datasets dev1 and dev2, demonstrate strong performance across the specified metrics. For the dev1 dataset, the scores were as follows: correctness at 0.98, completeness at 0.95, helpfulness at 0.83, logical coherence at 0.99, and faithfulness at 0.79. Similarly, the dev2 dataset yielded scores of 0.97 for correctness, 0.95 for completeness, 0.83 for helpfulness, 0.98 for logical coherence, and 0.82 for faithfulness. These results indicate that the RAG system effectively retrieves and uses medical information to generate accurate and contextually appropriate responses, with particularly high scores in correctness and logical coherence, suggesting robust factual accuracy and logical consistency in the generated content.

The following screenshot shows the evaluation summary for the dev1 dataset.

The following screenshot shows the evaluation summary for the dev2 dataset.

Additionally, as shown in the following screenshot, the LLM-as-a-judge framework allows for the comparison of multiple evaluation jobs across different models, datasets, and prompts, enabling detailed analysis and optimization of the RAG system’s performance.

Additionally, you can perform a detailed analysis by drilling down and investigating the outlier cases with least performance metrics such as correctness, as shown in the following screenshot.

Metrics explainability

The following screenshot showcases the detailed metrics explainability interface of the evaluation system, displaying example conversations with their corresponding metrics assessment. Each conversation entry includes four key columns: Conversation input, Generation output, Retrieved sources, and Ground truth, along with a Score column. The system provides a comprehensive view of 1,000 examples, with navigation controls to browse through the dataset. Of particular note is the retrieval depth indicator showing 10 for each conversation, demonstrating consistent knowledge base utilization across examples.

The evaluation framework enables detailed tracking of generation metrics and provides transparency into how the knowledge base arrives at its outputs. Each example conversation presents the complete chain of information, from the initial prompt through to the final assessment. The system displays the retrieved context that informed the generation, the actual generated response, and the ground truth for comparison. A scoring mechanism evaluates each response, with a detailed explanation of the decision-making process visible through an expandable interface (as shown by the pop-up in the screenshot). This granular level of detail allows for thorough analysis of the RAG system’s performance and helps identify areas for optimization in both retrieval and generation processes.

In this specific example from the Indiana University Medical System dataset (dev2), we see a clear assessment of the system’s performance in generating a radiology impression for chest X-ray findings. The knowledge base successfully retrieved relevant context (shown by 10 retrieved sources) to generate an impression stating “Normal heart size and pulmonary vascularity 2. Unremarkable mediastinal contour 3. No focal consolidation, pleural effusion, or pneumothorax 4. No acute bony findings.” The evaluation system scored this response with a perfect correctness score of 1, noting in the detailed explanation that the candidate response accurately summarized the key findings and correctly concluded there was no acute cardiopulmonary process, aligning precisely with the ground truth response.

In the following screenshot, the evaluation system scored this response with a low score of 0.5, noting in the detailed explanation the ground truth response provided is “Moderate hiatal hernia. No definite pneumonia.” This indicates that the key findings from the radiology report are the presence of a moderate hiatal hernia and the absence of any definite pneumonia. The candidate response covers the key finding of the moderate hiatal hernia, which is correctly identified as one of the impressions. However, the candidate response also includes additional impressions that are not mentioned in the ground truth, such as normal lung fields, normal heart size, unfolded aorta, and degenerative changes in the spine. Although these additional impressions might be accurate based on the provided findings, they are not explicitly stated in the ground truth response. Therefore, the candidate response is partially correct and partially incorrect based on the ground truth.

Clean up

To avoid incurring future charges, delete the S3 bucket, knowledge base, and other resources that were deployed as part of the post.

Conclusion

The implementation of LLM-as-a-judge for evaluating healthcare RAG applications represents a significant advancement in maintaining the reliability and accuracy of AI-generated medical content. Through this comprehensive evaluation framework using Amazon Bedrock Knowledge Bases, we’ve demonstrated how automated assessment can provide detailed insights into the performance of medical RAG systems across multiple critical dimensions. The high-performance scores across both datasets indicate the robustness of this approach, though these metrics are just the beginning.

Looking ahead, this evaluation framework can be expanded to encompass broader healthcare applications while maintaining the rigorous standards essential for medical applications. The dynamic nature of medical knowledge and clinical practices necessitates an ongoing commitment to evaluation, making continuous assessment a cornerstone of successful implementation.

Through this series, we’ve demonstrated how you can use Amazon Bedrock to create and evaluate healthcare generative AI applications with the precision and reliability required in clinical settings. As organizations continue to refine these tools and methodologies, prioritizing accuracy, safety, and clinical utility in healthcare AI applications remains paramount.

About the Authors

Adewale Akinfaderin is a Sr. Data Scientist–Generative AI, Amazon Bedrock, where he contributes to cutting edge innovations in foundational models and generative AI applications at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in physics and a doctorate in engineering.

Priya Padate is a Senior Partner Solution Architect supporting healthcare and life sciences worldwide at Amazon Web Services. She has over 20 years of healthcare industry experience leading architectural solutions in areas of medical imaging, healthcare related AI/ML solutions and strategies for cloud migrations. She is passionate about using technology to transform the healthcare industry to drive better patient care outcomes.

Dr. Ekta Walia Bhullar is a principal AI/ML/GenAI consultant with AWS Healthcare and Life Sciences business unit. She has extensive experience in development of AI/ML applications for healthcare especially in Radiology. During her tenure at AWS she has actively contributed to applications of AI/ML/GenAI within lifescience domain such as for clinical, drug development and commercial lines of business.

AWS DeepRacer: Closing time at AWS re:Invent 2024 –How did that physical racing go?

Having spent the last years studying the art of AWS DeepRacer in the physical world, the author went to AWS re:Invent 2024. How did it go?

In AWS DeepRacer: How to master physical racing?, I wrote in detail about some aspects relevant to racing AWS DeepRacer in the physical world. We looked at the differences between the virtual and the physical world and how we could adapt the simulator and the training approach to overcome the differences. The previous post was left open-ended—with one last Championship Final left, it was too early to share all my secrets.

Now that AWS re:Invent is over, it’s time to share my strategy, how I prepared, and how it went in the end.

Strategy

Going into the 2024 season, I was reflecting on my performance from 2022 and 2023. In 2022, I had unstable models that were unable to do fast laps on the new re:Invent 2022 Championship track, not even making the last 32. In 2023, things went slightly better, but it was clear that there was potential to improve.

Specifically, I wanted a model that:

Goes straight on the straights and corners with precision
Has a survival instinct and avoids going off-track even in a tight spot
Can ignore the visual noise seen around the track

Combine that with the ability to test the models before showing up at the Expo, and success seemed possible!

Implementation

In this section, I will explain my thinking about why physical racing is so different than virtual racing, as well as describe my approach to training a model that overcomes those differences.

How hard can it be to go straight?

If you have watched DeepRacer over the years, you have probably seen that most models struggle to go straight on the straights and end up oscillating left and right. The question has always been: why is it like that? This behavior causes two issues: the distance driven increases (result: slower lap time) and the car potentially enters the next turn in a way it can’t handle (result: off-track).

A few theories emerged:

Sim-to-real issues – The steering response isn’t matching the simulator, both with regards to the steering geometry and latency (time from picture to servo command, as well as the time it takes the servo to actually actuate). Therefore, when the car tries to adjust the direction on the straight, it doesn’t get the response it expects.
Model issues – A combination of the model not actually using the straight action, and not having access to angles needed to dampen oscillations (2.5–5.0 degrees).
Calibration issues – If the car isn’t calibrated to go straight when given a 0-degree action, and the left/right max values are either too high (tendency to oversteer) or too low (tendency to understeer), you are likely to get control issues and unstable behavior.

My approach:

Use the Ackermann steering geometry patch. With it, the car will behave more realistically, and the turning radius will decrease for a given angle. As a result, the action space can be limited to angles up to about 20 degrees. This roughly matches with the real car’s steering angle.
Include stabilizing steering angles (2.5 and 5.0) in the action space, allowing for minor corrections on the straights.
Use relatively slow speeds (0.8–1.3 m/s) to avoid slipping in the simulator. My theory is that the 15 fps simulator and the 30 fps car actually translates 1.2 mps in the simulator into effectively 2.4 mps in the real world.
By having an inverted chevron action space giving higher speeds for straights, nudge the car to use the straight actions, rather than oscillating left-right actions.
Try out v3, v4, and v5 physical models—test on a real track to see what works best.
Otherwise, the reward function was the same progress-based reward function I also use in virtual racing.

The following figure illustrates the view of testing in the garage, going straight at least one frame.

Be flexible

Virtual racing is (almost) deterministic, and over time, the model will converge and the car will take a narrow path, reducing the variety in the situations it sees. Early in training, it will frequently be in odd positions, almost going off-track, and it remembers how to get out of these situations. As it converges, the frequency at which it must handle these reduces, and the theory is that the memory fades, and at some point, it forgets how to get out of a tight spot.

My approach:

Diversify training to teach the car to handle a variety of corners, in both directions:
- Consistently train models going both clockwise and counterclockwise.
- Use tracks—primarily the 2022 Championship track—that are significantly more complex than the Forever Raceway.
- Do final optimization on the Forever Raceway—again in both directions.
Take several snapshots during training; don’t go below 0.5 in entropy.
Test on tracks the car has never seen. The simulator has many suitable, narrow tracks—the hallmark of a generalized model is one that can handle tracks it has never seen during training.

Stay focused on the track

In my last post, I looked at the visual differences between the virtual and real worlds. The question is what to do about it. The goal is to trick the model into ignoring the noise and focus on what is important: the track.

My approach:

Train in an environment with significantly more visual noise. The tracks in the custom track repository have added noise through additional lights, buildings, and different walls (and some even come with shadows).
Alter the environment during training to avoid overfitting to the added noise. The custom tracks were made in such a way that different objects (buildings, walls, and lines) could be made invisible at runtime. I had a cron job randomizing the environment every 5 minutes.

The following figure illustrates the varied training environment.

What I didn’t consider this year was simulating blurring during training. I attempted this previously by averaging the current camera frame with the previous one before inferencing. It didn’t seem to help.

Lens distortion is a topic I have observed, but not fully investigated. The original camera has a distinct fish-eye distortion, and Gazebo would be able to replicate it, but it would require some work to actually determine the coefficients. Equally, I have never tried to replicate the rolling motions of the real car.

Testing

Testing took place in the garage on the Trapezoid Narrow track. The track is obviously basic, but with two straights and two 180-degree turns with different radii, it had to do the job. The garage track also had enough visual noise to see if the models were robust enough.

The method was straightforward: try all models both clockwise and counterclockwise. Using the logs captured by the custom car stack, I spent the evening looking through the video of each run to determine which model I liked the best—looking at stability, handling (straight on straights plus precision cornering), and speed.

re:Invent 2024

The track for re:Invent 2024 was the Forever Raceway. The shape of the track isn’t new; it shares the centerline with the 2022 Summit Speedway, but being only 76 cm wide (the original was 1.07 cm), the turns become more pronounced, making it a significantly more difficult track.

The environment

The environment is classic re:Invent: a smooth track with very little shine combined with smooth, fairly tall walls surrounding the track. The background is what often causes trouble—this year, a large lit display hung under the ceiling at the far end of the track, and as the following figure shows, it was attracting quite some attention from the GradCam.

Similarly, the pit crew cage, where cars are maintained, attracted attention.

The results

So where did I end up, and why? In Round 1, I ended up at place 14, with a best average of 10.072 seconds, and a best lap time of 9.335 seconds. Not great, but also not bad—almost 1 second outside top 8.

Using the overhead camera provided by AWS through the Twitch stream, it’s possible to create a graphical view showing the path the car took, as shown in the following figure.

If we compare this with how the same model liked to drive in training, we see a bit of a difference.

What becomes obvious quite quickly is that although I succeeded in going straight on the (upper) straight, the car didn’t corner as tightly as during training, making the bottom half of the track a bit of a mess. Nevertheless, the car demonstrated the desired survival instinct and stayed on track even when faced with unexpectedly sharp corners.

Why did this happen:

20 degrees of turning using Ackermann steering is too much; the real car isn’t capable of doing it in the real world
The turning radius is increasing as the speed goes up due to slipping, caused both by low friction and lack of grip due to rolling
The reaction time plays more of a role as the speed increases, and my model acted too late, overshooting into the corner

The combined turning radius and reaction time effect also caused issues at the start. If the car goes slowly, it turns much faster—and ends up going off-track on the inside—causing issues for me and others.

My takeaways:

Overall, the training approach seemed to work well. Well-calibrated cars went straight on the straights, and background noise didn’t seem to bother my models much.
I need to get closer to the car’s actual handling characteristics at speed during training by increasing the max speed and reducing the max angle in the action space.
Physical racing is still not well understood—and it’s a lot about model-meets-car. Some models thrive on objectively perfectly calibrated cars, whereas others work great when matched with a particular one.
Track is king—those that had access to the track, either through their employer or having built one at home, had a massive advantage, even if almost everyone said that they were surprised by which model worked in the end.

Now enjoy the inside view of a car at re:Invent, and see if you can detect any of the issues that I have discussed. The video was recorded after I had been knocked out of the competition using a car with the custom car software.

Closing time: Where do we go from here?

This section is best enjoyed with Semisonic’s Closing Time as a soundtrack.

As we all wrapped up at the Expo after an intense week of racing, re:Invent literally being dismantled around us, the question was: what comes next?

This was the last DeepRacer Championship, but the general sentiment was that whereas nobody will really miss virtual racing—it is a problem solved—physical racing is still a whole lot of fun, and the community is not yet ready to move on. Since re:Invent several initiatives have gained traction with a common goal to make DeepRacer more accessible:

By enrolling cars with the DeepRacer Custom Car software stack into DeepRacer Event Manager you can capture car logs and generate the analytics videos, as shown in this article, directly during your event!
- DeepRacer Pi and DeepRacer Custom Car initiatives allow racers to build cars at home:Use off-the-shelf components for a 1:18 scale racer, or
Combine off-the-shelf components with a custom circuit board to build the 1:28 scale DeepRacer Pi Mini Both options are compatible with already trained models, including integration with DeepRacer Event Manager.

DeepRacer Custom Console will be a drop-in replacement for the current car UI with a beautiful UI designed in Cloudscape, aligning the design with DREM and the AWS Console.

Prototype DeepRacer Pi Mini – 1 :28 scale

Closing Words

DeepRacer is a fantastic way to teach AI in a very physical and visual way, and is suitable for older kids, students, and adults in the corporate setting alike. It will be interesting to see how AWS, its corporate partners, and the community will continue the journey in the years ahead.

A big thank you goes to all of those that have been involved in DeepRacer from its inception to today—too many to be named—it has been a wonderful experience. A big congratulations goes out to this years’ winners!

Closing time, every new beginning comes from some other beginning’s end…

About the Author

Lars Lorentz Ludvigsen is a technology enthusiast who was introduced to AWS DeepRacer in late 2019 and was instantly hooked. Lars works as a Managing Director at Accenture, where he helps clients build the next generation of smart connected products. In addition to his role at Accenture, he’s an AWS Community Builder who focuses on developing and maintaining the AWS DeepRacer community’s software solutions.

Overview of Smartsheet

Overview of the Smartsheet connector for Amazon Q Business

Prerequisites

Configure and prepare the Amazon Q Business Smartsheet connector

Amazon Q Business and Smartsheet connector in action

Troubleshooting

Clean up

Conclusion

About the Authors

Business challenge

Business use cases

Solution overview

MECE

Issue tree

SWOT

Value chain analysis

Value driver tree

Conclusion

About the Authors

Customer challenge

Initial approach

Finding a better solution

Stage 1

Stage 2

Stage 3

Conclusion

References

AWS

Community

Disclaimer

About the authors

The Wildcard Last Chance

The field of 32 begins

And then there were eight

The final three

About the Author

Accelerated AI-Powered Cybersecurity

Real-Time Threat Detection and Response

Scalability for Growing Infrastructure Cybersecurity Needs

Enhanced Data Security Across Distributed Environments

Improved Regulatory Compliance

Accelerating Post-Quantum Cryptography

Essentiality of Investing in Modern Cybersecurity Infrastructure

Overview of the solution

Dataset and background

RAG with Amazon Bedrock Knowledge Bases

LLM-as-a-judge and quality metrics

Prerequisites

Convert the test dataset into JSONL for RAG evaluation

Set up a RAG evaluation job

Evaluation results and metrics comparisons

Metrics explainability

Clean up

Conclusion

About the Authors

Strategy

Implementation

How hard can it be to go straight?

Be flexible

Stay focused on the track

Testing

re:Invent 2024

The environment

The results

Closing time: Where do we go from here?

Closing Words

About the Author

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.