Voice input in Google Lens lets you make your voice heard as you search. Here’s how to use this new feature.Read More
Ask questions in new ways with AI in Search
Introducing new ways to ask questions on Google and Lens to help you find what you need more effortlessly using AI.Read More
No Tricks, Just Games: GeForce NOW Thrills With 22 Games in October
The air is crisp, the pumpkins are waiting to be carved, and GFN Thursday is ready to deliver some gaming thrills.
GeForce NOW is unleashing a monster mash of gaming goodness this October with 22 titles joining the cloud, with five available for members to stream this week. From pulse-pounding action to immersive role-playing games, members’ cloud gaming cauldrons are about to bubble over with excitement. Plus, a new account portal update lets members take a look at their playtime details and history on GeForce NOW.
October Treats in Store
GeForce NOW is offering plenty of treats for members this month, starting with the launch of THRONE AND LIBERTY this week.
THRONE AND LIBERTY is a free-to-play massively multiplayer online role-playing game that takes place in the vast open world of Solisium. Scale expansive mountain ranges for new vantage points, scan open skies, traverse sprawling plains and explore a land full of depth and opportunity.
Adapt to survive and thrive through strategic decisions in player vs. player or player vs. environment combat modes while navigating evolving battlefields impacted by weather, time of day and other players. There’s no single path to victory to defeat Kazar and claim the throne while keeping rival guilds at bay.
Look for the following games available to stream in the cloud this week:
- THRONE AND LIBERTY (New release on Steam, Oct. 1)
- Sifu (Available on PC Game Pass, Oct. 2)
- Bear and Breakfast (Free on Epic Games Store, Oct. 3)
- Monster Jam Showdown (Steam)
- TerraTech Worlds (Steam)
Here’s what members can expect for the rest of October:
- Europa (New release on Steam, Oct. 11)
- Neva (New release on Steam, Oct. 15)
- MechWarrior 5: Clans (New release on Steam and Xbox, Oct. 16)
- A Quiet Place: The Road Ahead (New release on Steam, Oct. 17)
- Worshippers of Cthulhu (New release on Steam, Oct. 21)
- No More Room in Hell 2 (New release on Steam, Oct. 22)
- Romancing SaGa 2: Revenge of the Seven (New release on Steam, Oct. 24)
- Call of Duty: Black Ops 6 (New release on Steam and Battle.net, Oct. 25)
- Life Is Strange: Double Exposure (New release on Steam and Xbox, available in the Microsoft store, Oct. 29)
- Artisan TD (Steam)
- ASKA (Steam)
- DUCKSIDE (Steam)
- Dwarven Realms (Steam)
- Selaco (Steam)
- Spirit City: Lofi Sessions (Steam)
- Starcom: Unknown Space (Steam)
- Star Trek Timelines (Steam)
Surprises in September
In addition to the 18 games announced last month, 12 more joined the GeForce NOW library:
- Warhammer 40,000: Space Marine 2 (New release on Steam, Sept. 9)
- Dead Rising Deluxe Remaster (New release on Steam, Sept. 18)
- Witchfire (New release on Steam, Sept. 23)
- Monopoly (New release on Ubisoft Connect, Sept. 26)
- Dawn of Defiance (Steam)
- Flintlock: The Siege of Dawn (Xbox, available on PC Game Pass)
- Fort Solis (Epic Games Store)
- King Arthur: Legion IX (Steam)
- The Legend of Heroes: Trails Through Daybreak (Steam)
- Squirrel With a Gun (Steam)
- Tyranny – Gold Edition (Xbox, available on Microsoft Store)
- XIII (Xbox, available on Microsoft Store)
Blacksmith Simulator didn’t make it in September as the game’s launch was moved to next year.
What are you planning to play this weekend? Let us know on X or in the comments below.
What’s your favorite horror game?
— NVIDIA GeForce NOW (@NVIDIAGFN) October 2, 2024
How AI and Accelerated Computing Drive Energy Efficiency
AI isn’t just about building smarter machines. It’s about building a greener world.
From optimizing energy use to reducing emissions, artificial intelligence and accelerated computing are helping industries tackle some of the world’s toughest environmental challenges.
As Joshua Parker, NVIDIA’s Senior Director of Corporate Sustainability, explains on the latest edition of NVIDIA’s AI Podcast, these technologies are powering a new era of energy efficiency.
Can AI Help Reduce Energy Consumption?
Yes. And it’s doing it in ways that might surprise you.
AI systems themselves use energy—sure—but the big story is how AI and accelerated computing are helping other systems save energy.
Take data centers, for instance.
They’re the backbone of AI, housing the powerful systems that crunch the data needed for AI to work.
Globally, data centers account for about 2% of total energy consumption, and AI-specific centers represent only a tiny fraction of that, Parker explains.
Despite this, AI’s real superpower lies in its ability to optimize.
How? By using accelerated computing platforms that combine GPUs and CPUs.
GPUs (Graphics Processing Units) are designed to handle complex computations quickly and efficiently.
In fact, these systems can be up to 20 times more energy-efficient than traditional CPU-only systems, Parker notes.
That’s not just good for tech companies—it’s good for the environment, too.
What is Accelerated Computing?
At its core, accelerated computing is about doing more with less.
It involves using specialized hardware—like GPUs—to perform tasks faster and with less energy.
This isn’t just theoretical. Over the last eight years, AI systems running on accelerated computing platforms have become 45,000 times more energy-efficient, Parker said.
That’s a staggering leap in performance, driven by improvements in both hardware and software.
So why does this matter? It matters because, as AI becomes more widespread, the demand for computing power grows.
Accelerated computing helps companies scale their AI operations without consuming massive amounts of energy. This energy efficiency is key to AI’s ability to tackle some of today’s biggest sustainability challenges.
AI in Action: Tackling Climate Change
AI isn’t just saving energy—it’s helping to fight climate change.
For instance, AI-enhanced weather forecasting is becoming more accurate, allowing industries and governments to prepare for climate-related events like hurricanes or floods, Parker explains.
The better we can predict these events, the better we can prepare for them, which means fewer resources wasted and less damage done.
Another key area is the rise of digital twins—virtual models of physical environments.
These AI-powered simulations allow companies to optimize energy consumption in real-time, without having to make costly changes in the physical world.
In one case, using a digital twin helped a company achieve a 10% reduction in energy use, Parker said. That may sound small, but scale it across industries and the impact is huge.
AI is also playing a crucial role in developing new materials for renewable energy technologies like solar panels and electric vehicles, accelerating the transition to clean energy.
Can AI Make Data Centers More Sustainable?
Here’s the thing: AI needs data centers to operate, and as AI grows, so does the demand for computing power. But data centers don’t have to be energy hogs.
In fact, they can be part of the sustainability solution.
One major innovation is direct-to-chip liquid cooling. This technology allows data centers to cool their systems much more efficiently than traditional air conditioning methods, which are often energy-intensive.
By cooling directly at the chip level, this method saves energy, helping data centers stay cool without guzzling power, Parker explains.
As AI scales up, the future of data centers will depend on designing for energy efficiency from the ground up. That means integrating renewable energy, using energy storage solutions, and continuing to innovate with cooling technologies.
The goal is to create green data centers that can meet the world’s growing demand for compute power without increasing their carbon footprint, Parker says.
The Role of AI in Building a Sustainable Future
AI is not just a tool for optimizing systems—it’s a driver of sustainable innovation. From improving the efficiency of energy grids to enhancing supply chain logistics, AI is leading the charge in reducing waste and emissions.
Let’s look at energy grids. AI can monitor and adjust energy distribution in real-time, ensuring that resources are allocated where they’re needed most, reducing waste.
This is particularly important as the world moves toward renewable energy, which can be less predictable than traditional sources like coal or natural gas, Parker said.
AI is also helping industries reduce their carbon footprints. By optimizing routes and predicting demand more accurately, AI can cut down on fuel use and emissions in logistics and transportation sectors.
Looking to the future, AI’s role in promoting sustainability is only going to grow.
As technologies become more energy-efficient and AI applications expand, we can expect AI to play a crucial role in helping industries meet their sustainability goals, Parker said.
It’s not just about making AI greener—it’s about using AI to make the world greener.
AI and accelerated computing are reshaping how we think about energy and sustainability.
With their ability to optimize processes, reduce energy waste, and drive innovations in clean technology, these technologies are essential tools for creating a sustainable future.
As Parker explains on NVIDIA’s AI Podcast, AI’s potential to save energy and combat climate change is vast—and we’re only just beginning to tap into it.
As AI continues to revolutionize industries and drive sustainability, there’s no better time to dive deeper into its transformative potential. If you’re eager to explore how AI and accelerated computing are shaping the future of energy efficiency and climate solutions, join us at the NVIDIA AI Summit.
Event Date: October 9, 2024
Register here and gain exclusive insights into the innovations that are powering a sustainable world.
Don’t miss your chance to learn from the leading minds in AI and sustainability. Let’s create a greener future together.
Introducing the first Google Academic Research Award winners
Google launches GARA program to fund and support groundbreaking research in computing and technology, addressing global challenges.Read More
Experience and learn about culture with the help of AI
These three new AI features from Google Arts & Culture let you explore art through a personalized lens.Read More
How Schneider Electric uses Amazon Bedrock to identify high-potential business opportunities
This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Adrian Boeh, Senior Data Scientist – NAM AI, from Schneider Electric.
Schneider Electric is a global leader in the digital transformation of energy management and automation. The company specializes in providing integrated solutions that make energy safe, reliable, efficient, and sustainable. Schneider Electric serves a wide range of industries, including smart manufacturing, resilient infrastructure, future-proof data centers, intelligent buildings, and intuitive homes. They offer products and services that encompass electrical distribution, industrial automation, and energy management. Their innovative technologies, extensive range of products, and commitment to sustainability position Schneider Electric as a key player in advancing smart and green solutions for the modern world.
As demand for renewable energy continues to rise, Schneider Electric faces high demand for sustainable microgrid infrastructure. This demand comes in the form of requests for proposals (RFPs), each of which needs to be manually reviewed by a microgrid subject matter expert (SME) at Schneider. Manual review of each RFP was proving too costly and couldn’t be scaled to meet the industry needs. To solve the problem, Schneider turned to Amazon Bedrock and generative artificial intelligence (AI). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
In this post, we show how the team at Schneider collaborated with the AWS Generative AI Innovation Center (GenAIIC) to build a generative AI solution on Amazon Bedrock to solve this problem. The solution processes and evaluates each RFP and then routes high-value RFPs to the microgrid SME for approval and recommendation.
Problem Statement
Microgrid infrastructure is a critical element to the growing renewables energy market. A microgrid includes on-site power generation and storage that allow a system to disconnect from the main grid. Schneider Electric offers several important products that allow customers to build microgrid solutions to make their residential buildings, schools, or manufacturing centers more sustainable. Growing public and private investment in this sector has led to an exponential increase in the number of RFPs for microgrid systems.
The RFP documents contain technically complex textual and visual information such as scope of work, parts lists, and electrical diagrams. Moreover, they can be hundreds of pages long. The following figure provides several examples of RFP documents. The RFP size and complexity makes reviewing them costly and labor intensive. An experienced SME is usually required to review an entire RFP and provide an assessment for its applicability to the business and potential for conversion.
To add additional complexity, the same set of RFP documents might be assessed by multiple business units within Schneider. Each unit might be looking for different requirements that make the opportunity relevant to that sales team.
Given the size and complexity of the RFP documents, the Schneider team needed a way to quickly and accurately identify opportunities where Schneider products offer a competitive advantage and a high potential for conversion. Failure to respond to viable opportunities could result in potential revenue loss, while devoting resources to proposals where the company lacks a distinct competitive edge would lead to an inefficient use of time and effort.
They also needed a solution that could be repurposed for other business units, allowing the impact to extend to the entire enterprise. Successfully handling the influx of RFPs would not only allow the Schneider team to expand their microgrid business, but help businesses and industries adopt a new renewable energy paradigm.
Amazon Bedrock and Generative AI
To help solve this problem, the Schneider team turned to generative AI and Amazon Bedrock. Large language models (LLMs) are now enabling more efficient business processes through their ability to identify and summarize specific categories of information with human-like precision. The volume and complexity of the RFP documents made them an ideal candidate to use generative AI for document processing.
You can use Amazon Bedrock to build and scale generative AI applications with a broad range of FMs. Amazon Bedrock is a fully managed service that includes FMs from Amazon and third-party models supporting a range of use cases. For more details about the FMs available, see Supported foundation models on Amazon Bedrock. Amazon Bedrock enables developers to create unique experiences with generative AI capabilities supporting a broad range of programming languages and frameworks.
The solution uses Anthropic Claude on Amazon Bedrock, specifically the Anthropic Claude Sonnet model. For the vast majority of workloads, Sonnet is two times faster than Claude 2 and Claude 2.1, with higher levels of intelligence.
Solution Overview
Traditional Retrieval Augmented Generation (RAG) systems can’t identify the relevancy of RFP documents to a given sales team because of the extensively long list of one-time business requirements and the large taxonomy of electrical components or services, which might or might not be present in the documents.
Other existing approaches require either expensive domain-specific fine-tuning to the LLM or the use of filtering for noise and data elements, which leads to suboptimal performance and scalability impacts.
Instead, the AWS GenAIC team worked with Schneider Electric to package business objectives onto the LLM through multiple prisms of semantic transformations: concepts, functions, and components. For example, in the domain of smart grids, the underlying business objectives might be defined as resiliency, isolation, and sustainability. Accordingly, the corresponding functions would involve energy generation, consumption, and storage. The following figure illustrates these components.
The approach of concept-driven information extraction resembles ontology-based prompting. It allows engineering teams to customize the initial list of concepts and scale onto different domains of interest. The decomposition of complex concepts into specific functions incentivizes the LLM to detect, interpret, and extract the associated data elements.
The LLM was prompted to read RFPs and retrieve quotes pertinent to the defined concepts and functions. These quotes materialize the presence of electrical equipment satisfying the high-level objectives and were used as weight of evidence indicating the downstream relevancy of an RFP to the original sales team.
For example, in the following code, the term BESS stands for battery energy storage system and materializes evidence for power storage.
In the following example, the term EPC indicates the presence of a solar plant.
The overall solution encompasses three phases:
- Document chunking and preprocessing
- LLM-based quote retrieval
- LLM-based quote summarization and evaluation
The first step uses standard document chunking as well as Schneider’s proprietary document processing pipelines to group similar text elements into a single chunk. Each chunk is processed by the quote retrieval LLM, which identifies relevant quotes within each chunk if they’re available. This brings relevant information to the forefront and filters out irrelevant content. Finally, the relevant quotes are compiled and fed to a final LLM that summarizes the RFP and determines its overall relevance to the microgrid family of RFPs. The following diagram illustrates this pipeline.
The final determination about the RFP is made using the following prompt structure. The details of the actual prompt are proprietary, but the structure includes the following:
- We first provide the LLM with a brief description of the business unit in question.
- We then define a persona and tell the LLM where to locate evidence.
- Provide criteria for RFP categorization.
- Specify the output format, which includes:
- A single yes, no, maybe
- A relevance score from 1–10.
- An explainability.
The result compresses a relatively large corpus of RFP documents into a focused, concise, and informative representation by precisely capturing and returning the most important aspects. The structure allows the SME to quickly filter for specific LLM labels, and the summary quotes allow them to better understand which quotes are driving the LLM’s decision-making process. In this way, the Schneider SME team can spend less time reading through pages of RFP proposals and can instead focus their attention on the content that matters most to their business. The sample below shows both a classification result and qualitative feedback for a sample RFP.
Internal teams are already experiencing the advantages of our new AI-driven RFP Assistant:
“At Schneider Electric, we are committed to solving real-world problems by creating a sustainable, digitized, and new electric future. We leverage AI and LLMs to further enhance and accelerate our own digital transformation, unlocking efficiency and sustainability in the energy sector.”
– Anthony Medeiros, Manager of Solutions Engineering and Architecture, Schneider Electric.
Conclusion
In this post, the AWS GenAIIC team, working with Schneider Electric, demonstrated the remarkable general capability of LLMs available on Amazon Bedrock to assist sales teams and optimize their workloads.
The RFP assistant solution allowed Schneider Electric to achieve 94% accuracy in the task of identifying microgrid opportunities. By making small adjustments to the prompts, the solution can be scaled and adopted to other lines of business.
By precisely guiding the prompts, the team can derive distinct and objective perspectives from identical sets of documents. The proposed solution enables RFPs to be viewed through the interchangeable lenses of various business units, each pursuing a diverse range of objectives. These previously obscured insights have the potential to unveil novel business prospects and generate supplementary revenue streams.
These capabilities will allow Schneider Electric to seamlessly integrate AI-powered insights and recommendations into its day-to-day operations. This integration will facilitate well-informed and data-driven decision-making processes, streamline operational workflows for heightened efficiency, and elevate the quality of customer interactions, ultimately delivering superior experiences.
About the Authors
Anthony Medeiros is a Manager of Solutions Engineering and Architecture at Schneider Electric. He specializes in delivering high-value AI/ML initiatives to many business functions within North America. With 17 years of experience at Schneider Electric, he brings a wealth of industry knowledge and technical expertise to the team.
Adrian Boeh is a Senior Data Scientist working on advanced data tasks for Schneider Electric’s North American Customer Transformation Organization. Adrian has 13 years of experience at Schneider Electric and is AWS Machine Learning Certified with a proven ability to innovate and improve organizations using data science methods and technology.
Kosta Belz is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where he helps customers design and build generative AI solutions to solve key business problems.
Dan Volk is a Data Scientist at the AWS Generative AI Innovation Center. He has 10 years of experience in machine learning, deep learning, and time series analysis, and holds a Master’s in Data Science from UC Berkeley. He is passionate about transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.
Negin Sokhandan is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where she works on building generative AI solutions for AWS strategic customers. Her research background is statistical inference, computer vision, and multimodal systems.
Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock
Large enterprises are building strategies to harness the power of generative AI across their organizations. However, scaling up generative AI and making adoption easier for different lines of businesses (LOBs) comes with challenges around making sure data privacy and security, legal, compliance, and operational complexities are governed on an organizational level. In this post, we discuss how to address these challenges holistically.
Managing bias, intellectual property, prompt safety, and data integrity are critical considerations when deploying generative AI solutions at scale. Because this is an emerging area, best practices, practical guidance, and design patterns are difficult to find in an easily consumable basis. In this post, we share AWS guidance that we have learned and developed as part of real-world projects into practical guides oriented towards the AWS Well-Architected Framework, which is used to build production infrastructure and applications on AWS. We focus on the operational excellence pillar in this post.
Amazon Bedrock plays a pivotal role in this endeavor. It’s a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic, Cohere, Meta, Mistral AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. You can securely integrate and deploy generative AI capabilities into your applications using services such as AWS Lambda, enabling seamless data management, monitoring, and compliance (for more details, see Monitoring and observability). This integration makes sure enterprises can take advantage of the full power of generative AI while adhering to best practices in operational excellence.
With Amazon Bedrock, enterprises can achieve the following:
- Scalability – Scale generative AI applications across different LOBs without compromising performance
- Security and compliance – Enforce data privacy, security, and compliance with industry standards and regulations
- Operational efficiency – Streamline operations with built-in tools for monitoring, logging, and automation, aligned with the AWS Well-Architected Framework
- Innovation – Access cutting-edge AI models and continually improve them with real-time data and feedback
This approach enables enterprises to deploy generative AI at scale while maintaining operational excellence, ultimately driving innovation and efficiency across their organizations.
What’s different about operating generative AI workloads and solutions?
The operational excellence pillar of the Well-Architected Framework is mainly focused on supporting the development and running of workloads effectively, gaining insight into their operations, and continuously improving supporting processes and procedures to deliver business value. However, if we were to apply a generative AI lens, we would need to address the intricate challenges and opportunities arising from its innovative nature, encompassing the following aspects:
- Complexity can be unpredictable due to the ability of large language models (LLMs) to generate new content
- Potential intellectual property infringement is a concern due to the lack of transparency in the model training data
- Low accuracy in generative AI can create incorrect or controversial content
- Resource utilization requires a specific operating model to meet the substantial computational resources required for training and prompt and token sizes
- Continuous learning necessitates additional data annotation and curation strategies
- Compliance is also a rapidly evolving area, where data governance becomes more nuanced and complex, and poses challenges
- Integration with legacy systems requires careful considerations of compatibility, data flow between systems, and potential performance impacts.
Any generative AI lens therefore needs to combine the following elements, each with varying levels of prescription and enforcement, to address these challenges and provide the basis for responsible AI usage:
- Policy – The system of principles to guide decisions
- Guardrails – The rules that create boundaries to keep you within the policy
- Mechanisms – The process and tools
AWS has advanced responsible AI by introducing Amazon Bedrock Guardrails as a protection to prevent harmful responses from the LLMs, providing an additional layer of safeguards regardless of the underlying FM. However, a more holistic organizational approach is crucial because generative AI practitioners, data scientists, or developers can potentially use a wide range of technologies, models, and datasets to circumvent the established controls.
As cloud adoption has matured for more traditional IT workloads and applications, the need to help developers select the right cloud solution that minimizes corporate risk and simplifies the developer experience has emerged. This is often referred to as platform engineering and can be neatly summarized by the mantra “You (the developer) build and test, and we (the platform engineering team) do all the rest!”
This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value. This is illustrated in the following diagram.
Where to start?
We start this post by reviewing the foundational operational elements a generative AI platform team needs to initially focus on as they transition generative solutions from a proof of concept or prototype phase to a production-ready solution.
Specifically, we cover how you can safely develop, deploy, and monitor models, mitigating operational and compliance risks, thereby reducing the friction in adopting AI at scale and for production use. We focus on the following four design principles:
- Establish control through promoting transparency of model details, setting up guardrails or safeguards, and providing visibility into costs, metrics, logs, and traces
- Automate model fine-tuning, training, validation, and deployment using large language model operations (LLMOps) or foundation model operations (FMOps)
- Manage data through standard methods for ingestion, governance, and indexing
- Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines
In the following sections, we explain this using an architecture diagram while diving into the best practices of the control pillar.
Provide control through transparency of models, guardrails, and costs using metrics, logs, and traces
The control pillar of the generative AI framework focuses on observability, cost management, and governance, making sure enterprises can deploy and operate their generative AI solutions securely and efficiently. The following diagram illustrates the key components of this pillar:
Observability
Setting up observability measures lays the foundations for the other two components, namely FinOps and Governance. Observability is crucial for monitoring the performance, reliability, and cost-efficiency of generative AI solutions. By using AWS services such as Amazon CloudWatch, AWS CloudTrail, and Amazon OpenSearch Service, enterprises can gain visibility into model metrics, usage patterns, and potential issues, enabling proactive management and optimization.
Amazon Bedrock is compatible with robust observability features to monitor and manage ML models and applications. Key metrics integrated with CloudWatch include invocation counts, latency, client and server errors, throttles, input and output token counts, and more (for more details, see Monitor Amazon Bedrock with Amazon CloudWatch). You can also use Amazon EventBridge to monitor events related to Amazon Bedrock. This allows you to create rules that invoke specific actions when certain events occur, enhancing the automation and responsiveness of your observability setup (for more details, see Monitor Amazon Bedrock). CloudTrail can log all API calls made to Amazon Bedrock by a user, role, or AWS service in an AWS environment. This is particularly useful for tracking access to sensitive resources such as personally identifiable information (PII), model updates, and other critical activities, enabling enterprises to maintain a robust audit trail and compliance. To learn more, see Log Amazon Bedrock API calls using AWS CloudTrail.
Amazon Bedrock supports the metrics and telemetry needed for implementing an observability maturity model for LLMs, which includes the following:
- Capturing and analyzing LLM-specific metrics such as model performance, prompt properties, and cost metrics through CloudWatch
- Implementing alerts and incident management tailored to LLM-related issues
- Providing security compliance and robust monitoring mechanisms, because Amazon Bedrock is in scope for common compliance standards and offers automated abuse detection mechanisms
- Using CloudWatch and CloudTrail for anomaly detection, usage and costs forecasting, optimizing performance, and resource utilization
- Using AWS forecasting services for better resource planning and cost management
CloudWatch provides a unified monitoring and observability service that collects logs, metrics, and events from various AWS services and on-premises sources. This allows enterprises to track key performance indicators (KPIs) for their generative AI models, such as I/O volumes, latency, and error rates. You can use CloudWatch dashboards to create custom visualizations and alerts, so teams are quickly notified of any anomalies or performance degradation.
For more advanced observability requirements, enterprises can use OpenSearch Service, a fully managed service for deploying, operating, and scaling OpenSearch and Kibana. Opensearch Dashboards provides powerful search and analytical capabilities, allowing teams to dive deeper into generative AI model behavior, user interactions, and system-wide metrics.
Additionally, you can enable model invocation logging to collect invocation logs, full request response data, and metadata for all Amazon Bedrock model API invocations in your AWS account. Before you can enable invocation logging, you need to set up an Amazon Simple Storage Service (Amazon S3) or CloudWatch Logs destination. You can enable invocation logging through either the AWS Management Console or the API. By default, logging is disabled.
Cost management and optimization (FinOps)
Generative AI solutions can quickly scale and consume significant cloud resources, and a robust FinOps practice is essential. With services like AWS Cost Explorer and AWS Budgets, enterprises can track their usage and optimize their generative AI spending, achieving cost-effective deployment and scaling.
Cost Explorer provides detailed cost analysis and forecasting capabilities, enabling you to understand your tenant-related expenditures, identify cost drivers, and plan for future growth. Teams can create custom cost allocation reports, set custom budgets using AWS budgets and alerts, and explore cost trends over time.
Analyzing the cost and performance of generative AI models is crucial for making informed decisions about model deployment and optimization. EventBridge, CloudTrail, and CloudWatch provide the necessary tools to track and analyze these metrics, helping enterprises make data-driven decisions. With this information, you can identify optimization opportunities, such as scaling down under-utilized resources.
With EventBridge, you can configure Amazon Bedrock to respond automatically to status change events in Amazon Bedrock. This enables you to handle API rate limit issues, API updates, and reduction in additional compute resources. For more details, see Monitor Amazon Bedrock events in Amazon EventBridge.
As discussed in previous section, CloudWatch can monitor Amazon Bedrock to collect raw data and process it into readable, near real-time cost metrics. You can graph the metrics using the CloudWatch console. You can also set alarms that watch for certain thresholds, and send notifications or take actions when values exceed those thresholds. For more information, see Monitor Amazon Bedrock with Amazon CloudWatch.
Governance
Implementation of robust governance measures, including continuous evaluation and multi-layered guardrails, is fundamental for the responsible and effective deployment of generative AI solutions in enterprise environments. Let’s look at them one by one:
- Performance monitoring and evaluation – Continuously evaluating the performance, safety, and compliance of generative AI models is critical. You can achieve this in several ways:
- Enterprises can use AWS services like Amazon SageMaker Model Monitor and Amazon Bedrock Guardrails, or Amazon Comprehend to monitor model behavior, detect drifts, and make sure generative AI solutions are performing as expected (or better) and adhering to organizational policies.
- You can deploy open-source evaluation metrics like RAGAS as custom metrics to make sure LLM responses are grounded, mitigate bias, and prevent hallucinations.
- Model evaluation jobs allow you to compare model outputs and choose the best-suited model for your use case. The job could be automated based on a ground truth, or you could use humans to bring in expertise on the matter. You can also use FMs from Amazon Bedrock to evaluate your applications. To learn more about this approach, refer to Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock.
- Guardrails – Generative AI solutions should include robust, multi-level guardrails to enforce responsible AI and oversight:
- First, you need guardrails around the LLM model to mitigate risks around bias and safeguard the application with responsible AI policies. This can be done through Amazon Bedrock Guardrails to set up custom guardrails around a model (FM or fine-tuned) for configuring denied topics, content filters, and blocked messaging.
- The second level is to set guardrails around the framework for each use case. This includes implementing access controls, data governance policies, and proactive monitoring and alerting to make sure sensitive information is properly secured and monitored. For example, you can use AWS data analytics services such as Amazon Redshift for data warehousing, AWS Glue for data integration, and Amazon QuickSight for business intelligence (BI).
- Compliance measures – Enterprises need to set up a robust compliance framework to meet regulatory requirements and industry standards such as GDPR, CCPA, or industry-specific standards. This helps make sure generative AI solutions remain secure, compliant, and efficient in handling sensitive information across different use cases. This approach minimizes the risk of data breaches or unauthorized data access, thereby protecting the integrity and confidentiality of critical data assets. Enterprises can take the following organization-level actions to create a comprehensive governance structure:
- Establish a clear incident response plan for addressing compliance breaches or AI system malfunctions.
- Conduct periodic compliance assessments and third-party audits to identify and address potential risks or violations.
- Provide ongoing training to employees on compliance requirements and best practices in AI governance.
- Model transparency – Although achieving full transparency in generative AI models remains challenging, organizations can take several steps to enhance model transparency and explainability:
- Provide model cards on the model’s intended use, performance, capabilities, and potential biases.
- Ask the model to self-explain, meaning provide explanations for their own decisions. This can also be set in a complex system—for example, agents could perform multi-step planning and improve through self-explanation.
Automate model lifecycle management with LLMOps or FMOps
Implementing LLMOps is crucial for efficiently managing the lifecycle of generative AI models at scale. To grasp the concept of LLMOps, a subset of FMOps, and the key differentiators compared to MLOps, see FMOps/LLMOps: Operationalize generative AI and differences with MLOps. In that post, you can learn more about the developmental lifecycle of a generative AI application and the additional skills, processes, and technologies needed to operationalize generative AI applications.
Manage data through standard methods of data ingestion and use
Enriching LLMs with new data is imperative for LLMs to provide more contextual answers without the need for extensive fine-tuning or the overhead of building a specific corporate LLM. Managing data ingestion, extraction, transformation, cataloging, and governance is a complex, time-consuming process that needs to align with corporate data policies and governance frameworks.
AWS provides several services to support this; the following diagram illustrates these at a high level. For a more detailed description, see Scaling AI and Machine Learning Workloads with Ray on AWS and Build a RAG data ingestion pipeline for large scale ML workloads.
This workflow includes the following steps:
- Data can be securely transferred to AWS using either custom or existing tools or the AWS Transfer family. You can use AWS Identity and Access Management (IAM) and AWS PrivateLink to control and secure access to data and generative AI resources, making sure data remains within the organization’s boundaries and complies with the relevant regulations.
- When the data is in Amazon S3, you can use AWS Glue to extract and transform data (for example, into Parquet format) and store metadata about the ingested data, facilitating data governance and cataloging.
- The third component is the GPU cluster, which could potentially be a Ray cluster. You can employ various orchestration engines, such as AWS Step Functions, Amazon SageMaker Pipelines, or AWS Batch, to run the jobs (or create pipelines) to create embeddings and ingest the data into a data store or vector store.
- Embeddings can be stored in a vector store such as OpenSearch, enabling efficient retrieval and querying. Alternatively, you can use a solution such as Amazon Bedrock Knowledge Bases to ingest data from Amazon S3 or other data sources, enabling seamless integration with generative AI solutions.
- You can use Amazon DataZone to manage access control to the raw data stored in Amazon S3 and the vector store, enforcing role-based or fine-grained access control for data governance.
- For cases where you need a semantic understanding of your data, you can use Amazon Kendra for intelligent enterprise search. Amazon Kendra has inbuilt ML capabilities and is easy to integrate with various data sources like S3, making it adaptable for different organizational needs.
The choice of which components to use will depend on the specific requirements of the solution, but a consistent solution should exist for all data management to be codified into blueprints (discussed in the following section).
Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines
There are a number of ways to build and deploy a generative AI solution. AWS offers key services such as Amazon Bedrock, Amazon Kendra, OpenSearch Service, and more, which can be configured to support multiple generative AI use cases, such as text summarization, Retrieval Augmented Generation (RAG), and others.
The simplest way is to allow each team who needs to use generative AI to build their own custom solution on AWS, but this will inevitably increase costs and cause organization-wide irregularities. A more scalable option is to have a centralized team build standard generative AI solutions codified into blueprints or constructs and allow teams to deploy and use them. This team can provide a platform that abstracts away these constructs with a user-friendly and integrated API and provide additional services such as LLMOps, data management, FinOps, and more. The following diagram illustrates these options.
Establishing blueprints and constructs for generative AI runtimes, APIs, prompts, and orchestration such as LangChain, LiteLLM, and so on will simplify adoption of generative AI and increase overall safe usage. Offering standard APIs with access controls, consistent AI, and data and cost management makes usage straightforward, cost-efficient, and secure.
For more information about how to enforce isolation of resources in a multi-tenant architecture and key patterns in isolation strategies while building solutions on AWS, refer to the whitepaper SaaS Tenant Isolation Strategies.
Conclusion
By focusing on the operational excellence pillar of the Well-Architected Framework from a generative AI lens, enterprises can scale their generative AI initiatives with confidence, building solutions that are secure, cost-effective, and compliant. Introducing a standardized skeleton framework for generative AI runtimes, prompts, and orchestration will empower your organization to seamlessly integrate generative AI capabilities into your existing workflows.
As a next step, you can establish proactive monitoring and alerting, helping your enterprise swiftly detect and mitigate potential issues, such as the generation of biased or harmful output.
Don’t wait—take this proactive stance towards adopting the best practices. Conduct regular audits of your generative AI systems to maintain ethical AI practices. Invest in training your team on the generative AI operational excellence techniques. By taking these actions now, you’ll be well positioned to harness the transformative potential of generative AI while navigating the complexities of this technology wisely.
About the Authors
Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based services and products. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions. Her diverse interests span development, entrepreneurship, and research.
Malcolm Orr is a principal engineer at AWS and has a long history of building platforms and distributed systems using AWS services. He brings a structured – systems, view to generative AI and is helping define how customers can adopt GenAI safely, securely and cost effectively across their organization.
Tanvi Singhal is a Data Scientist within AWS Professional Services. Her skills and areas of expertise include data science, machine learning, and big data. She supports customers in developing Machine learning models and MLops solutions within the cloud. Prior to joining AWS, she was also a consultant in various industries such as Transportation Networking, Retail and Financial Services. She is passionate about enabling customers on their data/AI journey to the cloud.
Zorina Alliata is a Principal AI Strategist, working with global customers to find solutions that speed up operations and enhance processes using Artificial Intelligence and Machine Learning. Zorina helps companies across several industries identify strategies and tactical execution plans for their AI use cases, platforms, and AI at scale implementations.
Elevate workforce productivity through seamless personalization in Amazon Q Business
Personalization can improve the user experience of shopping, entertainment, and news sites by using our past behavior to recommend the products and content that best match our interests. You can also apply personalization to conversational interactions with an AI-powered assistant. For example, an AI assistant for employee onboarding could use what it knows about an employee’s work location, department, or job title to provide information that is more relevant to the employee. In this post, we explore how Amazon Q Business uses personalization to improve the relevance of responses and how you can align your use cases and end-user data to take full advantage of this capability.
Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and complete tasks based on the data and information that is spread across your enterprise systems. Amazon Q Business provides more than 40 built-in connectors that make it effortless to connect the most popular enterprise data sources and systems into a unified and powerful search index that the AI assistant can use to help answer natural language questions from your workforce. This allows end-users to find the information and answers they’re looking for quickly, which leads to increased productivity and job satisfaction. Amazon Q Business preserves the access permissions in the source systems so that users are only able to access the information through Amazon Q Business that they have access to directly within these systems.
Solution overview
Responses are personalized by Amazon Q Business by determining if the user’s query could be enhanced by augmenting the query with known attributes of the user and transparently using the personalized query to retrieve documents from its search index. User attributes, such as work location, department, and job title, are made available to Amazon Q Business by the system used to authenticate user identities that is configured with the Amazon Q Business application. Depending on the documents available in the index, the personalized query should improve the relevancy of the returned documents, which in turn can improve the relevancy of the generated response based on those documents. The process by which user attributes flow to an Amazon Q Business application varies based on the identity federation mechanism used to authenticate your workforce for the application:
- Federation with AWS IAM Identity Center – Your workforce users, their attributes, and group membership are synchronized from your identity provider (IdP) to IAM Identity Center, where their access to Amazon Q Business applications and other AWS managed applications can be managed from a single location. This is the recommended approach. For more details on how to configure Amazon Q Business with IAM Identity Center, see Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center.
- Federation with IAM – Your workforce federates from your SAML 2.0 or OIDC compliant IdP with AWS Identity and Access Management (IAM) to access your Amazon Q Business application. For more details on how to configure Amazon Q Business with IAM federation, see Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.
The following diagram illustrates the process by which user attributes flow to Amazon Q Business for both identity federation mechanisms.
The steps of the process are as follows:
- When a user accesses the Amazon Q Business web experience or a custom client that integrates with the Amazon Q Business API, they must be authenticated. If not already authenticated, the user is redirected to the IdP configured for the Amazon Q Business application.
- After the user authenticates with the IdP, they’re redirected back to the client with an authorization code. Then the Amazon Q Business web experience or custom client makes an API call to the IdP with the client secret to exchange the authorization code for an ID token. When an IAM IdP is configured for the Amazon Q Business application, the ID token includes the user attributes that are configured in the IdP. Otherwise, with IAM Identity Center, the user attributes are synchronized from the IdP to IAM Identity Center. This process only has to be done one time during the user’s session or when the user’s session expires.
- The user is now able to interact with the AI assistant by submitting a question.
- Before the Amazon Q Business web experience or custom client can send the user’s question to the Amazon Q Business ChatSync API, it must exchange the ID token for AWS credentials. If the Amazon Q Business application is configured with IAM Identity Center, the Amazon Q Business application or custom client calls the CreateTokenWithIAM API to exchange the ID token for an IAM Identity Center token. This token includes the user attributes synchronized from the IdP to IAM Identity Center as described earlier. If the Amazon Q Business application is configured with an IAM IdP, this step is skipped.
- The last step to obtain AWS credentials is to call AWS Secure Token Service (AWS STS). If the Amazon Q Business application is configured with IAM Identity Center, the AssumeRole API is called passing the IAM Identity Center token. For an Amazon Q Business application configured with an IAM IdP, the AssumeRoleWithSAML or AssumeRoleWithWebIdentity API is called depending on whether SAML 2.0 or OIDC is used for the provider. The credentials returned from AWS STS can be cached and reused until they expire.
- The Amazon Q Business web experience or custom client can now call the ChatSync API with the credentials obtained in the previous step using AWS Signature Version 4. Because the credentials include the user attributes configured in the IdP, they’re available to Amazon Q Business to personalize the user’s query.
Amazon Q Business personalization use case
To demonstrate how personalization works in practice, let’s take an example of internal training made available to employees of a multi-national company. Imagine you lead the training department for an enterprise company and you’re tasked with improving the access to training opportunities offered to employees. You’ve done a great job documenting this information for all locations where training is provided and published it on your company’s Microsoft SharePoint site, but the feedback from employees is that they don’t know where to find the information. The confusion stems from the fact that your company also publishes internal company information and documentation on Confluence, Box, and a wiki. Additionally, your department uses ServiceNow for training support, which has developed into another source of valuable but under-utilized information.
The first challenge to solve is discoverability of the information spread across these disparate and disconnected systems. Through the connectors described earlier, Amazon Q Business can bring together the information in these systems and provide a conversational user interface that allows employees to ask questions in natural language, such as, “What training is available?”
With the discoverability challenge solved, there is still an opportunity to further optimize the user experience. This is where personalization comes in. Consider the basic question, “What training is available?” from a user who works out of the San Francisco, CA, office. Based on this question, Amazon Q Business can find documents that describe the training classes available across all corporate locations, but lacks the knowledge of the user’s home office location to be more precise in its answer. Providing an answer based on the location, or even a blend of multiple locations, isn’t as accurate as if the answer were based on where the employee worked. The employee could be more explicit in their question by including their location, but the goal of AI assistants is to better understand the user’s intent and context to be able to provide the most accurate information possible for even the most basic questions. Knowing key information about the user allows Amazon Q Business to seamlessly personalize the retrieval of documents and therefore lead to a more accurate response. Let’s see how it works in more detail.
At the core of Amazon Q Business is a technique called Retrieval Augmented Generation (RAG). At a high level, RAG involves taking a user’s request and finding passages from a set of documents in a searchable index that are most similar to the request and then asking a large language model (LLM) to generate a response that provides an answer using the retrieved passages. Given the question, “What training is available?” and the number of locations for the company, the top document passages returned from the index and provided to the LLM may not even include the user’s location. Therefore, the more precise the query to the retrieval layer, the more accurate and relevant the ultimate response will be. For example, modifying the query to include details on the user’s location should result in document passages specific to the user being returned at or near the top of the list rather than buried further down the list.
Configure user attributes in your IdP
Let’s look at how you would configure your IdP to pass along the attributes of your users to your Amazon Q Business application. Regardless of the identity federation mechanism configured for your Amazon Q Business application, attributes for your users need to be maintained in your IdP’s directory. The following is a partial screenshot of some of the location-related fields available in the profile editor for the Okta IdP.
Besides the administrative UI for editing individual profiles, Okta also provides mechanisms for updating profiles in bulk or through APIs. These tools make it straightforward to keep your user profiles synchronized with source systems such as employee directories.
After your user profiles are updated in your IdP, the process for making user attributes available to your Amazon Q Business application varies based on the identity federation configuration.
Federation with IAM Identity Center
If you configure your Amazon Q Business application with IAM Identity Center (recommended) and you use an external IdP such as Okta or Entra ID to manage your workforce, you simply need to maintain user attributes in your IdP. Because IAM Identity Center supports the SCIM standard, you can set up user profiles and their attributes to be automatically synchronized with IAM Identity Center. After the users and attributes are synchronized to IAM Identity Center, they can be accessed by Amazon Q Business from either the web experience or through a custom client integration as described earlier.
A less common variation of using IAM Identity Center with Amazon Q Business that is suitable for basic testing is to use IAM Identity Center as the identity source (without an external IdP). In this case, you would add users and manage their attributes directly in IAM Identity Center through the AWS Management Console or the CreateUser and UpdateUser APIs.
Federation with IAM
If you configure your Amazon Q Business application to use IAM federation, user attributes are also maintained in your IdP. However, the attributes are passed to your Amazon Q Business application from your IdP using either a SAML 2.0 assertion or an OIDC claim depending on the provider type that you set up as your IAM IdP. Your IdP must be configured to pass the specific attributes that you intend to expose for personalization. How this configuration is done depends again on whether you’re using SAML 2.0 or OIDC. For this post, we describe how this is done in Okta. The process should be similar with other IdPs.
SAML 2.0 provider type
When you create a SAML 2.0 application in Okta for authenticating your users, you have the option to create attribute statements. The attribute statements are included in the SAML 2.0 assertion that is provided by Okta when a user authenticates. The first three attribute statements shown in the following table are required for SAML 2.0 authentication to work with Amazon Q Business. The others are examples of how you would pass optional attributes that can be used for personalization.
Name | Name format | Value |
https://aws.amazon.com/SAML/Attributes/PrincipalTag:Email | Unspecified | user.email |
https://aws.amazon.com/SAML/Attributes/Role | Unspecified | [WebExpRoleArn],[IdentityProviderArn] |
https://aws.amazon.com/SAML/Attributes/RoleSessionName | Unspecified | user.email |
https://aws.amazon.com/SAML/Attributes/PrincipalTag:countryCode | Unspecified | user.countryCode != null ? user.countryCode : “” |
https://aws.amazon.com/SAML/Attributes/PrincipalTag:city | Unspecified | user.city != null ? user.city : “” |
https://aws.amazon.com/SAML/Attributes/PrincipalTag:title | Unspecified | user.title != null ? user.title : “” |
https://aws.amazon.com/SAML/Attributes/PrincipalTag:department | Unspecified | user.department != null ? user.department : “” |
Where the attribute statement value uses the Okta Expression Language, Okta resolves the value expression with the actual value for the user. For example, user.email
resolves to the user’s email address, and user.city != null ? user.city : ""
resolves to the user’s city (as specified in their user profile) or an empty string if not specified. And because these values are passed in the SAML assertion, you can also include any custom attributes for your users that are specific to your business or domain that may be relevant to personalization.
For [WebExpRoleArn]
,[IdentityProviderArn]
, you must substitute [WebExpRoleArn]
for the web experience role ARN for your Amazon Q Business application and [IdentityProviderArn]
for the IAM IdP ARN that you created in IAM for this SAML provider.
OIDC provider type
When you create an OIDC application in Okta for authenticating your users, the location where you configure the user attributes to include in the OIDC claim is a bit different. For OIDC, you must add the user attributes you want to expose for personalization to the claim for the authorization server. AWS STS supports an access token or ID token type. In this post, we demonstrate the ID token type. For more details, see Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.
Complete the following steps:
- In Okta, choose Security, API in the navigation pane.
- Choose the authorization server (which may be default) and then Claims.
- If you don’t see a claim type of ID, choose Add Claim to create one.
- For Claim name, enter
https://aws.amazon.com/tags
. - For Include in token type, choose Access Token or ID Token (we use ID Token in this post).
- For Value type, choose Expression.
- For Value, enter a JSON document that uses the Okta Expression Language to resolve attributes for the user. The full expression is as follows:
- Choose Create.
Again, you are not limited to just these fields. You can also include custom fields that apply to your use case and documents in the expression.
Enable personalization in Amazon Q Business
After you have your preferred authentication mechanism configured in your IdP, IAM, and Amazon Q Business, you’re ready to see how it impacts responses in your Amazon Q Business application. Although personalization is enabled by default for Amazon Q Business applications, you can control whether personalization is enabled on the Update Global Controls settings page for your Amazon Q Business application. If necessary, select Enable response personalization and choose Save.
Amazon Q Business personalization in action
Now you’re ready to see how Amazon Q Business personalizes responses for each user. We continue with the same use case of asking Amazon Q Business “What training is available?” The documents added to the Amazon Q Business index include internal training schedules available to all employees as Word documents for two corporate offices: San Francisco and London. In addition, two users were created in the IdP, where one user is based in the San Francisco office and the other is based in the London office. The city and country fields were populated as well as each user’s title. The San Francisco employee is a software programmer and the London employee is the Director of Marketing.
When signed in to the application using an incognito (private) window as the San Francisco employee, the question “What training is available?” produces the following response.
The response includes content on the training classes being held at the San Francisco office. The citation in the Sources section also confirms that the “September Training Curriculum at San Francisco” document was used to generate the response.
We can close the incognito window, open a new incognito window, sign in as the London employee, and ask the same question: “What training is available?” This time, the response provides information on the training classes being held at the London office and the citation refers to the London curriculum document.
For one final test, we disable personalization for the Amazon Q Business application on the Update Global Controls settings page for the Amazon Q Business application, wait a few minutes for the change to take effect, and then ask the same question in a new conversation.
This time, Amazon Q Business includes information on classes being held at both offices, which is confirmed by the citations pulling in both documents. Although the question is still answered, the user must parse through the response to pick out the portions that are most relevant to them based on their location.
Use cases for Amazon Q Business personalization
Amazon Q Business can be very effective in supporting a wide variety of use cases. However, not all of these use cases can be enhanced with personalization. For example, asking Amazon Q Business to summarize a request for proposal (RFP) submission or compare credit card offers in a customer support use case are not likely to be improved based on attributes of the user. Fortunately, Amazon Q Business will automatically determine if a given user’s question would benefit from personalizing the retrieval query based on the attributes known for the user. When thinking about enabling and optimizing personalization for your use case, consider the availability of user attributes and the composition of data in your Amazon Q Business index.
Working backward from the personalization effect you want to implement, you first need to determine if the required user attributes for your use case exist in your IdP. This may require importing and synchronizing this data into your IdP from another system, such as an employee directory or payroll system. Then you should consider the documents and data in your Amazon Q Business index to determine if they are optimized for personalized retrieval. That is, determine whether the documents in your index have content that will be readily found by the retrieval step given the user attributes in your IdP. For example, the documents used for the training class example in this post have the city mentioned in the document title as well as the document body. Because Amazon Q Business boosts matches against the document title by default, we are taking advantage of built-in relevance tuning to further influence the documents that match the user’s city.
In this post, we focused on the user’s work location and information that was location-specific to add value through personalization. In other words, we used the user’s work location to transparently find what’s most relevant to them nearby. Another useful area of use cases to explore may use the user’s job title or job level and find content that is specific to their role. As you explore the possibilities, the intersection of user information and the composition of the data in the corpus of documents in your enterprise data stores are the best place to start.
Conclusion
In this post, we demonstrated how to use personalization to improve the relevancy and usefulness of the responses provided by an AI-powered assistant. Personalization is not going to dramatically improve every interaction with Amazon Q Business, but when it’s thoughtfully applied to use cases and data sources where it can deliver value, it can build trust with end-users by providing responses that are more relevant and meaningful.
What use cases do you have where attributes for your users and the information in your data sources can allow Amazon Q Business to deliver a more personalized user experience? Try out the solution for yourself, and leave your feedback and questions in the comments.
About the Authors
James Jory is a Principal Solutions Architect for Amazon Q Business. He has interests in generative AI, personalization, and recommender systems and has a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and motor sports.
Nihal Harish is a Software Development Engineer at AWS AI. He is passionate about generative AI and reinforcement learning. Outside of work, he enjoys playing tennis, tending to his garden, and exploring new culinary recipes.
Pranesh Anubhav is a Software Development Manager for Amazon Personalize. He is passionate about designing machine learning systems to serve customers at scale. Outside of his work, he loves playing soccer and is an avid follower of Real Madrid.
Gaurush Hiranandani is an Applied Scientist at AWS AI, where his research spans the fields of statistical machine learning, with a particular focus on preference elicitation and recommender systems. He is deeply passionate about advancing the personalization of generative AI services at AWS AI, aiming to enhance user experiences through tailored, data-driven insights.
Harsh Singh is a Principal Product Manager Technical at AWS AI. Harsh enjoys building products that bring AI to software developers and everyday users to improve their productivity.
Best practices for building robust generative AI applications with Amazon Bedrock Agents – Part 1
Building intelligent agents that can accurately understand and respond to user queries is a complex undertaking that requires careful planning and execution across multiple stages. Whether you are developing a customer service chatbot or a virtual assistant, there are numerous considerations to keep in mind, from defining the agent’s scope and capabilities to architecting a robust and scalable infrastructure.
This two-part series explores best practices for building generative AI applications using Amazon Bedrock Agents. Agents helps you accelerate generative AI application development by orchestrating multistep tasks. Agents use the reasoning capability of foundation models (FMs) to break down user-requested tasks into multiple steps. In addition, they use the developer-provided instruction to create an orchestration plan and then carry out the plan by invoking company APIs and accessing knowledge bases using Retrieval Augmented Generation (RAG) to provide an answer to the user’s request.
In Part 1, we focus on creating accurate and reliable agents. Part 2 discusses architectural considerations and development lifecycle practices.
Laying the groundwork: Collecting ground truth data
The foundation of any successful agent is high-quality ground truth data—the accurate, real-world observations used as reference for benchmarks and evaluating the performance of a model, algorithm, or system. For an agent application, before you start building, it’s crucial to collect a set of ground truth interactions or conversations that will drive the entire agent lifecycle. This data provides a benchmark for expected agent behavior, including the interaction with existing APIs, knowledge bases, and guardrails connected with the agent. This enables accurate testing and evaluation and helps identify edge cases and potential pitfalls.
To build a robust ground truth dataset, focus on gathering diverse examples that cover various user intents and scenarios. Your dataset should include the input and expected output for both simple and complex interactions. It’s important to regularly update and expand your dataset as you learn more about user behavior. Ground your data in real customer interactions that reflect actual use cases but be sure to de-identify and anonymize the data.
The following table shows a subset of ground truth collected for a banking assistant agent that knows the check_account_balance
and book_appointments
APIs, uses the knowledge base bankingFAQ
, and is connected with the guardrail BlockInvestmentAdvice
. This example presents only a few of the queries used to test the agent, including the session information used from other systems and passed to the agent using sessionAttributes
. It’s also a best practice to collect any extra information that would be shared with the agent in a production system. That includes information shared with the action group execution function only (session attributes), information used to augment the prompt (prompt session attributes), and information used by the knowledge base (knowledge base configuration).
User Query | Session Attributes | Session prompt Attributes | Expected Response | API, Knowledge Bases and Guardrails invoked |
What is my account balance? | None | None | Could you please provide the number of the account that you would like to check the balance for? | None |
What is the balance for the account 1234? | user id 111 | None | Your balance is X | Action Group: check_account_balance(111, 1234) |
How can I open an account? | None | None | To open a new banking account with ANY_BANK Corporation, you need to gather the following information and documentation: 1. Valid government-issued ID (driver’s license, passport, etc.) 2. Proof of address (utility bill, lease agreement, etc.) 3. Social Security number or Individual Taxpayer Identification Number (ITIN) 4. Initial deposit amount (if required) You can then fill out an application online or in person at an ANY_BANK Corporation branch. Once approved, you can make your initial deposit and start using your new account. |
Knowledge Base: bankingFAQ |
Should I invest in bitcoins? | None | None | Sorry, your query violates our usage policies. We do not provide investment advices. To discuss the best investment advice for your current situation, please contact us on (XXX) XXX-XXXX and we will be happy to support you. | Guardrail: BlockInvestmentAdvice |
Could you make an appointment for tomorrow at 2pm? | user id 111 | Today: 09/03/2024 | Certainly! We’ve booked an appointment for you tomorrow, September 4th, 2024, at 2pm. Your appointment ID is XXXX. | Action Group: book_appointment(111, 09/04/2024) |
Defining scope and sample interactions
Now that you have your ground truth data, the next step is to clearly define the scope of each agent, including tasks it should and shouldn’t handle, and outline clear expected sample user interactions. This process involves identifying primary functions and capabilities, limitations and out-of-scope tasks, expected input formats and types, and desired output formats and styles.
For instance, when considering an HR assistant agent, a possible scope would be the following:
Primary functions:
– Provide information on company HR policies
– Assist with vacation requests and time-off management
– Answer basic payroll questions
Out of scope:
– Handling sensitive employee data
– Making hiring or firing decisions
– Providing legal advice
Expected inputs:
– Natural language queries about HR policies
– Requests for time-off or vacation information
– Basic payroll inquires
Desired outputs:
– Clear and concise responses to policy questions
– Step-by-step guidance for vacation requests
– Completion of tasks for book a new vacation, retrieve, edit and delete an existing request
– Referrals to appropriate HR personnel for complex issues
– Creation of an HR ticket for questions where the agent is not able to respond
By clearly defining your agent’s scope, you set clear boundaries and expectations, which will guide your development process and help create a focused, reliable AI agent.
Architecting your solution: Building small and focused agents that interact with each other
When it comes to agent architecture, the principle “divide and conquer” holds true. In our experience, it has proven to be more effective to build small, focused agents that interact with each other rather than a single large monolithic agent. This approach offers improved modularity and maintainability, straightforward testing and debugging, flexibility to use different FMs for specific tasks, and enhanced scalability and extensibility.
For example, consider an HR assistant that helps internal employees in an organization and a payroll team assistant that supports the employees of the payroll team. Both agents have common functionality such as answering payroll policy questions and scheduling meetings between employees. Although the functionalities are similar, they differ in scope and permissions. For instance, the HR assistant can only reply to questions based on the internally available knowledge, whereas the payroll agents can also handle confidential information only available for the payroll employees. Additionally, the HR agents can schedule meetings between employees and their assigned HR representative, whereas the payroll agent schedules meetings between the employees on their team. In a single-agent approach, those functionalities are handled in the agent itself, resulting in the duplication of the action groups available to each agent, as shown in the following figure.
In this scenario, when something changes in the meetings action group, the change needs to be propagated to the different agents. When applying the multi-agent collaboration best practice, the HR and payroll agents orchestrate smaller, task-focused agents that are focused on their own scope and have their own instructions. Meetings are now handled by an agent itself that is reused between the two agents, as shown in the following figure.
When a new functionality is added to the meeting assistant agent, the HR agent and payroll agent only need to be updated to handle those functionalities. This approach can also be automated in your applications to increase the scalability of your agentic solutions. The supervisor agents (HR and payroll agents) can set the tone of your application as well as define how each functionality (knowledge base or sub-agent) of the agent should be used. That includes enforcing knowledge base filters and parameter constraints as part of the agentic application.
Crafting the user experience: Planning agent tone and greetings
The personality of your agent sets the tone for the entire user interaction. Carefully planning the tone and greetings of your agent is crucial for creating a consistent and engaging user experience. Consider factors such as brand voice and personality, target audience preferences, formality level, and cultural sensitivity.
For instance, a formal HR assistant might be instructed to address users formally, using titles and last names, while maintaining a professional and courteous tone throughout the conversation. In contrast, a friendly IT support agent could use a casual, upbeat tone, addressing users by their first names and even incorporating appropriate emojis and tech-related jokes to keep the conversation light and engaging.
The following is an example prompt for a formal HR assistant:
The following is an example prompt for a friendly IT support agent:
Make sure your agent’s tone aligns with your brand identity and remains constant across different interactions. When collaborating between multiple agents, you should set the tone across the application and enforce it over the different sub-agents.
Maintaining clarity: Providing unambiguous instructions and definitions
Clear communication is the cornerstone of effective AI agents. When defining instructions, functions, and knowledge base interactions, strive for unambiguous language that leaves no room for misinterpretation. Use simple, direct language and provide specific examples for complex concepts. Define clear boundaries between similar functions and implement confirmation mechanisms for critical actions. Consider the following example of clear vs. ambiguous instructions.
The following is an example ambiguous prompt
The following is a clearer prompt:
By providing clear instructions, you reduce the chances of errors and make sure your agent behaves predictably and reliably.
The same advice is valid when defining the functions of your action groups. Avoid ambiguous function names and definitions and set clear descriptions for its parameters. The following figure shows how to change the name, description, and parameters of two functions in an action group to get the user details and information based on what is actually returned by the functions and the expected value formatting for the user ID.
Finally, the knowledge base instructions should clearily state what is available in the knowledge base and when to use it to answer user queries.
The following is an ambiguous prompt:
The following is a clearer prompt:
Using organizational knowledge: Integrating knowledge bases
To make sure you provide your agents with enterprise knowledge, integrate them with your organization’s existing knowledge bases. This allows your agents to use vast amounts of information and provide more accurate, context-aware responses. By accessing up-to-date organizational data, your agents can improve response accuracy and relevance, cite authoritative sources, and reduce the need for frequent model updates.
Complete the following steps when integrating a knowledge base with Amazon Bedrock:
- Index your documents into a vector database using Amazon Bedrock Knowledge Bases.
- Configure your agent to access the knowledge base during interactions.
- Implement citation mechanisms to reference source documents in responses.
Regularly update your knowledge base to make sure your agent has consistent access to the most current information. This can achieved by implementing event-based synchronization of your knowledge base data sources using the StartIngestionJob API and an Amazon EventBridge rule that is invoked periodically or based on updates of files in the knowledge base Amazon Simple Storage Service (Amazon S3) bucket.
Integrating Amazon Bedrock Knowledge Bases with your agent will allow you to add semantic search capabilities to your application. By using the knowledgeBaseConfigurations
field in your agent’s SessionState during the InvokeAgent request, you can control how your agent interacts with your knowledge base by setting the desired number of results and any necessary filters.
Defining success: Establishing evaluation criteria
To measure the effectiveness of your AI agent, it’s essential to define specific evaluation criteria. These metrics will help you assess performance, identify areas for improvement, and track progress over time.
Consider the following key evaluation metrics:
- Response accuracy – This metric measures how your responses compare to your ground truth data. It provides information such as if the answers are correct and if the agent shows good performance and high quality.
- Task completion rate – This measures the success rate of the agent. The core idea of this metric is to measure the percentage or proportion of the conversations or user interactions where the agent was able to successfully complete the requested tasks and fulfill the user’s intent.
- Latency or response time – This metric measures how long a task took to run and the response time. Essentially, it measures how quickly the agent can provide a response or output after receiving an input or query. You can also set intermediate metrics that measure how long each step of the agent trace takes to run to identify the steps that need to be optimized in your system.
- Conversation efficiency – These measures how efficiently the conversation was able to collect the required information.
- Engagement – These measures how well the agent can understand the user’s intent, provide relevant and natural responses, and maintain an engagement with back-and-forth conversational flow.
- Conversation coherence – This metric measures the logical progression and continuity between the responses. It checks if the context and relevance are kept during the session and if the appropriate pronouns and references are used.
Furthermore, you should define your use case-specific evaluation metrics that determine how well the agent is fulfilling the tasks for your use case. For instance, for the HR use case, a possible custom metric could be the number of tickets created, because those are created when the agent can’t answer the question by itself.
Implementing a robust evaluation process involves creating a comprehensive test dataset based on your ground truth data, developing automated evaluation scripts to measure quantitative metrics, implementing A/B testing to compare different agent versions or configurations, and establishing a regular cadence for human evaluation of qualitative factors. Evaluation is an ongoing process, so you should continuously refine your criteria and measurement methods as you learn more about your agent’s performance and user needs.
Using human evaluation
Although automated metrics are valuable, human evaluation plays a crucial role in assessing and improving your AI agent’s performance. Human evaluators can provide nuanced feedback on aspects that are difficult to quantify automatically, such as assessing natural language understanding and generation, evaluating the appropriateness of responses in context, identifying potential biases or ethical concerns, and providing insights into user experience and satisfaction.
To effectively use human evaluation, consider the following best practices:
- Create a diverse panel of evaluators representing different perspectives
- Develop clear evaluation guidelines and rubrics
- Use a mix of expert evaluators (such as subject matter experts) and representative end-users
- Collect quantitative ratings and qualitative feedback
- Regularly analyze evaluation results to identify trends and areas for improvement
Continuous improvement: Testing, iterating, and refining
Building an effective AI agent is an iterative process. Now that you have a working prototype, it’s crucial to test extensively, gather feedback, and continuously refine your agent’s performance. This process should include comprehensive testing using your ground truth dataset; real-world user testing with a beta group; analysis of agent logs and conversation traces; regular updates to instructions, function definitions, and prompts; and performance comparison across different FMs.
To achieve thorough testing, consider using AI to generate diverse test cases. The following is an example prompt for generating HR assistant test scenarios:
One of the best tools of the testing phase is the agent trace. The trace provides you with the prompts used by the agent in each step taken during the agent’s orchestration. It gives insights on the agent’s chain of thought and reasoning process. You can enable the trace in your InvokeAgent call during the test process and disable it after your agent has been validated.
The next step after collecting a ground truth dataset is to evaluate the agent’s behavior. You first need to define evaluation criteria for assessing the agent’s behavior. For the HR assistant example, you can create a test dataset that compares the results provided by your agent with the results obtained by directly querying the vacations database. You can then manually evaluate the agent behavior using human evaluation, or you can automate the evaluation using agent evaluation frameworks such as Agent Evaluation. If model invocation logging is enabled, Amazon Bedrock Agents will also give you Amazon CloudWatch logs. You can use those logs to validate your agent’s behavior, debug unexpected outputs, and adjust the agent accordingly.
The last step of the agent testing phase is to plan for A/B testing groups during the deployment stage. You should define different aspects of agent behavior, such as formal or informal HR assistant tone, that can be tested with a smaller set of your user group. You can then make different agent versions available for each group during initial deployments and evaluate the agent behavior for each group. Amazon Bedrock Agents has built-in versioning capabilities to help you with this key part of testing.
Conclusions
Following these best practices and continuously refining your approach can significantly contribute to your success in developing powerful, accurate, and user-oriented AI agents using Amazon Bedrock. In Part 2 of this series, we explore architectural considerations, security best practices, and strategies for scaling your AI agents in production environments.
By following these best practices, you can build secure, accurate, scalable, and responsible generative AI applications using Amazon Bedrock. For examples to get started, check out the Amazon Bedrock Agents GitHub repository.
To learn more about Amazon Bedrock Agents, you can get started with the Amazon Bedrock Workshop and the standalone Amazon Bedrock Agents Workshop, which provides a deeper dive. Additionally, check out the service introduction video from AWS re:Invent 2023.
About the Authors
Maira Ladeira Tanke is a Senior Generative AI Data Scientist at AWS. With a background in machine learning, she has over 10 years of experience architecting and building AI applications with customers across industries. As a technical lead, she helps customers accelerate their achievement of business value through generative AI solutions on Amazon Bedrock. In her free time, Maira enjoys traveling, playing with her cat, and spending time with her family someplace warm.
Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, the flagship generative AI offering from AWS for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.
Navneet Sabbineni is a Software Development Manager at AWS Bedrock. With over 9 years of industry experience as a software developer and manager, he has worked on building and maintaining scalable distributed services for AWS, including generative AI services like Amazon Bedrock Agents and conversational AI services like Amazon Lex. Outside of work, he enjoys traveling and exploring the Pacific Northwest with his family and friends.
Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.