Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Enhanced diagnostics flow with LLM and Amazon Bedrock agent integration

Noodoe is a global leader in EV charging innovation, offering advanced solutions that empower operators to optimize their charging station operations and provide exceptional user experiences. Their universal charging stations are compatible with all EV brands and feature intuitive payment options, including credit cards and Apple Pay. Powered by the Noodoe EV OS cloud management system, the company delivers round-the-clock automated monitoring, diagnostics, and maintenance, achieving a market-leading uptime of 99.83%. With operations in over 15 countries and a robust commitment to sustainability, Noodoe is transforming the EV charging industry through cutting-edge technology and a user-first approach.

Despite its technological strengths, Noodoe has encountered key challenges in helping station operators optimize performance and select the most cost-effective electricity pricing strategies across diverse markets. Traditional systems lack the capability to efficiently process vast amounts of real-time and historical data or provide personalized, station-level recommendations. This limits operators’ ability to make timely, informed decisions—resulting in higher electricity costs, underutilized assets, and a subpar customer experience. These inefficiencies not only reduce profitability but also hinder the ability to scale in a competitive and fast-evolving EV charging landscape.

To solve this, Noodoe has integrated large language models (LLMs) through Amazon Bedrock and Amazon Bedrock Agents to deliver intelligent automation, real-time data access, and multilingual support. These AI-powered tools analyze usage patterns, station diagnostics, and external variables like weather or grid conditions to generate highly tailored pricing recommendations. By using the structured orchestration and prompt-based reasoning of Amazon Bedrock, Noodoe equips operators with actionable insights that improve margins, enhance station utilization, and allow them to offer more competitive rates to users—ultimately boosting customer satisfaction. This service is delivered through a subscription model, creating a new, scalable revenue stream for Noodoe while reinforcing its leadership and innovation in the EV infrastructure space.

In this post, we explore how Noodoe uses AI and Amazon Bedrock to optimize EV charging operations. By integrating LLMs, Noodoe enhances station diagnostics, enables dynamic pricing, and delivers multilingual support. These innovations reduce downtime, maximize efficiency, and improve sustainability. Read on to discover how AI is transforming EV charging management.

Solution overview

The Noodoe AI-enhanced diagnostics flow is built on a multi-step process that combines data collection, AI-powered analytics, and seamless translation for global accessibility, as illustrated in the following figure.

Solution overview

The physical charging station network currently operates over 1,000 sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. As illustrated in the following image, it uses the EV Cloud and LLMs to generate relevant recommendations following backend processing.

Photo of an EV charging station

The following screenshot shows an example of the results in the UI.

The following screenshot shows an example of the results in the UI. Overview of Noodoe AI-enhanced diagnostics

The following diagram illustrates the solution data flow.

Overview of Noodoe AI-enhanced diagnostics

To meet the feature requirements, the system operation process includes the following steps:

  1. Charging data is processed through the EV service before entering the database.
  2. The charging history data and pricing data are stored in the EV database.
  3. Amazon EventBridge Scheduler periodically triggers the EV service to perform analysis.
  4. The EV service calls the AI service to analyze historical data and provide pricing recommendations.
  5. The AI service collects the organized historical data to prepare the prompt template.
  6. This information, combined with appropriate prompts, is used in conjunction with Amazon Bedrock Agents as an AI-pricing agent to extract relevant information. The AI-pricing agent analyzes this combined data to identify daily peak and off-peak periods and provide recommendations for user pricing plans.
  7. Optionally, if translation is needed for non-English users, these results from the AI-pricing agent are further processed through another Amazon Bedrock agent for translation.
  8. Optionally, the translation agent uses Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to get the result in the corresponding language.
  9. Finally, the AI service collects the results in the user’s language for formatting and other processing, then inserts them into a template to create a comprehensive report that is pushed to the user’s end.

In the following section, we dive deep into these steps and the AWS services used.

Architecture of Noodoe AI-enhanced diagnostics

Noodoe faced key challenges in building a globally scalable, reliable, and cost-efficient architecture. They needed a solution that could support rapid expansion, handle high data volumes, and deliver consistent performance across AWS Regions. Addressing these requirements required careful architectural planning to provide flexibility and resilience.

Architecture of Noodoe AI-enhanced diagnosticsThe following diagram illustrates the solution architecture Noodoe built to overcome these challenges to support global growth.

The EV charging optimization platform structures the data flow across multiple AWS services, providing efficient data ingestion, processing, and AI-driven decision-making. Amazon Elastic Kubernetes Service (Amazon EKS) retrieves data from Amazon DocumentDB, processes it, and invokes Amazon Bedrock Agents for reasoning and analysis. This structured data pipeline enables optimized pricing strategies and multilingual customer interactions. By using containerized applications, event-driven workflows, and AI capabilities, the system provides scalable and flexible insights to EV station operators.

Data ingestion and processing

EV charging stations send real-time charging data to AWS IoT Core, which acts as the initial entry point for data processing. The data is then transmitted to Amazon Managed Streaming for Apache Kafka (Amazon MSK) to facilitate high-throughput, reliable streaming. From Amazon MSK, data flows into Amazon EKS, where the EV service processes it before storing the charging history and trend records in DocumentDB. This structured storage provides efficient retrieval for analysis and prediction.

AI-powered pricing analysis

To optimize pricing strategies, Amazon EventBridge triggers a pricing prediction function at regular intervals. This function retrieves historical charging data from DocumentDB and sends it, along with predefined prompts, to the Amazon Bedrock AI-pricing agent. The AI agent, powered by Anthropic’s Claude on Amazon Bedrock, evaluates station usage trends, peak and off-peak periods, and pricing inefficiencies to generate optimal pricing recommendations. Although the pricing agent doesn’t access an Amazon Bedrock knowledge base or trigger action groups, it uses preprocessing and post processing features to refine predictions and improve decision-making.

Multilingual support and report generation

If translation is required, the pricing analysis results are forwarded to the Amazon Bedrock translate agent, which converts the insights into the operator’s preferred language. The translated and structured data is then formatted into a predefined report template and stored in a designated database for later retrieval. This provides seamless access to actionable insights across diverse markets.

UI, monitoring, and performance optimization

Operators access the system through a web-based UI, with Amazon Route 53 and Amazon CloudFront providing fast and efficient content delivery. An Application Load Balancer distributes incoming requests across multiple EKS instances, providing high availability. To optimize performance, Amazon ElastiCache accelerates data retrieval while reducing database load. For system monitoring and observability, Amazon CloudWatch provides additional monitoring and observability. The administrator of Noodoe uses Amazon Managed Service for Prometheus and Amazon Managed Grafana for system monitoring and visualization.

This architecture empowers Noodoe with an AI-driven, scalable, and intelligent EV charging management solution, enhancing station utilization, revenue optimization, and customer experience worldwide.

Summary

The Noodoe AI-enhanced diagnostics flow transforms EV charging operations by integrating Amazon Bedrock Agents, blending rule-based automation, real-time user input, and LLM-powered insights for smarter decision-making. Backed by a comprehensive knowledge base and streamlined APIs, the solution empowers operators to automate workflows, optimize pricing, and boost station performance at scale. Ongoing expansion of the knowledge base, workflow refinement, and real-world testing further improve efficiency and reliability. This approach has delivered a 15% increase in revenue and reduced implementation time by 10%. Continuous feedback and clear documentation equip users to effectively use AI-driven diagnostics for more intelligent charging management.

Roman Kleinerman, Vice President of Products at Noodoe, shares: “We’ve seen revenue increases of 10–25% depending on the location and number of stations, as customers use our Al solution to optimize pricing strategies.”

Noodoe is dedicated to delivering smarter, more intelligent EV charging services that benefit both end-users and operators. Currently, Noodoe operates over 1,000 charging sites across more than 20 countries, with plans to expand by more than 50 additional sites by the end of 2025. Looking ahead, the system is being enhanced to support near real-time dynamic pricing optimization by incorporating factors such as demand, grid conditions, time of day, and weather. Amazon Bedrock Agents help enable these intelligent capabilities, powering dynamic pricing, load balancing, and grid-aware routing to optimize energy distribution and guide users to the most efficient stations. Future enhancements will offer personalized charging recommendations and incentives based on user preferences, maximizing value for both customers and operators. Start building intelligent, AI-powered EV charging solutions with Amazon Bedrock.


About the Authors

Ray Wang is a Senior Solutions Architect at AWS. With 12 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 14 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.

Howard Su is a Solutions Architect at AWS. With many years of experience in software development and system operations, Howard have served in various roles including RD, QA, and SRE, and Howard have been responsible for the architectural design of numerous large-scale systems, as well as participating in several cloud migrations. After accumulating years of development and operations experience, Howard is dedicated to promoting cloud-native service technologies and becoming an advocate for DevOps.

Tony Trinh is a Senior AIML Specialist Architect at AWS. With 13+ years of experience in the IT industry, Tony specializes in architecting scalable, compliance-driven AI and ML solutions—particularly in generative AI, MLOps, and cloud-native data platforms. As a perpetual learner, he’s doing research in Visual Language Model, Responsible AI & Computer Vision and authoring a book in ML engineering. In his spare time, Tony enjoys outdoor activity, experimenting with home improvement, and exploring Melbourne’s vibrant coffee scene.

Read More

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

Build GraphRAG applications using Amazon Bedrock Knowledge Bases

In these days, it is more common to companies adopting AI-first strategy to stay competitive and more efficient. As generative AI adoption grows, the technology’s ability to solve problems is also improving (an example is the use case to generate comprehensive market report). One way to simplify the growing complexity of problems to be solved is through graphs, which excel at modeling relationships and extracting meaningful insights from interconnected data and entities.

In this post, we explore how to use Graph-based Retrieval-Augmented Generation (GraphRAG) in Amazon Bedrock Knowledge Bases to build intelligent applications. Unlike traditional vector search, which retrieves documents based on similarity scores, knowledge graphs encode relationships between entities, allowing large language models (LLMs) to retrieve information with context-aware reasoning. This means that instead of only finding the most relevant document, the system can infer connections between entities and concepts, improving response accuracy and reducing hallucinations. To inspect the graph built, Graph Explorer is a great tool.

Introduction to GraphRAG

Traditional Retrieval-Augmented Generation (RAG) approaches improve generative AI by fetching relevant documents from a knowledge source, but they often struggle with context fragmentation, when relevant information is spread across multiple documents or sources.

This is where GraphRAG comes in. GraphRAG was created to enhance knowledge retrieval and reasoning by leveraging knowledge graphs, which structure information as entities and their relationships. Unlike traditional RAG methods that rely solely on vector search or keyword matching, GraphRAG enables multi-hop reasoning (logical connections between different pieces of context), better entity linking, and contextual retrieval. This makes it particularly valuable for complex document interpretation, such as legal contracts, research papers, compliance guidelines, and technical documentation.

Amazon Bedrock Knowledge Bases GraphRAG

Amazon Bedrock Knowledge Bases is a managed service for storing, retrieving, and structuring enterprise knowledge. It seamlessly integrates with the foundation models available through Amazon Bedrock, enabling AI applications to generate more informed and trustworthy responses. Amazon Bedrock Knowledge Bases now supports GraphRAG, an advanced feature that enhances traditional RAG by integrating graph-based retrieval. This allows LLMs to understand relationships between entities, facts, and concepts, making responses more contextually relevant and explainable.

How Amazon Bedrock Knowledge Bases GraphRAG works

Graphs are generated by creating a structured representation of data as nodes (entities) and edges (relationships) between those nodes. The process typically involves identifying key entities within the data, determining how these entities relate to each other, and then modeling these relationships as connections in the graph. After the traditional RAG process, Amazon Bedrock Knowledge Bases GraphRAG performs additional steps to improve the quality of the generated response:

  1. It identifies and retrieves related graph nodes or chunk identifiers that are linked to the initially retrieved document chunks.
  2. The system then expands on this information by traversing the graph structure, retrieving additional details about these related chunks from the vector store.
  3. By using this enriched context, which includes relevant entities and their key connections, GraphRAG can generate more comprehensive responses.

How graphs are constructed

Imagine extracting information from unstructured data such as PDF files. In Amazon Bedrock Knowledge Bases, graphs are constructed through a process that extends traditional PDF ingestion. The system creates three types of nodes: chunk, document, and entity. The ingestion pipeline begins by splitting documents from an Amazon Simple Storage Service (Amazon S3) folder into chunks using customizable methods (you can choose between basic fixed-size chunking to more complex LLM-based chunking mechanisms). Each chunk is then embedded, and an ExtractChunkEntity step uses an LLM to identify key entities within the chunk. This information, along with the chunk’s embedding, text, and document ID, is sent to Amazon Neptune Analytics for storage. The insertion process creates interconnected nodes and edges, linking chunks to their source documents and extracted entities using the bulk load API in Amazon Neptune. The following figure illustrates this process.

Document processing workflow showing chunking and embedding steps leading to Neptune Analytics database

Use case

Consider a company that needs to analyze a large range of documents, and needs to correlate entities that are spread across those documents to answer some questions (for example, Which companies has Amazon invested in or acquired in recent years?). Extracting meaningful insights from this unstructured data and connecting it with other internal and external information poses a significant challenge. To address this, the company decides to build a GraphRAG application using Amazon Bedrock Knowledge Bases, usign the graph databases to represent complex relationships within the data.

One business requirement for the company is to generate a comprehensive market report that provides a detailed analysis of how internal and external information are correlated with industry trends, the company’s actions, and performance metrics. By using Amazon Bedrock Knowledge Bases, the company can create a knowledge graph that represents the intricate connections between press releases, products, companies, people, financial data, external documents and industry events. The Graph Explorer tool becomes invaluable in this process, helping data scientists and analysts to visualize those connections, export relevant subgraphs, and seamlessly integrate them with the LLMs in Amazon Bedrock. After the graph is well structured, anyone in the company can ask questions in natural language using Amazon Bedrock LLMs and generate deeper insights from a knowledge base with correlated information across multiple documents and entities.

Solution overview

In this GraphRAG application using Amazon Bedrock Knowledge Bases, we’ve designed a streamlined process to transform raw documents into a rich, interconnected graph of knowledge. Here’s how it works:

  1. Document ingestion: Users can upload documents manually to Amazon S3 or set up automatic ingestion pipelines.
  2. Chunk, entity extraction, and embeddings generation: In the knowledge base, documents are first split into chunks using fixed size chunking or customizable methods, then embeddings are computed for each chunk. Finally, an LLM is prompted to extract key entities from each chunk, creating a GraphDocument that includes the entity list, chunk embedding, chunked text, and document ID.
  3. Graph construction: The embeddings, along with the extracted entities and their relationships, are used to construct a knowledge graph. The constructed graph data, including nodes (entities) and edges (relationships), is automatically inserted into Amazon Neptune.
  4. Data exploration: With the graph database populated, users can quickly explore the data using Graph Explorer. This intuitive interface allows for visual navigation of the knowledge graph, helping users understand relationships and connections within the data.
  5. LLM-powered application: Finally, users can leverage LLMs through Amazon Bedrock to query the graph and retrieve correlated information across documents. This enables powerful, context-aware responses that draw insights from the entire corpus of ingested documents.

The following figure illustrates this solution.

AWS architecture showing data flow from S3 through Bedrock to Neptune with user query interaction

Prerequisites

The example solution in this post uses datasets from the following websites:

Also, you need to:

  • Create an S3 bucket to store the files on AWS. In this example, we named this bucket: blog-graphrag-s3.
  • Download and upload the PDF and XLS files from the websites into the S3 bucket.

Building the Graph RAG Application

  1. Open the AWS Management Console for Amazon Bedrock.
  2. In the navigation pane, under Knowledge Bases, choose Create.
  3. Select Knowledge Base with vector store, and choose Create.
    Amazon Bedrock Knowledge Bases interface showing creation and testing options
  4. Enter a name for Knowledge Base name (for example: knowledge-base-graphrag-demo) and optional description.
  5. Select Create and use a new service role.
  6. Select Data source as Amazon S3.
  7. Leave everything else as default and choose Next to continue.
  8. Enter a Data source name (for example: knowledge-base-graphrag-data-source).
  9. Select an S3 bucket by choosing Browse S3. (If you don’t have an S3 bucket in your account, create one. Make sure to upload all the necessary files.)
  10. After the S3 bucket is created and files are uploaded, choose blog-graphrag-s3 bucket.
  11. Leave everything else as default and choose Next.
  12. Choose Select model and then select an embeddings model (in this example, we chose the Titan Text Embeddings V2 model).
  13. In the Vector database section, under Vector store creation method select Quick create a new vector store, for the Vector store select Amazon Neptune Analytics (GraphRAG),and choose Next to continue.
  14. Review all the details.
  15. Choose Create Knowledge Base after reviewing all the details.
  16. Creating a knowledge base on Amazon Bedrock might take several minutes to complete depending on the size of the data present in the data source. You should see the status of the knowledge base as Available after it is created successfully.

provide kb details and source configuration

Update and sync the graph with your data

  1. Select the Data source name (in this example, knowledge-base-graphrag-data-source ) to view the synchronization history.
  2. Choose Sync to update the data source.

Amazon Bedrock Knowledge Bases interface showing one active vector store implementation

Visualize the graph using Graph Explorer

Let’s look at the graph created by the knowledge base by navigating to the Amazon Neptune console. Make sure that you’re in the same AWS Region where you created the knowledge base.

  1. Open the Amazon Neptune console.
  2. In the navigation pane, choose Analytics and then Graphs.
  3. You should see the graph created by the knowledge base.

Neptune database graphs interface showing single active graph instance

To view the graph in Graph Explorer, you need to create a notebook by going to the Notebooks section.

You can create the notebook instance manually or by using an AWS CloudFormation template. In this post, we will show you how to do it using the Amazon Neptune console (manual).

To create a notebook instance:

  1. Choose Notebooks.
  2. Choose Create notebook.
    • Select the Analytics as the Neptune Service
    • Associate the notebook with the graph you just created (in this case: bedrock-knowledge-base-imwhqu).
    • Select the notebook instance type.
    • Enter a name for the notebook instance in the Notebook name
    • Create an AWS Identity and Access Management (IAM) role and use the Neptune default configuration.
    • Select VPC, Subnet, and Security group.
  3. Leave Internet access as default and choose Create notebook.

Neptune Notebooks interface showing two active Jupyter notebook instances

Notebook instance creation might take a few minutes. After the Notebook is created, you should see the status as Ready.

To see the Graph Explorer:

  • Go to Actions and choose Open Graph Explorer.

Neptune notebook details page showing aws-neptune-graphraq configuration

By default, public connectivity is disabled for the graph database. To connect to the graph, you must either have a private graph endpoint or enable public connectivity. For this post, you will enable public connectivity for this graph.

To set up a public connection to view the graph (optional):

  1. Go back to the graph you created earlier (under Analytics, Graphs).
  2. Select your graph by choosing the round button to the left of the Graph Identifier.
  3. Choose Modify.
  4. Select the check box Enable public connectivity in the Network
  5. Choose Next.
  6. Review changes and choose Submit.

page showing how to set up public connectivity

To open the Graph Explorer:

  1. Go back to Notebooks.
  2. After the the Notebook Instance is created, click on in the instance name (in this case: aws-neptune-analytics-neptune-analytics-demo-notebook).
  3. Then, choose Actions and then choose Open Graph Explore

image showing how to open graph explorer

  1. You should now see Graph Explorer. To see the graph, add a node to the canvas, then explore and navigate into the graph.

image showing graph explorer

Playground: Working with LLMs to extract insights from the knowledge base using GraphRAG

You’re ready to test the knowledge base.

  1. Choose the knowledge base, select a model, and choose Apply.
  2. Choose Run after adding the prompt. In the example shown in the following screenshot, we asked How is AWS Increasing energy efficiency?).

example prompt

  1. Choose Show details to see the Source chunk.
  2. Choose Metadata associated with this chunk to view the chunk ID, data source ID, and source URI.

image showing chunk id, resource id and resource uri

  1. In the next example, we asked a more complex question: Which companies has AMAZON invested in or acquired in recent years?

example with a complex prompt

Another way to improve the relevance of query responses is to use a reranker model. Using the reranker model in GraphRAG involves providing a query and a list of documents to be reordered based on relevance. The reranker calculates relevance scores for each document in relation to the query, improving the accuracy and pertinence of retrieved results for subsequent use in generating responses or prompts. In the Amazon Bedrock Playgrounds, you can see the results generated by the reranking model in two ways: the data ranked by the reranking solitary (the following figure), or a combination of the reranking model and the LLM to generate new insights.

rerank model example

To use the reranker model:

  1. Check the availability of the reranker model
  2. Go to AWS Management Console for Amazon Bedrock.
  3. From the navigation pane, under Builder tools, choose Knowledge Bases
  4. Choose the same knowledge base we created in the steps before knowledge-base-graphrag-demo.
  5. Click on Test Knowledge Base.
  6. Choose Configurations, expand the Reranking section, choose Select model, and select a reranker model (in this post, we choose Cohere Rerank 3.5).

Clean up

To clean up your resources, complete the following tasks:

  1. Delete the Neptune notebooks: aws-neptune-graphrag.
  2. Delete the Amazon Bedrock Knowledge Bases: knowledge-base-graphrag-demo.
  3. Delete content from the Amazon S3 bucket blog-graphrag-s3.

Conclusion

Using Graph Explorer in combination with Amazon Neptune and Amazon Bedrock LLMs provides a solution for building sophisticated GraphRAG applications. Graph Explorer offers intuitive visualization and exploration of complex relationships within data, making it straightforward to understand and analyze company connections and investments. You can use Amazon Neptune graph database capabilities to set up efficient querying of interconnected data, allowing for rapid correlation of information across various entities and relationships.

By using this approach to analyze Amazon’s investment and acquisition history of Amazon, we can quickly identify patterns and insights that might otherwise be overlooked. For instance, when examining the questions “Which companies has Amazon invested in or acquired in recent years?” or “How is AWS increasing energy efficiency?” The GraphRAG application can cross the knowledge graph, correlating press releases, investor relations information, entities, and financial data to provide a comprehensive overview of Amazon’s strategic moves.

The integration of Amazon Bedrock LLMs further enhances the accuracy and relevance of generated results. These models can contextualize the graph data, helping you to understand the nuances in company relationships and investment trends, and be supportive in generating comprehensive market reports. This combination of graph-based knowledge and natural language processing enables more precise answers and data interpretation, going beyond basic fact retrieval to offer analysis of Amazon’s investment strategy.

In summary, the synergy between Graph Explorer, Amazon Neptune, and Amazon Bedrock LLMs creates a framework for building GraphRAG applications that can extract meaningful insights from complex datasets. This approach streamlines the process of analyzing corporate investments and create new ways to analyze unstructured data across various industries and use cases.


About the authors

Ruan Roloff is a ProServe Cloud Architect specializing in Data & AI at AWS. During his time at AWS, he was responsible for the data journey and data product strategy of customers across a range of industries, including finance, oil and gas, manufacturing, digital natives and public sector — helping these organizations achieve multi-million dollar use cases. Outside of work, Ruan likes to assemble and disassemble things, fish on the beach with friends, play SFII, and go hiking in the woods with his family.

Sai Devisetty is a Technical Account Manager at AWS. He helps customers in the Financial Services industry with their operations in AWS. Outside of work, Sai cherishes family time and enjoys exploring new destinations.

Madhur Prashant is a Generative AI Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Read More

Streamline personalization development: How automated ML workflows accelerate Amazon Personalize implementation

Streamline personalization development: How automated ML workflows accelerate Amazon Personalize implementation

Crafting unique, customized experiences that resonate with customers is a potent strategy for boosting engagement and fostering brand loyalty. However, creating dynamic personalized content is challenging and time-consuming because of the need for real-time data processing, complex algorithms for customer segmentation, and continuous optimization to adapt to shifting behaviors and preferences—all while providing scalability and accuracy. Despite these challenges, the potential rewards make personalization a worthwhile pursuit for many businesses. Amazon Personalize is a fully managed machine learning (ML) service that uses your data to generate product and content recommendations for your users. Amazon Personalize helps accelerate time-to-value with custom models that are trained on data you provide, such as your users, catalog items, and the interactions between users and items to generate personalized content and product recommendations. You can choose from various recipes—algorithms for specific use-cases—to find the ones that fit your needs, such as recommending items that a user is mostly likely to engage with next given their past interactions or next best action that a user is most likely to take.

To maintain a personalized user experience, it’s crucial to implement machine learning operations (MLOps) practices, including continuous integration, deployment, and training of your ML models. MLOps facilitates seamless integration across various ML tools and frameworks, streamlining the development process. A robust machine learning solution for maintaining personalized experiences typically includes automated pipeline construction, as well as automated configuration, training, retraining, and deployment of personalization models. While services like Amazon Personalize offer a ready-to-use recommendation engine, establishing a comprehensive MLOps lifecycle for a personalization solution remains a complex undertaking. This process involves intricate steps to make sure that models remain accurate and relevant as user behaviors and preferences evolve over time.

This blog post presents an MLOps solution that uses AWS Cloud Development Kit (AWS CDK) and services like AWS Step Functions, Amazon EventBridge and Amazon Personalize to automate provisioning resources for data preparation, model training, deployment, and monitoring for Amazon Personalize.

Features and benefits

Deploying this solution offers improved scalability and traceability and allows you to quickly set up a production-ready environment to seamlessly deliver tailored recommendations to users using Amazon Personalize. This solution:

  • Streamlines the creation and management of Amazon Personalize resources.
  • Provides greater flexibility in resource management and selective feature activation.
  • Enhances readability and comprehensibility of complex workflows.
  • Enables event-driven architecture by publishing key Amazon Personalize events, allowing real-time monitoring, and enabling automated responses and integrations with other systems.
  • Includes automated creation of Amazon Personalize resources, including recommenders, solutions, and solution versions.
  • Facilitates end-to-end workflow automation for dataset import, model training, and deployment in Amazon Personalize.
  • Improves organization and modularity of complex processes through nested step functions.
  • Provides flexible activation of specific solution components using AWS CDK.

Solution overview

This solution uses AWS CDK layer 3 constructs. Constructs are the basic building blocks of AWS CDK applications. A construct is a component within your application that represents one or more AWS CloudFormation resources and their configuration.

Solution overview

The solution architecture is shown in the preceding figure and includes:

  1. An Amazon Simple Storage Service (Amazon S3) bucket is used to store interactions, users, and items datasets. In this step, you need to configure your bucket permissions so that Amazon Personalize and AWS Glue can access the datasets and input files.
  2. AWS Glue is used to preprocess the interactions, users, and item datasets. This step helps ensure that the datasets comply with the training data requirements of Amazon Personalize. For more information, see Preparing training data for Amazon Personalize.
  3. EventBridge is used to schedule regular updates, by triggering the workflow and for publishing events related to resource provisioning. Because Step Functions workflow orchestrates the workflow based on the input configuration file, you use that configuration when setting up the scheduled start of Step Functions.
  4. Step Functions workflow manages all resource provisioning of the Amazon Personalize dataset group (including datasets, schemas, event tracker, filters, solutions, campaigns, and batch inference jobs). Step Functions provides monitoring across the solution through event logs. You can also visually track the stages of your workflow in the Step Functions console. You can adjust the input configuration file to better fit your use case, by defining schemas, recipes, and inference options. The solution workflow will have the following steps:
    1. A preprocessing job that runs an AWS Glue job, if provided. This step facilitates any preprocessing of the data that might be required.
    2. Create a dataset group, which is a container for Amazon Personalize resources.
    3. Create a dataset import job for the datasets based on the defined S3 bucket.
    4. Create filters that define any filtering that you want to apply on top of the recommendations.
    5. Create an event tracker for ingesting real-time events, such as user interactions, which in turn influence the recommendations provided.
    6. Create solutions and recommenders for creating custom resources and domain recommenders.
    7. Create a campaign; or a batch inference or segment job for generating inferences for real-time, batch, and segmentation use cases respectively.
  5. If you have a batch inference use case, then recommendations that match your inputs will be output into the S3 bucket that you defined in the input configuration file.
  6. An Amazon EventBridge event bus, where resource status notification updates are posted throughout the AWS Step Functions workflow.

Prerequisites

Before you deploy the AWS CDK stack, make sure that you have the following prerequisites in place:

  1. Install and configure AWS Command Line Interface (AWS CLI).
  2. Install Python 3.12 or newer
  3. Install Node.js 20.16.0 or newer.
  4. Install AWS CDK 2.88.0 or newer.
  5. Docker 27.5.1 or newer (required for AWS Lambda function bundling).

Newer versions of AWS CLI, Python, Node.js, and the AWS CDK are generally compatible, this solution has been tested with the versions listed.

Deploy the solution

With the prerequisites in place, use the following steps to deploy the solution:

  1. Clone the repository to a new folder on your desktop using the following command:
git clone https://github.com/aws-samples/automate-mlops-personalize-cdk-pipeline.git
  1. Create a Python virtual environment for development:
python3 -m venv .venv
source .venv/bin/activate 
pip install -r requirements.txt
  1. Define an Amazon Personalize MLOps pipeline instance PersonalizeMlOpsPipeline (see personalize_pipeline_stack.py for the complete example, which also includes different inference options). In this walkthrough, you create a custom solution with an associated campaign and batch inference job:
PersonalizeMlOpsPipeline(
    self, 'PersonalizePipelineSolution',
    pre_processing_config={
        "job_class": PreprocessingGlueJobFlow
    },
    enable_filters=True,
    enable_event_tracker=True,
    recommendation_config=[
        {
            "type": "solutions",
            "inference_options": ["campaigns", "batchInferenceJobs"]
        },
        {
            "type": "recommenders"
        }
    ]

)

Where:

  • ‘PersonalizePipelineSolution‘ – The name of the pipeline solution stack
  • pre_processing_config – Configuration for the pre-processing job to transform raw data into a format usable by Amazon Personalize. For using AWS Glue jobs for preprocessing specify the AWS Glue job class (PreprocessingGlueJobFlow) as a value to the parameter job_class. Currently, only AWS Glue jobs are supported. You can pass the name of the AWS Glue job that you need to run as a part of the input config. This doesn’t deploy the actual AWS Glue job responsible for pre-processing the files; the actual AWS Glue must be created outside of this solution and the name passed as an input to the state machine. A sample AWS Glue job is supplied in the accompanying repo, which shows how pre-processing can be done.
  • enable_filters – A Boolean value to enable dataset filters for pre-processing. When set to true, the pipeline will create the state machines needed to create filters. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
  • enable_event_tracker – A Boolean value to enable the Amazon Personalize event tracker. When set to true, the pipeline will create the state machines needed to create an event tracker. Supported options are true or false. If you specify this value as false, the corresponding state machine is not deployed.
  • recommendation_config – Configuration options for recommendations. The two types currently supported are solutions and recommenders. Within the solutions type, you can have multiple options such as campaigns, batchInferenceJobs, and batchSegmentJobs. Based on the selected options, the corresponding state machine and components are created. In the earlier example, we used campaigns and batchInferenceJobs as the option, which means that only the campaigns and batch inference job state machines will be deployed with the AWS CDK.

After the infrastructure is deployed you can also enable and disable certain options through the state machine input configuration file. You can use this AWS CDK code to control what components are deployed in your AWS environment and with the input config, you can select what components run.

Preprocessing: As an optional step, you can use an existing AWS Glue job for preprocessing your data before feeding it into Amazon Personalize, which uses this data to generate recommendations for your end users. While this post demonstrates the process using the Movie Lens dataset, you can adapt it for your own datasets or custom processing needs. To do so, navigate to the glue_job folder and modify the movie_script.py file accordingly, or create an entirely new AWS Glue job tailored to your specific requirements. This preprocessing step, though optional, can be crucial in making sure that your data is optimally formatted for Amazon Personalize to generate accurate recommendations.

  1. Make sure that the AWS Glue job is configured to write its output to an S3 bucket. This bucket should then be specified as an input source in the Step Functions input configuration file.
  2. Verify that the AWS Glue service has the necessary permissions to access the S3 bucket mentioned in your script.
  3. In the input configuration, you’ll need to provide the name of the AWS Glue job that will be executed by the main state machine workflow. It’s crucial that this specified AWS Glue job runs without any errors, because any failures could potentially cause the entire state machine execution to fail.

Package and deploy the solution with AWS CDK, allowing for the most flexibility in development:

Before you can deploy the pipeline using AWS CDK, you need to set up AWS credentials on your local machine. You can refer Set up AWS temporary credentials for more details.

# bootstrap CDK (required once - deploys a CDK bootstrap CloudFormation stack for assets)
cdk bootstrap
# build the solution
cdk synth
# build and deploy the solution
cdk deploy

Run the pipeline

Before initiating the pipeline, create the resources that follow and document the resource names for future reference.

  1. Set up an S3 bucket for dataset storage. If you plan to use the preprocessing step, this should be the same bucket as the output destination.
  2. Update the S3 bucket policy to grant Amazon Personalize the necessary access permissions. See Giving Amazon Personalize access to Amazon S3 resources for policy examples.
  3. Create an AWS Identity and Access Management (IAM) role to be used by the state machine for accessing Amazon Personalize resources.

You can find detailed instructions and policy examples in the GitHub repository.

After you’ve set up these resources, you can create the input configuration file for the Step Functions state machine. If you configure the optional AWS Glue job it will create the input files that are required as an input to the pipeline, refer Configure the Glue Job to create the output files for more details.

Create input configuration

This input file is crucial because it contains all the essential information needed to create and manage your Amazon Personalize resources, this input configuration json acts as input to the Step Functions state machine. The file can contain the following top level objects

  • datasetGroup
  • datasets
  • eventTracker
  • filters
  • solutions (can contain campaigns, batchInferenceJobs and batchSegmentJobs)
  • recommenders

Customize the configuration file according to your specific requirements and include or exclude sections based on the Amazon Personalize artifacts that you want to create. For the dataset import jobs in the datasets section, replace AWS_ACCOUNT_ID, S3_BUCKET_NAME and IAM_ROLE_ARN with the appropriate values. The following is a snippet of the input configuration file. For a complete sample, see input_media.json.

"datasetImportJob": {
    "serviceConfig": {
        "dataSource": {
            "dataLocation": "s3://S3_BUCKET_NAME/users.csv"
        },
        "roleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/IAM_ROLE_NAME",
        "tags": [
            {
                "tagKey": "string",
                "tagValue": "string"
            }
        ],
        "importMode": "FULL",
        "jobName": "UsersDatasetImportJob"
    },
    "createNewJob": true
}
}

Likewise, if you’re using batch inference or batch segment jobs, remember to also update the BUCKET_NAME and IAM ROLE ARN in those sections. It’s important to verify that you have the required input files for batch inference stored in your S3 bucket. Adjust the file paths in your configuration to accurately reflect the location of these files within your bucket structure. This helps ensure that Amazon Personalize can access the correct data when executing these batch processes.

Adjust the AWS Glue Job name in the configuration file if you have configured it as a part of the AWS CDK stack.

See the property table for a deep dive into each property and identify whether it’s optional or required.

Execute the pipeline

You can run the pipeline using the main state machine by the name PersonalizePipelineSolution from the Step Functions Console or set up a schedule in EventBridge (find the step-by step process in the Schedule the workflow for continued maintenance of the solution section of this post).

  1. In the AWS Management Console for Step Functions, navigate to State machines and select the PersonalizePipelineSolution.

Personalize Pipeline Solution

  1. Choose Start Execution and enter the configuration file that you created for your use case based on the steps in the Create input configuration section.

Start Execution

  1. Choose Start execution and monitor the State Machine execution. In the Step Functions console, you will find a visual representation of the workflow and can track at what stage the execution is. Event logs will give you insight into the progress of the stages and information if there are any errors. The following figure is an example of a completed workflow:

Graph view of AWS Step Functions

  1. After the workflow finishes, you can view the resources in the Amazon Personalize console. For batch inference jobs specifically, you can locate the corresponding step under the Inference tasks section of the graph, and within the Custom Resources area of the Amazon Personalize console.

Get recommendations (real-time inference)

After your pipeline has completed its run successfully, you can obtain recommendations. In the example configuration, we chose to deploy campaigns as the inference option. As a result, you’ll have access to a campaign that can provide real-time recommendations.

We use the Amazon Personalize console to get recommendations. Choose Dataset groups and select your dataset group name. Choose Campaigns and select your campaign name. Enter a userid and items Ids of your choice to test personalized ranking, you can get the userid and item Ids from the input file in the Amazon S3 bucket you configured.

Test Campaign Results

 

Get recommendations (batch inference)

If you have configured batch inference to run, start by verifying that the batch inference step has successfully completed in the Step Functions workflow. Then, use the Amazon S3 console to navigate to the destination S3 bucket for your batch inference job. If you don’t see an output file there, verify that you’ve provided the correct path for the input file in your input configuration.

Media Inference Output - S3 Bucket

 

Schedule the workflow for continued maintenance of the solution

While Amazon Personalize offers automatic training for solutions through its console or SDK, allowing users to set retraining frequencies such as every three days, this MLOps workflow provides an enhanced approach. By using EventBridge schedules you gain more precise control over the timing of retraining processes. Using this method, you can specify exact dates and times for retraining executions. To implement this advanced scheduling, you can configure an EventBridge schedule to trigger the Step Functions execution, giving you finer granularity in managing your machine learning model updates.

  1. Navigate to the Amazon EventBridge Console and select EventBridge Schedule and then choose Create schedule.

Amazon EventBridge

  1. You can establish a recurring schedule for executing your entire workflow. A key benefit of this solution is the enhanced control it offers over the specific date and time you want your workflow to run. This allows for precise timing of your processes, which you can use to align the workflow execution with your operational needs or optimal data processing windows.

Specify schedule detail

  1. Select AWS Step Functions (as shown below) as your target.

Select Target - Event Bridge

  1. Insert the input configuration file that you prepared previously as the input and click Next.

Start Execution

An additional step you can take is to set up a dead-letter queue with Amazon Simple Query Service (Amazon SQS) to handle failed Step Functions executions.

Monitoring and notification

To maintain the reliability, availability, and performance of Step Functions and your solution, set up monitoring and logging. You can set up an EventBridge rule to receive notifications about events that are of interest, such as batch inference being ready in the S3 bucket. Here is how you can set that up:

  1. Navigate to Amazon Simple Notification Service (Amazon SNS) console and create an SNS topic that will be the target for your event.

Create Topic

  1. Amazon SNS supports subscription for different endpoint types such as HTTP/HTTPS, email, Lambda, SMS, and so on. For this example, use an email endpoint.

Create subscription

  1. After you create the topic and the subscription, navigate to the EventBridge console and select Create Rule. Define the details associated with the event such as the name, description, and the event bus.

Define rule detail

  1. To set up the event rule, you’ll use the pattern form. You use this form to define the specific events that will trigger notifications. For the batch segment job completion step, you should configure the source and detail-type fields as follows:
{
"source": ["solutions.aws.personalize"],
"detail-type": ["Personalize BatchSegmentJob status change"]
}

Event pattern

  1. Select the SNS topic as your target and proceed.

Select target(s)

With this procedure, you have set up an EventBridge rule to receive notifications on your email when an object is created in your batch inference bucket. You can also set up logic based on your use case to trigger any downstream processes such as creation of email campaigns with the results of your inference by choosing different targets such as Lambda.

Additionally, you can use Step Functions and Amazon Personalize monitoring through Amazon CloudWatch metrics. See Logging and Monitoring AWS Step Functions and Monitoring Amazon Personalize for more information.

Handling schema updates

Schema updates are available in Amazon Personalize for adding columns to the existing schema. Note that deleting columns from existing schemas isn’t currently supported. To update the schema, make sure that you’re modifying the schema in the input configuration passed to Step Functions. See Replacing a dataset’s schema to add new columns for more information.

Clean up

To avoid incurring additional costs, delete the resources you created during this solution walkthrough. You can clean up the solution by deleting the CloudFormation stack you deployed as part of the setup.

Using the console

  1. Sign in to the AWS CloudFormation console.
  2. On the Stacks page, select this solution’s installation stack.
  3. Choose Delete.

Using the AWS CLI

$ aws cloudformation delete-stack —stack-name <installation-stack-name> 

Conclusion

This MLOps solution for Amazon Personalize offers a powerful, automated approach to creating and maintaining personalized user experiences at scale. By using AWS services like AWS CDK, Step Functions, and EventBridge, the solution streamlines the entire process from data preparation through model deployment and monitoring. The flexibility of this solution means that you can customize it to fit various use cases, and integration with EventBridge keeps models up to date. Delivering exceptional personalized experiences is critical for business growth, and this solution provides an efficient way to harness the power of Amazon Personalize to improve user engagement, customer loyalty, and business results. We encourage you to explore and adapt this solution to enhance your personalization efforts and stay ahead in the competitive digital landscape.

To learn more about the capabilities discussed in this post, check out Amazon Personalize features and the Amazon Personalize Developer Guide.

Additional resources:


About the Authors

Reagan Rosario brings over a decade of technical expertise to his role as a Sr. Specialist Solutions Architect in Generative AI at AWS. Reagan transforms enterprise systems through strategic implementation of AI-powered cloud solutions, automated workflows, and innovative architecture design. His specialty lies in guiding organizations through digital evolution—preserving core business value while implementing cutting-edge generative AI capabilities that dramatically enhance operations and create new possibilities.

Nensi Hakobjanyan is a Solutions Architect at Amazon Web Services, where she supports enterprise Retail and CPG customers in designing and implementing cloud solutions. In addition to her deep expertise in cloud architecture, Nensi brings extensive experience in Machine Learning and Artificial Intelligence, helping organizations unlock the full potential of data-driven innovation. She is passionate about helping customers through digital transformation and building scalable, future-ready solutions in the cloud.

Read More

Fast-track SOP processing using Amazon Bedrock

Fast-track SOP processing using Amazon Bedrock

Standard operating procedures (SOPs) are essential documents in the context of regulations and compliance. SOPs outline specific steps for various processes, making sure practices are consistent, efficient, and compliant with regulatory standards.

SOP documents typically include key sections such as the title, scope, purpose, responsibilities, procedures, documentation, citations (references), and a detailed approval and revision history. In FDA-regulated industries such as healthcare and life sciences, SOPs play a crucial role in defining manufacturing, clinical, laboratory, quality control, quality assurance, and regulatory compliance practices.

When a regulatory body like the US Food and Drug Administration (FDA) introduces changes to regulations, organizations are required to evaluate the changes against their internal SOPs. When necessary, they must update their SOPs to align with the regulation changes and maintain compliance.

In this post, we show different approaches using Amazon Bedrock to identify relationships between regulation changes and SOPs.

Challenge

In the healthcare and life sciences industry, regulatory authorities like the FDA and the European Medicines Agency (EMA) frequently update regulations across various areas, such as clinical trials, medical devices, drug development and approvals, quality risk management, systems and data management, and technology adoption. These regulatory updates often require organizations to correspondingly update their internal SOPs to align with the changes. This process is typically manual, requiring a team of subject matter experts to review the regulatory changes, screen the SOPs to identify relevance, determine the impact, and specify what needs to be updated. This manual approach adds significant overhead for companies and can result in review cycles lasting several days to months.

To address this challenge, we explore approaches that can help automate the identification of relationships between regulatory changes and SOPs. These approaches can also be extended to assess the impact of regulatory changes on an organization’s internal processes and documentation. By using automation, companies can streamline the SOP update process, reducing the time and resources required to maintain alignment with evolving regulatory requirements.

Sample Data

For this post, we used SOPs published by the FDA’s Center for Biologics Evaluation and Research. These publicly available SOPs are used by the FDA staff to guide their duties.

Specifically, we focused on the following SOPs related to biologics procedures. This narrow scope allowed us to dive deeper into a specific regulatory domain within the larger healthcare and life sciences industry.

In addition to the SOPs, we also used three of the FDA’s Biologics Guidance Documents to test the relationship between the regulatory documents and the SOPs.

These guidance documents describe the FDA’s policy interpretations on regulatory issues related to the biologics domain. They cover a wide range of topics, including processing, content, evaluation, approval, inspection, and enforcement of policies. The guidance documents also discuss specific products or issues relating to the design, production, labeling, promotion, manufacturing, and testing of regulated products.

We used the following specific FDA Biologics Guidance Documents for this analysis:

Approaches

A key step in assessing the impact of regulatory changes is to identify if a regulatory guidance is related to an organization’s SOPs. We used Amazon Bedrock along with Amazon Simple Storage Service (Amazon S3) to store the input dataset.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Our experiments used Anthropic’s Claude 3 Opus large language model (LLM) on Amazon Bedrock. However, you can use the broad selection of models available on Amazon Bedrock to experiment with alternative models and choose the one that best suits your specific requirements. Amazon Bedrock frequently releases updated versions of existing AI models that can be accessed and used by simply applying a configuration change, making it a highly flexible choice for deploying the latest AI capabilities.

We focused on the following approaches:

  1. Full document match – Comparing the full text of the regulatory guidance and SOP documents
  2. Text similarity – This approach consists of two options:
    • Vector embeddings – Measuring the semantic similarity between the guidance and SOP texts
    • Keyword Search – Identifying relevant keywords and their occurrences in the documents
  3. Taxonomy topic match – Mapping the guidance and SOP content to a taxonomic structure to identify topical relationships

This post details the approaches we explored and the learnings from our experiments.

Full document match

The following diagram illustrates the full document match architecture.

In this approach, we compared each regulatory change to every SOP by passing the full contents of the SOP and the regulatory change to the model. The goal was to identify relationship between the regulatory change and the SOP.

The following is a sample prompt to check if an SOP is related to a regulation change:

Human:

Act as an expert who understands the Food and Drug Administration's Center for Biologics Evaluation and Research (CBER) internal Biologics procedures, which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies).

Given a regulatory guidance change, and a document describing a CBER Biologics procedure of evaluation, research or review, tell me if the regulatory change is related the procedure.
<regulatory_change>
{regchange}
</regulatory_change>

<biologics_procedure>
{procedure}
</biologics_procedure>
To answer this question, first, determine what the subject of the regulatory change is. For example, the subject might be something like "regulatory considerations for human cells and tissues" or "processing of clinical holds for new drug applications".

In order for the regulatory change to be related to the biologics procedure, the procedure description must include information about the topic. In other words, you should have a specific section from the procedure that is clearly related to the regulatory change in order to say that it is related.

For example, if a procedure description mentions cells or tissues, it could be related to how they need to be handled or manipulated. Similarly if a procedure mentions dispute resolution, it could be related to how a biologics procedure handles dispute resolution. Tell me whether the regulatory change is related to that the biologics procedure. Don't guess, and explain your thinking. If you can't find specific elements in the procedure that are clearly related to the regulations, then the answer is that the procedure is not related by the regulatory change.

Your answer should end with the word "RELATED" or "UNRELATED", reflecting your overall evaluation of whether the procedure description is related to the regulation change or not.

When we ran the full document matching approach using Amazon Bedrock across all the SOPs and the regulatory guidance documents in the dataset, the results showed accurate identification of related SOPs. For example, SOPP 9151 was correctly identified as the only SOP related to the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation change, with others being identified as unrelated:

Assistant:

SOPP9151-061306.txt:

The regulatory change is about the regulation of human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides guidance on the criteria for determining how HCT/Ps will be regulated, registration and listing requirements, donor eligibility requirements, current good tissue practice requirements, and FDA inspection and enforcement of HCT/P establishments.

The biologics procedure is specifically about handling requests for exemptions and alternative procedures for HCT/Ps under 21 CFR 1271.155. The procedure outlines the responsibilities and steps for CBER staff to process and respond to these exemption requests.

There is a clear relationship between the regulatory change and the biologics procedure. The regulatory change provides the overall framework and requirements for regulating HCT/Ps, while the biologics procedure focuses on one specific aspect within that framework - handling exemption requests from certain HCT/P requirements.

The procedure directly references and provides instructions for implementing 21 CFR 1271.155, which is part of the HCT/P regulations covered in the regulatory change document. So the biologics procedure is a detailed process that enables the agency to grant exemptions and alternatives to the HCT/P requirements established in the regulations.

In summary, the regulatory change establishes the rules for HCT/Ps, and the biologics procedure describes how the agency will process requests for exceptions to those rules. Therefore, the two documents are closely related.

RELATED

Similarly, SOPP 8005 was correctly identified as the only SOP related to the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation change, and the other SOPs were identified as unrelated.

Finally, SOP 8201 was also correctly identified as the only SOP related to the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation change.

These results demonstrate the effectiveness of the full document matching approach in accurately linking the relevant SOPs to their corresponding regulatory guidance documents.

Text similarity

The following diagram illustrates the text similarity match workflow.

In our second approach, we indexed the SOPs using either vector embeddings for semantic similarity or a keyword-based similarity approach. This allowed us to submit the contents of a regulatory change as a query and return the most similar SOP documents.

The steps involved in this text similarity approach are:

  1. Index the SOPs:
    • For a vector embeddings approach, we generated vector representations of the SOP contents using an LLM to capture semantic similarities.
    • For a keyword-based approach, we identified the most relevant keywords in each SOP and built an index based on their occurrences.
  2. Query the index:
    • For a given regulatory change, we submitted the text as a query to the SOP index.
    • The index then returned the most similar SOPs based on the chosen similarity metric (semantic or keyword-based).

Vector Search

For the text similarity approach, we used the open source in-memory database ChromaDB to generate the vector embeddings and perform the search.

We created a collection within ChromaDB containing all the SOP documents. We then independently queried each regulation guidance document text against this SOP collection. We used the default L2 distance algorithm, where a lower distance score indicates a closer match between the query and the indexed SOP documents.

Although the vector embedding-based text similarity approach identified the top matching SOP document in some cases, it also produced some inaccurate results.

For example, when querying with the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, SOPP 9151 was correctly identified as the top match. However, a few other unrelated SOP documents also had low distance scores, which could potentially lead to them being misidentified as relevant:

[
    [
        "SOPP9151-061306.txt",
        0.640560507774353
    ],
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.8971723914146423
    ],
    [
        "SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt",
        0.9069873690605164
    ],
    [
        "SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt",
        1.069345235824585
    ],
    [
        "SOPP-8507V4-IllegitimateProds-Final.txt",
        1.1143898963928223
    ],
    [
        "SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt",
        1.157564640045166
    ],
    [
        "SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt",
        1.2106068134307861
    ],
    [
        "SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt",
        1.3174282312393188
    ]
]

Similarly, when querying with the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation, the vector search incorrectly identified SOPP 8717 as the best match, whereas SOPP 8005, which is more directly related to formal dispute resolution, had a higher distance score:

[
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.848071277141571
    ],
       …
       …
]

Finally, for the regulation Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry, the vector search again identified SOPP 8717 as the top match, rather than the more relevant SOP 8201:

[
    [
        "SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt",
        0.8028254508972168
    ],
    …
    …
]

Keyword search

We also explored a keyword-based similarity method using the Whoosh Python search library. We first created an index of all the SOP documents using the Whoosh library. Then, for each regulation guidance document, we separately searched the index using a Whoosh query parser.

The Whoosh library returns a search score for each matched SOP document, where a higher score indicates a better match to the query.

When searching for the Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry regulation, the top matching document was incorrectly identified as SOPP 8301 instead of the more relevant SOPP 9151:

8 results found
<Hit {'fname': 'SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt'}> with score of 744.420492560645
<Hit {'fname': 'SOPP-8507V4-IllegitimateProds-Final.txt'}> with score of 609.3431135327576
<Hit {'fname': 'SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt'}> with score of 588.5899332740212
<Hit {'fname': 'SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt'}> with score of 582.278310231856
<Hit {'fname': 'SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt'}> with score of 449.11608593886564
<Hit {'fname': 'SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt'}> with score of 377.9723456561558
<Hit {'fname': 'SOPP9151-061306.txt'}> with score of 328.67972016789844
<Hit {'fname': 'SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt'}> with score of 160.6985781375766

Similarly, when searching for the Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulation using the keyword-based similarity approach, the Whoosh library again incorrectly identified SOPP 8301 as the top matching document, instead of the more relevant SOPP 8005:

8 results found
<Hit {'fname': 'SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt'}> with score of 721.9535225922815
…
…

In contrast with the previous examples, when searching for the Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulation, SOP 8201 was correctly identified as the top matching document:

8 results found
<Hit {'fname': 'SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt'}> with score of 503.3284407067554
…
…

These results indicate that although the keyword-based similarity approach can be effective in some cases, it might also struggle to accurately identify the most relevant SOPs, similar to the challenges faced with the vector embedding-based method.

Taxonomy-based topic matching

The following diagram illustrates the taxonomy-based topic matching architecture.

In our third approach, we focused on building a hierarchical taxonomy of topics associated with the subject of biologics procedures. This taxonomy-based approach aimed to capture the topical relationships between the regulatory guidance documents and the SOPs. This approach has the potential to provide a cost-effective solution, because the entire SOP or input document doesn’t need to be passed with the prompt for every query to the LLM. Performing a full text match between an SOP and a regulatory change might not be a cost-effective approach, especially as the input documents grow in size.

The key steps in this approach were:

Step 1: We constructed a multi-level taxonomy that organized the topics related to biologics procedures. The taxonomy included a hierarchy of main topics, each of which could include subtopics.

The following is an example prompt for building the taxonomy:

Human:

Act as an expert in Biologics procedures for the Food & Drug Administration's Center for Biologics Evaluation and Research (CBER), which help their staff in performing their duties in assuring the safety, purity, potency, and effectiveness of biologics and related products (such as vaccines, live biotherapeutics (probiotics), blood products, and cell, tissue, and gene therapies). Biologics procedures help CBER staff in regulating administration and management of biologics evaluations and reviews including but not limited to clinical studies, electronic submissions, dispute resolutions and management of biologics and related products

I want you to create a hierarchy or taxonomy of topics relating to the biologics procedures. For example, some of the topics may be related to

    - Administrative Guidances

    - Adverse Events and Product Deviation Guidances

    - Application Submission Guidances

    - Biosimilars Guidances

    - Clinical Guidances

    - CMC & GMP Guidances

    - Devices Guidances

    - Generics Guidances

    - Labeling and Promotion Guidances

    - Allergenics Guidances

    - Blood Guidances

    - Cellular and Gene Therapy Guidances

    - Tissue Guidances

    - Vaccine and Related Biological Product Guidances

    - Xenotransplantation Guidances

    - Communication

    - Marketing Applications

    - Compliance

    - Medical Devices

    - Master Files

    - Investigational New Drugs

    - Communication

    - General Information

    - Therapeutic Products

Here's an example of how a topic might be organized:

<topic>
  <name>Master Files</name>
  <subtopic>
      <name>Drug Substance</name>
  </subtopic>

  <subtopic>
      <name>Packaging Material</name>
  </subtopic>

  <subtopic>
      <name>Preparation Information</name>
  </subtopic>

  <subtopic>
      <name>Clinical Study Data</name>
  </subtopic>

  <subtopic>
      <name>Non-Clinical Study Data</name>
  </subtopic>
</topic>

Your hierarchy should include all the topics covering the CBER's biologics procedures, and should be sufficiently exhaustive that it covers many different types of activities and duties in different areas of biologics administration, evaluation and review.

Think step by step, using these steps:

Think about all of the important topics that enable CBER's staff to perform their duties in administering, evaluating, reviewing and managing of biologics and related products.
Organize those topics into a hierarchy or taxonomy. There should be a single top-level item called <biologics_topics>.
Output that hierarchy in an XML format. Make sure that each item in the XML is in plain English and doesn't use special characters like the underscore.

The following is a sample taxonomy hierarchy returned by Amazon Bedrock when using Anthropic’s Claude 3 Opus LLM:

Assistant:

<biologics_topics>

  <topic>
    <name>Administrative Guidances</name>
    <subtopic>
      <name>Forms</name>
    </subtopic>
    <subtopic> 
      <name>Meetings</name>
    </subtopic>
    <subtopic>
      <name>Dispute Resolution</name> 
    </subtopic>
  </topic>

  <topic>
    <name>Submission Guidances</name>
    <subtopic>
      <name>Investigational New Drug Applications</name>
    </subtopic>
    <subtopic>
      <name>Biologics License Applications</name>
    </subtopic>
    <subtopic> 
      <name>Supplements and Other Changes to an Approved Application</name>
    </subtopic>
    <subtopic>
      <name>Biosimilars</name>
    </subtopic>
  </topic>

  <topic> 
    <name>Clinical Guidances</name>
    <subtopic>
      <name>Clinical Trial Design</name>
    </subtopic>
    <subtopic>
      <name>Human Subject Protection</name> 
    </subtopic>
    <subtopic>
      <name>Pediatric Studies</name>
    </subtopic>
  </topic>

  <topic>
    <name>Safety Guidances</name> 
    <subtopic>
      <name>Adverse Event Reporting</name>
    </subtopic>
    <subtopic>
      <name>Safety Assessments</name>
    </subtopic>
    <subtopic>
      <name>Pharmacovigilance</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Quality Guidances</name>
    <subtopic> 
      <name>Chemistry Manufacturing and Controls</name>
    </subtopic>
    <subtopic>
      <name>Good Manufacturing Practices</name>
    </subtopic>
    <subtopic>
      <name>Analytical Methods</name>
    </subtopic>
  </topic>

  <topic>
    <name>Labeling Guidances</name>
    <subtopic>
      <name>Prescribing Information</name> 
    </subtopic>
    <subtopic>
      <name>Medication Guides</name>
    </subtopic>
    <subtopic>
      <name>Promotional Labeling and Advertising</name>
    </subtopic>
  </topic>

  <topic>
    <name>Product Specific Guidances</name>
    <subtopic>
      <name>Vaccines</name>
    </subtopic> 
    <subtopic>
      <name>Blood and Blood Components</name>
    </subtopic>
    <subtopic>
      <name>Cellular and Gene Therapies</name> 
    </subtopic>
    <subtopic>
      <name>Allergenics</name>
    </subtopic>
    <subtopic> 
      <name>Tissues</name>
    </subtopic>
    <subtopic>
      <name>Xenotransplantation Products</name>
    </subtopic>
  </topic>

  <topic>
    <name>Compliance and Enforcement</name>
    <subtopic>
      <name>Inspections</name>
    </subtopic>
    <subtopic>
      <name>Recalls</name> 
    </subtopic>
    <subtopic>
      <name>Warning Letters</name>
    </subtopic>
  </topic>

</biologics_topics>

This hierarchy covers the key areas involved in CBER’s regulation of biologics, from administrative processes, to investigational products, marketing applications, clinical development, safety monitoring, quality control, labeling, and compliance for the major categories of biological products. The structure aims to capture the breadth of guidances and procedures that CBER staff rely on to ensure the safety, purity, potency and effectiveness of biologics and related products.

Step 2: Using the taxonomy we built, we labeled each SOP with the topics that are covered in the SOP. We instructed the model to select the most specific ones from the list of topics and subtopics.

The following is an example prompt to label SOPs with taxonomy topics using Anthropic’s Claude 3 Opus model:

Human:

Act as an expert who understands Biologics procedures.

The following information describes the details of a Biologics procedure. Choose the topics covered in the description only from the following list of possible topics:

{possible_topics}

<procedure_description>

{procedure}

</procedure_description>

Explain your thinking, and end the explanation by generating a JSON string array containing all of the matching topic names.

All topic names must come from the list of possible topics. Do not return a topic unless it is found within the list of valid topics, subtopics, and sub-subtopics. 

If you generate a topic name that is not found in the list of possible topics, it is an error.

Make sure to select the most specific leaf level sub-subtopic or subtopic available from the list of possible topics, subtopics, and sub-subtopics. Be sure your list covers all of the main points of the Biologics procedure description.

The following is the example SOP topic labeling output from Amazon Bedrock:

Assistant:

SOPP-8005-Formal-Dispute-Resolution-Process-V6.txt: ['Administrative Guidances', 'Dispute Resolution']

SOPP-8117-Issuing-Tracking-Numbers-eCTD-Format-V9.txt: ['Administrative Guidances', 'Submission Guidances', 'Investigational New Drug Applications', 'Biologics License Applications']

SOPP-8201-Administrative-Processing-Clinical-Holds-INDs_V9.txt: ['Investigational New Drug Applications', 'Meetings', 'Clinical Trial Design', 'Adverse Event Reporting']

SOPP-8301-Receipt-and-Processing-of-Master-Files_V4.txt: ['Administrative Guidances', 'Submission Guidances', 'Compliance and Enforcement']

SOPP-8403-Issuance-Reissuance-and-Voluntary-Revocation-of-Biological-Product-Licenses-V6.txt: ['Submission Guidances', 'Compliance and Enforcement', 'Labeling Guidances']

SOPP-8507V4-IllegitimateProds-Final.txt: ['Compliance and Enforcement', 'Inspections', 'Recalls']

SOPP-8717--Required-Biocompatibility-Training-and-Toxicology-Profiles-for-Evaluation-of-Medical-Devices.txt: ['Product Quality Guidances', 'Submission Guidances', 'Administrative Guidances']

SOPP9151-061306.txt: ['Cellular and Gene Therapies', 'Inspections', 'Tissues']

Step 3: To find the relationships between the regulatory guidance documents and the SOPs, we followed a similar approach to label the regulatory changes with the most specific topics or subtopics from the built taxonomy.

The following is an example prompt to label regulatory guidance documents with taxonomy topics:

Human: 
Act as an expert who understands Biologics procedures. The following information describes a regulatory guidance or change that affects how certain Biologics procedures. Please choose the main topic covered in the change description from the following list of possible topics:
{topics_list} 
<regulatory_guidance_description> 
{regulatory_guidance} </regulatory_guidance_description> 
Explain your thinking, and end the explanation by generating an XML item called <topic> with the relevant topic string in it. Make sure the topic is the most specific one available from the list of possible topics, subtopics, and descendants of those. 
You must choose a single leaf-level XML node from the list of possible topics. The correct answer may be in a <topic> node, or <subtopic> or <subsubtopic>. Choose only one, and return the <name> of the selected node inside of a XML item called <topic>. 
Assistant:

Step 4: If the topic assigned to a regulatory change document matches one of the topics of an SOP, we can conclude the regulatory change is related to the SOP.

The following are example outputs of regulatory guidance topic alignment using Amazon Bedrock with the Anthropic Claude 3 Opus model.

The first example uses regulatory guidance Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry. The aligned taxonomy topic is Cellular and Gene Therapies. This topic aligns with SOPP 9151.

For reg change Regs/Regulation_Of_HCT_Ps_Small_Entity_Compliance_Guide.txt:

The regulatory guidance is focused on human cells, tissues, and cellular and tissue-based products (HCT/Ps). It provides an overview of the regulations in 21 CFR Part 1271 that apply to HCT/Ps.

The key topics covered include:

Criteria for regulation solely under section 361 of the PHS Act vs as drugs, devices and/or biological products
Registration and listing requirements for HCT/P establishments
Donor eligibility requirements
Current Good Tissue Practice (CGTP) requirements
FDA inspection and enforcement related to HCT/P establishments

Looking at the list of possible topics, the most relevant high-level topic is:

<topic>

  <name>Product Specific Guidances</name>

Within that, the most applicable subtopic is:

<subtopic>

  <name>Cellular and Gene Therapies</name>

Therefore, the most specific and relevant topic for this regulatory guidance is:

<topic>Cellular and Gene Therapies</topic>

The Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry regulatory guidance was aligned with the Dispute Resolution topic, which was also identified as a key topic covered in SOPP 8005.

For reg change Regs/Formal_Dispute_Resolution.txt:

The regulatory guidance description is about formal dispute resolution procedures for sponsors to appeal scientific and/or medical disputes to management above the division level at CDER and CBER. This falls under the broad topic of "Administrative Guidances", and the specific subtopic of "Dispute Resolution".

Therefore, the most specific and appropriate topic for this guidance is:

<topic>Dispute Resolution</topic>

The Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry regulatory guidance was mapped to the Investigational New Drug Applications topic, which was found to be aligned with the content covered in SOP 8201, indicating a strong relationship between the guidance and this particular SOP:

For reg change Regs/Submitting_And_Reviewing_Complete_Responses_To_Clinical_Holds.txt:

The regulatory guidance is about the process for submitting and reviewing responses to clinical holds on INDs. The key points are:

  • When FDA imposes a clinical hold on an IND, the study cannot proceed until the sponsor submits a complete response addressing all the clinical hold issues, and FDA notifies the sponsor they can proceed.
  • The guidance describes what the sponsor should include in the complete response, how to submit it, and how FDA will review and respond to it within 30 days.
  • It also covers procedural details like how FDA will track and measure the 30-day response timeline for PDUFA goals.
Looking at the list of possible topics, this guidance falls under:

<topic>

  <name>Submission Guidances</name>

  <subtopic>

    <name>Investigational New Drug Applications</name>

  </subtopic>

</topic>

Since it is specifically about the process for responding to clinical holds on INDs, the most relevant leaf-level topic is:

<topic>Investigational New Drug Applications</topic>

The taxonomic alignment approach was effective in accurately identifying the relationships between the regulatory changes and the SOPs in the test dataset.

Learnings

The following table summarizes our observations. SOPs formatted as red bold italic in the table are misidentified.

Regulation

Full text match

Taxonomy topic match Text similarity
Vector embeddings

Keyword Search

Regulation of Human Cells, Tissues, and Cellular and Tissue-Based Products (HCT/Ps) – Small Entity Compliance Guide; Guidance for Industry

SOPP 9151

SOPP 9151 SOPP 9151

SOPP 8301

Formal Dispute Resolution: Appeals Above the Division Level; Guidance for Industry

SOPP 8005

SOPP 8005

SOPP 8717

SOPP 8201

Submitting and Reviewing Complete Responses to Clinical Holds (Revised); Guidance for Industry

SOP 8201

SOP 8201

SOPP 8717

SOP 8201

The combination of the full text matching and taxonomy-based topic matching approaches, using Amazon Bedrock and the Anthropic Claude 3 Opus model, enabled accurate identification of the SOPs most closely related to the regulation guidance documents in the dataset. In contrast, the text similarity methods using vector embeddings and keyword search were less successful in correctly matching the SOPs to the relevant regulatory guidance documents.

Conclusion

In this post, we explored various approaches to quickly identify the relationships between regulatory changes and an organization’s SOPs, using Amazon Bedrock and Anthropic’s Claude 3 Opus model. The methods we evaluated included full text matching, text similarity using vector embeddings and keyword search, and a taxonomy-based topic alignment approach.

Our findings indicate that the full text matching and taxonomy-based topic matching were the most effective in accurately identifying the SOPs most closely related to the regulation guidance documents in the test dataset. In contrast, the text similarity techniques using vector embeddings and keyword search were less reliable in consistently matching the SOPs to the relevant regulatory documents.

Both the full text matching and taxonomy-based approaches can be viable options for organizations to assess the relationships between regulatory changes and their internal SOPs. The full text matching might provide more accurate results but requires providing the complete text of the SOP or input document, which could have cost implications.

The taxonomy-based approach, on the other hand, offers a structured way to map the content of the documents to a customizable topic hierarchy. Although the initial taxonomy might not be complete or fully accurate, it can be further enriched and tailored to an organization’s specific needs. If you choose a taxonomy-based approach, you can use a machine-generated starting point and then refine it to better suit your domain and use case requirements.

By adopting the taxonomy-based approach and adapting it to their specific needs, organizations can not only identify the relationships between regulatory changes and SOPs, they can also assess the potential impact of regulatory changes on their internal procedures. This can help streamline the process of screening SOPs against new regulations and fast-track the impact assessment in regulated industries like life sciences and others.

If you want to implement a similar solution in your AWS environment, reach out to your AWS account team for assistance.

I would like to acknowledge Greg Sommerville, Thomaz Silva and Murtuza Bootwala for their contributions to this blog. It couldn’t have been done without them.


About the Author

Ganesh Raam Ramadurai is a Senior Technical Program Manager at Amazon Web Services (AWS), where he leads the PACE (Prototyping and Cloud Engineering) team. He specializes in delivering innovative, AI/ML and Generative AI-driven prototypes that help AWS customers explore emerging technologies and unlock real-world business value. With a strong focus on experimentation, scalability, and impact, Ganesh works at the intersection of strategy and engineering—accelerating customer innovation and enabling transformative outcomes across industries.

Read More

Deploy Amazon SageMaker Projects with Terraform Cloud

Deploy Amazon SageMaker Projects with Terraform Cloud

Amazon SageMaker Projects empower data scientists to self-serve Amazon Web Services (AWS) tooling and infrastructure to organize all entities of the machine learning (ML) lifecycle, and further enable organizations to standardize and constrain the resources available to their data science teams in pre-packaged templates.

For AWS customers using Terraform to define and manage their infrastructure-as-code (IaC), the current best practice for enabling Amazon SageMaker Projects carries a dependency on AWS CloudFormation to facilitate integration between AWS Service Catalog and Terraform. This blocks enterprise customers whose IT governance prohibit use of vendor-specific IaC such as CloudFormation from using Terraform Cloud.

This post outlines how you can enable SageMaker Projects with Terraform Cloud, removing the CloudFormation dependency.

AWS Service Catalog engine for Terraform Cloud

SageMaker Projects are directly mapped to AWS Service Catalog products. To obviate the use of CloudFormation, these products must be designated as Terraform products that use the AWS Service Catalog Engine (SCE) for Terraform Cloud. This module, actively maintained by Hashicorp, contains AWS-native infrastructure for integrating Service Catalog with Terraform Cloud so that your Service Catalog products are deployed using the Terraform Cloud platform.

By following the steps in this post, you can use the Service Catalog engine to deploy SageMaker Projects directly from Terraform Cloud.

Prerequisites

To successfully deploy the example, you must have the following:

  1. An AWS account with the necessary permissions to create and manage SageMaker Projects and Service Catalog products. See the Service Catalog documentation for more information on Service Catalog permissions.
  2. An existing Amazon SageMaker Studio domain with an associated Amazon SageMaker user profile. The SageMaker Studio domain must have SageMaker Projects enabled. See Use quick setup for Amazon SageMaker AI.
  3. A Unix terminal with the AWS Command Line Interface (AWS CLI) and Terraform installed. See the Installing or updating to the latest version of the AWS CLIand the Install Terraform for more information about installation.
  4. An existing Terraform Cloud account with the necessary permissions to create and manage workspaces. See the following tutorials to quickly create your own account:
    1. HCP Terraform – intro and sign Up
    2. Log In to HCP Terraform from the CLI

See Terraform teams and organizations documentation for more information about Terraform Cloud permissions.

Deployment steps

  1. Clone the sagemaker-custom-project-templates repository from the AWS Samples GitHub to your local machine, update the submodules, and navigate to the mlops-terraform-cloud directory.
    $ git clone https://github.com/aws-samples/sagemaker-custom-project-templates.git
    $ cd sagemaker-custom-project_templates
    $ git submodule update --init --recursive
    $ cd mlops-terraform-cloud

The preceding code base above creates a Service Catalog portfolio, adds the SageMaker Project template as a Service Catalog product to the portfolio, allows the SageMaker Studio role to access the Service Catalog product, and adds the necessary tags to make the product visible in SageMaker Studio. See Create Custom Project Templates in the SageMaker Projects Documentation for more information about this process.

  1. Login to your Terraform Cloud account
    $ terraform login

This prompts your browser to sign into your HCP account and generates a security token. Copy this security token and paste it back into your terminal.

  1. Navigate to your AWS account and retrieve the SageMaker user role Amazon Resource Name (ARN) for the SageMaker user profile associated with your SageMaker Studio domain. This role is used to grant SageMaker Studio users permissions to create and manage SageMaker Projects.
    • In the AWS Management Console for Amazon SageMaker, choose Domains from the navigation pane
      Amazon SageMaker home screen highlighting machine learning workflow options and quick-start configurations for users and organizations
    • Select your studio domain
      Amazon SageMaker Domains management screen with one InService domain, emphasizing shared environment for team collaboration
    • Under User Profiles, select your user profile
      Amazon SageMaker Domain management interface showing user profiles tab with configuration options and launch controls
    • In the User Details, copy the ARN
      SageMaker lead-data-scientist profile configuration with IAM role and creation details
  2. Create a tfvars file with the necessary variables for the Terraform Cloud workspace
    $ cp terraform.tfvars.example terraform.tfvars

  3. Set the appropriate values in the newly created tfvars file. The following variables are required:
    tfc_organization = "my-tfc-organization"
    tfc_team = "aws-service-catalog"
    token_rotation_interval_in_days = 30
    sagemaker_user_role_arns = ["arn:aws:iam::XXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole"]

Make sure that your desired Terraform Cloud (TFC) organization has the proper entitlements and that your tfc_team is unique for this deployment. See the Terraform Organizations Overview for more information on creating organizations.

  1. Initialize the Terraform Cloud workspace
    $ terraform init

  2. Apply the Terraform Cloud workspace
    $ terraform apply

  3. Go back to the SageMaker console using the user profile associated with the SageMaker user role ARN that you copied previously and choose Open Studio application
    SageMaker Studio welcome screen highlighting integrated ML development environment with login options
  4. In the navigation pane, choose Deployments and then choose Projects
    SageMaker Studio home interface highlighting ML workflow options, including JupyterLab and Code Editor, with Projects section emphasized for model deployment
  5. Choose Create project, select the mlops-tf-cloud-example product and then choose Next
    SageMaker Studio project creation workflow showing template selection step with Organization templates tab and MLOps workflow automation option
  6. In Project details, enter a unique name for the template and (option) enter a project description. Choose Create
    SageMaker project setup interface on Project details step, showcasing naming conventions, description field, and tagging options for MLOps workflow
  7. In a separate tab or window, go back to your Terraform Cloud account’s Workspaces and you’ll see a workspace being provisioned directly from your SageMaker Project deployment. The naming convention of the Workspace will be <ACCOUNT_ID>-<SAGEMAKER_PROJECT_ID>
    Terraform workspaces dashboard showing status counts and one workspace with Applied status

Further customization

This example can be modified to include custom Terraform in your SageMaker Project template. To do so, define your Terraform in the mlops-product/product directory. When ready to deploy, be sure to archive and compress this Terraform using the following command:

$ cd mlops-product
$ tar -czf product.tar.gz product

Cleanup

To remove the resources deployed by this example, run the following from the project directory:

$ terraform destroy

Conclusion

In this post you defined, deployed, and provisioned a SageMaker Project custom template purely in Terraform. With no dependencies on other IaC tools, you can now enable SageMaker Projects strictly within your Terraform Enterprise infrastructure.


About the author

Max Copeland is a Machine Learning Engineer for AWS, leading customer engagements spanning ML-Ops, data science, data engineering, and generative AI.

Read More

How ZURU improved the accuracy of floor plan generation by 109% using Amazon Bedrock and Amazon SageMaker

How ZURU improved the accuracy of floor plan generation by 109% using Amazon Bedrock and Amazon SageMaker

ZURU Tech is on a mission to change the way we build, from town houses and hospitals to office towers, schools, apartment blocks, and more. Dreamcatcher is a user-friendly platform developed by ZURU that allows users with any level of experience to collaborate in the building design and construction process. With the simple click of a button, an entire building can be ordered, manufactured and delivered to the construction site for assembly.

ZURU collaborated with AWS Generative AI Innovation Center and AWS Professional Services to implement a more accurate text-to-floor plan generator using generative AI. With it, users can specify a description of the building they want to design using natural language. For example, instead of designing the foundation, walls, and key aspects of a building from scratch, a user could enter, “Create a house with three bedrooms, two bathrooms, and an outdoor space for entertainment.” The solution would generate a unique floor plan within the 3D design space, allowing users with a non-technical understanding of architecture and construction to create a well-designed house

In this post, we show you why a solution using a large language model (LLM) was chosen. We explore how model selection, prompt engineering, and fine-tuning can be used to improve results. And we explain how the team made sure they could iterate quickly through an evaluation framework using key services such as Amazon Bedrock and Amazon SageMaker.

Understanding the challenge

The foundation for generating a house within Dreamcatcher’s 3D building system is to first confirm we can generate a 2D floor plan based on the user’s prompt. The ZURU team found that generating 2D floor plans, such as the one in the following image, using different machine learning (ML) techniques requires success across two key criteria.

First, the model must understand rooms, the purpose of each room, and their orientation to one another within a two-dimensional vector system. This can also be described as how well the model can adhere to the features described from a user’s prompt. Second, there is also a mathematical component to making sure rooms adhere to criteria such as specific dimensions and floor space. To be certain that they were on the right track and to allow for fast R&D iteration cycles, the ZURU team created a novel evaluation framework that would measure the output of different models based on showing the level of accuracy across these two key metrics.

The ZURU team initially looked at using generative adversarial networks (GAN) for floor plan generation, but experimentation with a GPT2 LLM had positive results based on the test framework. This reinforced the idea that an LLM-based approach could provide the required accuracy for a text-to–floor plan generator.

Improving the results

To improve on the results of the GPT2 model, we worked together and defined two further experiments. The first was a prompt engineering approach. Using Anthropic’s Claude 3.5 Sonnet in Amazon Bedrock the team was able to evaluate the impact of a leading proprietary model with contextual examples included in the prompts.  The second approach focused on using fine-tuning with Llama 3B variants to evaluate the improvement of accuracy when the model weights are directly influenced using high-quality examples.

Dataset preparation and analysis

To create the initial dataset, floor plans from thousands of houses were gathered from publicly available sources and reviewed by a team of in-house architects. To streamline the review process, the ZURU team built a custom application with a simple yes/no decision mechanism similar to those found in popular social matching applications, allowing architects to quickly approve plans compatible with the ZURU building system or reject those with disqualifying features. This intuitive approach significantly accelerated ZURU’s evaluation process while maintaining clear decision criteria for each floor plan.

To further enhance this dataset, we began with careful dataset preparation including filtering out the low-quality data (30%) by evaluating the metric score of ground truth dataset. Following this filtering mechanism, data points not achieving 100% accuracy on instruction adherence are removed from the training dataset. This data preparation technique helped to improve the efficiency and quality of the fine-tuning and prompt engineering by more than 20%.

During our exploratory data analysis we found that the dataset contained prompts that can match multiple floor plans as well as floor plans that could match multiple prompts. By moving all related prompt and floor plan combinations to the same data split (either training, validation, or testing) we were able to prevent data leakage and promote robust evaluation.

Prompt engineering approach

As part of our approach, we implemented dynamic matching for few-shot prompting that is different from traditional static sampling methods. Combining this with the implementation of prompt decomposition, we could increase the overall accuracy of the generated floor plan content.

With a dynamic few-shot prompting methodology, we retrieve the most relevant examples at run time based on the details of the input prompt from a high-quality dataset and provide this as part of the prompt to the generative AI model.

The dynamic few-shot prompting approach is further enhanced by prompt decomposition, where we break down complex tasks into smaller, more manageable components to achieve better results from language models. By decomposing queries, each component can be optimized for its specific purpose. We found that combining these methods resulted in improved relevancy in example selection and lower latency in retrieving the example data, leading to better performance and higher quality results.

Prompt engineering architecture

The workflow and architecture implemented for prototyping shown in the following figure demonstrates a systematic approach to AI model optimization. When a user query such as “Build me a house with three bedrooms and two bathrooms” is entered, the workflow follows these steps:

  1. We use prompt decomposition to execute three smaller tasks that retrieve highly relevant examples that match the same features for a house that the user has requested
  2. We use the relevant examples and inject it into the prompt to perform dynamic few-shot prompting to generate a floor plan
  3. We use the reflection technique to ask the generative AI model to self-reflect and asses that the generated content adheres to our requirements

Deep dive on workflow and architecture

The first step in our workflow is to understand the unique features of the house, which we can use as search criteria to find the most relevant examples in the subsequent steps. For this step, we use Amazon Bedrock, which provides a serverless API-driven endpoint for inference. From the wide range of generative AI models offered by Amazon Bedrock, we choose Mistral 7B, which provides the right balance between cost, latency, and accuracy required for this small decomposed step.

The second step is to search for the most relevant examples using the unique features we found. We use Amazon Bedrock Knowledge Bases backed by Amazon OpenSearch Serverless as a vector database to implement metadata filtering and hybrid search to retrieve the most relevant record identifiers. Amazon Simple Storage Service (Amazon S3) is used for storage of the data set, and Amazon Bedrock Knowledge Bases provides a managed solution for vectorizing and indexing the metadata into the vector database.

In the third step, we retrieve the actual floor plan data by record identifier using Amazon DynamoDB. By splitting the search and retrieval of floor plan examples into two steps, we were able to use purpose-built services with Amazon OpenSearch, allowing for low-latency search, and DynamoDB for low-latency data retrieval by key value leading to optimized performance.

After retrieving the most relevant examples for the user’s prompt, in step four we use Amazon Bedrock and Anthropic’s Claude 3.5 Sonnet as a model with leading benchmarks in deep reasoning and mathematics to generate our new floor plan.

Finally, in step five, we implement reflection. We use Amazon Bedrock with Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock again and pass the original prompt, instructions, examples and newly generated floor plan back with a final instruction for the model to reflect and double-check its generated floor plan and correct mistakes.

Fine-tuning approach

We explored two methods for optimizing LLMs for automated floorplan generation: full parameter fine-tuning and Low-Rank Adaptation (LoRA)–based fine-tuning. Full fine-tuning adjusts all LLM parameters, which requires significant memory and training time. In contrast, LoRA tunes only a small subset of parameters, reducing memory requirements and training time.

Workflow and architecture

We implemented our workflow containing data processing, fine-tuning, and inference and testing steps shown in the following figure below, all within a SageMaker Jupyter Lab Notebook provisioned with an ml.p4.24xlarge instance, giving us access to Nvidia A100 GPUs. Because we used a Jupyter notebook and ran all parts of our workflow interactively, we were able to iterate quickly and debug our experiments while maturing the training and testing scripts.

Deep dive on fine tuning workflow

One key insight from our experiments was the critical importance of dataset quality and diversity. Further to our initial dataset preparation, when fine-tuning a model, we found that carefully selecting training samples with larger diversity helped the model learn more robust representations. Additionally, although larger batch sizes generally improved performance (within memory constraints), we had to carefully balance this against computational resources (320 GB GPU memory in an ml.p4.24xlarge  instance) and training time (ideally within 1–2 days).

We conducted several iterations to optimize performance, experimenting with various approaches including initial few-sample quick instruction fine-tuning, larger dataset fine-tuning, fine-tuning with early stopping, comparing Llama 3.1 8B and Llama 3 8B models, and varying instruction length in fine-tuning samples. Through these iterations, we found that full fine-tuning of the Llama 3.1 8B model using a curated dataset of 200,000 samples produced the best results.

The training process for full fine-tuning Llama 3.1 8B with BF16 and a microbatch size of three involved eight epochs with 30,000 steps, taking 25 hours to complete. In contrast, the LoRA approach showed significant computational efficiency, requiring only 2 hours of training time and producing an 89 MB checkpoint.

Evaluation framework

The testing framework implements an efficient evaluation methodology that optimizes resource utilization and time while maintaining statistical validity. Key components include:

  1. A prompt deduplication system that identifies and consolidates duplicate instructions in the test dataset, reducing computational overhead and enabling faster iteration cycles for model improvement
  2. A distribution-based performance assessment that filters unique test cases, promotes representative sampling through statistical analysis, and projects results across the full dataset
  3. A metric-based evaluation that implements scoring across key criteria enabling comparative analysis against both the baseline GPT2 model and other approaches.

Results and business impact

To understand how well each approach in our experiment performed, we used the evaluation framework and compared several key metrics. For the purposes of this post, we focus on two of these key metrics. The first reflects how well the model was able to follow users’ instructions to reflect the features required in the house. The second metric looks at how well the features of the house adhered to instructions in mathematical and positioning and orientation. The following image show these results in a graph.

We found that the prompt engineering approach with Anthropic’s Claude 3.5 Sonnet as well as the full fine-tuning approach with Llama 3.1 8b increased the instruction adherence quality over the baseline GPT2 model by 109%, showing that, depending on a team’s skillsets, both approaches could be used to improve the quality of understanding an LLM when generating content such as floor plans.

When looking at mathematical correctness, our prompt engineering approach wasn’t able to create significant improvements over the baseline, but full fine-tuning was a clear winner with a 54% increase over the baseline GPT2 results.

The LoRA-based tuning approach achieves slightly lower performance scores being 20% less in the metric scores on instruction adherence and 50% lower scores on mathematical correctness compared to full fine-tuning, demonstrating the tradeoffs that can be made when it comes to time, cost, and hardware compared to model accuracy.

Conclusion

ZURU Tech has set its vision on fundamentally transforming the way we design and construct buildings. In this post, we highlighted the approach to building and improving a text-to–floor plan generator based on LLMs to create a highly useable and streamlined workflow within a 3D-modeling system. We dived into advanced concepts of prompt engineering using Amazon Bedrock and detailed approaches to fine-tuning LLMs using Amazon SageMaker, showing the different tradeoffs you can make to significantly improve on the accuracy of the content that is generated.

To learn more about the Generative AI Innovation Center program, get in touch with your account team.


About the Authors

Federico Di Mattia is the team leader and Product Owner of ZURU AI at ZURU Tech in Modena, Italy. With a focus on AI-driven innovation, he leads the development of Generative AI solutions that enhance business processes and drive ZURU’s growth.

Niro Amerasinghe is a Senior Solutions Architect based out of Auckland, New Zealand. With experience in architecture, product development, and engineering, he helps customers in using Amazon Web Services (AWS) to grow their businesses.

Haofei Feng is a Senior Cloud Architect at AWS with over 18 years of expertise in DevOps, IT Infrastructure, Data Analytics, and AI. He specializes in guiding organizations through cloud transformation and generative AI initiatives, designing scalable and secure GenAI solutions on AWS. Based in Sydney, Australia, when not architecting solutions for clients, he cherishes time with his family and Border Collies.

Sheldon Liu is an applied scientist, ANZ Tech Lead at the AWS Generative AI Innovation Center. He partners with enterprise customers across diverse industries to develop and implement innovative generative AI solutions, accelerating their AI adoption journey while driving significant business outcomes.

Xuefeng Liu leads a science team at the AWS Generative AI Innovation Center in the Asia Pacific regions. His team partners with AWS customers on generative AI projects, with the goal of accelerating customers’ adoption of generative AI.

Simone Bartoli is a Machine Learning Software Engineer at ZURU Tech, in Modena, Italy. With a background in computer vision, machine learning, and full-stack web development, Simone specializes in creating innovative solutions that leverage cutting-edge technologies to enhance business processes and drive growth.

Marco Venturelli is a Senior Machine Learning Engineer at ZURU Tech in Modena, Italy. With a background in computer vision and AI, he leverages his experience to innovate with generative AI, enriching the Dreamcatcher software with smart features.

Stefano Pellegrini is a Generative AI Software Engineer at ZURU Tech in Italy. Specializing in GAN and diffusion-based image generation, he creates tailored image-generation solutions for various departments across ZURU.

Enrico Petrucci is a Machine Learning Software Engineer at ZURU Tech, based in Modena, Italy. With a strong background in machine learning and NLP tasks, he currently focuses on leveraging Generative AI and Large Language Models to develop innovative agentic systems that provide tailored solutions for specific business cases.

Read More

Going beyond AI assistants: Examples from Amazon.com reinventing industries with generative AI

Going beyond AI assistants: Examples from Amazon.com reinventing industries with generative AI

Generative AI revolutionizes business operations through various applications, including conversational assistants such as Amazon’s Rufus and Amazon Seller Assistant. Additionally, some of the most impactful generative AI applications operate autonomously behind the scenes, an essential capability that empowers enterprises to transform their operations, data processing, and content creation at scale. These non-conversational implementations, often in the form of agentic workflows powered by large language models (LLMs), execute specific business objectives across industries without direct user interaction.

Non-conversational applications offer unique advantages such as higher latency tolerance, batch processing, and caching, but their autonomous nature requires stronger guardrails and exhaustive quality assurance compared to conversational applications, which benefit from real-time user feedback and supervision.

This post examines four diverse Amazon.com examples of such generative AI applications:

Each case study reveals different aspects of implementing non-conversational generative AI applications, from technical architecture to operational considerations. Throughout these examples, you will learn how the comprehensive suite of AWS services, including Amazon Bedrock and Amazon SageMaker, are the key to success. Finally, we list key learnings commonly shared across these use cases.

Creating high-quality product listings on Amazon.com

Creating high-quality product listings with comprehensive details helps customers make informed purchase decisions. Traditionally, selling partners manually entered dozens of attributes per product. The new generative AI solution, launched in 2024, transforms this process by proactively acquiring product information from brand websites and other sources to improve the customer experience across numerous product categories.

Generative AI simplifies the selling partner experience by enabling information input in various formats such as URLs, product images, or spreadsheets and automatically translating this into the required structure and format. Over 900,000 selling partners have used it, with nearly 80% of generated listing drafts accepted with minimal edits. AI-generated content provides comprehensive product details that help with clarity and accuracy, which can contribute to product discoverability in customer searches.

For new listings, the workflow begins with selling partners providing initial information. The system then generates comprehensive listings using multiple information sources, including titles, descriptions, and detailed attributes. Generated listings are shared with selling partners for approval or editing.

For existing listings, the system identifies products that can be enriched with additional data.

Data integration and processing for a large variety of outputs

The Amazon team built robust connectors for internal and external sources with LLM-friendly APIs using Amazon Bedrock and other AWS services to seamlessly integrate into Amazon.com backend systems.

A key challenge is synthesizing diverse data into cohesive listings across more than 50 attributes, both textual and numerical. LLMs require specific control mechanisms and instructions to accurately interpret ecommerce concepts because they might not perform optimally with such complex, varied data. For example, LLMs might misinterpret “capacity” in a knife block as dimensions rather than number of slots, or mistake “Fit Wear” as a style description instead of a brand name. Prompt engineering and fine-tuning were extensively used to address these cases.

Generation and validation with LLMs

The generated product listings should be complete and correct. To help this, the solution implements a multistep workflow using LLMs for both generation and validation of attributes. This dual-LLM approach helps prevent hallucinations, which is critical when dealing with safety hazards or technical specifications. The team developed advanced self-reflection techniques to make sure the generation and validation processes complement each other effectively.

The following figure illustrates the generation process with validation both performed by LLMs.

Product Listing creation workflow

Figure 1. Product Listing creation workflow

Multi-layer quality assurance with human feedback

Human feedback is central to the solution’s quality assurance. The process includes Amazon.com experts for initial evaluation and selling partner input for acceptance or edits. This provides high-quality output and enables ongoing enhancement of AI models.

The quality assurance process includes automated testing methods combining ML-, algorithm-, or LLM-based evaluations. Failed listings undergo regeneration, and successful listings proceed to further testing. Using causal inference models, we identify underlying features affecting listing performance and opportunities for enrichment. Ultimately, listings that pass quality checks and receive selling partner acceptance are published, making sure customers receive accurate and comprehensive product information.

The following figure illustrates the workflow of going to production with testing, evaluation, and monitoring of product listing generation.

Product Listing testing and human in the loop workflow

Figure 2. Product Listing testing and human in the loop workflow

Application-level system optimization for accuracy and cost

Given the high standards for accuracy and completeness, the team adopted a comprehensive experimentation approach with an automated optimization system. This system explores various combinations of LLMs, prompts, playbooks, workflows, and AI tools to iterate for higher business metrics, including cost. Through continuous evaluation and automated testing, the product listing generator effectively balances performance, cost, and efficiency while staying adaptable to new AI developments. This approach means customers benefit from high-quality product information, and selling partners have access to cutting-edge tools for creating listings efficiently.

Generative AI-powered prescription processing in Amazon Pharmacy

Building upon the human-AI hybrid workflows previously discussed in the seller listing example, Amazon Pharmacy demonstrates how these principles can be applied in a Health Insurance Portability and Accountability Act (HIPAA)-regulated industry. Having shared a conversational assistant for patient care specialists in the post Learn how Amazon Pharmacy created their LLM-based chat-bot using Amazon SageMaker, we now focus on automated prescription processing, which you can read about in The life of a prescription at Amazon Pharmacy and the following research paper in Nature Magazine.

At Amazon Pharmacy, we developed an AI system built on Amazon Bedrock and SageMaker to help pharmacy technicians process medication directions more accurately and efficiently. This solution integrates human experts with LLMs in creation and validation roles to enhance precision in medication instructions for our patients.

Agentic workflow design for healthcare accuracy

The prescription processing system combines human expertise (data entry technicians and pharmacists) with AI support for direction suggestions and feedback. The workflow, shown in the following diagram, begins with a pharmacy knowledge-based preprocessor standardizing raw prescription text in Amazon DynamoDB, followed by fine-tuned small language models (SLMs) on SageMaker identifying critical components (dosage, frequency).

Data entry technician and pharmacist workflow with two GenAI modules

(a)

Data entry technician and pharmacist workflow with two GenAI modules

(b)

Flagging module workflow

(c)

Figure 3. (a) Data entry technician and pharmacist workflow with two GenAI modules, (b) Suggestion module workflow and (c) Flagging module workflow

The system seamlessly integrates experts such as data entry technicians and pharmacists, where generative AI complements the overall workflow towards agility and accuracy to better serve our patients. A direction assembly system with safety guardrails then generates instructions for data entry technicians to create their typed directions through the suggestion module. The flagging module flags or corrects errors and enforces further safety measures as feedback provided to the data entry technician. The technician finalizes highly accurate, safe-typed directions for pharmacists who can either provide feedback or execute the directions to the downstream service.

One highlight from the solution is the use of task decomposition, which empowers engineers and scientists to break the overall process into a multitude of steps with individual modules made of substeps. The team extensively used fine-tuned SLMs. In addition, the process employs traditional ML procedures such as named entity recognition (NER) or estimation of final confidence with regression models. Using SLMs and traditional ML in such contained, well-defined procedures significantly improved processing speed while maintaining rigorous safety standards due to incorporation of appropriate guardrails on specific steps.

The system comprises multiple well-defined substeps, with each subprocess operating as a specialized component working semi-autonomously yet collaboratively within the workflow toward the overall objective. This decomposed approach, with specific validations at each stage, proved more effective than end-to-end solutions while enabling the use of fine-tuned SLMs. The team used AWS Fargate to orchestrate the workflow given its current integration into existing backend systems.

In their product development journey, the team turned to Amazon Bedrock, which provided high-performing LLMs with ease-of-use features tailored to generative AI applications. SageMaker enabled further LLM selections, deeper customizability, and traditional ML methods. To learn more about this technique, see How task decomposition and smaller LLMs can make AI more affordable and read about the Amazon Pharmacy business case study.

Building a reliable application with guardrails and HITL

To comply with HIPAA standards and provide patient privacy, we implemented strict data governance practices alongside a hybrid approach that combines fine-tuned LLMs using Amazon Bedrock APIs with Retrieval Augmented Generation (RAG) using Amazon OpenSearch Service. This combination enables efficient knowledge retrieval while maintaining high accuracy for specific subtasks.

Managing LLM hallucinations—which is critical in healthcare—required more than just fine-tuning on large datasets. Our solution implements domain-specific guardrails built on Amazon Bedrock Guardrails, complemented by human-in-the-loop (HITL) oversight to promote system reliability.

The Amazon Pharmacy team continues to enhance this system through real-time pharmacist feedback and expanded prescription format capabilities. This balanced approach of innovation, domain expertise, advanced AI services, and human oversight not only improves operational efficiency, but means that the AI system properly augments healthcare professionals in delivering optimal patient care.

Generative AI-powered customer review highlights

Whereas our previous example showcased how Amazon Pharmacy integrates LLMs into real-time workflows for prescription processing, this next use case demonstrates how similar techniques—SLMs, traditional ML, and thoughtful workflow design—can be applied to offline batch inferencing at massive scale.

Amazon has introduced AI-generated customer review highlights to process over 200 million annual product reviews and ratings. This feature distills shared customer opinions into concise paragraphs highlighting positive, neutral, and negative feedback about products and their features. Shoppers can quickly grasp consensus while maintaining transparency by providing access to related customer reviews and keeping original reviews available.

The system enhances shopping decisions through an interface where customers can explore review highlights by selecting specific features (such as picture quality, remote functionality, or ease of installation for a Fire TV). Features are visually coded with green check marks for positive sentiment, orange minus signs for negative, and gray for neutral—which means shoppers can quickly identify product strengths and weaknesses based on verified purchase reviews. The following screenshot shows review highlights regarding noise level for a product.

An example product review highlights for a product.

Figure 4. An example product review highlights for a product.

A recipe for cost-effective use of LLMs for offline use cases

The team developed a cost-effective hybrid architecture combining traditional ML methods with specialized SLMs. This approach assigns sentiment analysis and keyword extraction to traditional ML while using optimized SLMs for complex text generation tasks, improving both accuracy and processing efficiency. The following diagram shows ttraditional ML and LLMs working to provide the overall workflow.

Use of traditional ML and LLMs in a workflow.

Figure 5. Use of traditional ML and LLMs in a workflow.

The feature employs SageMaker batch transform for asynchronous processing, significantly reducing costs compared to real-time endpoints. To deliver a near zero-latency experience, the solution caches extracted insights alongside existing reviews, reducing wait times and enabling simultaneous access by multiple customers without additional computation. The system processes new reviews incrementally, updating insights without reprocessing the complete dataset. For optimal performance and cost-effectiveness, the feature uses Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances for batch transform jobs, providing up to 40% better price-performance to alternatives.

By following this comprehensive approach, the team effectively managed costs while handling the massive scale of reviews and products so that the solution remained both efficient and scalable.

Amazon Ads AI-powered creative image and video generation

Having explored mostly text-centric generative AI applications in previous examples, we now turn to multimodal generative AI with Amazon Ads creative content generation for sponsored ads. The solution has capabilities for image and video generation, the details of which we share in this section. In common, this solution uses Amazon Nova creative content generation models at its core.

Working backward from customer need, a March 2023 Amazon survey revealed that nearly 75% of advertisers struggling with campaign success cited creative content generation as their primary challenge. Many advertisers—particularly those without in-house capabilities or agency support—face significant barriers due to the expertise and costs of producing quality visuals. The Amazon Ads solution democratizes visual content creation, making it accessible and efficient for advertisers of different sizes. The impact has been substantial: advertisers using AI-generated images in Sponsored Brands campaigns saw nearly 8% click-through rates (CTR) and submitted 88% more campaigns than non-users.

Last year, the AWS Machine Learning Blog published a post detailing the image generation solution. Since then, Amazon has adopted Amazon Nova Canvas as its foundation for creative image generation, creating professional-grade images from text or image prompts with features for text-based editing and controls for color scheme and layout adjustments.

In September 2024, the Amazon Ads team included the creation of short-form video ads from product images. This feature uses foundation models available on Amazon Bedrock to give customers control over visual style, pacing, camera motion, rotation, and zooming through natural language, using an agentic workflow to first describe video storyboards and then generate the content for the story. The following screenshot shows an example of creative image generation for product backgrounds on Amazon Ads.

Ads image generation example for a product.

Figure 6. Ads image generation example for a product.

As discussed in the original post, responsible AI is at the center of the solution, and Amazon Nova creative models come with built-in controls to support safety and responsible AI use, including watermarking and content moderation.

The solution uses AWS Step Functions with AWS Lambda functions to orchestrate serverless orchestration of both image and video generation processes. Generated content is stored in Amazon Simple Storage Service (Amazon S3) with metadata in DynamoDB, and Amazon API Gateway provides customer access to the generation capabilities. The solution now employs Amazon Bedrock Guardrails in addition to maintaining Amazon Rekognition and Amazon Comprehend integration at various steps for additional safety checks. The following screenshot shows creative AI-generated videos on Amazon Ads campaign builder.

Ads video generation for a product

Figure 7. Ads video generation for a product

Creating high-quality ad creatives at scale presented complex challenges. The generative AI model needed to produce appealing, brand-appropriate images across diverse product categories and advertising contexts while remaining accessible to advertisers regardless of technical expertise. Quality assurance and improvement are fundamental to both image and video generation capabilities. The system undergoes continual enhancement through extensive HITL processes enabled by Amazon SageMaker Ground Truth. This implementation delivers a powerful tool that transforms advertisers’ creative process, making high-quality visual content creation more accessible across diverse product categories and contexts.

This is just the beginning of Amazon Ads using generative AI to empower advertising customers to create the content they need to drive their advertising objectives. The solution demonstrates how reducing creative barriers directly increases advertising activity while maintaining high standards for responsible AI use.

Key technical learnings and discussions

Non-conversational applications benefit from higher latency tolerance, enabling batch processing and caching, but require robust validation mechanisms and stronger guardrails due to their autonomous nature. These insights apply to both non-conversational and conversational AI implementations:

  • Task decomposition and agentic workflows – Breaking complex problems into smaller components has proven valuable across implementations. This deliberate decomposition by domain experts enables specialized models for specific subtasks, as demonstrated in Amazon Pharmacy prescription processing, where fine-tuned SLMs handle discrete tasks such as dosage identification. This strategy allows for specialized agents with clear validation steps, improving reliability and simplifying maintenance. The Amazon seller listing use case exemplifies this through its multistep workflow with separate generation and validation processes. Additionally, the review highlights use case showcased cost-effective and controlled use of LLMs by using traditional ML for preprocessing and performing parts that could be associated with an LLM task.
  • Hybrid architectures and model selection – Combining traditional ML with LLMs provides better control and cost-effectiveness than pure LLM approaches. Traditional ML excels at well-defined tasks, as shown in the review highlights system for sentiment analysis and information extraction. Amazon teams have strategically deployed both large and small language models based on requirements, integrating RAG with fine-tuning for effective domain-specific applications like the Amazon Pharmacy implementation.
  • Cost optimization strategies – Amazon teams achieved efficiency through batch processing, caching mechanisms for high-volume operations, specialized instance types such as AWS Inferentia and AWS Trainium, and optimized model selection. Review highlights demonstrates how incremental processing reduces computational needs, and Amazon Ads used Amazon Nova foundation models (FMs) to cost-effectively create creative content.
  • Quality assurance and control mechanisms – Quality control relies on domain-specific guardrails through Amazon Bedrock Guardrails and multilayered validation combining automated testing with human evaluation. Dual-LLM approaches for generation and validation help prevent hallucinations in Amazon seller listings, and self-reflection techniques improve accuracy. Amazon Nova creative FMs provide inherent responsible AI controls, complemented by continual A/B testing and performance measurement.
  • HITL implementation – The HITL approach spans multiple layers, from expert evaluation by pharmacists to end-user feedback from selling partners. Amazon teams established structured improvement workflows, balancing automation and human oversight based on specific domain requirements and risk profiles.
  • Responsible AI and compliance – Responsible AI practices include content ingestion guardrails for regulated environments and adherence to regulations such as HIPAA. Amazon teams integrated content moderation for user-facing applications, maintained transparency in review highlights by providing access to source information, and implemented data governance with monitoring to promote quality and compliance.

These patterns enable scalable, reliable, and cost-effective generative AI solutions while maintaining quality and responsibility standards. The implementations demonstrate that effective solutions require not just sophisticated models, but careful attention to architecture, operations, and governance, supported by AWS services and established practices.

Next steps

The examples from Amazon.com shared in this post illustrate how generative AI can create value beyond traditional conversational assistants. We invite you to follow these examples or create your own solution to discover how generative AI can reinvent your business or even your industry. You can visit the AWS generative AI use cases page to start the ideation process.

These examples showed that effective generative AI implementations often benefit from combining different types of models and workflows. To learn what FMs are supported by AWS services, refer to Supported foundation models in Amazon Bedrock and Amazon SageMaker JumpStart Foundation Models. We also suggest you explore Amazon Bedrock Flows, which can ease the path towards building workflows. Additionally, we remind you that Trainium and Inferentia accelerators provide important cost savings in these applications.

Agentic workflows, as illustrated in our examples, have proven particularly valuable. We recommend exploring Amazon Bedrock Agents for quickly building agentic workflows.

Successful generative AI implementation extends beyond model selection—it represents a comprehensive software development process from experimentation to application monitoring. To begin building your foundation across these essential services, we invite you to explore Amazon QuickStart.

Conclusion

These examples demonstrate how generative AI extends beyond conversational assistants to drive innovation and efficiency across industries. Success comes from combining AWS services with strong engineering practices and business understanding. Ultimately, effective generative AI solutions focus on solving real business problems while maintaining high standards of quality and responsibility.

To learn more about how Amazon uses AI, refer to Artificial Intelligence in Amazon News.


About the Authors

BurakBurak Gozluklu is a Principal AI/ML Specialist Solutions Architect and lead GenAI Scientist Architect for Amazon.com on AWS, based in Boston, MA. He helps strategic customers adopt AWS technologies and specifically Generative AI solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. He maintains his connection to academia as a research affiliate at MIT. Outside of work, Burak is an enthusiast of yoga.

Emilio Maldonado is a Senior leader at Amazon responsible for Product Knowledge, oriented at building systems to scale the e-commerce Catalog metadata, organize all product attributes, and leverage GenAI to infer precise information that guides Sellers and Shoppers to interact with products. He’s passionate about developing dynamic teams and forming partnerships. He holds a Bachelor of Science in C.S. from Tecnologico de Monterrey (ITESM) and an MBA from Wharton, University of Pennsylvania.

Wenchao Tong is a Sr. Principal Technologist at Amazon Ads in Palo Alto, CA, where he spearheads the development of GenAI applications for creative building and performance optimization. His work empowers customers to enhance product and brand awareness and drive sales by leveraging innovative AI technologies to improve creative performance and quality. Wenchao holds a Master’s degree in Computer Science from Tongji University. Outside of work, he enjoys hiking, board games, and spending time with his family.

Alexandre Alves is a Sr. Principal Engineer at Amazon Health Services, specializing in ML, optimization, and distributed systems. He helps deliver wellness-forward health experiences.

Puneet Sahni is Sr. Principal Engineer in Amazon. He works on improving the data quality of all products available in Amazon catalog. He is passionate about leveraging product data to improve our customer experiences. He has a Master’s degree in Electrical engineering from Indian Institute of Technology (IIT) Bombay. Outside of work he enjoying spending time with his young kids and travelling.

Vaughn Schermerhorn is a Director at Amazon, where he leads Shopping Discovery and Evaluation—spanning Customer Reviews, content moderation, and site navigation across Amazon’s global marketplaces. He manages a multidisciplinary organization of applied scientists, engineers, and product leaders focused on surfacing trustworthy customer insights through scalable ML models, multimodal information retrieval, and real-time system architecture. His team develops and operates large-scale distributed systems that power billions of shopping decisions daily. Vaughn holds degrees from Georgetown University and San Diego State University and has lived and worked in the U.S., Germany, and Argentina. Outside of work, he enjoys reading, travel, and time with his family.

Tarik Arici is a Principal Applied Scientist at Amazon Selection and Catalog Systems (ASCS), working on Catalog Quality Enhancement using GenAI workflows. He has a PhD in Electrical and Computer Engineering from Georgia Tech. Outside of work, Tarik enjoys swimming and biking.

Read More

Architect a mature generative AI foundation on AWS

Architect a mature generative AI foundation on AWS

Generative AI applications seem simple—invoke a foundation model (FM) with the right context to generate a response. In reality, it’s a much more complex system involving workflows that invoke FMs, tools, and APIs and that use domain-specific data to ground responses with patterns such as Retrieval Augmented Generation (RAG) and workflows involving agents. Safety controls need to be applied to input and output to prevent harmful content, and foundational elements have to be established such as monitoring, automation, and continuous integration and delivery (CI/CD), which are needed to operationalize these systems in production.

Many organizations have siloed generative AI initiatives, with development managed independently by various departments and lines of businesses (LOBs). This often results in fragmented efforts, redundant processes, and the emergence of inconsistent governance frameworks and policies. Inefficiencies in resource allocation and utilization drive up costs.

To address these challenges, organizations are increasingly adopting a unified approach to build applications where foundational building blocks are offered as services to LOBs and teams for developing generative AI applications. This approach facilitates centralized governance and operations. Some organizations use the term “generative AI platform” to describe this approach. This can be adapted to different operating models of an organization: centralized, decentralized, and federated. A generative AI foundation offers core services, reusable components, and blueprints, while applying standardized security and governance policies.

This approach gives organizations many key benefits, such as streamlined development, the ability to scale generative AI development and operations across organization, mitigated risk as central management simplifies implementation of governance frameworks, optimized costs because of reuse, and accelerated innovation as teams can quickly build and ship use cases.

In this post, we give an overview of a well-established generative AI foundation, dive into its components, and present an end-to-end perspective. We look at different operating models and explore how such a foundation can operate within those boundaries. Lastly, we present a maturity model that helps enterprises assess their evolution path.

Overview

Laying out a strong generative AI foundation includes offering a comprehensive set of components to support the end-to-end generative AI application lifecycle. The following diagram illustrates these components.

Mature Generative AI Platform

In this section, we discuss the key components in more detail.

Hub

At the core of the foundation are multiple hubs that include:

  • Model hub – Provides access to enterprise FMs. As a system matures, a broad range of off-the-shelf or customized models can be supported. Most organizations conduct thorough security and legal reviews before models are approved for use. The model hub acts as a central place to access approved models.
  • Tool/Agent hub – Enables discovery and connectivity to tool catalog and agents. This could be enabled via protocols such as MCP, Agent2Agent (A2A).

Gateway

A model gateway offers secure access to the model hub through standardized APIs. Gateway is built as a multi-tenant component to provide isolation across teams and business units that are onboarded. Key features of a gateway include:

  • Access and authorization – The gateway facilitates authentication, authorization, and secure communication between users and the system. It helps verify that only authorized users can use specific models, and can also enforce fine-grained access control.
  • Unified API – The gateway provides unified APIs to models and features such as guardrails and evaluation. It can also support automated prompt translation to different prompt templates across different models.
  • Rate limiting and throttling – It handles API requests efficiently by controlling the number of requests allowed in a given time period, preventing overload and managing traffic spikes.
  • Cost attribution – The gateway monitors usage across the organization and allocates costs to the teams. Because these models can be resource-intensive, tracking model usage helps allocate costs properly, optimize resources, and avoid overspending.
  • Scaling and load balancing – The gateway can handle load balancing across different servers, model instances, or AWS Regions so that applications remain responsive.
  • Guardrails – The gateway applies content filters to requests and responses through guardrails and helps adhere to organizational security and compliance standards.
  • Caching – The cache layer stores prompts and responses that can help improve performance and reduce costs.

The AWS Solutions Library offers solution guidance to set up a multi-provider generative AI gateway. The solution uses an open source LiteLLM proxy wrapped in a container that can be deployed on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). This offers organizations a building block to develop an enterprise wide model hub and gateway. The generative AI foundation can start with the gateway and offer additional features as it matures.

The gateway pattern for tool/agent hub are still evolving. The model gateway can be a universal gateway to all the hubs or alternatively individual hubs could have their own purpose-built gateways.

Orchestration

Orchestration encapsulates generative AI workflows, which are usually a multi-step process. The steps could involve invocation of models, integrating data sources, using tools, or calling APIs. Workflows can be deterministic, where they are created as predefined templates. An example of a deterministic flow is a RAG pattern. In this pattern, a search engine is used to retrieve relevant sources and augment the data into the prompt context, before the model attempts to generate the response for the user prompt. This aims to reduce hallucination and encourage the generation of responses grounded in verified content.

Alternatively, complex workflows can be designed using agents where a large language model (LLM) decides the flow by planning and reasoning. During reasoning, the agent can decide when to continue thinking, call external tools (such as APIs or search engines), or submit its final response. Multi-agent orchestration is used to tackle even more complex problem domains by defining multiple specialized subagents, which can interact with each other to decompose and complete a task requiring different knowledge or skills. A generative AI foundation can provide primitives such as models, vector databases, and guardrails as a service and higher-level services for defining AI workflows, agents and multi-agents, tools, and also a catalog to encourage reuse.

Model customization

A key foundational capability that can be offered is model customization, including the following techniques:

  • Continued pre-training – Domain-adaptive pre-training, where existing models are further trained on domain-specific data. This approach can offer a balance between customization depth and resource requirements, necessitating fewer resources than training from scratch.
  • Fine-tuning – Model adaptation techniques such as instruction fine-tuning and supervised fine-tuning to learn task-specific capabilities. Though less intensive than pre-training, this approach still requires significant computational resources.
  • Alignment – Training models with user-generated data using techniques such as Reinforcement Learning with Human Feedback (RLHF) and Direct Preference Optimization (DPO).

For the preceding techniques, the foundation should provide scalable infrastructure for data storage and training, a mechanism to orchestrate tuning and training pipelines, a model registry to centrally register and govern the model, and infrastructure to host the model.

Data management

Organizations typically have multiple data sources, and data from these sources is mostly aggregated in data lakes and data warehouses. Common datasets can be made available as a foundational offering to different teams. The following are additional foundational components that can be offered:

  • Integration with enterprise data sources and external sources to bring in the data needed for patterns such as RAG or model customization
  • Fully managed or pre-built templates and blueprints for RAG that include a choice of vector databases, chunking data, converting data into embeddings, and indexing them in vector databases
  • Data processing pipelines for model customization, including tools to create labeled and synthetic datasets
  • Tools to catalog data, making it quick to search, discover, access, and govern data

GenAIOps

Generative AI operations (GenAIOps) encompasses overarching practices of managing and automating operations of generative AI systems. The following diagram illustrates its components.

Generative AI Ops

Fundamentally, GenAIOps falls into two broad categories:

  • Operationalizing applications that consume FMs – Although operationalizing RAG or agentic applications shares core principles with DevOps, it requires additional, AI-specific considerations and practices. RAGOps addresses operational practices for managing the lifecycle of RAG systems, which combine generative models with information retrieval mechanisms. Considerations here are choice of vector database, optimizing indexing pipelines, and retrieval strategies. AgentOps helps facilitate efficient operation of autonomous agentic systems. The key concerns here are tool management, agent coordination using state machines, and short-term and long-term memory management.
  • Operationalizing FM training and tuning – ModelOps is a category under GenAIOps, which is focused on governance and lifecycle management of models, including model selection, continuous tuning and training of models, experiments tracking, central model registry, prompt management and evaluation, model deployment, and model governance. FMOps, which is operationalizing FMs, and LLMOps, which is specifically operationalizing LLMs, fall under this category.

In addition, operationalization involves implementing CI/CD processes for automating deployments, integrating evaluation and prompt management systems, and collecting logs, traces, and metrics to optimize operations.

Observability

Observability for generative AI needs to account for the probabilistic nature of these systems—models might hallucinate, responses can be subjective, and troubleshooting is harder. Like other software systems, logs, metrics, and traces should be collected and centrally aggregated. There should be tools to generate insights out of this data that can be used to optimize the applications even further. In addition to component-level monitoring, as generative AI applications mature, deeper observability should be implemented, such as instrumenting traces, collecting real-world feedback, and looping it back to improve models and systems. Evaluation should be offered as a core foundational component, and this includes automated and human evaluation and LLM-as-a-judge pipelines along with storage of ground truth data.

Responsible AI

To balance the benefits of generative AI with the challenges that arise from it, it’s important to incorporate tools, techniques, and mechanisms that align to a broad set of responsible AI dimensions. At AWS, these Responsible AI dimensions include privacy and security, safety, transparency, explainability, veracity and robustness, fairness, controllability, and governance. Each organization would have its own governing set of responsible AI dimensions that can be centrally incorporated as best practices through the generative AI foundation.

Security and privacy

Communication should be over TLS, and private network access should be supported. User access should be secure, and a system should support fine-grained access control. Rate limiting and throttling should be in place to help prevent abuse. For data security, data should be encrypted at rest and transit, and tenant data isolation patterns should be implemented. Embeddings stored in vector stores should be encrypted. For model security, custom model weights should be encrypted and isolated for different tenants. Guardrails should be applied to input and output to filter topics and harmful content. Telemetry should be collected for actions that users take on the central system. Data quality is ownership of the consuming applications or data producers. The consuming applications should integrate observability into applications.

Governance

The two key areas of governance are model and data:

  • Model governance – Monitor model for performance, robustness, and fairness. Model versions should be managed centrally in a model registry. Appropriate permissions and policies should be in place for model deployments. Access controls to models should be established.
  • Data governance – Apply fine-grained access control to data managed by the system, including training data, vector stores, evaluation data, prompt templates, workflow, and agent definitions. Establish data privacy policies such as managing sensitive data (for example, personally identifiable information (PII) redaction), for the data managed by the system, protecting prompts and data and not using them to improve models.

Tools landscape

A variety of AWS services, AWS partner solutions, and third-party tools and frameworks are available to architect a comprehensive generative AI foundation. The following figure might not cover the entire gamut of tools, but we have created a landscape based on our experience with these tools.

Generative AI platform heatmap

Operational boundaries

One of the challenges to solve for is who owns the foundational components and how do they operate within the organization’s operating model. Let’s look at three common operating models:

  • Centralized – Operations are centralized to one team. Some organizations refer to this team as the platform team or platform engineering team. In this model, foundational components are managed by a central team and offered to LOBs and enterprise teams.

Centralized operating model

  • Decentralized – LOBs and teams build their respective systems and operate independently. The central team takes on a role of a Center of Excellence (COE) that defines best practices, standards, and governance frameworks. Logs and metrics can be aggregated in a central place.

Decentralized operating model

  • Federated – A more flexible model is a hybrid of the two. A central team manages the foundation that offers building blocks for model access, evaluation, guardrails, central logs, and metrics aggregation to teams. LOBs and teams use the foundational components but also build and manage their own components as necessary.

Federated operating model

Multi-tenant architecture

Irrespective of the operating model, it’s important to define how tenants are isolated and managed within the system. The multi-tenant pattern depends on a number of factors:

  • Tenant and data isolation – Data ownership is critical for building generative AI systems. A system should establish clear policies on data ownership and access rights, making sure data is accessible only to authorized users. Tenant data should be securely isolated from others to maintain privacy and confidentiality. This can be through physical isolation of data, for example, setting up isolated vector databases for each tenant for a RAG application, or by logical separation, for example, using different indexes within a shared database. Role-based access control should be set up to make sure users within a tenant can access resources and data specific to their organization.
  • Scalability and performance – Noisy neighbors can be a real problem, where one tenant is extremely chatty compared to others. Proper resource allocation according to tenant needs should be established. Containerization of workloads can be a good strategy to isolate and scale tenants individually. This also ties into the deployment strategy described later in this section, by means of which a chatty tenant can be completely isolated from others.
  • Deployment strategy – If strict isolation is required for use cases, then each tenant can have dedicated instances of compute, storage, and model access. This means gateway, data pipelines, data storage, training infrastructure, and other components are deployed on an isolated infrastructure per tenant. For tenants who don’t need strict isolation, shared infrastructure can be used and partitioning of resources can be achieved by a tenant identifier. A hybrid model can also be used, where the core foundation is deployed on shared infrastructure and specific components are isolated by tenant. The following diagram illustrates an example architecture.
  • Observability – A mature generative AI system should provide detailed visibility into operations at both the central and tenant-specific level. The foundation offers a central place for collecting logs, metrics, and traces, so you can set up reporting based on tenant needs.
  • Cost Management – A metered billing system should be set up based on usage. This requires establishing cost tracking based on resource usage of different components plus model inference costs. Model inference costs vary by models and by providers, but there should be a common mechanism of allocating costs per tenant. System administrators should be able to track and monitor usage across teams.

Multi tenant generative AI Platform federated architecture

Let’s break this down by taking a RAG application as an example. In the hybrid model, the tenant deployment contains instances of a vector database that stores the embeddings, which supports strict data isolation requirements. The deployment will additionally include the application layer that contains the frontend code and orchestration logic to take the user query, augment the prompt with context from the vector database, and invoke FMs on the central system. The foundational components that offer services such as evaluation and guardrails for applications to consume to build a production-ready application are in a separate shared deployment. Logs, metrics, and traces from the applications can be fed into a central aggregation place.

Generative AI foundation maturity model

We have defined a maturity model to track the evolution of the generative AI foundation across different stages of adoption. The maturity model can be used to assess where you are in the development journey and plan for expansion. We define the curve along four stages of adoption: emerging, advanced, mature, and established.

Generative AI platform maturity stages

The details for each stage are as follows:

  • Emerging – The foundation offers a playground for model exploration and assessment. Teams are able to develop proofs of concept using enterprise approved models.
  • Advanced – The foundation facilitates first production use cases. Multiple environments exist for development, testing, and production deployment. Monitoring and alerts are established.
  • Mature – Multiple teams are using the foundation and are able to develop complex use cases. CI/CD and infrastructure as code (IaC) practices accelerate the rollout of reusable components. Deeper observability such as tracing is established.
  • Established – A best-in-class system, fully automated and operating at scale, with governance and responsible AI practices, is established. The foundation enables diverse use cases, and is fully automated and governed. Most of the enterprise teams are onboarded on it.

The evolution might not be exactly linear along the curve in terms of specific capabilities, but certain key performance indicators can be used to evaluate the adoption and growth.

Generative AI platform maturity KPIs

Conclusion

Establishing a comprehensive generative AI foundation can be a critical step in harnessing the power of AI at scale. Enterprise AI development brings unique challenges ranging from agility, reliability, governance, scale, and collaboration. Therefore, a well-constructed foundation with the right components and adapted to the operating model of business aids in building and scaling generative AI applications across the enterprise.

The rapidly evolving generative AI landscape means there might be cutting-edge tools we haven’t covered under the tools landscape. If you’re using or aware of state-of-the art solutions that align with the foundational components, we encourage you to share them in the comments section.

Our team is dedicated to helping customers solve challenges in generative AI development at scale—whether it’s architecting a generative AI foundation, setting up operational best practices, or implementing responsible AI practices. Leave us a comment and we will be glad to collaborate.


About the authors

Chaitra Mathur is as a GenAI Specialist Solutions Architect at AWS. She works with customers across industries in building scalable generative AI platforms and operationalizing them. Throughout her career, she has shared her expertise at numerous conferences and has authored several blogs in the Machine Learning and Generative AI domains.

Dr. Alessandro Cerè is a GenAI Evaluation Specialist and Solutions Architect at AWS. He assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations. Bringing a unique perspective to the field of AI, Alessandro has a background in quantum physics and research experience in quantum communications and quantum memories. In his spare time, he pursues his passion for landscape and underwater photography.

Aamna Najmi is a GenAI and Data Specialist at AWS. She assists customers across industries and regions in operationalizing and governing their generative AI systems at scale, ensuring they meet the highest standards of performance, safety, and ethical considerations, bringing a unique perspective of modern data strategies to complement the field of AI. In her spare time, she pursues her passion of experimenting with food and discovering new places.

Dr. Andrew Kane is the WW Tech Leader for Security and Compliance for AWS Generative AI Services, leading the delivery of under-the-hood technical assets for customers around security, as well as working with CISOs around the adoption of generative AI services within their organizations. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors. He was the legal licensee in his ancient (AD 1468) English countryside village pub until early 2020.

Bharathi Srinivasan is a Generative AI Data Scientist at the AWS Worldwide Specialist Organization. She works on developing solutions for Responsible AI, focusing on algorithmic fairness, veracity of large language models, and explainability. Bharathi guides internal teams and AWS customers on their responsible AI journey. She has presented her work at various learning conferences.

Denis V. Batalov is a 17-year Amazon veteran and a PhD in Machine Learning, Denis worked on such exciting projects as Search Inside the Book, Amazon Mobile apps and Kindle Direct Publishing. Since 2013 he has helped AWS customers adopt AI/ML technology as a Solutions Architect. Currently, Denis is a Worldwide Tech Leader for AI/ML responsible for the functioning of AWS ML Specialist Solutions Architects globally. Denis is a frequent public speaker, you can follow him on Twitter @dbatalov.

Nick McCarthy is a Generative AI Specialist at AWS. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Outside of work he loves to spend time traveling, trying new cuisines and reading about science and technology. Nick has a Bachelors degree in Astrophysics and a Masters degree in Machine Learning.

Alex Thewsey is a Generative AI Specialist Solutions Architect at AWS, based in Singapore. Alex helps customers across Southeast Asia to design and implement solutions with ML and Generative AI. He also enjoys karting, working with open source projects, and trying to keep up with new ML research.

Willie Lee is a Senior Tech PM for the AWS worldwide specialists team focusing on GenAI. He is passionate about machine learning and the many ways it can impact our lives, especially in the area of language comprehension.

Read More

Using Amazon OpenSearch ML connector APIs

Using Amazon OpenSearch ML connector APIs

When ingesting data into Amazon OpenSearch, customers often need to augment data before putting it into their indexes. For instance, you might be ingesting log files with an IP address and want to get a geographic location for the IP address, or you might be ingesting customer comments and want to identify the language they are in. Traditionally, this requires an external process that complicates data ingest pipelines and can cause a pipeline to fail. OpenSearch offers a wide range of third-party machine learning (ML) connectors to support this augmentation.

This post highlights two of these third-party ML connectors. The first connector we demonstrate is the Amazon Comprehend connector. In this post, we show you how to use this connector to invoke the LangDetect API to detect the languages of ingested documents.

The second connector we demonstrate is the Amazon Bedrock connector to invoke the Amazon Titan Text Embeddings v2 model so that you can create embeddings from ingested documents and perform semantic search.

Solution overview

We use Amazon OpenSearch with Amazon Comprehend to demonstrate the language detection feature. To help you replicate this setup, we’ve provided the necessary source code, an Amazon SageMaker notebook, and an AWS CloudFormation template. You can find these resources in the sample-opensearch-ml-rest-api GitHub repo.

End-to-end document processing workflow using OpenSearch Service integrating with SageMaker notebooks and AWS AI services

The reference architecture shown in the preceding figure shows the components used in this solution. A SageMaker notebook is used as a convenient way to execute the code that is provided in the Github repository provided above.

Prerequisites

To run the full demo using the sample-opensearch-ml-rest-api, make sure you have an AWS account with access to:

Part 1: The Amazon Comprehend ML connector

Set up OpenSearch to access Amazon Comprehend

Before you can use Amazon Comprehend, you need to make sure that OpenSearch can call Amazon Comprehend. You do this by supplying OpenSearch with an IAM role that has access to invoke the DetectDominantLanguage API. This requires the OpenSearch Cluster to have fine grained access control enabled. The CloudFormation template creates a role for this called <Your Region>-<Your Account Id>-SageMaker-OpenSearch-demo-role. Use the following steps to attach this role to the OpenSearch cluster.

  1. Open the OpenSearch Dashboard console—you can find the URL in the output of the CloudFormation template—and sign in using the username and password you provided.OpenSearch Dashboards landing page featuring navigation sidebar, visualization tools, and data management options
  1. Choose Security in the left-hand menu (if you don’t see the menu, choose the three horizontal lines icon at the top left of the dashboard).OpenSearch security setup guide detailing role creation and user mapping processes with action buttons
  2. From the security menu, select Roles to manage the OpenSearch roles.OpenSearch roles dashboard with detailed permissions matrix showing security analytics, alerting, and snapshot management access controls
  3. In the search box. enter ml_full_access role.OpenSearch roles management screen with filtered view of ML full access role, showing detailed permissions and reserved status
  4. Select the Mapped users link to map the IAM role to this OpenSearch role.AWS IAM console displaying full access role with security restrictions and role duplication option
  5. On the Mapped users screen, choose Manage mapping to edit the current mappings.AWS IAM role management interface showing zero mapped users with creation and mapping controls
  6. Add the IAM role mentioned previously to map it to the ml_full_access role, this will allow OpenSearch to access the needed AWS resources from the ml-commons plugin. Enter your IAM role Amazon Resource Name (ARN) (arn:aws:iam::<your account id>:role/<your region>-<your account id>-SageMaker-OpenSearch-demo-role) in the backend roles field and choose Map.AWS IAM console showing user and backend role mapping options with explanations for role inheritance

Set up the OpenSearch ML connector to Amazon Comprehend

In this step, you set up the ML connector to connect Amazon Comprehend to OpenSearch.

  1. Get an authorization token to use when making the call to OpenSearch from the SageMaker notebook. The token uses an IAM role attached to the notebook by the CloudFormation template that has permissions to call OpenSearch. That same role is mapped to the OpenSearch admin role in the same way you just mapped the role to access Amazon Comprehend. Use the following code to set this up:
awsauth = AWS4Auth(credentials.access_key,
credentials.secret_key,
region,
'es',
session_token=credentials.token)
  1. Create the connector. It needs a few pieces of information:
    1. It needs a protocol. For this example, use aws_sigv4, which allows OpenSearch to use an IAM role to call Amazon Comprehend.
    2. Provide the ARN for this role, which is the same role you used to set up permissions for the ml_full_access role.
    3. Provide comprehend as the service_name, and DetectDominateLanguage as the api_name.
    4. Provide the URL to Amazon Comprehend and set up how to call the API and what data to pass to it.

The final call looks like:

comprehend = boto3.client('comprehend', region_name='us-east-1')
path = '/_plugins/_ml/connectors/_create'
url = host + path

payload = {
  "name": "Comprehend lang identification",
  "description": "comprehend model",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "roleArn": sageMakerOpenSearchRoleArn
  },
  "parameters": {
    "region": "us-east-1",
    "service_name": "comprehend",
    "api_version": "20171127",
    "api_name": "DetectDominantLanguage",
    "api": "Comprehend_${parameters.api_version}.${parameters.api_name}",
    "response_filter": "$"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://${parameters.service_name}.${parameters.region}.amazonaws.com",
      "headers": {
        "content-type": "application/x-amz-json-1.1",
        "X-Amz-Target": "${parameters.api}"
      },
      "request_body": "{"Text": "${parameters.Text}"}" 
    }
  ]
}

comprehend_connector_response = requests.post(url, auth=awsauth, json=payload)
comprehend_connector = comprehend_connector_response.json()["connector_id"]

Register the Amazon Comprehend API connector

The next step is to register the Amazon Comprehend API connector with OpenSearch using the Register Model API from OpenSearch.

  • Use the comprehend_connector that you saved from the last step.
path = '/_plugins/_ml/models/_register'
url = host + path

payload = {
    "name": "comprehend lang id API",
    "function_name": "remote",
    "description": "API to detect the language of text",
    "connector_id": comprehend_connector
}
headers = {"Content-Type": "application/json"}

response = requests.post(url, auth=awsauth, json=payload, headers=headers)
comprehend_model_id = response.json()['model_id']

As of OpenSearch 2.13, when the model is first invoked, it’s automatically deployed. Prior to 2.13 you would have to manually deploy the model within OpenSearch.

Test the Amazon Comprehend API in OpenSearch

With the connector in place, you need to test the API to make sure it was set up and configured correctly.

  1. Make the following call to OpenSearch.
path = '/_plugins/_ml/models/'+ comprehend_model_id + '/_predict'
url = host + path

headers = {"Content-Type": "application/json"}
payload = {
    "parameters": {
        "Text": "你知道厕所在哪里吗"
    }
}

response = requests.post(url, auth=awsauth, json=payload, headers=headers)
print(response.json())
  1. You should get the following result from the call, showing the language code as zh with a score of 1.0:
{
   "inference_results":[
      {
         "output":[
            {
               "name":"response",
               "dataAsMap":{
                  "response":{
                     "Languages":[
                        {
                           "LanguageCode":"zh",
                           "Score":1.0
                        }
                     ]
                  }
               }
            }
         ],
         "status_code":200
      }
   ]
}

Create an ingest pipeline that uses the Amazon Comprehend API to annotate the language

The next step is to create a pipeline in OpenSearch that calls the Amazon Comprehend API and adds the results of the call to the document being indexed. To do this, you provide both an input_map and an output_map. You use these to tell OpenSearch what to send to the API and how to handle what comes back from the call.

path = '/_ingest/pipeline/comprehend_language_identification_pipeline'
url = host + path

payload = {
  "description": "ingest identify lang with the comprehend API",
  "processors":[
    {
      "ml_inference": {
        "model_id": comprehend_model_id,
        "input_map": [
            {
               "Text": "Text"
            }
        ],
        "output_map": [
            {  
               "detected_language": "response.Languages[0].LanguageCode",
               "language_score": "response.Languages[0].Score"
            }
        ]
      }
    }
  ]
}
headers = {"Content-Type": "application/json"}
response = requests.put(url, auth=awsauth, json=payload, headers=headers)

You can see from the preceding code that you are pulling back both the top language result and its score from Amazon Comprehend and adding those fields to the document.

Part 2: The Amazon Bedrock ML connector

In this section, you use Amazon OpenSearch with Amazon Bedrock through the ml-commons plugin to perform a multilingual semantic search. Make sure that you have the solution prerequisites in place before attempting this section.

In the SageMaker instance that was deployed for you, you can see the following files: english.json, french.json, german.json.

These documents have sentences in their respective languages that talk about the term spring in different contexts. These contexts include spring as a verb meaning to move suddenly, as a noun meaning the season of spring, and finally spring as a noun meaning a mechanical part. In this section, you deploy Amazon Titan Text Embeddings model v2 using the ml connector for Amazon Bedrock. You then use this embeddings model to create vectors of text in three languages by ingesting the different language JSON files. Finally, these vectors are stored in Amazon OpenSearch to enable semantic searches to be used across the language sets.

Amazon Bedrock provides streamlined access to various powerful AI foundation models through a single API interface. This managed service includes models from Amazon and other leading AI companies. You can test different models to find the ideal match for your specific needs, while maintaining security, privacy, and responsible AI practices. The service enables you to customize these models with your own data through methods such as fine-tuning and Retrieval Augmented Generation (RAG). Additionally, you can use Amazon Bedrock to create AI agents that can interact with enterprise systems and data, making it a comprehensive solution for developing generative AI applications.

AWS architecture diagram showing document ingestion and processing flow between OpenSearch, SageMaker Notebook, and Bedrock ML

The reference architecture in the preceding figure shows the components used in this solution.

(1) First we must create the OpenSearch ML connector via running code within the Amazon SageMaker notebook. The connector essentially creates a Rest API call to any model, we specifically want to create a connector to call the Titan Embeddings model within Amazon Bedrock.

(2) Next, we must create an index to later index our language documents into. When creating an index, you can specify its mappings, settings, and aliases.

(3) After creating an index within Amazon OpenSearch, we want to create an OpenSearch Ingestion pipeline that will allow us to streamline data processing and preparation for indexing, making it easier to manage and utilize the data. (4) Now that we have created an index and set up a pipeline, we can start indexing our documents into the pipeline.

(5 – 6) We use the pipeline in OpenSearch that calls the Titan Embeddings model API. We send our language documents to the titan embeddings model, and the model returns vector embeddings of the sentences.

(7) We store the vector embeddings within our index and perform vector semantic search.

While this post highlights only specific areas of the overall solution, the SageMaker notebook has the code and instructions to run the full demo yourself.

Before you can use Amazon Bedrock, you need to make sure that OpenSearch can call Amazon Bedrock. .

Load sentences from the JSON documents into dataframes

Start by loading the JSON document sentences into dataframes for more structured organization. Each row can contain the text, embeddings, and additional contextual information:

import json
import pandas as pd

def load_sentences(file_name):
    sentences = []
    with open(file_name, 'r', encoding='utf-8') as file:
        for line in file:
            try:
                data = json.loads(line)
                if 'sentence' in data and 'sentence_english' in data:
                    sentences.append({
                        'sentence': data['sentence'],
                        'sentence_english': data['sentence_english']
                    })
            except json.JSONDecodeError:
                # Skip lines that are not valid JSON (like the index lines)
                continue
    
    return pd.DataFrame(sentences)

# Usage
german_df = load_sentences('german.json')
english_df = load_sentences('english.json')
french_df = load_sentences('french.json')
# print(french_df.head())

Create the OpenSearch ML connector to Amazon Bedrock

After loading the JSON documents into dataframes, you’re ready to set up the OpenSearch ML connector to connect Amazon Bedrock to OpenSearch.

  1. The connector needs the following information.
    1. It needs a protocol. For this solution, use aws_sigv4, which allows OpenSearch to use an IAM role to call Amazon Bedrock.
    2. Provide the same role used earlier to set up permissions for the ml_full_access role.
    3. Provide the service_name, model, dimensions of the model, and embedding type.

The final call looks like the following:

payload = {
  "name": "Amazon Bedrock Connector: embedding",
  "description": "The connector to bedrock Titan embedding model",
  "version": 1,
  "protocol": "aws_sigv4",
  "parameters": {
    "region": "us-east-1",
    "service_name": "bedrock",
    "model": "amazon.titan-embed-text-v2:0",
    "dimensions": 1024,
    "normalize": True,
    "embeddingTypes": ["float"]
  },
  "credential": {
    "roleArn": sageMakerOpenSearchRoleArn
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://bedrock-runtime.${parameters.region}.amazonaws.com/model/${parameters.model}/invoke",
      "headers": {
        "content-type": "application/json",
        "x-amz-content-sha256": "required"
      },
      "request_body": "{ "inputText": "${parameters.inputText}", "dimensions": ${parameters.dimensions}, "normalize": ${parameters.normalize}, "embeddingTypes": ${parameters.embeddingTypes} }",
      "pre_process_function": "connector.pre_process.bedrock.embedding",
      "post_process_function": "connector.post_process.bedrock.embedding"
    }
  ]
}

bedrock_connector_response = requests.post(url, auth=awsauth, json=payload, headers=headers)

bedrock_connector_3 = bedrock_connector_response.json()["connector_id"]
print('Connector id: ' + bedrock_connector_3)

Test the Amazon Titan Embeddings model in OpenSearch

After registering and deploying the Amazon Titan Embeddings model using the Amazon Bedrock connector, you can test the API to verify that it was set up and configured correctly. To do this, make the following call to OpenSearch:

headers = {"Content-Type": "application/json"}
payload = {
  "parameters": {
    "inputText": "It's nice to see the flowers bloom and hear the birds sing in the spring"
  }
}
response = requests.post(url, auth=awsauth, json=payload, headers=headers)
print(response.json())

You should get a formatted result, similar to the following, from the call that shows the generated embedding from the Amazon Titan Embeddings model:

{'inference_results': [{'output': [{'name': 'sentence_embedding', 'data_type': 'FLOAT32', 'shape': [1024], 'data': [-0.04092199727892876, 0.052057236433029175, -0.03354490175843239, 0.04398418962955475, -0.001235315459780395, -0.03284895047545433, -0.014197427779436111, 0.0098129278048…

The preceding result is significantly shortened compared to the actual embedding result you might receive. The purpose of this snippet is to show you the format.

Create the index pipeline that uses the Amazon Titan Embeddings model

Create a pipeline in OpenSearch. You use this pipeline to tell OpenSearch to send the fields you want embeddings for to the embeddings model.

pipeline_name = "titan_embedding_pipeline_v2"
url = f"{host}/_ingest/pipeline/{pipeline_name}"

pipeline_body = {
    "description": "Titan embedding pipeline",
    "processors": [
        {
            "text_embedding": {
                "model_id": bedrock_model_id,
                "field_map": {
                    "sentence": "sentence_vector"
                }
            }
        }
    ]
}

response = requests.put(url, auth=awsauth, json=pipeline_body, headers={"Content-Type": "application/json"})
print(response.text)

Create an index

With the pipeline in place, the next step is to create an index that will use the pipeline. There are three fields in the index:

  • sentence_vector – This is where the vector embedding will be stored when returned from Amazon Bedrock.
  • sentence – This is the non-English language sentence.
  • sentence_english – this is the English translation of the sentence. Include this to see how well the model is translating the original sentence.
index_name = 'bedrock-knn-index-v2'
url = f'{host}/{index_name}'
mapping = {
    "mappings": {
        "properties": {
            "sentence_vector": {
                "type": "knn_vector",
                "dimension": 1024,  
                "method": {
                    "name": "hnsw",
                    "space_type": "l2",
                    "engine": "nmslib"
                },
                "store":True
            },
            "sentence":{
                "type": "text",
                "store": True
            },
            "sentence_english":{
                "type": "text",
                "store": True
            }
        }
    },
    "settings": {
        "index": {
            "knn": True,
            "knn.space_type": "cosinesimil",
            "default_pipeline": pipeline_name
        }
    }
}

response = requests.put(url, auth=awsauth, json=mapping, headers={"Content-Type": "application/json"})
print(f"Index creation response: {response.text}")

Load dataframes into the index

Earlier in this section, you loaded the sentences from the JSON documents into dataframes. Now, you can index the documents and generate embeddings for them using the Amazon Titan Text Embeddings Model v2. The embeddings will be stored in the sentence_vector field.

index_name = "bedrock-knn-index-v2"

def index_documents(df, batch_size=100):
    total = len(df)
    for start in range(0, total, batch_size):
        end = min(start + batch_size, total)
        batch = df.iloc[start:end]

        bulk_data = []
        for _, row in batch.iterrows():
            # Prepare the action metadata
            action = {
                "index": {
                    "_index": index_name
                }
            }
            # Prepare the document data
            doc = {
                "sentence": row['sentence'],
                "sentence_english": row['sentence_english']
            }
            
            # Add the action and document to the bulk data
            bulk_data.append(json.dumps(action))
            bulk_data.append(json.dumps(doc))

        # Join the bulk data with newlines
        bulk_body = "n".join(bulk_data) + "n"

        # Send the bulk request
        bulk_url = f"{host}/_bulk"
        response = requests.post(bulk_url, auth=awsauth, data=bulk_body, headers={"Content-Type": "application/x-ndjson"})

        if response.status_code == 200:
            print(f"Successfully indexed batch {start}-{end} of {total}")
        else:
            print(f"Error indexing batch {start}-{end} of {total}: {response.text}")

        # Optional: add a small delay to avoid overwhelming the cluster
        time.sleep(1)

# Index your documents
print("Indexing German documents:")
index_documents(german_df)
print("nIndexing English documents:")
index_documents(english_df)
print("nIndexing French documents:")
index_documents(french_df)

Perform semantic k-NN across the documents

The final step is to perform a k-nearest neighbor (k-NN) search across the documents.

# Define your OpenSearch host and index name
index_name = "bedrock-knn-index-v2"
def semantic_search(query_text, k=5):
    search_url = f"{host}/{index_name}/_search"
    # First, index the query to generate its embedding
    index_doc = {
        "sentence": query_text,
        "sentence_english": query_text  # Assuming the query is in English
    }
    index_url = f"{host}/{index_name}/_doc"
    index_response = requests.post(index_url, auth=awsauth, json=index_doc, headers={"Content-Type": "application/json"})
    
    if index_response.status_code != 201:
        print(f"Failed to index query document: {index_response.text}")
        return []
    
    # Retrieve the indexed query document to get its vector
    doc_id = index_response.json()['_id']
    get_url = f"{host}/{index_name}/_doc/{doc_id}"
    get_response = requests.get(get_url, auth=awsauth)
    query_vector = get_response.json()['_source']['sentence_vector']
    
    # Now perform the KNN search
    search_query = {
        "size": 30,
        "query": {
            "knn": {
                "sentence_vector": {
                    "vector": query_vector,
                    "k": 30
                }
            }
        },
        "_source": ["sentence", "sentence_english"]
    }

    search_response = requests.post(search_url, auth=awsauth, json=search_query, headers={"Content-Type": "application/json"})
    
    if search_response.status_code != 200:
        print(f"Search failed with status code {search_response.status_code}")
        print(search_response.text)
        return []

    # Clean up - delete the temporary query document
    delete_url = f"{host}/{index_name}/_doc/{doc_id}"
    requests.delete(delete_url, auth=awsauth)

    return search_response.json()['hits']['hits']

# Example usage
query = "le soleil brille"
results = semantic_search(query)

if results:
    print(f"Search results for: '{query}'")
    for result in results:
        print(f"Score: {result['_score']}")
        print(f"Sentence: {result['_source']['sentence']}")
        print(f"English: {result['_source']['sentence_english']}")
        print()
else:
    print("No results found or search failed.")

The example query is in French and can be translated to the sun is shining. Keeping in mind that the JSON documents have sentences that use spring in different contexts, you’re looking for query results and vector matches of sentences that use spring in the context of the season of spring.

Here are some of the results from this query:

Search results for: ' le soleil brille'
Score: 0.40515712
Sentence: Les premiers rayons de soleil au printemps réchauffent la terre.
English: The first rays of spring sunshine warm the earth.

Score: 0.40117615
Sentence: Die ersten warmen Sonnenstrahlen kitzeln auf der Haut im Frühling.
English: The first warm sun rays tickle the skin in spring.

Score: 0.3999985
Sentence: Die ersten Sonnenstrahlen im Frühling wecken die Lebensgeister.
English: The first rays of sunshine in spring awaken the spirits.

This shows that the model can provide results across all three languages. It is important to note that the confidence scores for these results might be low because you’ve only ingested a couple documents with a handful of sentences in each for this demo. To increase confidence scores and accuracy, ingest a robust dataset with multiple languages and plenty of sentences for reference.

Clean Up

To avoid incurring future charges, go to the AWS Management Console for CloudFormation console and delete the stack you deployed. This will terminate the resources used in this solution.

Benefits of using the ML connector for machine learning model integration with OpenSearch

There are many ways you can perform k-nn semantic vector searches; a popular methods is to deploy external Hugging Face sentence transformer models to a SageMaker endpoint. The following are the benefits of using the ML connector approach we showed in this post, and why should you use it instead of deploying models to a SageMaker endpoint:

  • Simplified architecture
    • Single system to manage
    • Native OpenSearch integration
    • Simpler deployment
    • Unified monitoring
  • Operational benefits
    • Less infrastructure to maintain
    • Built-in scaling with OpenSearch
    • Simplified security model
    • Straightforward updates and maintenance
  • Cost efficiency
    • Single system costs
    • Pay-per-use Amazon Bedrock pricing
    • No endpoint management costs
    • Simplified billing

Conclusion

Now that you’ve seen how you can use the OpenSearch ML connector to augment your data with external REST calls, we recommend that you visit the GitHub repo if you haven’t already and walk through the full demo yourselves. The full demo shows how you can use Amazon Comprehend for language detection and how to use Amazon Bedrock for multilingual semantic vector search, using the ml-connector plugin for both use cases. It also has sample text and JSON documents to ingest so you can see how the pipeline works.


About the Authors

John Trollinger photo

John Trollinger is a Principal Solutions Architect supporting the World Wide Public Sector with a focus on OpenSearch and Data Analytics. John has been working with public sector customers over the past 25 years helping them deliver mission capabilities. Outside of work, John likes to collect AWS certifications and compete in triathlons.

Shwetha Radhakrishnan photo

Shwetha Radhakrishnan is a Solutions Architect for Amazon Web Services (AWS) with a focus in Data Analytics & Machine Learning. She has been building solutions that drive cloud adoption and help empower organizations to make data-driven decisions within the public sector. Outside of work, she loves dancing, spending time with friends and family, and traveling.

Read More

Bridging the gap between development and production: Seamless model lifecycle management with Amazon Bedrock

Bridging the gap between development and production: Seamless model lifecycle management with Amazon Bedrock

In the landscape of generative AI, organizations are increasingly adopting a structured approach to deploy their AI applications, mirroring traditional software development practices. This approach typically involves separate development and production environments, each with its own AWS account, to create logical separation, enhance security, and streamline workflows.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. As organizations scale their AI initiatives, they often face challenges in efficiently managing and deploying custom models across different stages of development and across geographical regions.

To address these challenges, Amazon Bedrock introduces two key features: Model Share and Model Copy. These features are designed to streamline the AI development lifecycle, from initial experimentation to global production deployment. They enable seamless collaboration between development and production teams, facilitate efficient resource utilization, and help organizations maintain control and security throughout the customized model lifecycle.

In this comprehensive blog post, we’ll dive deep into the Model Share and Model Copy features, exploring their functionalities, benefits, and practical applications in a typical development-to-production scenario.

Prerequisites for Model Copy and Model Share

Before you can start using Model Copy and Model Share, the following prerequisites must be fulfilled:

  1. AWS Organizations setup: Both the source account (the account sharing the model) and the target account (the account receiving the model) must be part of the same organization. You’ll need to create an organization if you don’t have one already, enable resource sharing, and invite the relevant accounts.
  2. IAM permissions: 
  3. KMS key policies (Optional): If your models are encrypted with a customer-managed KMS key, you’ll need to set up key policies to allow the target account to decrypt the shared model or to encrypt the copied model with a specific KMS key.
  4. Network configuration: Make sure that the necessary network configurations are in place, especially if you’re using VPC endpoints or have specific network security requirements.
  5. Service quotas: Check and, if necessary, request increases for the number of custom models per account service quotas in both the source and target Regions and accounts.
  6. Provisioned throughput support: Verify that the target Region supports provisioned throughput for the model you intend to copy. This is crucial because the copy job will be rejected if provisioned throughput isn’t supported in the target Region.

Model Share: Streamlining development-to-production workflows

The following figure shows the architecture of Model Share and Model Copy. It consists of a source account where the model is fined tuned. Next, Amazon Bedrock shares it with the recipient account which accepts the shared model in AWS Resource Access Manager (RAM). Then, the shared model can be copied to the desired AWS Region.

Architecture diagram for Model Share and Model Copy.

When managing Amazon Bedrock custom models in a development-to-production pipeline, it’s essential to securely share these models across different AWS accounts to streamline the promotion process to higher environments. The Amazon Bedrock Model Share feature addresses this need, enabling smooth sharing between development and production environments. Model Share enables the sharing of custom models fine-tuned on Amazon Bedrock between different AWS accounts within the same Region and organization. This feature is particularly useful for organizations that maintain separate development and production environments.

Important considerations: 

  • Both the source and target AWS accounts must be in the same organization.
  • Only models that have been fine-tuned within Amazon Bedrock can be shared.
  • Base models and custom models imported using the custom model import (CMI) cannot be shared directly. For these, use the standard model import process in each AWS account.
  • When sharing encrypted models, use a customer-managed KMS key and attach a key policy that allows the recipient account to decrypt the shared model. Specify the recipient account in the Principal field of the key policy.

Key benefits: 

  • Simplified development-to-production transitions: Quickly move fine-tuned models on Amazon Bedrock from development to production environments.
  • Enhanced team collaboration: Share models across different departments or project teams.
  • Resource optimization: Reduce duplicate model customization efforts across your organization.

How it works: 

  1. After a model has been fine-tuned in the source AWS account using Amazon Bedrock, the source AWS account can use the AWS Management Console for Amazon Bedrock to share the model.
  2. The target AWS account accepts the shared model in AWS RAM.
  3. The shared model in the target AWS account needs to be copied to the desired Regions.
  4. After copying, the target AWS account can purchase provisioned throughput and use the model.
  5. If using KMS encryption, make sure the key policy is properly set up for the recipient account.

Model Copy: Optimizing model deployment across Regions

The Amazon Bedrock Model Copy feature enables you to replicate custom models across different Regions within your account. This capability serves two primary purposes: it can be used independently for single-account deployments, or it can complement Model Share in multi-account scenarios, where you first share the model across accounts and then copy it. The feature is particularly valuable for organizations that require global model deployment, Regional load balancing, and robust disaster recovery solutions. By allowing flexible model distribution across Regions, Model Copy helps optimize your AI infrastructure for both performance and reliability.

Important considerations: 

  • Make sure the target Region supports provisioned throughput for the model being copied. If provision throughput isn’t supported, the copy job will be rejected.
  • Be aware of the costs associated with storing and using copied models in multiple Regions. Consult the Amazon Bedrock pricing page for detailed information.
  • When used after Model Share for cross-account scenarios, first accept the shared model, then initiate the cross-Region copy within your account.
  • Regularly review and optimize your multi-Region deployment strategy to balance performance needs with cost considerations.
  • When copying encrypted models, use a customer-managed KMS key and attach a key policy that allows the role used for copying to encrypt the model. Specify the role in the Principal field of the key policy.

Key benefits of Model Copy: 

  • Reduced latency: Deploy models closer to end-users in different geographical locations to minimize response times.
  • Increased availability: Enhance the overall availability and reliability of your AI applications by having models accessible in multiple Regions.
  • Improved disaster recovery: Facilitate easier implementation of disaster recovery strategies by maintaining model replicas across different Regions.
  • Support for Regional compliance: Align with data residency requirements by deploying models in specific Regions as needed.

How it works: 

  1. Identify the target Region where you want to deploy your model.
  2. Use the Amazon Bedrock console to initiate the Model Copy process from the source Region to the target Region.
  3. After the model has been copied, purchase provisioned throughput for the model in each Region where you want to use it.
  4. If using KMS encryption, make sure the key policy is properly set up for the role performing the copy operation.

Use cases: 

  • Single-account deployment: Use Model Copy to replicate models across Regions within the same AWS account for improved global performance.
  • Multi-account deployment: After using Model Share to transfer a model from a development to a production account, use Model Copy to distribute the model across Regions in the production account.

By using Model Copy, either on its own or in tandem with Model Share, you can create a robust, globally distributed AI infrastructure. This flexibility offers low-latency access to your custom models across different geographical locations, enhancing the performance and reliability of your AI-powered applications regardless of your account structure.

Aligning Model Share and Model Copy with AWS best practices

When implementing Model Share and Model Copy, it’s crucial to align these features with AWS best practices for multi-account environments. AWS recommends setting up separate accounts for development and production, which makes Model Share particularly valuable for transitioning models between these environments. Consider how these features interact with your organizational structure, especially if you have separate organizational units (OUs) for security, infrastructure, and workloads. Key considerations include:

  • Maintaining compliance with policies set at the OU level.
  • Using Model Share and Model Copy in the continuous integration and delivery (CI/CD) pipeline of your organization.
  • Using AWS billing features for cost management across accounts.
  • For disaster recovery within the same AWS account, use Model Copy. When implementing disaster recovery across multiple AWS accounts, use both Model Share and Model Copy.

By aligning Model Share and Model Copy with these best practices, you can enhance security, compliance, and operational efficiency in your AI model lifecycle management. For more detailed guidance, see the AWS Organizations documentation.

From development to production: A practical use case

Let’s walk through a typical scenario where Model Copy and Model Share can be used to streamline the process of moving a custom model from development to production.

Workflow for Amazon Bedrock Copy and Model Share.

Step 1: Model development (development account)

In the development account, data scientists fine-tune a model on Amazon Bedrock. The process typically involves:

  1. Experimenting with different FMs
  2. Performing prompt engineering
  3. Fine-tuning the selected model with domain-specific data
  4. Evaluating model performance on the specific task
  5. Applying Amazon Bedrock Guardrails to make sure that the model meets ethical and regulatory standards

The following example fine-tunes an Amazon Titan Text Express model in the US East (N. Virginia) Region (us-east-1).

# Example: Fine-tuning a model in the development account
import boto3
bedrock = boto3.client(service_name='bedrock')
    
# Set parameters
customizationType = "FINE_TUNING"
baseModelIdentifier = "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-text-express-v1"
roleArn = "${your-customization-role-arn}"
jobName = "MyFineTuningJob"
customModelName = "MyCustomModel"
hyperParameters = {
        "epochCount": "1",
        "batchSize": "1",
        "learningRate": ".0005",
        "learningRateWarmupSteps": "0"
    }
trainingDataConfig = {"s3Uri": "s3://${training-bucket}/myInputData/train.jsonl"}
outputDataConfig = {"s3Uri": "s3://${output-bucket}/myOutputData"}

# Create job
response = bedrock.create_model_customization_job(
    jobName=jobName, 
    customModelName=customModelName,
    roleArn=roleArn,
    baseModelIdentifier=baseModelIdentifier,
    hyperParameters=hyperParameters,
    trainingDataConfig=trainingDataConfig,
    outputDataConfig=outputDataConfig
)
 
job_arn = response['jobArn']
print(f"Model customization job created: {job_arn}")

Step 2: Model evaluation and selection

After the model is fine-tuned, the development team evaluates its performance and decides if it’s ready for production use.

# Example: Evaluating the fine-tuned model
bedrock_runtime = boto3.client('bedrock-runtime')
 
response = bedrock_runtime.invoke_model(
    modelId=customModelName,
    contentType="application/json",
    accept="application/json",
    body=json.dumps({
        "prompt": "Your LLM as judge prompt go here",
        "max_tokens_to_sample": 500
    })
)
 
result = json.loads(response['body'].read())
print(f"Model output: {result['completion']}")

Step 3: Model sharing (development to production account)

After the model is approved for production use, the development team uses Model Share to make it available to the production account. Remember, this step is only applicable for fine-tuned models created within Amazon Bedrock, not for custom models imported using custom model import.

# Example: Sharing the model with the production account
ram = boto3.client('ram')

response = ram.create_resource_share(
    name='financial-analyst-model-share',
    resourceArns=['arn:aws:bedrock:us-east-1:{dev-account-id}:model/custom-financial-analyst-model'],
    principals=['production-account-id'],
    allowExternalPrincipals=False
)
 
share_arn = response['resourceShare']['resourceShareArn']
print(f"Resource share created: {share_arn}"

Step 4: Model Copy (production account)

The production team, now with access to the shared model, must first copy the model to their desired Region before they can use it. This step is necessary even for shared models, because sharing alone doesn’t make the model usable in the target account.

# Example: Copying the model to the production account's desired region
bedrock = boto3.client('bedrock', region_name='us-west-2')
 
# Check if the target region supports provisioned throughput for this model
# This check is not provided by the API and would need to be implemented separately
 
response = bedrock.create_model_copy_job(
    sourceModelArn='arn:aws:bedrock:us-east-1:{dev-account-id}:model/custom-financial-analyst-model',
    targetModelName='financial-analyst-model-us-west-2',
    targetRegion='us-west-2'
)
 
job_arn = response['jobArn']
print(f"Model copy job created: {job_arn}"

Step 5: Production deployment

Finally, after the model has been successfully copied, the production team can purchase provisioned throughput and set up the necessary infrastructure for inference.

# Example: Setting up provisioned throughput and inference endpoint in production
bedrock = boto3.client('bedrock', region_name='us-west-2')
 
# Purchase provisioned throughput
response = bedrock.create_provisioned_model_throughput(
    modelId='financial-analyst-model-us-west-2',
    provisionedUnits=1
)
 
# Set up inference endpoint
response = bedrock.create_model_invocation_endpoint(
    modelId='financial-analyst-model-us-west-2',
    endpointName='financial-analyst-endpoint',
    instanceType='ml.g4dn.xlarge',
    instanceCount=2
)
 
endpoint_arn = response['endpointArn']
print(f"Inference endpoint created: {endpoint_arn}" 

Conclusion

Amazon Bedrock Model Copy and Model Share features provide a powerful option for managing the lifecycle of an AI application from development to production. These features enable organizations to:

  • Streamline the transition from experimentation to deployment
  • Enhance collaboration between development and production teams
  • Optimize model performance and availability on a global scale
  • Maintain security and compliance throughout the model lifecycle

As the field of AI continues to evolve, these tools are crucial for organizations to stay agile, efficient, and competitive. Remember, the journey from development to production is iterative, requiring continuous monitoring, evaluation, and refinement of models to maintain ongoing effectiveness and alignment with business needs.

By implementing the best practices and considerations outlined in this post, you can create a robust, secure, and efficient workflow for managing your AI models across different environments and Regions. This approach will accelerate your AI development process and maximize the value of your investments in model customization and fine tuning. With the features provided by Amazon Bedrock, you’re well-equipped to navigate the complexities of AI model management and deployment successfully.


About the Authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Neeraj Lamba is a Cloud Infrastructure Architect with Amazon Web Services (AWS) Worldwide Public Sector Professional Services. He helps customers transform their business by helping design their cloud solutions and offering technical guidance. Outside of work, he likes to travel, play Tennis and experimenting with new technologies.

Read More