Large language models predominate, both as a research subject themselves and as tools for researching topics of particular interest to Amazon, such as speech, recommendations, and information retrieval.Read More
Governing ML lifecycle at scale: Best practices to set up cost and usage visibility of ML workloads in multi-account environments
Cloud costs can significantly impact your business operations. Gaining real-time visibility into infrastructure expenses, usage patterns, and cost drivers is essential. This insight enables agile decision-making, optimized scalability, and maximizes the value derived from cloud investments, providing cost-effective and efficient cloud utilization for your organization’s future growth. What makes cost visibility even more important for the cloud is that cloud usage is dynamic. This requires continuous cost reporting and monitoring to make sure costs don’t exceed expectations and you only pay for the usage you need. Additionally, you can measure the value the cloud delivers to your organization by quantifying the associated cloud costs.
For a multi-account environment, you can track costs at an AWS account level to associate expenses. However, to allocate costs to cloud resources, a tagging strategy is essential. A combination of an AWS account and tags provides the best results. Implementing a cost allocation strategy early is critical for managing your expenses and future optimization activities that will reduce your spend.
This post outlines steps you can take to implement a comprehensive tagging governance strategy across accounts, using AWS tools and services that provide visibility and control. By setting up automated policy enforcement and checks, you can achieve cost optimization across your machine learning (ML) environment.
Implement a tagging strategy
A tag is a label you assign to an AWS resource. Tags consist of a customer-defined key and an optional value to help manage, search for, and filter resources. Tag keys and values are case sensitive. A tag value (for example, Production
) is also case sensitive, like the keys.
It’s important to define a tagging strategy for your resources as soon as possible when establishing your cloud foundation. Tagging is an effective scaling mechanism for implementing cloud management and governance strategies. When defining your tagging strategy, you need to determine the right tags that will gather all the necessary information in your environment. You can remove tags when they’re no longer needed and apply new tags whenever required.
Categories for designing tags
Some of the common categories used for designing tags are as follows:
- Cost allocation tags – These help track costs by different attributes like department, environment, or application. This allows reporting and filtering costs in billing consoles based on tags.
- Automation tags – These are used during resource creation or management workflows. For example, tagging resources with their environment allows automating tasks like stopping non-production instances after hours.
- Access control tags – These enable restricting access and permissions based on tags. AWS Identity and Access Management (IAM) roles and policies can reference tags to control which users or services can access specific tagged resources.
- Technical tags – These provide metadata about resources. For example, tags like
environment
orowner
help identify technical attributes. The AWS reserved prefixaws: tags
provide additional metadata tracked by AWS. - Compliance tags – These may be needed to adhere to regulatory requirements, such as tagging with classification levels or whether data is encrypted or not.
- Business tags – These represent business-related attributes, not technical metadata, such as cost centers, business lines, and products. This helps track spending for cost allocation purposes.
A tagging strategy also defines a standardized convention and implementation of tags across all resource types.
When defining tags, use the following conventions:
- Use all lowercase for consistency and to avoid confusion
- Separate words with hyphens
- Use a prefix to identify and separate AWS generated tags from third-party tool generated tags
Tagging dictionary
When defining a tagging dictionary, delineate between mandatory and discretionary tags. Mandatory tags help identify resources and their metadata, regardless of purpose. Discretionary tags are the tags that your tagging strategy defines, and they should be made available to assign to resources as needed. The following table provides examples of a tagging dictionary used for tagging ML resources.
Tag Type | Tag Key | Purpose | Cost Allocation | Mandatory |
Workload | anycompany:workload:application-id |
Identifies disparate resources that are related to a specific application | Y | Y |
Workload | anycompany:workload:environment |
Distinguishes between dev , test , and production |
Y | Y |
Financial | anycompany:finance:owner |
Indicates who is responsible for the resource, for example SecurityLead , SecOps , Workload-1-Development-team |
Y | Y |
Financial | anycompany:finance:business-unit |
Identifies the business unit the resource belongs to, for example Finance , Retail , Sales , DevOps , Shared |
Y | Y |
Financial | anycompany:finance:cost-center |
Indicates cost allocation and tracking, for example 5045 , Sales-5045 , HR-2045 |
Y | Y |
Security | anycompany:security:data-classification |
Indicates data confidentiality that the resource supports | N | Y |
Automation | anycompany:automation:encryption |
Indicates if the resource needs to store encrypted data | N | N |
Workload | anycompany:workload:name |
Identifies an individual resource | N | N |
Workload | anycompany:workload:cluster |
Identifies resources that share a common configuration or perform a specific function for the application | N | N |
Workload | anycompany:workload:version |
Distinguishes between different versions of a resource or application component | N | N |
Operations | anycompany:operations:backup |
Identifies if the resource needs to be backed up based on the type of workload and the data that it manages | N | N |
Regulatory | anycompany:regulatory:framework |
Requirements for compliance to specific standards and frameworks, for example NIST, HIPAA, or GDPR | N | N |
You need to define what resources require tagging and implement mechanisms to enforce mandatory tags on all necessary resources. For multiple accounts, assign mandatory tags to each one, identifying its purpose and the owner responsible. Avoid personally identifiable information (PII) when labeling resources because tags remain unencrypted and visible.
Tagging ML workloads on AWS
When running ML workloads on AWS, primary costs are incurred from compute resources required, such as Amazon Elastic Compute Cloud (Amazon EC2) instances for hosting notebooks, running training jobs, or deploying hosted models. You also incur storage costs for datasets, notebooks, models, and so on stored in Amazon Simple Storage Service (Amazon S3).
A reference architecture for the ML platform with various AWS services is shown in the following diagram. This framework considers multiple personas and services to govern the ML lifecycle at scale. For more information about the reference architecture in detail, see Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker.
The reference architecture includes a landing zone and multi-account landing zone accounts. These should be tagged to track costs for governance and shared services.
The key contributors towards recurring ML cost that should be tagged and tracked are as follows:
- Amazon DataZone – Amazon DataZone allows you to catalog, discover, govern, share, and analyze data across various AWS services. Tags can be added at an Amazon DataZone domain and used for organizing data assets, users, and projects. Usage of data is tracked through the data consumers, such as Amazon Athena, Amazon Redshift, or Amazon SageMaker.
- AWS Lake Formation – AWS Lake Formation helps manage data lakes and integrate them with other AWS analytics services. You can define metadata tags and assign them to resources like databases and tables. This identifies teams or cost centers responsible for those resources. Automating resource tags when creating databases or tables with the AWS Command Line Interface (AWS CLI) or SDKs provides consistent tagging. This enables accurate tracking of costs incurred by different teams.
- Amazon SageMaker – Amazon SageMaker uses a domain to provide access to an environment and resources. When a domain is created, tags are automatically generated with a DomainId key by SageMaker, and administrators can add a custom ProjectId Together, these tags can be used for project-level resource isolation. Tags on a SageMaker domain are automatically propagated to any SageMaker resources created in the domain.
- Amazon SageMaker Feature Store – Amazon SageMaker Feature Store allows you to tag your feature groups and search for feature groups using tags. You can add tags when creating a new feature group or edit the tags of an existing feature group.
- Amazon SageMaker resources – When you tag SageMaker resources such as jobs or endpoints, you can track spending based on attributes like project, team, or environment. For example, you can specify tags when creating the SageMaker Estimator that launches a training job.
Using tags allows you to incur costs that align with business needs. Monitoring expenses this way gives insight into how budgets are consumed.
Enforce a tagging strategy
An effective tagging strategy uses mandatory tags and applies them consistently and programmatically across AWS resources. You can use both reactive and proactive approaches for governing tags in your AWS environment.
Proactive governance uses tools such as AWS CloudFormation, AWS Service Catalog, tag policies in AWS Organizations, or IAM resource-level permissions to make sure you apply mandatory tags consistently at resource creation. For example, you can use the CloudFormation Resource Tags property to apply tags to resource types. In Service Catalog, you can add tags that automatically apply when you launch the service.
Reactive governance is for finding resources that lack proper tags using tools such as the AWS Resource Groups tagging API, AWS Config rules, and custom scripts. To find resources manually, you can use Tag Editor and detailed billing reports.
Proactive governance
Proactive governance uses the following tools:
- Service catalog – You can apply tags to all resources created when a product launches from the service catalog. The service catalog provides a TagOptions Use this to define the tag key-pairs to associate with the product.
- CloudFormation Resource Tags – You can apply tags to resources using the AWS CloudFormation Resource Tags property. Tag only those resources that support tagging through AWS CloudFormation.
- Tag policies – Tag policies standardize tags across your organization’s account resources. Define tagging rules in a tag policy that apply when resources get tagged. For example, specify that a CostCenter tag attached to a resource must match the case and values the policy defines. Also specify that noncompliant tagging operations on some resources get enforced, preventing noncompliant requests from completing. The policy doesn’t evaluate untagged resources or undefined tags for compliance. Tag policies involve working with multiple AWS services:
- To enable the tag policies feature, use AWS Organizations. You can create tag policies and then attach those policies to organization entities to put the tagging rules into effect.
- Use AWS Resource Groups to find noncompliant tags on account resources. Correct the noncompliant tags in the AWS service where you created the resource.
- Service Control Policies – You can restrict the creation of an AWS resource without proper tags. Use Service Control Policies (SCPs) to set guardrails around requests to create resources. SCPs allow you to enforce tagging policies on resource creation. To create an SCP, navigate to the AWS Organizations console, choose Policies in the navigation pane, then choose Service Control Policies.
Reactive governance
Reactive governance uses the following tools:
- AWS Config rules – Check resources regularly for improper tagging. The AWS Config rule required-tags examines resources to make sure they contain specified tags. You should take action when resources lack necessary tags.
- AWS Resource Groups tagging API – The AWS Resource Groups Tagging API lets you tag or untag resources. It also enables searching for resources in a specified AWS Region or account using tag-based filters. Additionally, you can search for existing tags in a Region or account, or find existing values for a key within a specific Region or account. To create a resource tag group, refer to Creating query-based groups in AWS Resource Groups.
- Tag Editor – With Tag Editor, you build a query to find resources in one or more Regions that are available for tagging. To find resources to tag, see Finding resources to tag.
SageMaker tag propagation
Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. SageMaker Studio automatically copies and assign tags to the SageMaker Studio notebooks created by the users, so you can track and categorize the cost of SageMaker Studio notebooks.
Amazon SageMaker Pipelines allows you to create end-to-end workflows for managing and deploying SageMaker jobs. Each pipeline is composed of a sequence of steps that transform data into a trained model. Tags can be applied to pipelines similarly to how they are used for other SageMaker resources. When a pipeline is run, its tags can potentially propagate to the underlying jobs launched as part of the pipeline steps.
When models are registered in Amazon SageMaker Model Registry, tags can be propagated from model packages to other related resources like endpoints. Model packages in the registry can be tagged when registering a model version. These tags become associated with the model package. Tags on model packages can potentially propagate to other resources that reference the model, such as endpoints created using the model.
Tag policy quotas
The number of policies that you can attach to an entity (root, OU, and account) is subject to quotas for AWS Organizations. See Quotas and service limits for AWS Organizations for the number of tags that you can attach.
Monitor resources
To achieve financial success and accelerate business value realization in the cloud, you need complete, near real-time visibility of cost and usage information to make informed decisions.
Cost organization
You can apply meaningful metadata to your AWS usage with AWS cost allocation tags. Use AWS Cost Categories to create rules that logically group cost and usage information by account, tags, service, charge type, or other categories. Access the metadata and groupings in services like AWS Cost Explorer, AWS Cost and Usage Reports, and AWS Budgets to trace costs and usage back to specific teams, projects, and business initiatives.
Cost visualization
You can view and analyze your AWS costs and usage over the past 13 months using Cost Explorer. You can also forecast your likely spending for the next 12 months and receive recommendations for Reserved Instance purchases that may reduce your costs. Using Cost Explorer enables you to identify areas needing further inquiry and to view trends to understand your costs. For more detailed cost and usage data, use AWS Data Exports to create exports of your billing and cost management data by selecting SQL columns and rows to filter the data you want to receive. Data exports get delivered on a recurring basis to your S3 bucket for you to use with your business intelligence (BI) or data analytics solutions.
You can use AWS Budgets to set custom budgets that track cost and usage for simple or complex use cases. AWS Budgets also lets you enable email or Amazon Simple Notification Service (Amazon SNS) notifications when actual or forecasted cost and usage exceed your set budget threshold. In addition, AWS Budgets integrates with Cost Explorer.
Cost allocation
Cost Explorer enables you to view and analyze your costs and usage data over time, up to 13 months, through the AWS Management Console. It provides premade views displaying quick information about your cost trends to help you customize views suiting your needs. You can apply various available filters to view specific costs. Also, you can save any view as a report.
Monitoring in a multi-account setup
SageMaker supports cross-account lineage tracking. This allows you to associate and query lineage entities, like models and training jobs, owned by different accounts. It helps you track related resources and costs across accounts. Use the AWS Cost and Usage Report to track costs for SageMaker and other services across accounts. The report aggregates usage and costs based on tags, resources, and more so you can analyze spending per team, project, or other criteria spanning multiple accounts.
Cost Explorer allows you to visualize and analyze SageMaker costs from different accounts. You can filter costs by tags, resources, or other dimensions. You can also export the data to third-party BI tools for customized reporting.
Conclusion
In this post, we discussed how to implement a comprehensive tagging strategy to track costs for ML workloads across multiple accounts. We discussed implementing tagging best practices by logically grouping resources and tracking costs by dimensions like environment, application, team, and more. We also looked at enforcing the tagging strategy using proactive and reactive approaches. Additionally, we explored the capabilities within SageMaker to apply tags. Lastly, we examined approaches to provide visibility of cost and usage for your ML workloads.
For more information about how to govern your ML lifecycle, see Part 1 and Part 2 of this series.
About the authors
Gunjan Jain, an AWS Solutions Architect based in Southern California, specializes in guiding large financial services companies through their cloud transformation journeys. He expertly facilitates cloud adoption, optimization, and implementation of Well-Architected best practices. Gunjan’s professional focus extends to machine learning and cloud resilience, areas where he demonstrates particular enthusiasm. Outside of his professional commitments, he finds balance by spending time in nature.
Ram Vittal is a Principal Generative AI Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, reliable and scalable GenAI/ML systems to help enterprise customers improve their business outcomes. In his spare time, he rides motorcycle and enjoys walking with his dogs!
Automate invoice processing with Streamlit and Amazon Bedrock
Invoice processing is a critical yet often cumbersome task for businesses of all sizes, especially for large enterprises dealing with invoices from multiple vendors with varying formats. The sheer volume of data, coupled with the need for accuracy and efficiency, can make invoice processing a significant challenge. Invoices can vary widely in format, structure, and content, making efficient processing at scale difficult. Traditional methods relying on manual data entry or custom scripts for each vendor’s format can not only lead to inefficiencies, but can also increase the potential for errors, resulting in financial discrepancies, operational bottlenecks, and backlogs.
To extract key details such as invoice numbers, dates, and amounts, we use Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
In this post, we provide a step-by-step guide with the building blocks needed for creating a Streamlit application to process and review invoices from multiple vendors. Streamlit is an open source framework for data scientists to efficiently create interactive web-based data applications in pure Python. We use Anthropic’s Claude 3 Sonnet model in Amazon Bedrock and Streamlit for building the application front-end.
Solution overview
This solution uses the Amazon Bedrock Knowledge Bases chat with document feature to analyze and extract key details from your invoices, without needing a knowledge base. The results are shown in a Streamlit app, with the invoices and extracted information displayed side-by-side for quick review. Importantly, your document and data are not stored after processing.
The storage layer uses Amazon Simple Storage Service (Amazon S3) to hold the invoices that business users upload. After uploading, you can set up a regular batch job to process these invoices, extract key information, and save the results in a JSON file. In this post, we save the data in JSON format, but you can also choose to store it in your preferred SQL or NoSQL database.
The application layer uses Streamlit to display the PDF invoices alongside the extracted data from Amazon Bedrock. For simplicity, we deploy the app locally, but you can also run it on Amazon SageMaker Studio, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Container Service (Amazon ECS) if needed.
Prerequisites
To perform this solution, complete the following:
- Create and activate an AWS account. Make sure your AWS credentials are configured correctly.
- This tutorial assumes you have the necessary AWS Identity and Access Management (IAM) permissions.
- Install AWS Command Line Interface (AWS CLI)
- Configure AWS CLI and set the AWS Region to where you would like to run this invoice processor by following the Set up AWS temporary credentials and AWS Region for development documentation. The Region you choose must have Amazon Bedrock and Anthropic’s Claude 3 Sonnet model available.
- Install Python 3.7 or later on your local machine.
- Access to Anthropic’s Claude 3 Sonnet in Amazon Bedrock.
Install dependencies and clone the example
To get started, install the necessary packages on your local machine or on an EC2 instance. If you’re new to Amazon EC2, refer to the Amazon EC2 User Guide. This tutorial we will use the local machine for project setup.
To install dependencies and clone the example, follow these steps:
- Clone the repository into a local folder:
- Install Python dependencies
- Navigate to the project directory:
- Upgrade pip
- (Optional) Create a virtual environment isolate dependencies:
- Activate the virtual environment:
- Mac/Linux:
- Windows:
- In the cloned directory, invoke the following to install the necessary Python packages:
This will install the necessary packages, including Boto3 (AWS SDK for Python), Streamlit, and other dependencies.
- Update the
region
in theconfig.yaml
file to the same Region set for your AWS CLI where Amazon Bedrock and Anthropic’s Claude 3 Sonnet model are available.
After completing these steps, the invoice processor code will be set up in your local environment and will be ready for the next stages to process invoices using Amazon Bedrock.
Process invoices using Amazon Bedrock
Now that the environment setup is done, you’re ready to start processing invoices and deploying the Streamlit app. To process invoices using Amazon Bedrock, follow these steps:
Store invoices in Amazon S3
Store invoices from different vendors in an S3 bucket. You can upload them directly using the console, API, or as part of your regular business process. Follow these steps to upload using the CLI:
- Create an S3 bucket:
Replace
your-bucket-name
with the name of the bucket you created andyour-region
with the Region set for your AWS CLI and inconfig.yaml
(for example,us-east-1
) - Upload invoices to S3 bucket. Use one of the following commands to upload the invoice to S3.
- To upload invoices to the root of the bucket:
- To upload invoices to a specific folder (for example,
invoices
): - Validate the upload:
Process invoices with Amazon Bedrock
In this section, you will process the invoices in Amazon S3 and store the results in a JSON file (processed_invoice_output.json
). You will extract the key details from the invoices (such as invoice numbers, dates, and amounts) and generate summaries.
You can trigger the processing of these invoices using the AWS CLI or automate the process with an Amazon EventBridge rule or AWS Lambda trigger. For this walkthrough, we will use the AWS CLI to trigger the processing.
We packaged the processing logic in the Python script invoices_processor.py
, which can be run as follows:
The --prefix
argument is optional. If omitted, all of the PDFs in the bucket will be processed. For example:
or
Use the solution
This section examines the invoices_processor.py
code. You can chat with your document either on the Amazon Bedrock console or by using the Amazon Bedrock RetrieveAndGenerate API (SDK). In this tutorial, we use the API approach.
-
- Initialize the environment: The script imports the necessary libraries and initializes the Amazon Bedrock and Amazon S3 client.
- Configure : The
config.yaml
file specifies the model ID, Region, prompts for entity extraction, and the output file location for processing. - Set up API calls: The
RetrieveAndGenerate
API fetches the invoice from Amazon S3 and processes it using the FM. It takes several parameters, such as prompt, source type (S3), model ID, AWS Region, and S3 URI of the invoice. - Batch processing: The
batch_process_s3_bucket_invoices
function batch process the invoices in parallel in the specified S3 bucket and writes the results to the output file (processed_invoice_output.json
as specified byoutput_file
inconfig.yaml
). It relies on the process_invoice function, which calls the Amazon Bedrock RetrieveAndGenerate API for each invoice and prompt. - Post-processing: The extracted data in
processed_invoice_output.json
can be further structured or customized to suit your needs.
This approach allows invoice handling from multiple vendors, each with its own unique format and structure. By using large language models (LLMs), it extracts important details such as invoice numbers, dates, amounts, and vendor information without requiring custom scripts for each vendor format.
Run the Streamlit demo
Now that you have the components in place and the invoices processed using Amazon Bedrock, it’s time to deploy the Streamlit application. You can launch the app by invoking the following command:
When the app is up, it will open in your default web browser. From there, you can review the invoices and the extracted data side-by-side. Use the Previous and Next arrows to seamlessly navigate through the processed invoices so you can interact with and analyze the results efficiently. The following screenshot shows the UI.
There are quotas for Amazon Bedrock (of which some are adjustable) that you need to consider when building at scale with Amazon Bedrock.
Cleanup
To clean up after running the demo, follow these steps:
- Delete the S3 bucket containing your invoices using the command
- If you set up a virtual environment, deactivate it by invoking
deactivate
- Remove any local files created during the process, including the cloned repository and output files
- If you used any AWS resources such as an EC2 instance, terminate them to avoid unnecessary charges
Conclusion
In this post, we walked through a step-by-step guide to automating invoice processing using Streamlit and Amazon Bedrock, addressing the challenge of handling invoices from multiple vendors with different formats. We showed how to set up the environment, process invoices stored in Amazon S3, and deploy a user-friendly Streamlit application to review and interact with the processed data.
If you are looking to further enhance this solution, consider integrating additional features or deploying the app on scalable AWS services such as Amazon SageMaker, Amazon EC2, or Amazon ECS. Due to this flexibility, your invoice processing solution can evolve with your business, providing long-term value and efficiency.
We encourage you to learn more by exploring Amazon Bedrock, Access Amazon Bedrock foundation models, RetrieveAndGenerate API, and Quotas for Amazon Bedrock and building a solution using the sample implementation provided in this post and a dataset relevant to your business. If you have questions or suggestions, leave a comment.
About the Authors
Deepika Kumar is a Solution Architect at AWS. She has over 13 years of experience in the technology industry and has helped enterprises and SaaS organizations build and securely deploy their workloads on the cloud securely. She is passionate about using Generative AI in a responsible manner whether that is driving product innovation, boost productivity or enhancing customer experiences.
Jobandeep Singh is an Associate Solution Architect at AWS specializing in Machine Learning. He supports customers across a wide range of industries to leverage AWS, driving innovation and efficiency in their operations. In his free time, he enjoys playing sports, with a particular love for hockey.
Ratan Kumar is a solutions architect based out of Auckland, New Zealand. He works with large enterprise customers helping them design and build secure, cost-effective, and reliable internet scale applications using the AWS cloud. He is passionate about technology and likes sharing knowledge through blog posts and twitch sessions.
Centralize model governance with SageMaker Model Registry Resource Access Manager sharing
We recently announced the general availability of cross-account sharing of Amazon SageMaker Model Registry using AWS Resource Access Manager (AWS RAM), making it easier to securely share and discover machine learning (ML) models across your AWS accounts.
Customers find it challenging to share and access ML models across AWS accounts because they have to set up complex AWS Identity and Access Management (IAM) policies and create custom integrations. With this launch, customers can now seamlessly share and access ML models registered in SageMaker Model Registry between different AWS accounts.
Customers can use the SageMaker Studio UI or APIs to specify the SageMaker Model Registry model to be shared and grant access to specific AWS accounts or to everyone in the organization. Authorized users can then quickly discover and use those shared models in their own AWS accounts. This streamlines the ML workflows, enables better visibility and governance, and accelerates the adoption of ML models across the organization.
In this post, we will show you how to use this new cross-account model sharing feature to build your own centralized model governance capability, which is often needed for centralized model approval, deployment, auditing, and monitoring workflows. Before we dive into the details of the architecture for sharing models, let’s review what use case and model governance is and why it’s needed.
Use case governance is essential to help ensure that AI systems are developed and used in ways that respect values, rights, and regulations. According to the EU AI Act, use case governance refers to the process of overseeing and managing the development, deployment, and use of AI systems in specific contexts or applications. This includes:
- Risk assessment: Identifying and evaluating potential risks associated with AI systems.
- Mitigation strategies: Implementing measures to minimize or eliminate risks.
- Transparency and explainability: Making sure that AI systems are transparent, explainable, and accountable.
- Human oversight: Including human involvement in AI decision-making processes.
- Monitoring and evaluation: Continuously monitoring and evaluating AI systems to help ensure compliance with regulations and ethical standards.
Model governance involves overseeing the development, deployment, and maintenance of ML models to help ensure that they meet business objectives and are accurate, fair, and compliant with regulations. It includes processes for monitoring model performance, managing risks, ensuring data quality, and maintaining transparency and accountability throughout the model’s lifecycle. In AWS, these model lifecycle activities can be performed over multiple AWS accounts (for example, development, test, and production accounts) at the use case or business unit level. However, model governance functions in an organization are centralized and to perform those functions, teams need access to metadata about model lifecycle activities across those accounts for validation, approval, auditing, and monitoring to manage risk and compliance.
Use case and model governance plays a crucial role in implementing responsible AI and helps with the reliability, fairness, compliance, and risk management of ML models across use cases in the organization. It helps prevent biases, manage risks, protect against misuse, and maintain transparency. By establishing robust oversight, organizations can build trust, meet regulatory requirements, and help ensure ethical use of AI technologies.
Use case and model lifecycle governance overview
In the context of regulations such as the European Union’s Artificial Intelligence Act (EU AI Act), a use case refers to a specific application or scenario where AI is used to achieve a particular goal or solve a problem. The EU AI Act proposes to regulate AI systems based on their intended use cases, which are categorized into four levels of risk:
- Unacceptable risk: Significant threat to safety, livelihoods, or rights
- High risk: Significant impacts on lives (for example, use of AI in healthcare and transportation)
- Limited risk: Minimal impacts (for example, chatbots and virtual assistants)
- Minimal risk: Negligible risks (for example, entertainment and gaming)
An AI system is built to satisfy a use case such as credit risk, which can be comprised of workflows orchestrated with one or more ML models—such as credit risk and fraud detection models. You can build a use case (or AI system) using existing models, newly built models, or combination of both. Regardless of how the AI system is built, governance will be applied at the AI system level where use case decisions (for example, denying a loan application) are being made. However, explaining why that decision was made requires next-level detailed reports from each affected model component of that AI system. Therefore, governance applies both at the use case and model level and is driven by each of their lifecycle stages.
Use case lifecycle stages
A use case has its own set of lifecycle stages from development through deployment to production, shown in the following figure. A use case typically starts with an experimentation or proof-of-concept (POC) stage where the idea is explored for feasibility. When the use case is determined to be feasible, it’s approved and moves to the next stage for development. The use case is then developed using various components including ML models and unit testing, and then moved to the next stage—quality assurance (QA)—after approval. Next, the use case is tested, validated, and approved to be moved to the pre-production stage where it’s A/B tested with production-like settings and approved for the next stage. Now, the use case is deployed and operational in production. When the use case is no longer needed for business, it’s retired and decommissioned. Even though these stages are depicted as linear in the diagram, they are frequently iterative.
Model lifecycle stages
When an ML model is developed it goes through a similar set of lifecycle stages as a use case. In the case of an ML model, shown in the following figure, the lifecycle starts with the development or candidate model. Prior to that stage, there would be several experiments performed to build the candidate model. From a governance perspective, tracking starts from the candidate or dev model stage. After approval in dev, the model moves into the QA stage where it’s validated and integration tested to make sure that it meets the use case requirements and then is approved for promotion to the next stage. The model is then A/B tested along with the use case in pre-production with production-like data settings and approved for deployment to the next stage. The model is finally deployed to production. When the model is no longer needed, it’s retired and removed from deployed endpoints.
Stage status types
In the preceding use case and model stages discussion, we mentioned approving the model to go to the next stage. However, there are two other possible states—pending and rejected, as depicted in the following figure. These stages are applicable to both use case and model stages. For example, a use case that’s been moved from the QA stage to pre-production could be rejected and sent back to the development stage for rework because of missing documentation related to meeting certain regulatory controls.
Multi-account architecture for sharing models
A multi-account strategy improves security, scalability, and reliability of your systems. It also helps achieve data, project, and team isolation while supporting software development lifecycle best practices. Cross-account model sharing supports a multi-account strategy, removing the overhead of assuming roles into multiple accounts. Furthermore, sharing model resources directly across multiple accounts helps improve ML model approval, deployment, and auditing.
The following diagram depicts an architecture for centralizing model governance using AWS RAM for sharing models using a SageMaker Model Group, a core construct within SageMaker Model Registry where you register your model version.
In the architecture presented in the preceding figure, the use case stakeholder, data scientist (DS) and ML engineer (MLE) perform the following steps:
- The use case stakeholder, that is the DS team lead, receives the request to build an AI use case such as credit risk from their line of business lead.
- The DS team lead records the credit risk use case in the POC stage in the stage governance table.
- The MLE is notified to set up a model group for new model development. The MLE creates the necessary infrastructure pipeline to set up a new model group.
- The MLE sets up the pipeline to share the model group with the necessary permissions (create and update the model version) to the ML project team’s development account. Optionally, this model group can also be shared with their test and production accounts if local account access to model versions is needed.
- The DS uses SageMaker Training jobs to generate metrics captured by , selects a candidate model, and registers the model version inside the shared model group in their local model registry.
- Because this is a shared model group, the actual model version will be recorded in the shared services account model registry and a link will be maintained in the development account. The Amazon S3 model artifacts associated to the model will be copied to the shared services account when the model is registered in the shared services model registry.
- The model group and associated model version will be synced into the model stage governance Amazon DynamoDB table with attributes such as model group, model version, model stage (development, test, production, and so on), model status (pending, approved, or rejected), and model metrics (in JSON format). The ML admin sets up this table with the necessary attributes based on their central governance requirements.
- The model version is approved for deployment into the test stage and is deployed into the test account along with necessary infrastructure for invoking the model, such as an Amazon API gateway and AWS Lambda
- Model is integration tested in the test environment and model test metrics are updated in the model stage governance table
- Model test results are validated, and the model version is approved for deployment into the production stage and is deployed into the production account along with the necessary infrastructure for invoking the model such as an API gateway and Lambda functions.
- The model is A/B tested or optionally shadow tested in the production environment and model production metrics are updated in the model stage governance table. When satisfactory production results are attained, the model version is rolled out in the production environment.
- The model governance (compliance) officer uses the governance dashboard to act on model governance functions such as reviewing the model to validate compliance and monitoring for risk mitigation.
Building a central model registry using model group resource sharing
Model group resource sharing makes it possible to build a central model registry with few clicks or API calls without needing to write complex IAM policies. We will demonstrate how to set up a central model registry based on the architecture we described in the previous sections. We will start by using the SageMaker Studio UI and then by using APIs. In both cases, we will demonstrate how to create a model package group in the ML Shared Services account (Account A) and share it with the ML Dev account (Account B) so that any updates to model versions in Account B automatically update the corresponding model versions in Account A.
Prerequisites
You need to have the following prerequisites in place to implement the solution in this post.
- Two AWS accounts: one for development and another for shared services
- IAM Role with access to create, update and delete SageMaker resources
- Optional: enable Resource Sharing within AWS Organizations
After you have the prerequisites set up, start by creating and sharing a model group across accounts. The basic steps are:
- In Account A, create a model group.
- In Account A, create a resource share for the model group, and then attach permissions and specify the target account to share the resource. Permissions can be standard or custom.
- Account B should accept the resource sharing invitation to start using the shared resource from Account A.
- Optionally, if Account A and Account B are part of the same AWS Organizations, and the resource sharing is enabled within AWS Organizations, then the resource sharing invitation are auto accepted without any manual intervention.
Create and share a model group across accounts using SageMaker Studio
The following section shows how to use SageMaker Studio to share models in a multi-account environment to build a central model registry. The following are instructions for using the AWS Management Console for SageMaker Studio to create a model package group in the shared services account, adding the necessary permissions, with the ML Dev account.
To use the console to create and share a model package:
- In the SageMaker Studio console, sign in to Account A and navigate to the model registry, select the model package group (in this example, the
credit-risk-package-group-1724904598
), and choose Share. - In Account A, select the appropriate permissions to share the model package group with Account B. If you need to allow custom policy, navigate to the AWS RAM console and create the policy.
- After selecting the permission policy, specify Account B (and any other accounts) to share the resource, then choose Share.
- In Account B, navigate to the model registry, choose Shared with me, and then choose View pending approvals to see the model shared from Account A.
- Accept the model invitation from Account A to access the shared model package group and its versions. When accounts are set up in the same organization, invitations will be accepted without requiring user intervention.
Create and share the model group across accounts using APIs
The following section shows how to use APIs to share models in a multi-account environment to build a central model registry. Create a model package group in the ML Shared Services account (Account A) and share it with the ML Dev account (Account B).
Following are the steps completed by using APIs to create and share a model package group across accounts.
- In Account A, create a model package group.
- In Account A, if needed, create custom sharing permissions; otherwise use standard sharing permissions.
- In Account A, create a resource share for the model package group, attach permissions, and specify the target account to share the resource.
- In Account B, accept the resource sharing invitation to start using the resource.
- If Account A and B are part of the same organization, then the resource sharing invitation can be accepted without any manual intervention.
Run the following code in the ML Shared Services account (Account A).
Run the following code in the ML Dev account (Account B).
MLflow experimentation with the shared model group
The following section shows you how to use Amazon SageMaker with MLflow to track your experiments in the development account and save candidate models in the shared model group while developing a credit risk model. It’s a binary classification problem where the goal is to predict whether a customer is a credit risk. If you want to run the code in your own environment, check out the notebook in this GitHub repository.
SageMaker with MLflow is a capability of SageMaker that you can use to create, manage, analyze, and compare your ML experiments. To get started with MLflow, you need to set up an MLflow tracking server to monitor your experiments and runs. You can set up the server programmatically or by using the SageMaker Studio UI. It can take up to 20 minutes for the setup to complete. The following code snippet shows how to create a tracking server.
To set up an MLflow tracking server in SageMaker Studio, choose the MLflow application icon. When your server is running, click on the ellipses button and then click on Open MLflow button to open the MLflow UI.
Now that your MLflow tracking server is running, you can start tracking your experiments. MLflow tracking allows you to programmatically track the inputs, parameters, configurations, and models of your iterations as experiments
and runs
.
- Runs are executions of some piece of data science code and record metadata and generated artifacts.
- An experiment collects multiple runs with the same objective.
The following code shows you how to set up an experiment and track your executions while developing the credit risk model.
Data preparation
For this example, you will use the South German Credit dataset open source dataset. To use the dataset to train the model, you need to first do some pre-processing, You can run the pre-processing code in your JupyterLab application or on a SageMaker ephemeral cluster as a SageMaker Training job using the @remote
decorator. In both cases, you can track your experiments using MLflow.
The following code demonstrates how to track your experiments when executing your code on a SageMaker ephemeral cluster using the @remote
decorator. To get started, set-up a name for your experiment.
The processing script creates a new MLflow active experiment by calling the mlflow.set_experiment()
method with the experiment name above. After that, it invokes mlflow.start_run()
to launch an MLflow run
under that experiment.
You can also log the input dataset and the sklearn model used to fit the training set during pre-processing as part of the same script.
In the MLflow UI, use the Experiments to locate your experiment. Its name should start with “credit-risk-model-experiment”.
Click on the experiment name to reveal the table with the associated Runs and then click on the Run whose name starts with “remote-processing”. You will see its details as sin the following figure.
Click on the Artifacts tab to see the MLFlow model generated.
Model training
You can continue experimenting with different feature engineering techniques in your JupyterLab environment and track your experiments in MLflow. After you have completed the data preparation step, it’s time to train the classification model. You can use the xgboost algorithm for this purpose and run your code either in your JupyterLab environment or as a SageMaker Training job. Again, you can track your experiments using MLflow in both cases. The following example shows how to use MLflow with a SageMaker Training job in your code. You can use the method mlflow.autolog()
to log metrics, parameters, and models without the need for explicit log statements.
In addition, you can use the mlflow.log_artifact()
method to save the model.tar.gz
file in MLflow so that you can directly use it later when you register the model to the model registry.
Navigate back to the MLflow UI. Click on the name of your experiment at the top of your screen starting with “credit-risk-model-experiment” to see the updated Runs table. Click on the name of your remote-training Run to see the overview of the training run including the associated hyperparameters, model metrics, and generated model artifacts.
The following figure shows the overview of a training run.
Click on the Model metrics tab to view the metrics tracked during the training run. The figure below shows the metrics of a training run.
Click on the Artifacts tab to view the artifacts generated during the training run. The following figure shows an example of the generated artifacts.
Registering the model to the model registry
ML experimentation is an iterative process and you typically end up with a number of candidate models. With MLflow, you can compare these models to identify the one that you want to move to quality assurance for approval. The following is an example of how to retrieve the best candidate using the MLflow API based on a specific metric.
After you have selected a model, you can register it to the shared model group in the shared services account. You can discover the model groups that are available to you either through the SageMaker Studio UI or programmatically.
The final step is to register the candidate model to the model group as a new model version.
Design considerations for use case and model stage governance
Use case and model stage governance is a construct to track governance information of a use case or model across various stages in its journey to production. Also, periodic tracking of key model performance and drift metrics is used to surface those metrics for governance functions.
There are several use case and model stage governance attributes that need to be tracked, such as the following:
- Use case ID: Unique OD of the use case.
- Use case name: Name of the use case.
- Use case stage: Current stage of the use case. For example, proof of concept, development, QA, and so on.
- Model group: SageMaker model group name.
- Model version: SageMaker model version name.
- Model owner: Person or entity who owns the model.
- Model LoB: Model owner’s line of business.
- Model project: Project or use case that the model is part of.
- Model stage: Stage where the model version is deployed. For example, development, test, or production.
- Model status: Status of the model version in a given stage. For example, pending or approved.
- Model risk: Risk categorization of the model version. For example, high, medium, or low.
- Model validation metrics: Model validation metrics in JSON format.
- Model monitoring metrics: Model monitoring metrics in JSON format. This needs to include the endpoint from which this metrics was captured.
- Model audit timestamp: Timestamp when this record was updated.
- Model audit user: User who updated this record.
Create a use case or model stage governance construct with the preceding set of attributes and drive your deployment and governance workflows using this table. Next, we will describe the design considerations for deployment and governance workflows.
Design considerations for deployment and governance workflows
Following are the design consideration for the deployment and governance workflows:
- The model version is built in the development account and registered with pending status in the central model registry or model group.
- A sync process is triggered to capture the key model attributes, derive additional governance attributes, and create a development stage record in the model governance stage table. Model artifacts from the development account are synced into the central model registry account.
- The model owner approves the model version in the development stage for deployment to the test stage in the central model registry.
- A deployment pipeline is triggered and the model is deployed to the test environment and a new test stage record is created for that model version.
- The model version is tested and validated in the test environment and validation metrics are captured in the test stage record in the model governance stage construct.
- The governance officer verifies the model validation results and approves the model version for deployment to production. The production stage record is created for the model version in the model governance stage table.
- A deployment pipeline is triggered and the model is deployed to the production environment and the production stage record model status is updated to deployed for that model version.
- After the model monitoring jobs are set up, model inference metrics are periodically captured and aggregated and model metrics are updated in model stage governance table.
- The use case stage value is updated to the next stage when all models for that use case are approved in the previous stage.
Conclusion
In this post, we have discussed how to centralize your use case and model governance function in a multi-account environment using the new model group sharing feature of SageMaker Model Registry. We shared an architecture for setting up central use case and model governance and walked through the steps involved in building that architecture. We provided practical guidance for setting up cross-account model group sharing using SageMaker Studio and APIs. Finally, we discussed key design considerations for building the centralized use case and model governance functions to extend the native SageMaker capabilities. We encourage you to try this model-sharing feature along with centralizing your use case and model governance functions. You can leave feedback in the comments section.
About the authors
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle.
Anastasia Tzeveleka is a Senior Generative AI/ML Specialist Solutions Architect at AWS. As part of her work, she helps customers across EMEA build foundation models and create scalable generative AI and machine learning solutions using AWS services.
Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.
Madhubalasri B. is a Software Development Engineer at Amazon Web Services (AWS), focusing on the SageMaker Model Registry and machine learning governance domain. She has expertise in cross-account access and model sharing, ensuring secure, scalable, and compliant deployment of machine learning models. Madhubalasri is dedicated to driving innovation in ML governance and optimizing model management processes
Saumitra Vikaram is a Senior Software Engineer at AWS. He is focused on AI/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.
Keshav Chandakis a Software Engineer at AWS with a focus on the SageMaker Repository Service. He specializes in developing capabilities to enhance governance and management of ML models.
Revolutionize trip planning with Amazon Bedrock and Amazon Location Service
Have you ever stumbled upon a breathtaking travel photo and instantly wondered where it was and how to get there? With 1.3 billion international arrivals in 2023, international travel is poised to exceed pre-pandemic levels and break tourism records in the coming years. Each one of these millions of travelers need to plan where they’ll stay, what they’ll see, and how they’ll get from place to place. This is where AWS and generative AI can revolutionize the way we plan and prepare for our next adventure. With the significant developments in the field of generative AI, intelligent applications powered by foundation models (FMs) can help users map out an itinerary through an intuitive natural conversation interface. It’s like having your own personal travel agent whenever you need it.
Amazon Bedrock is the place to start when building applications that will amaze and inspire your users. Amazon Bedrock is a fully managed service that empowers developers with an uncomplicated solution to build and scale generative AI applications by offering a choice of high-performing FMs from leading companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities that you need to build generative AI applications with security, privacy, and responsible AI. It enables you to privately customize the FM of your choice with your data using techniques such as fine-tuning, prompt engineering, and retrieval augmented generation (RAG) and build agents that run tasks using your enterprise systems and data sources while adhering to security and privacy requirements.
In this post, we show you how to build a generative AI-powered trip-planning service that revolutionizes the way travelers discover and explore destinations. By using advanced AI technology and Amazon Location Service, the trip planner lets users translate inspiration into personalized travel itineraries. This innovative service goes beyond traditional trip planning methods, offering real-time interaction through a chat-based interface and maintaining scalability, reliability, and data security through AWS native services.
Architecture
The following figure shows the architecture of the solution.
The workflow of the solution uses the following steps.
- A user interacts with an AWS Amplify frontend to initiate a trip planning request, either through text or by uploading an image. The user can access and interact with the generated trip itinerary through the frontend application, which includes visualizations on maps powered by Amazon Location Service and Amplify.
- If an image is uploaded, it is stored in Amazon Simple Storage Service (Amazon S3), and a custom AWS Lambda function will use a machine learning model deployed on Amazon SageMaker to analyze the image to extract a list of place names and the similarity score of each place name. It will then return the place name with the highest similarity score. The user’s request is sent to AWS API Gateway, which triggers a Lambda function to interact with Amazon Bedrock using Anthropic’s Claude Instant V1 FM to process the user’s request and generate a natural language response of the place location.
- If the user interacts using text, it will trigger the Amazon Bedrock FM directly, providing the natural language response of the place location.
- Amazon Location Service is integrated to provide precise location (location coordinates) data based on the place name. If the user prompt consists of suggestions such as searching for places of interests (POIs), it will pinpoint these POIs on the map within the chat interface as well.
- A Lambda function combines the generative AI response from Amazon Bedrock with the location data from Amazon Location Service to create a personalized and context-aware trip itinerary.
- The conversation history of the user is stored in Amazon DynamoDB.
Core benefits of Amazon Bedrock and Amazon Location Service
Amazon Bedrock provides capabilities to build generative AI applications with security, privacy, and responsible AI practices. Being serverless, it allows secure integration and deployment of generative AI capabilities without managing infrastructure.
Amazon Location Service offers cost-effective, high-quality location-based services. It provides geospatial data based on coordinates, enabling accurate mapping, geofencing, and tracking capabilities for various applications. With a single API across multiple providers, it offers seamless integration, flexibility, and efficient application development with built-in health monitoring and AWS service integration.
By integrating Amazon Bedrock with Amazon Location Service, the virtual trip planning application uses the strengths of both services. Amazon Bedrock enables the use of top FMs for specific use cases and customization for generating contextual responses, while Amazon Location Service provides location data and mapping capabilities. This integration offers tailored trip recommendations through engaging responses powered by generative AI and intuitive visualization on maps.
Key features
Other currently available search engines often require multiple customer touch points and actions to gather information; this virtual trip planner streamlines the process into a seamless, intuitive experience. With a few clicks, users can access location coordinates, personalized itineraries, and real-time assistance, eliminating the need for cumbersome navigation across various sites and internet tabs. These features are presented in a web UI that was designed as a one-stop solution for our users. The following figure shows the start of a trip-planning chat.
Within this innovative generative AI solution, there’s a key feature of chat-based natural language interaction that enhances the solution by introducing a user-friendly conversational interface. This capability enables users to engage in dynamic conversations, articulating their preferences, interests, and constraints in a conversational manner. Notably, this functionality eliminates the need for navigating through complex tasks solely to plan a trip, fostering a more personalized and human-like interaction. The following figure shows the first question and response in the solution.
This application uses Anthropic’s Claude Instant V1 on Amazon Bedrock, where it’s designed to respond with context-specific insights, dynamically adapting to the ongoing conversation. Through natural language processing algorithms and machine learning techniques, the large language model (LLM) analyzes the user’s queries in real time, extracting relevant context and intent to deliver tailored responses. Whether the user is seeking recommendations for accommodations, exploring POIs, or inquiring about transportation options, the model will use contextual understanding to provide accurate and personalized assistance to the user. This responsiveness makes sure that each interaction feels intuitive and fluid, mimicking the experience of conversing with a knowledgeable travel expert who anticipates and addresses the user’s needs seamlessly throughout the conversation. The following figure shows the continuation of the interaction and depicts the user’s question, the response, and a map that reflects the information in the response.
This innovative feature harmonizes with the evolving needs of users, providing a comprehensive solution that significantly enhances the overall travel experience. The user-centric approach of the solution reflects a commitment to simplifying the trip planning process, allowing travelers to seamlessly translate inspiration into personalized and enjoyable travel itineraries.
Project conceptual walkthrough: Virtual trip planner
LangChain is a framework for developing applications powered by LLMs that can be used to build applications that are context-aware through connecting an LLM to sources of context (prompt instructions, few shot examples, and other content to ground its response to the users). The application also relies on the LLM to reason with the user, such as answering based on provided context and the actions to take.
LangChain enables you to create your own customized agent. The core idea of agents is to use a language model to choose a sequence of actions to take. These actions can invoke certain functions, from a simple calculation to complex internet search or API call. You can write a prompt template, providing a list of tool names that it can use, and ask the agent to make a decision based on certain inputs. An agent uses the power of an LLM to determine which function to execute, and output the result based on the prompt guide. Here is an example from LangChain.
The following code snippet imports the necessary libraries, including the Amazon Bedrock module from the LangChain LLM. It then initializes the Amazon Bedrock client using the necessary parameters, which involves the selection of the model_id, region_name, and keyword arguments.
Amazon Location Service offers cost-effective location-based services (LBS) with high-quality data from trusted providers like Esri, HERE, and GrabMaps. This enables developers to build advanced location-enabled applications that include location data and functionality such as such as maps, POIs, geocoding, routing, geofences, tracking, and health monitoring metrics. This virtual trip planner highlights the integration of Amazon Location Service with Amazon Bedrock to build location-enabled applications.
The tool SearchPlaceIndexForText from Amazon Location Service enables users to geocode free-form text, such as addresses, names, cities, or regions, facilitating the search for places or POIs. By using optional parameters, such as bounding box or country filters, and biasing searches towards specific positions globally, users can refine their search results. Notably, the tool allows users to search for places near a given position using BiasPosition or filter results within a bounding box using FilterBBox. The search results are presented in descending order of relevance, providing users with a list of POIs along with their coordinates for visualization on maps in the user interface.
To use this functionality, the user input needs to be translated into the appropriate Action and parameters required by Amazon Location Service. For instance, if a user enters “Find coffee shops near Central Park, New York City,” the application would parse this input and convert it into the corresponding Action and parameters for the SearchPlaceIndexForText tool. This could involve setting the SearchText to coffee shops, the BiasPosition to the coordinates of Central Park, and potentially applying filters or bounding boxes to narrow down the search area.
After the user input is translated into the required Action and parameters, Amazon Location Service processes the request and provides the relevant location coordinates and names of coffee shops near Central Park. This information is then passed to the generative AI component of the application, which uses it to generate human-friendly responses or visualizations for the user interface.
By seamlessly integrating Amazon Location Service with generative AI, the application delivers a natural and intuitive experience for users, allowing them to search for places using conversational language while using its powerful geocoding capabilities.
The following code sample is a function that queries locations using Amazon Location Service and takes parameters such as the index name, text, country, maximum results, categories, and region. The function initializes the Amazon Location Service client, sets search parameters based on the input, performs the search using search_place_index_for_text, and returns the results. In case of a ResourceNotFoundException, it prints an error message and returns an empty list.
In a LangChain agent, an LLM is used as a reasoning engine to determine which actions to take and in which order. You can also customize the prompt template (known as prompt engineering) to make the model generate the desired contents. Building an agent requires you to customize the agent methods to form an appropriate prompt. LangChain provides some prebuilt classes, such as ConversationalChatAgent. In the following code snippet, a ConversationalChatAgent class that inherits the Agent class from LangChain is defined. The class definition is similar to the LangChain ConversationalChatAgent class.
Note from_llm_and_tools
This method creates a prompt template and initiates an LLM model to use a prompt and provided functionalities (tools) to generate the content. This is where you can customize the prompt template.
cls.create_prompt
will create a BasePromptTemplate
object, which is the object that LangChain used to create the actual prompt for the LLM. Instead of the provided BasePromptTemplate
object, you can modify the human_message
and system_message
values in the from_llm_and_tools
method, as shown in the preceding example.
As you progress through the code, it’s important to understand how the different components work together to create an agent capable of finding POIs based on user queries.
The following code defines a class named FindPOIsByCountry
, which is a subclass of BaseTool
. This class is designed to find POIs in a specific country and includes a description of when to use the tool and examples of queries that it can handle. The _run
method within this class takes a query and attempts to identify the country mentioned in the query using the pycountry
library. It then calls the query_locations
function, passing parameters such as the Amazon Location Service index name, text (query), maximum results, categories of interest (for example, amusement park or museum), identified country code, and region.
We take this further by implementing a query_nearby_locations
feature that uses BiasPosition
and FilterBBox
. BiasPosition
is an optional parameter from Amazon Location Service that indicates a preference for places that are closer to a specified position. FilterBBox
is an optional parameter that limits the search results by returning only places that are within the provided bounding box. By setting a 10 km radius around the bias position, it narrows the search to locations within this range. The key difference lies in how locations are filtered based on proximity in query_nearby_locations
.
The functions discussed in this section can be integrated into the backend of your project, tightly coupled with the generative AI component. With the implementation details and guidance provided, you can use the power of Amazon Location Service and LangChain’s generative AI capabilities to build a conversational application that allows users to search for nearby points of interest using natural language queries. By integrating the query_nearby_locations function, parsing user input, customizing the LangChain agent’s prompt template, and developing a user-friendly interface, you can create an intuitive experience where users can discover relevant locations within specified proximities or bounding boxes. As you build your application, focus on implementing robust error handling, considering edge cases, and thoroughly testing the application before deploying it to a production environment. With this foundation, you can create innovative location-based applications that seamlessly blend the power of Amazon Location Service and Amazon Bedrock using Anthropic’s Claude V1
Conclusion
Harnessing the power of generative AI enables this web solution to interpret user queries and dynamically generate personalized travel itineraries. This application offers a user-friendly experience, where users can interact with the system through a chat-based interface providing relevant responses based on context. This application serves as a transformative tool that seamlessly guides users to discover more information about locations and explore additional points of interest. To get started on building your own innovative solutions, explore Amazon Bedrock now and start your journey today.
About the Authors
Yao Cong (YC) Yeo is a Solutions Architect at Amazon Web Services, empowering Singapore’s ISVs and SMBs in their cloud transformation journeys, guiding customers to optimize workloads and maximize their AWS cloud potential. YC specialises in the Application Security domain in Cloud Security, ensuring robust and secure cloud implementations. In the Generative AI space, YC delivers thought leadership content to bridge the gap between technical possibilities and business objectives in the evolving digital landscape.
Loke Jun Kai is an AI/ML Specialist Solutions Architect in AWS. He works on Go-To-Market motions and Strategic Opportunities in the ASEAN Region. Jun Kai have provided technical and visionary guidance for customers across industries and segments, from large enterprises to Startups. Outside of work, he enjoys looking at all things related to Venture Capital, or having Tennis sessions.
Abhi Fabhian is a Solutions Architect at Amazon Web Services based in Indonesia, providing expert technical guidance on cloud technologies to clients across various sectors in Indonesia, helping them optimize their cloud experience. Outside of work he enjoys sports, cars, music and playing games.
Tung Cao is a Solutions Architect at Amazon Web Services based in Vietnam, covering Vietnam’s SMB and ISVs on their journey to the cloud, helping them optimize and innovate their business processes. Tung specializes in AI/ML, which helps in providing cutting-edge solutions to enhance customer experiences, streamline operations, and drive data-driven decision-making. enabling businesses to leverage advanced technologies like machine learning and deep learning to gain competitive advantages.
Siraphop (Fufu) Thaisangsa-nga is a Solutions Architect at Amazon Web Services based in Thailand, dedicated to guiding local businesses through their cloud transformation journeys. With a deep understanding of the Thai market, Fufu helps companies leverage AWS services to innovate, scale, and improve their operational efficiency, excelling in tailoring cloud solutions to meet the unique needs of Thai businesses across various industries.
Simplify automotive damage processing with Amazon Bedrock and vector databases
In the automotive industry, the ability to efficiently assess and address vehicle damage is crucial for efficient operations, customer satisfaction, and cost management. However, manual inspection and damage detection can be a time-consuming and error-prone process, especially when dealing with large volumes of vehicle data, the complexity of assessing vehicle damage, and the potential for human error in the assessment.
This post explores a solution that uses the power of AWS generative AI capabilities like Amazon Bedrock and OpenSearch vector search to perform damage appraisals for insurers, repair shops, and fleet managers.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon OpenSearch Service is a powerful, highly flexible search engine that allows you to retrieve data based on a variety of lexical and semantic retrieval approaches.
By combining these powerful tools, we have developed a comprehensive solution that streamlines the process of identifying and categorizing automotive damage. This approach not only enhances efficiency, but also provides valuable insights that can help automotive businesses make more informed decisions.
The traditional way to solve these problems is to use computer vision machine learning (ML) models to classify the damage and its severity and complement with regression models that predict numerical outcomes based on input features like the make and model of the car, damage severity, damaged part, and more.
This approach creates challenges to maintain multiple models for classifying damage severity and creating estimates. Although these models can provide precise estimates based on historical data, they can’t be generalized to provide a quick range of estimates and any changes to the damage dataset (which includes updated makes and models) or varying repair estimates based on parts, labor, and facility. Any generalization to provide such estimates using traditional models will lead to feature engineering complexity.
This is where large language models (LLMs) come into play to look at the features both visually and based on text descriptions and find the closest match semantically.
Solution overview
Automotive companies have large datasets that include damages that have happened to their vehicle assets, which include images of the vehicles, the damage, and detailed information about that damage. This metadata includes details such as make, model, year, area of the damage, severity of the damage, parts replacement cost, and labor required to repair.
The information contained in these datasets—the images and the corresponding metadata—is converted to numerical vectors using a process called multimodal embedding. These embedding vectors contain the necessary information of the image and the text metadata encoded in numerical representation. We query against these embedding vectors to find the closest match to the incoming damaged vehicle image. This technique is called semantic search. In this solution, we use OpenSearch Service, a powerful, highly flexible search engine that allows you to retrieve data based on a variety of lexical and semantic retrieval approaches, including vector search. We generate the embeddings using the Amazon Titan Multimodal Embeddings model, available on Amazon Bedrock.
This solution is available in our GitHub repo, including detailed instructions about its deployment and testing.
The following architecture diagram illustrates the proposed solution. It contains two flows:
- Data ingestion – The data ingestion flow converts the damage datasets (images and metadata) into vector embeddings and stores them in the OpenSearch vector store. We need to initially invoke this flow to load all the historic data into OpenSearch. We can also schedule it to load the updated dataset on a regular basis, or invoke it in near real time whenever new data flows in.
- Damage assessment inference – The inference processing flow runs every time there is a new damage image to find the closest match from the current dataset stored in OpenSearch.
The data ingestion flow consists of the following steps:
- The ingestion process starts with the ingestion processor taking each damaged image from the existing damage repair cost dataset and passing it to Anthropic’s Claude 3 on Amazon Bedrock. The invoice details of the repair costs could be in various formats, like PDF, images, tables, and so on. These images are passed to Anthropic’s Claude 3 Haiku to be analyzed and output into a standardized JSON format. The step of creating the metadata during the ingestion process is optional if the repair invoices are already present in a standardized format.
In this solution, Anthropic’s Claude 3 creates the JSON metadata for each image. The dataset provided in this example only contains images. In a production scenario, the metadata would ideally contain relevant data from existing invoices, where Amazon Bedrock could be used to extract the relevant information and create the standardized metadata, if it doesn’t exist yet.
The following is an example image.
The following code shows an example of the ingested metadata:
- The JSON output from the previous step along with the actual damage image are sent to the Amazon Titan Multimodal Embeddings model to generate embedding vectors. Each vector is of 1,024 dimensions, and it encodes both the image and the repair cost JSON data.
- The outputs generated in the previous steps (the text representation and vector embeddings of the damage data) are stored in an Amazon OpenSearch Serverless vector search collection. By storing both the text representation and vector embeddings, you can use the power of hybrid search (text search and semantic search) to optimize the search results.
- Finally, the ingestion processor stores the raw images in Amazon Simple Storage Service (Amazon S3), which we use later in the inference flow to show the closest matches to the user.
The user performing the damage assessment interacts with the UI by providing the image of the damaged vehicle and some basic information needed for the assessment. The inference processing flow includes the following steps:
Inference Flow Steps:
- The inference processor takes each damaged image provided by the user and passes it to Anthropic’s Claude 3 to be analyzed and output into a standardized JSON format.
- The JSON output from the previous step along with the damage image are sent to the Amazon Titan Multimodal Embeddings model to generate embedding vectors.
- The embeddings are queried against all the embeddings of the existing damage data inside the OpenSearch Serverless collection to find the closest matches. For the top k (k=3 in our sample application) closest matches, it returns the JSON data that contains the repair costs and other damage expenses. With that information, several stats like median expenses and repair costs upper and lower bounds are calculated.
- In our scenario, the solution takes the metadata from each of the matches and sends that metadata to Anthropic’s Claude 3 Haiku hosted on Amazon Bedrock. The prompt is engineered to get the LLM to consider the total repair cost of each match and calculate an average. Production implementations of this solution could have variations of how this final step is done. Calculation of the repair costs could be done on different ways, in this case using generative AI, or by retrieving further information from other datasets, such as current parts and labor costs, to calculate a new repair cost average.
- The UI displays the repair expenses estimates along with the accuracy. The front end also pulls the images from Amazon S3 that are closest in match to the queried image.
Prompts and datasets
Our solution consists of automotive damage images, which are provided as part of our repository, and the code provided handles the ingestion of images and the UI that users can interact with. Our sample dataset contains images from different vehicles (for this post, we use three fictitious car brands and models). We use the following prompt to create the JSON metadata that is ingested with the image:
This prompt instructs the model to create the metadata as JSON output, and an example of that JSON metadata is provided within the <model>
tag. The prompt also adds instructions for the model to assess the damage and estimate the cost following the <criteria>
tag. The model and criteria are parameters that are created within the code and passed to the model. They are defined in the code from lines 85–106.
For each fictitious vehicle make and model, we have a dataset with 200 images. These images are stored within the /containers/ingestion/data_set
path of the repository.
During the inference flow, the first steps that are run by the UI are capturing the image from the user and creating new metadata based on this new image and some basic information that the user provides. The following prompt is part of the inference code, which is used to create the initial metadata:
These prompts are examples provided with the solution to create basic metadata, which is then used to increase the accuracy of the vector search. There might be different use cases where more detailed prompts are required, and for that, this solution can serve as a base.
Prerequisites
To deploy the proposed sample solution, some prerequisites are needed:
- Model access in Amazon Bedrock. For instructions to request model access, see Access Amazon Bedrock foundation models. Enable the following models at a minimum:
- Amazon Titan Multimodal Embeddings
- Anthropic’s Claude 3 Haiku
- An AWS Identity and Access Management (IAM) user or role that can run AWS CloudFormation templates and has access to create resources for the following:
- Application Load Balancers
- Amazon CloudFront
- Amazon Elastic Container Service (Amazon ECS)
- Amazon Elastic Container Registry (Amazon ECR)
- IAM
- OpenSearch Serverless
- Amazon S3
- AWS Systems Manager
Deploy the solution
Complete the following steps to deploy this solution:
- Run the provided CloudFormation template.
- Download the dataset from the public dataset repository. Specific instructions can be found on the AWS Samples repository.
- Upload the dataset to the S3 source bucket. Specific instructions can be found on the AWS Samples repository.
- Run the ECS task, which runs the image ingestion process following the steps mentioned on the GitHub repo.
- To access the inference code, open the AWS CloudFormation console, navigate to the stack’s Outputs tab, and choose the CloudFront distribution link for the
InferenceUIURL
key to go to the inference UI.
- Test the solution by following the testing procedures in our GitHub repo.
Clean up
To clean up the resources you created, complete the following steps:
- On the AWS CloudFormation console, navigate to the Outputs tab of the stack you deployed.
- Note the name of your ECR repository and S3 bucket.
- On the Amazon S3 console, delete the contents of the bucket.
- On the Amazon ECR console, delete the images in the repository.
- On the AWS CloudFormation console, delete the stack.
Deleting the stack removes all other related resources from your AWS account. The bucket and repository must be empty in order to delete them.
Conclusion
The integration of Amazon Bedrock and vector databases like OpenSearch presents a powerful solution for simplifying automotive damage processing. This innovative approach offers several key benefits:
- Efficiency – By using generative AI and semantic search capabilities, the system can quickly process and analyze damage reports, significantly reducing the time required for assessments
- Accuracy – The use of multimodal embeddings and vector search makes sure damage assessments are based on comprehensive data, including both visual and textual information, leading to more accurate results
- Scalability – As the dataset grows, the system’s performance improves, allowing it to handle increasing volumes of data without compromising speed or accuracy
- Adaptability – The system can be updated with new data, so it remains current with the latest repair costs and damage types without the need to fully train using a traditional ML model
As the automotive industry continues to evolve, solutions like this will play a crucial role in streamlining operations, improving customer satisfaction, and optimizing resource allocation. By embracing AI-driven technologies, automotive businesses can stay ahead of the curve and deliver more efficient, accurate, and cost-effective damage assessment services. The combination of powerful AI models available in Amazon Bedrock and vector search capabilities of OpenSearch Service demonstrates the potential for transformative solutions in the automotive industry. As these technologies continue to advance, we can expect even more innovative applications that will reshape how we approach vehicle damage assessment and repair.
For detailed instructions and deployment steps, refer to our GitHub repo. Let us know in the comments section your thoughts about this solution and potential improvements we can add.
About the Authors
Vinicius Pedroni is a Senior Solutions Architect at AWS for the Travel and Hospitality Industry, with focus on Edge Services and Generative AI. Vinicius is also passionate about assisting customers on their Cloud Journey, allowing them to adopt the right strategies at the right moment.
Manikanth Pasumarti is a Solutions Architect based out of New York City. He works with enterprise customers to architect and design solutions for their business needs. He is passionate about math and loves to teach kids in his free time.
Understanding prompt engineering: Unlock the creative potential of Stability AI models on AWS
In the rapidly evolving world of generative AI image modeling, prompt engineering has become a crucial skill for developers, designers, and content creators. By crafting effective prompts, you can harness the full potential of advanced diffusion transformer text-to-image models, enabling you to produce high-quality images that align closely with your creative vision. Amazon Bedrock offers access to powerful models such as Stable Image Ultra and Stable Diffusion 3 Large, which are designed to transform text descriptions into stunning visual outputs. Stability AI’s newest launch of Stable Diffusion 3.5 Large (SD3.5L) on Amazon SageMaker JumpStart enhances image generation, human anatomy rendering, and typography by producing more diverse outputs and adhering closely to user prompts, making it a significant upgrade over its predecessor.
In this post, we explore advanced prompt engineering techniques that can enhance the performance of these models and facilitate the creation of compelling imagery through text-to-image transformations.
Understanding the Prompt Structure
Prompt engineering is a valuable technique for effectively using generative AI image models. The structure of a prompt directly affects the generated images’ quality, creativity, and accuracy. Stability AI’s latest models enhance productivity by helping users achieve quality results. This guide offers practical prompting tips for the Stable Diffusion 3 family of models, allowing you to refine image concepts quickly and precisely. A well-structured Stable Diffusion prompt typically consists of the following key components:
- Subject – This is the main focus of your image. You can provide extensive details, such as the gender of a character, their clothing, and the setting. For example, “A corgi dog sitting on the front porch.”
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
- Medium – This refers to the material or technique used in creating the artwork. Examples include “oil paint,” “digital art,” “voxel art,” or “watercolor.” A complete prompt might read: “3D Voxel Art; wide angle shot of a bright and colorful world.”
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
- Style – You can specify an art style (such as impressionism, realism, or surrealism). A more detailed prompt could be: “Impressionist painting of a lady in a sun hat in a blooming garden.”
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
- Composition and framing – You can describe the desired composition and framing of the image. This could include specifying close-up shots, wide-angle views, or particular compositional techniques. Consider the images generated by the following prompt: “Wide-shot of two friends lying on a hilltop, stargazing against an open sky filled with stars.”
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
- Lighting and color:You can describe the lighting or shadows in the scene. Terms like “backlight,” “hard rim light,” and “dynamic shadows” can enhance the feel of the image. Consider the following prompt and images generated with it: “A yellow umbrella left open on a rainy street, surrounded by neon reflections, with hard rim light outlining its shape against the wet pavement, adding a moody glow.”
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
- Resolution – Specifying resolution helps control image sharpness. For example: “A winding river through a snowy forest in 4K, illuminated by soft winter sunlight, with tree shadows across the snow and icy reflections.”
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
Treat the SD3 generation of models as a creative partner. By expressing your ideas clearly in natural language, you give the model the best opportunity to generate an image that aligns with your vision.
Prompting techniques
The following are key prompting techniques to employ:
- Descriptive language – Unlike previous models that required concise prompts, SD3.5 allows for detailed descriptions. For instance, instead of simply stating “a man and woman,” you can specify intricate details such as clothing styles and background settings. This clarity helps in achieving better adherence to the desired output.
- Negative prompts – Negative prompting offers enhanced control over colors and content by removing unwanted elements, textures, or hues from the image. Whereas the main prompt establishes the image’s broad composition, negative prompts allow for honing in on specific elements, yielding a cleaner, more polished result. This added refinement helps keep distractions to a minimum, aligning the final output closely with your intended vision.
- Using multiple text encoders –The SD3 generation of models features three text encoders that can accept varied prompts. This allows you to experiment with assigning general themes or styles to one encoder while detailing specific subjects in another.
- Tokenization – Perfecting the art of prompt engineering for the Stable Diffusion 3 model family requires a deep understanding of several key concepts and techniques. At the core of effective prompting lies the process of tokenization and token analysis. It’s crucial to comprehend how the SD3 family breaks down your prompt text into individual tokens, because this directly impacts the model’s interpretation and subsequent image generation. By analyzing these tokens, you can identify potential issues such as out-of-vocabulary words that might split into sub-word tokens, multi-word phrases that don’t tokenize together as expected, or ambiguous tokens like “3D” that could be interpreted in multiple ways. For instance, in the prompt “A realistic 3D render of a red apple,” the clarity of tokenization can significantly affect the quality of the output image.
Generated by SD3 Large | Generated by SD Ultra | Generated by SD3.5 Large |
- Prompt weighting – Prompt weighting and emphasis techniques allow you to fine-tune the importance of specific elements within your prompt. By using syntax like “A photo of a (red:1.2) apple,” you can increase the significance of the color “red” in the generated image. Similarly, emphasizing multiple aspects, as in “A (photorealistic:1.4) (3D render:1.2) of a red apple,” can help achieve a more nuanced result that balances photorealism with 3D rendering qualities. “(photorealistic 1.4)” indicates that the image should be photorealistic, with a weight of 1.4. The higher weight (>1.0) emphasizes that the photorealistic quality is more important than usual. Although you can technically set weights higher than 5.0, it’s advisable to stay within the range of 1.5–2.0 for effective results. This level of control enables you to guide the model’s focus more precisely, resulting in outputs that more closely align with your creative vision.
A photo of a (red:1.2) apple | A (photorealistic:1.4) (3D render:1.2) of a red apple |
Practical settings for optimal results
To optimize the performance for these models, several key settings should be adjusted based on user preferences and hardware capabilities. Start with 28 denoising steps to balance image quality and generation time. For the Guidance Scale (CFG), set it between 3.5–4.5 to maintain fidelity to the prompt without creating overly contrasted images. ComfyUI is an open source, node-based application that empowers users to generate images, videos, and audio using advanced AI models, offering a highly customizable workflow for creative projects. In ComfyUI, using the dpmpp_2m sampler along with the sgm_uniform scheduler yields effective results. Additionally, aim for a resolution of approximately 1 megapixel (for example, 1024×1024 for square images) while making sure that dimensions are divisible by 64 for optimal output quality. These settings provide a solid foundation for generating high-quality images while efficiently utilizing your hardware resources, allowing for further adjustments based on specific requirements.
Prompt programming
Treating prompts as a form of programming language can also yield powerful results. By structuring your prompts with components like subjects, styles, and scenes, you create a modular system that’s simple to adjust and extend. For example, using syntax like “A red apple [SUBJ], photorealistic [STYLE], on a wooden table [SCENE]” allows for systematic modifications and experimentation with different elements of the prompt.
Prompt augmentation and tuning
Lastly, prompt augmentation and tuning can significantly enhance the effectiveness of your prompts. This might involve incorporating additional data such as reference images or rough sketches as conditioning inputs alongside your text prompts. Furthermore, fine-tuning models on carefully curated datasets of prompt-image pairs can improve the associations between textual descriptions and visual outputs, leading to more accurate and refined results. With these advanced techniques, you can push the boundaries of what’s possible with SD3.5, creating increasingly sophisticated and tailored images that truly bring your ideas to life.
Responsible and ethical AI with Amazon Bedrock
When working with Stable Diffusion models through Amazon Bedrock, Amazon Bedrock Guardrails can intercept and evaluate user prompts before they reach the image generation pipeline. This allows for filtering and moderation of input text to prevent the creation of harmful, offensive, or inappropriate images. The system offers configurable content filters that can be adjusted to different strength levels, giving fine-tuned control over what types of image content are permitted to be generated. Organizations can define denied topics specific to image generation, such as blocking requests for violent imagery or explicit content. Word filters can be set up to detect and block specific phrases or terms that may lead to undesirable image outputs. Additionally, sensitive information filters can be applied to protect personally identifiable information (PII) from being incorporated into generated images. This multi-layered approach helps prevent misuse of Stable Diffusion models, maintain compliance with regulations around AI-generated imagery, and provide a consistently safe user experience when using these powerful image generation capabilities. By implementing Amazon Bedrock Guardrails, organizations can confidently deploy Stable Diffusion models while mitigating risks and adhering to ethical AI principles.
Conclusion
In the dynamic realm of generative AI image modeling, understanding prompt engineering is essential for developers, designers, and content creators looking to unlock the full potential of models like Stable Diffusion 3.5 Large. This advanced model, available on Amazon Bedrock, enhances image generation by producing diverse outputs that closely align with user prompts. Effective prompting involves understanding the structure of prompts, which typically includes key components such as the subject, medium, style, and resolution. By clearly defining these elements and employing techniques like prompt weighting and chaining, you can refine your creative vision and achieve high-quality results.
Additionally, the process of tokenization plays a crucial role in how prompts are interpreted by the model. Analyzing tokens can help identify potential issues that may affect output quality. You can also enhance your prompts through modular programming approaches and by incorporating additional data like reference images. By fine-tuning models on datasets of prompt-image pairs, creators can improve the associations between text and visuals, leading to more accurate results.
This post provided practical tips and techniques to optimize performance and elevate the creative possibilities within Stable Diffusion 3.5 Large, empowering you to produce compelling imagery that resonates with their artistic intent. To get started, see Stability AI in Amazon Bedrock. To explore what’s available on SageMaker JumpStart, see Stability AI builds foundation models on Amazon SageMaker.
About the Authors
Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area working with GENAI Model providers and helping customer optimize their GENAI workloads on AWS. She helps enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.
Sanwal Yousaf is a Solutions Engineer at Stability AI. His work at Stability AI focuses on working with enterprises to architect solutions using Stability AI’s Generative models to solve pressing business problems. He is passionate about creating accessible resources for people to learn and develop proficiency with AI.
Introducing Stable Diffusion 3.5 Large in Amazon SageMaker JumpStart
We are excited to announce the availability of Stability AI’s latest and most advanced text-to-image model, Stable Diffusion 3.5 Large, in Amazon SageMaker JumpStart. This new cutting-edge image generation model, which was trained on Amazon SageMaker HyperPod, empowers AWS customers to generate high-quality images from text descriptions with unprecedented ease, flexibility, and creative potential. By adding Stable Diffusion 3.5 Large to SageMaker JumpStart, we’re taking another significant step towards democratizing access to advanced AI technologies and enabling businesses of all sizes to harness the power of generative AI.
In this post, we provide an implementation guide for subscribing to Stable Diffusion 3.5 Large in SageMaker JumpStart, deploying the model in Amazon SageMaker Studio, and generating images using text-to-image prompts.
Stable Diffusion 3.5 Large capabilities and use cases
At 8.1 billion parameters, with superior quality and prompt adherence, Stable Diffusion 3.5 Large is the most powerful model in the Stable Diffusion family. The model excels at creating diverse, high-quality images across a wide range of styles, making it an excellent tool for media, gaming, advertising, ecommerce, corporate training, retail, and education. For ideation, Stable Diffusion 3.5 Large can accelerate storyboarding, concept art creation, and rapid prototyping of visual effects. For production, you can quickly generate high-quality 1-megapixel images for campaigns, social media posts, and advertisements, saving time and resources while maintaining creative control.
Stable Diffusion 3.5 Large offers users nearly endless creative possibilities, including:
- Enhanced creativity and photorealism – You can generate exceptional visuals with highly detailed 3D imagery that include fine details like lighting and textures.
- Exceptional multi-subject proficiency – It offers unrivaled capabilities in generating images with multiple subjects, which is ideal for creating complex scenes.
- Increased efficiency – Fast, accurate, and quality content production streamlines operations, saving time and money. Despite its power and complexity, Stable Diffusion 3.5 Large is optimized for efficiency, providing accessibility and ease of use across a broad audience.
Solution overview
With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models (FMs). ML practitioners can deploy FMs to dedicated SageMaker instances from a network isolated environment and customize models using Amazon SageMaker for model training and deployment. You can now discover and deploy the Stable Diffusion 3.5 large model with a few clicks in SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping provide data security.
The Stable Diffusion 3.5 Large model is available today in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Osaka, Hong Kong), China (Beijing), Middle East (Bahrain), Africa (Cape Town), and Europe (Milan, Stockholm).
SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.
Prerequisites
Make sure that your AWS Identity and Access Management (IAM) role has AmazonSageMakerFullAccess
. To successfully deploy the model, confirm that your IAM role has the following three permissions and you have authority to make AWS Marketplace subscriptions in the AWS account used:
aws-marketplace:ViewSubscriptions
aws-marketplace:Unsubscribe
aws-marketplace:Subscribe
Subscribe to the Stable Diffusion 3.5 Large model package
You can access SageMaker JumpStart through the SageMaker Studio Home page by selecting JumpStart in the Prebuilt and automated solutions section. The JumpStart landing page allows you to explore various resources including solutions, models, and notebooks. You can search for a particular provider. In this following screenshot, we are looking at all the models by Stability AI on SageMaker JumpStart.
Each model is presented with a model card containing key information such as the model name, fine-tuning capability, provider, and a brief description. To find the Stable Diffusion 3.5L model, you can either browse the Foundation Model: Image Generation carousel or use the search function. Select Stable Diffusion 3.5 Large.
Next, we will subscribe to Stable Diffusion 3.5 Large, follow these steps:
- Open the model listing page in AWS Marketplace using the link available from the example notebook in SageMaker JumpStart.
- On the listing, choose Continue to subscribe.
- On the Subscribe to this software page, review and choose Accept Offer if you and your organization accept the EULA, pricing, and support terms.
- Choose Continue to configuration to start configuring your model.
- Choose a supported Region, and you will see the model package Amazon Resource Name (ARN) that you need to specify when creating an endpoint.
Note: If you don’t have the necessary permissions to view or subscribe to the model, reach out to your AWS administrator or procurement point of contact. Many enterprises may limit AWS Marketplace permissions to control the actions that someone can take in the AWS Marketplace Management Portal.
Deploy the model in SageMaker Studio
Now you’re prepared to follow the notebook example from Stability AI’s GitHub repository to create an endpoint (with the model package ARN from AWS Marketplace) and create a deployable ModelPackage
.
For Stable Diffusion 3.5 Large, you’ll need to deploy on an Amazon Elastic Compute Cloud (Amazon EC2) ml.p5.48xlarge instance.
Generate images with a text prompt
Refer to the Stable Diffusion 3.5 Large documentation for more details. From the example notebook, the code to generate an image is as follows:
The following are examples of images generated from different prompts.
Prompt: Photography, pink rose flowers in the twilight, glowing, tile houses in the background. |
Prompt: The word “AWS x Stability” in a thick, blocky script surrounded by roots and vines against a solid white background. The scene is lit by flat light, creating a reflective scene with a minimal color palette. Quilling style. |
Prompt: Expressionist painting, side profile of a silhouette of a student seated at a desk, absorbed in reading a book. Her thoughts artistically connect to the stars and the vast universe, symbolizing the expansion of knowledge and a boundless mind. |
Prompt: High-energy street scene in a neon-lit Tokyo alley at night, where steam rises from food carts, and colorful neon signs illuminate the rain-slicked pavement. |
Prompt: 3D animation scene of an adventurer traveling the world with his pet dog. |
Clean up
When you’ve finished working, you can delete the endpoint to release the EC2 instances associated with it and stop billing.
Get your list of SageMaker endpoints using the AWS Command Line Interface (AWS CLI) as follows:
Then delete the endpoints:
Conclusion
In this post, we walked through subscribing to Stable Diffusion 3.5 Large in SageMaker JumpStart, deploying the model in SageMaker Studio, and generating of a variety of images with Stability AI’s latest text-to-image model.
Start creating amazing images today with Stable Diffusion 3.5 Large on SageMaker JumpStart. To learn more about SageMaker JumpStart, see SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.
If you’d like to explore advanced prompt engineering techniques that can enhance the performance of text-to-image models from Stability AI and facilitate the creation of compelling imagery, see Understanding prompt engineering: Unlock the creative potential of Stability AI models on AWS.
About the Authors
Tom Yemington is a Senior GenAI Models Specialist focused on helping model providers and customers scale generative AI solutions in AWS. Tom is a Certified Information Systems Security Professional (CISSP). Outside of work, you can find Tom racing vintage cars or teaching people how to race as an instructor at track-day events.
Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area working with GENAI Model providers and helping customer optimize their GENAI workloads on AWS. She helps enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.
Boshi Huang is a Senior Applied Scientist in Generative AI at Amazon Web Services, where he collaborates with customers to develop and implement generative AI solutions. Boshi’s research focuses on advancing the field of generative AI through automatic prompt engineering, adversarial attack and defense mechanisms, inference acceleration, and developing methods for responsible and reliable visual content generation.
Improve governance of models with Amazon SageMaker unified Model Cards and Model Registry
You can now register machine learning (ML) models in Amazon SageMaker Model Registry with Amazon SageMaker Model Cards, making it straightforward to manage governance information for specific model versions directly in SageMaker Model Registry in just a few clicks.
Model cards are an essential component for registered ML models, providing a standardized way to document and communicate key model metadata, including intended use, performance, risks, and business information. This transparency is particularly important for registered models, which are often deployed in high-stakes or regulated industries, such as financial services and healthcare. By including detailed model cards, organizations can establish the responsible development of their ML systems, enabling better-informed decisions by the governance team.
When solving a business problem with an ML model, customers want to refine their approach and register multiple versions of the model in SageMaker Model Registry to find the best candidate model. To effectively operationalize and govern these various model versions, customers want the ability to clearly associate model cards with a particular model version. This lack of a unified user experience posed challenges for customers, who needed a more streamlined way to register and govern their models.
Because SageMaker Model Cards and SageMaker Model Registry were built on separate APIs, it was challenging to associate the model information and gain a comprehensive view of the model development lifecycle. Integrating model information and then sharing it across different stages became increasingly difficult. This required custom integration efforts, along with complex AWS Identity and Access Management (IAM) policy management, further complicating the model governance process.
With the unification of SageMaker Model Cards and SageMaker Model Registry, architects, data scientists, ML engineers, or platform engineers (depending on the organization’s hierarchy) can now seamlessly register ML model versions early in the development lifecycle, including essential business details and technical metadata. This unification allows you to review and govern models across your lifecycle from a single place in SageMaker Model Registry. By consolidating model governance workflows in SageMaker Model Registry, you can improve transparency and streamline the deployment of models to production environments upon governance officers’ approval.
In this post, we discuss a new feature that supports the integration of model cards with the model registry. We discuss the solution architecture and best practices for managing model cards with a registered model version, and walk through how to set up, operationalize, and govern your models using the integration in the model registry.
Solution overview
In this section, we discuss the solution to address the aforementioned challenges with model governance. First, we introduce the unified model governance solution architecture for addressing the model governance challenges for an end-to-end ML lifecycle in a scalable, well-architected environment. Then we dive deep into the details of the unified model registry and discuss how it helps with governance and deployment workflows.
Unified model governance architecture
ML governance enforces the ethical, legal, and efficient use of ML systems by addressing concerns like bias, transparency, explainability, and accountability. It helps organizations comply with regulations, manage risks, and maintain operational efficiency through robust model lifecycles and data quality management. Ultimately, ML governance builds stakeholder trust and aligns ML initiatives with strategic business goals, maximizing their value and impact. ML governance starts when you want to solve a business use case or problem with ML and is part of every step of your ML lifecycle, from use case inception, model building, training, evaluation, deployment, and monitoring of your production ML system.
Let’s delve into the architecture details of how you can use a unified model registry along with other AWS services to govern your ML use case and models throughout the entire ML lifecycle.
SageMaker Model Registry catalogs your models along with their versions and associated metadata and metrics for training and evaluation. It also maintains audit and inference metadata to help drive governance and deployment workflows.
The following are key concepts used in the model registry:
- Model package group – A model package group or model group solves a business problem with an ML model (for this example, we use the model CustomerChurn). This model group contains all the model versions associated with that ML model.
- Model package version – A model package version or model version is a registered model version that includes the model artifacts and inference code for the model.
- Registered model – This is the model group that is registered in SageMaker Model Registry.
- Deployable model – This is the model version that is deployable to an inference endpoint.
Additionally, this solution uses Amazon DataZone. With the integration of SageMaker and Amazon DataZone, it enables collaboration between ML builders and data engineers for building ML use cases. ML builders can request access to data published by data engineers. Upon receiving approval, ML builders can then consume the accessed data to engineer features, create models, and publish features and models to the Amazon DataZone catalog for sharing across the enterprise. As part of the SageMaker Model Cards and SageMaker Model Registry unification, ML builders can now share technical and business information about their models, including training and evaluation details, as well as business metadata such as model risk, for ML use cases.
The following diagram depicts the architecture for unified governance across your ML lifecycle.
There are several for implementing secure and scalable end-to-end governance for your ML lifecycle:
- Define your ML use case metadata (name, description, risk, and so on) for the business problem you’re trying to solve (for example, automate a loan application process).
- Set up and invoke your use case approval workflow for building the ML model (for example, fraud detection) for the use case.
- Create an ML project to create a model for the ML use case.
- Create a SageMaker model package group to start building the model. Associate the model to the ML project and record qualitative information about the model, such as purpose, assumptions, and owner.
- Prepare the data to build your model training pipeline.
- Evaluate your training data for data quality, including feature importance and bias, and update the model package version with relevant evaluation metrics.
- Train your ML model with the prepared data and register the candidate model package version with training metrics.
- Evaluate your trained model for model bias and model drift, and update the model package version with relevant evaluation metrics.
- Validate that the candidate model experimentation results meet your model governance criteria based on your use case risk profile and compliance requirements.
- After you receive the governance team’s approval on the candidate model, record the approval on the model package version and invoke an automated test deployment pipeline to deploy the model to a test environment.
- Run model validation tests in a test environment and make sure the model integrates and works with upstream and downstream dependencies similar to a production environment.
- After you validate the model in the test environment and make sure the model complies with use case requirements, approve the model for production deployment.
- After you deploy the model to the production environment, continuously monitor model performance metrics (such as quality and bias) to make sure the model stays in compliance and meets your business use case key performance indicators (KPIs).
Architecture tools, components, and environments
You need to set up several components and environments for orchestrating the solution workflow:
- AI governance tooling – This tooling should be hosted in an isolated environment (a separate AWS account) where your key AI/ML governance stakeholders can set up and operate approval workflows for governing AI/ML use cases across your organization, lines of business, and teams.
- Data governance – This tooling should be hosted in an isolated environment to centralize data governance functions such as setting up data access policies and governing data access for AI/ML use cases across your organization, lines of business, and teams.
- ML shared services – ML shared services components should be hosted in an isolated environment to centralize model governance functions such as accountability through workflows and approvals, transparency through centralized model metadata, and reproducibility through centralized model lineage for AI/ML use cases across your organization, lines of business, and teams.
- ML development – This phase of the ML lifecycle should be hosted in an isolated environment for model experimentation and building the candidate model. Several activities are performed in this phase, such as creating the model, data preparation, model training, evaluation, and model registration.
- ML pre-production – This phase of ML lifecycle should be hosted in an isolated environment for integrating the testing the candidate model with the ML system and validating that the results comply with the model and use case requirements. The candidate model that was built in the ML development phase is deployed to an endpoint for integration testing and validation.
- ML production – This phase of the ML lifecycle should be hosted in an isolated environment for deploying the model to a production endpoint for shadow testing and A/B testing, and for gradually rolling out the model for operations in a production environment.
Integrate a model version in the model registry with model cards
In this section, we provide API implementation details for testing this in your own environment. We walk through an example notebook to demonstrate how you can use this unification during the model development data science lifecycle.
We have two example notebooks in GitHub repository: AbaloneExample and DirectMarketing.
Complete the following steps in the above Abalone example notebook:
- Install or update the necessary packages and library.
- Import the necessary library and instantiate the necessary variables like SageMaker client and Amazon Simple Storage Service (Amazon S3) buckets.
- Create an Amazon DataZone domain and a project within the domain.
You can use an existing project if you already have one. This is an optional step and we will be referencing the Amazon DataZone project ID while creating the SageMaker model package. For overall governance between your data and the model lifecycle, this can help create the correlation between business unit/domain, data and corresponding model.
The following screenshot shows the Amazon DataZone welcome page for a test domain.
In Amazon DataZone, projects enable a group of users to collaborate on various business use cases that involve creating assets in project inventories and thereby making them discoverable by all project members, and then publishing, discovering, subscribing to, and consuming assets in the Amazon DataZone catalog. Project members consume assets from the Amazon DataZone catalog and produce new assets using one or more analytical workflows. Project members can be owners or contributors.
You can gather the project ID on the project details page, as shown in the following screenshot.
In the notebook, we refer to the project ID as follows:
- Prepare a SageMaker model package group.
A model group contains a group of versioned models. We refer to the Amazon DataZone project ID when we create the model package group, as shown in the following screenshot. It’s mapped to the custom_details
field.
- Update the details for the model card, including the intended use and owner:
This data is used to update the created model package. The SageMaker model package helps create a deployable model that you can use to get real-time inferences by creating a hosted endpoint or to run batch transform jobs.
The model card information shown as model_card=my_card
in the following code snippet can be passed to the pipeline during the model register step:
Alternatively, you can pass it as follows:
The notebook will invoke a run of the SageMaker pipeline (which can also be invoked from an event or from the pipelines UI), which includes preprocessing, training, and evaluation.
After the pipeline is complete, you can navigate to Amazon SageMaker Studio, where you can see a model package on the Models page.
You can view the details like business details, intended use, and more on the Overview tab under Audit, as shown in the following screenshots.
The Amazon DataZone project ID is captured in the Documentation section.
You can view performance metrics under Train as well.
Evaluation details like model quality, bias pre-training, bias post-training, and explainability can be reviewed on the Evaluate tab.
Optionally, you can view the model card details from the model package itself.
Additionally, you can update the audit details of the model by choosing Edit in the top right corner. Once you are done with your changes, choose Save to keep the changes in the model card.
Also, you can update the model’s deploy status.
You can track the different statuses and activity as well.
Lineage
ML lineage is crucial for tracking the origin, evolution, and dependencies of data, models, and code used in ML workflows, providing transparency and traceability. It helps with reproducibility and debugging, making it straightforward to understand and address issues.
Model lineage tracking captures and retains information about the stages of an ML workflow, from data preparation and training to model registration and deployment. You can view the lineage details of a registered model version in SageMaker Model Registry using SageMaker ML lineage tracking, as shown in the following screenshot. ML model lineage tracks the metadata associated with your model training and deployment workflows, including training jobs, datasets used, pipelines, endpoints, and the actual models. You can also use the graph node to view more details, such as dataset and images used in that step.
Clean up
If you created resources while using the notebook in this post, follow the instructions in the notebook to clean up those resources.
Conclusion
In this post, we discussed a solution to use a unified model registry with other AWS services to govern your ML use case and models throughout the entire ML lifecycle in your organization. We walked through an end-to-end architecture for developing an AI use case embedding governance controls, from use case inception to model building, model validation, and model deployment in production. We demonstrated through code how to register a model and update it with governance, technical, and business metadata in SageMaker Model Registry.
We encourage you to try out this solution and share your feedback in the comments section.
About the authors
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 3-year-old Sheepadoodle.
Neelam Koshiya is principal solutions architect (GenAI specialist) at AWS. With a background in software engineering, she moved organically into an architecture role. Her current focus is to help enterprise customers with their ML/ GenAI journeys for strategic business outcomes. Her area of depth is machine learning. In her spare time, she enjoys reading and being outdoors.
Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies.
Saumitra Vikaram is a Senior Software Engineer at AWS. He is focused on AI/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.
Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services
Live streaming has been gaining immense popularity in recent years, attracting an ever-growing number of viewers and content creators across various platforms. From gaming and entertainment to education and corporate events, live streams have become a powerful medium for real-time engagement and content consumption. However, as the reach of live streams expands globally, language barriers and accessibility challenges have emerged, limiting the ability of viewers to fully comprehend and participate in these immersive experiences.
Recognizing this need, we have developed a Chrome extension that harnesses the power of AWS AI and generative AI services, including Amazon Bedrock, an AWS managed service to build and scale generative AI applications with foundation models (FMs). This extension aims to revolutionize the live streaming experience by providing real-time transcription, translation, and summarization capabilities directly within your browser.
With this extension, viewers can seamlessly transcribe live streams into text, enabling them to follow along with the content even in noisy environments or when listening to audio is not feasible. Moreover, the extension’s translation capabilities open up live streams to a global audience, breaking down language barriers and fostering more inclusive participation. By offering real-time translations into multiple languages, viewers from around the world can engage with live content as if it were delivered in their first language.
In addition, the extension’s capabilities extend beyond mere transcription and translation. Using the advanced natural language processing and summarization capabilities of FMs available through Amazon Bedrock, the extension can generate concise summaries of the content being transcribed in real time. This innovative feature empowers viewers to catch up with what is being presented, making it simpler to grasp key points and highlights, even if they have missed portions of the live stream or find it challenging to follow complex discussions.
In this post, we explore the approach behind building this powerful extension and provide step-by-step instructions to deploy and use it in your browser.
Solution overview
The solution is powered by two AWS AI services, Amazon Transcribe and Amazon Translate, along with Amazon Bedrock, a fully managed service that allows you to build generative AI applications. The solution also uses Amazon Cognito user pools and identity pools for managing authentication and authorization of users, Amazon API Gateway REST APIs, AWS Lambda functions, and an Amazon Simple Storage Service (Amazon S3) bucket.
After deploying the solution, you can access the following features:
- Live transcription and translation – The Chrome extension transcribes and translates audio streams for you in real time using Amazon Transcribe, an automatic speech recognition service. This feature also integrates with Amazon Transcribe automatic language identification for streaming transcriptions—with a minimum of 3 seconds of audio, the service can automatically detect the dominant language and generate a transcript without you having to specify the spoken language.
- Summarization – The Chrome extension uses FMs such as Anthropic’s Claude 3 models on Amazon Bedrock to summarize content being transcribed, so you can grasp key ideas of your live stream by reading the summary.
Live transcription is currently available in the over 50 languages currently supported by Amazon Transcribe streaming (Chinese, English, French, German, Hindi, Italian, Japanese, Korean, Brazilian Portuguese, Spanish, and Thai), while translation is available in the over 75 languages currently supported by Amazon Translate.
The following diagram illustrates the architecture of the application.
The solution workflow includes the following steps:
- A Chrome browser is used to access the desired live streamed content, and the extension is activated and displayed as a side panel. The extension delivers a web application implemented using the AWS SDK for JavaScript and the AWS Amplify JavaScript library.
- The user signs in by entering a user name and a password. Authentication is performed against the Amazon Cognito user pool. After a successful login, the Amazon Cognito identity pool is used to provide the user with the temporary AWS credentials required to access application features. For more details about the authentication and authorization flows, refer to Accessing AWS services using an identity pool after sign-in.
- The extension interacts with Amazon Transcribe (StartStreamTranscription operation), Amazon Translate (TranslateText operation), and Amazon Bedrock (InvokeModel operation). Interactions with Amazon Bedrock are handled by a Lambda function, which implements the application logic underlying an API made available using API Gateway.
- The user is provided with the transcription, translation, and summary of the content playing inside the browser tab. The summary is stored inside an S3 bucket, which can be emptied using the extension’s Clean Up feature.
In the following sections, we walk through how to deploy the Chrome extension and the underlying backend resources and set up the extension, then we demonstrate using the extension in a sample use case.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- A computer with Google Chrome installed on it
- An AWS account
- Access to one or more Amazon Bedrock models (for more information, see Access Amazon Bedrock foundation models)
- An AWS Identity and Access Management (IAM) user with the AdministratorAccess policy granted (for production, we recommend restricting access as needed)
- The AWS Command Line Interface (AWS CLI) installed and configured to use with your AWS account
- The AWS CDK CLI installed
- js and npm installed
Deploy the backend
The first step consists of deploying an AWS Cloud Development Kit (AWS CDK) application that automatically provisions and configures the required AWS resources, including:
- An Amazon Cognito user pool and identity pool that allow user authentication
- An S3 bucket, where transcription summaries are stored
- Lambda functions that interact with Amazon Bedrock to perform content summarization
- IAM roles that are associated with the identity pool and have permissions required to access AWS services
Complete the following steps to deploy the AWS CDK application:
- Using a command line interface (Linux shell, macOS Terminal, Windows command prompt or PowerShell), clone the GitHub repository to a local directory, then open the directory:
- Open the
cdk/bin/config.json
file and populate the following configuration variables:
The template launches in the us-east-2
AWS Region by default. To launch the solution in a different Region, change the aws_region
parameter accordingly. Make sure to select a Region in which all the AWS services in scope (Amazon Transcribe, Amazon Translate, Amazon Bedrock, Amazon Cognito, API Gateway, Lambda, Amazon S3) are available.
The Region used for bedrock_region
can be different from aws_region
because you might have access to Amazon Bedrock models in a Region different from the Region where you want to deploy the project.
By default, the project uses Anthropic’s Claude 3 Sonnet as a summarization model; however, you can use a different model by changing the bedrock_model_id
in the configuration file. For the complete list of model IDs, see Amazon Bedrock model IDs. When selecting a model for your deployment, don’t forget to check that the desired model is available in your preferred Region; for more details about model availability, see Model support by AWS Region.
- If you have never used the AWS CDK on this account and Region combination, you will need to run the following command to bootstrap the AWS CDK on the target account and Region (otherwise, you can skip this step):
- Navigate to the
cdk
sub-directory, install dependencies, and deploy the stack by running the following commands:
- Confirm the deployment of the listed resources by entering y.
Wait for AWS CloudFormation to finish the stack creation.
You need to use the CloudFormation stack outputs to connect the frontend to the backend. After the deployment is complete, you have two options.
The preferred option is to use the provided postdeploy.sh
script to automatically copy the cdk configuration parameters to a configuration file by running the following command, still in the /cdk
folder:
Alternatively, you can copy the configuration manually:
- Open the AWS CloudFormation console in the same Region where you deployed the resources.
- Find the stack named
AwsStreamAnalysisStack
. - On the Outputs tab, note of the output values to complete the next steps.
Set up the extension
Complete the following steps to get the extension ready for transcribing, translating, and summarizing live streams:
- Open the
src/config.js
Based on how you chose to collect the CloudFormation stack outputs, follow the appropriate step:- If you used the provided automation, check whether the values inside the
src/config.js
file have been automatically updated with the corresponding values. - If you copied the configuration manually, populate the
src/config.js
file with the values you noted. Use the following format:
- If you used the provided automation, check whether the values inside the
Take note of the CognitoUserPoolId, which will be needed in a later step to create a new user.
- In the command line interface, move back to the
aws-transcribe-translate-summarize-live-streams-in-browser directory
with a command similar to following:
- Install dependencies and build the package by running the following commands:
- Open your Chrome browser and navigate to
chrome://extensions/
.
Make sure that developer mode is enabled by toggling the icon on the top right corner of the page.
- Choose Load unpacked and upload the build directory, which can be found inside the local project folder
aws-transcribe-translate-summarize-live-streams-in-browser
. - Grant permissions to your browser to record your screen and audio:
- Identify the newly added Transcribe, translate and summarize live streams (powered by AWS)
- Choose Details and then Site Settings.
- In the Microphone section, choose Allow.
- Create a new Amazon Cognito user:
- On the Amazon Cognito console, choose User pools in the navigation pane.
- Choose the user pool with the
CognitoUserPoolId
value noted from the CloudFormation stack outputs. - On the Users tab, choose Create user and configure this user’s verification and sign-in options.
See a walkthrough of Steps 4-6 in the animated image below. For additional details, refer to Creating a new user in the AWS Management Console.
Use the extension
Now that the extension in set up, you can interact with it by completing these steps:
- On the browser tab, choose the Extensions.
- Choose (right-click) on the Transcribe, translate and summarize live streams (powered by AWS) extension and choose Open side panel.
- Log in using the credentials created in the Amazon Cognito user pool from the previous step.
- Close the side panel.
You’re now ready to experiment with the extension.
- Open a new tab in the browser, navigate to a website featuring an audio/video stream, and open the extension (choose the Extensions icon, then choose the option menu (three dots) next to AWS transcribe, translate, and summarize, and choose Open side panel).
- Use the Settings pane to update the settings of the application:
- Mic in use – The Mic not in use setting is used to record only the audio of the browser tab for a live video streaming. Mic in use is used for a real-time meeting where your microphone is recorded as well.
- Transcription language – This is the language of the live stream to be recorded (set to auto to allow automatic identification of the language).
- Translation language – This is the language in which the live stream will be translated and the summary will be printed. After you choose the translation language and start the recording, you can’t change your choice for the ongoing live stream. To change the translation language for the transcript and summary, you will have to record it from scratch.
- Choose Start recording to start recording, and start exploring the Transcription and Translation
Content on the Translation tab will appear with a few seconds of delay compared to what you see on the Transcription tab. When transcribing speech in real time, Amazon Transcribe incrementally returns a stream of partial results until it generates the final transcription for a speech segment. This Chrome extension has been implemented to translate text only after a final transcription result is returned.
- Expand the Summary section and choose Get summary to generate a summary. The operation will take a few seconds.
- Choose Stop recording to stop recording.
- Choose Clear all conversations in the Clean Up section to delete the summary of the live stream from the S3 bucket.
See the extension in action in the video below.
Troubleshooting
If you receive the error “Extension has not been invoked for the current page (see activeTab permission). Chrome pages cannot be captured.”, check the following:
- Make sure you’re using the extension on the tab where you first opened the side pane. If you want to use it on a different tab, stop the extension, close the side pane, and choose the extension icon again to run it
- Make sure you have given permissions for audio recording in the web browser.
If you can’t get the summary of the live stream, make sure you have stopped the recording and then request the summary. You can’t change the language of the transcript and summary after the recording has started, so remember to choose it appropriately before you start the recording.
Clean up
When you’re done with your tests, to avoid incurring future charges, delete the resources created during this walkthrough by deleting the CloudFormation stack:
- On the AWS CloudFormation console, choose Stacks in the navigation pane.
- Choose the stack
AwsStreamAnalysisStack
. - Take note of the
CognitoUserPoolId
andCognitoIdentityPoolId
values among the CloudFormation stack outputs, which will be needed in the following step. - Choose Delete stack and confirm deletion when prompted.
Because the Amazon Cognito resources won’t be automatically deleted, delete them manually:
- On the Amazon Cognito console, locate the
CognitoUserPoolId
andCognitoIdentityPoolId
values previously retrieved in the CloudFormation stack outputs. - Select both resources and choose Delete.
Conclusion
In this post, we showed you how to deploy a code sample that uses AWS AI and generative AI services to access features such as live transcription, translation and summarization. You can follow the steps we provided to start experimenting with the browser extension.
To learn more about how to build and scale generative AI applications, refer to Transform your business with generative AI.
About the Authors
Luca Guida is a Senior Solutions Architect at AWS; he is based in Milan and he supports independent software vendors in their cloud journey. With an academic background in computer science and engineering, he started developing his AI/ML passion at university; as a member of the natural language processing and generative AI community within AWS, Luca helps customers be successful while adopting AI/ML services.
Chiara Relandini is an Associate Solutions Architect at AWS. She collaborates with customers from diverse sectors, including digital native businesses and independent software vendors. After focusing on ML during her studies, Chiara supports customers in using generative AI and ML technologies effectively, helping them extract maximum value from these powerful tools.
Arian Rezai Tabrizi is an Associate Solutions Architect based in Milan. She supports enterprises across various industries, including retail, fashion, and manufacturing, on their cloud journey. Drawing from her background in data science, Arian assists customers in effectively using generative AI and other AI technologies.