Privacy-Computation Trade-offs in Private Repetition and Metaselection

A Private Repetition algorithm takes as input a differentially private algorithm with constant success probability and boosts it to one that succeeds with high probability. These algorithms are closely related to private metaselection algorithms that compete with the best of many private algorithms, and private hyperparameter tuning algorithms that compete with the best hyperparameter settings for a private learning algorithm. Existing algorithms for these tasks pay either a large overhead in privacy cost, or a large overhead in computational cost. In this work, we show strong lower bounds for…Apple Machine Learning Research

SLiCK: Exploiting Subsequences for Length-Constrained Keyword Spotting

User-defined keyword spotting on a resource-constrained edge device is challenging. However, keywords are often bounded by a maximum keyword length, which has been largely under-leveraged in prior works. Our analysis of keyword-length distribution shows that user-defined keyword spotting can be treated as a length-constrained problem, eliminating the need for aggregation over variable text length. This leads to our proposed method for efficient keyword spotting, SLiCK (exploiting Subsequences for Length-Constrained Keyword spotting). We further introduce a subsequence-level matching scheme to…Apple Machine Learning Research

Unlock cost-effective AI inference using Amazon Bedrock serverless capabilities with an Amazon SageMaker trained model

Unlock cost-effective AI inference using Amazon Bedrock serverless capabilities with an Amazon SageMaker trained model

In this post, I’ll show you how to use Amazon Bedrock—with its fully managed, on-demand API—with your Amazon SageMaker trained or fine-tuned model.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Previously, if you wanted to use your own custom fine-tuned models in Amazon Bedrock, you either had to self-manage your inference infrastructure in SageMaker or train the models directly within Amazon Bedrock, which requires costly provisioned throughput.

With Amazon Bedrock Custom Model Import, you can use new or existing models that have been trained or fine-tuned within SageMaker using Amazon SageMaker JumpStart. You can import the supported architectures into Amazon Bedrock, allowing you to access them on demand through the Amazon Bedrock fully managed invoke model API.

Solution overview

At the time of writing, Amazon Bedrock supports importing custom models from the following architectures:

  • Mistral
  • Flan
  • Meta Llama 2 and Llama 3

For this post, we use a Hugging Face Flan-T5 Base model.

In the following sections, I walk you through the steps to train a model in SageMaker JumpStart and import it into Amazon Bedrock. Then you can interact with your custom model through the Amazon Bedrock playgrounds.

Prerequisites

Before you begin, verify that you have an AWS account with Amazon SageMaker Studio and Amazon Bedrock access.

If you don’t already have an instance of SageMaker Studio, see Launch Amazon SageMaker Studio for instructions to create one.

Train a model in SageMaker JumpStart

Complete the following steps to train a Flan model in SageMaker JumpStart:

  1. Open the AWS Management Console and go to SageMaker Studio.

Amazon SageMaker Console

  1. In SageMaker Studio, choose JumpStart in the navigation pane.

With SageMaker JumpStart, machine learning (ML) practitioners can choose from a broad selection of publicly available FMs using pre-built machine learning solutions that can be deployed in a few clicks.

  1. Search for and choose the Hugging Face Flan-T5 Base

Amazon SageMaker JumpStart Page

On the model details page, you can review a short description of the model, how to deploy it, how to fine-tune it, and what format your training data needs to be in to customize the model.

  1. Choose Train to begin fine-tuning the model on your training data.

Flan-T5 Base Model Card

Create the training job using the default settings. The defaults populate the training job with recommended settings.

  1. The example in this post uses a prepopulated example dataset. When using your own data, enter its location in the Data section, making sure it meets the format requirements.

Fine-tune model page

  1. Configure the security settings such as AWS Identity and Access Management (IAM) role, virtual private cloud (VPC), and encryption.
  2. Note the value for Output artifact location (S3 URI) to use later.
  3. Submit the job to start training.

You can monitor your job by selecting Training on the Jobs dropdown menu. When the training job status shows as Completed, the job has finished. With default settings, training takes about 10 minutes.

Training Jobs

Import the model into Amazon Bedrock

After the model has completed training, you can import it into Amazon Bedrock. Complete the following steps:

  1. On the Amazon Bedrock console, choose Imported models under Foundation models in the navigation pane.
  2. Choose Import model.

Amazon Bedrock - Custom Model Import

  1. For Model name, enter a recognizable name for your model.
  2. Under Model import settings, select Amazon SageMaker model and select the radio button next to your model.

Importing a model from Amazon SageMaker

  1. Under Service access, select Create and use a new service role and enter a name for the role.
  2. Choose Import model.

Creating a new service role

  1. The model import will complete in about 15 minutes.

Successful model import

  1. Under Playgrounds in the navigation pane, choose Text.
  2. Choose Select model.

Using the model in Amazon Bedrock text playground

  1. For Category, choose Imported models.
  2. For Model, choose flan-t5-fine-tuned.
  3. For Throughput, choose On-demand.
  4. Choose Apply.

Selecting the fine-tuned model for use

You can now interact with your custom model. In the following screenshot, we use our example custom model to summarize a description about Amazon Bedrock.

Using the fine-tuned model

Clean up

Complete the following steps to clean up your resources:

  1. If you’re not going to continue using SageMaker, delete your SageMaker domain.
  2. If you no longer want to maintain your model artifacts, delete the Amazon Simple Storage Service (Amazon S3) bucket where your model artifacts are stored.
  3. To delete your imported model from Amazon Bedrock, on the Imported models page on the Amazon Bedrock console, select your model, and then choose the options menu (three dots) and select Delete.

Clean-Up

Conclusion

In this post, we explored how the Custom Model Import feature in Amazon Bedrock enables you to use your own custom trained or fine-tuned models for on-demand, cost-efficient inference. By integrating SageMaker model training capabilities with the fully managed, scalable infrastructure of Amazon Bedrock, you now have a seamless way to deploy your specialized models and make them accessible through a simple API.

Whether you prefer the user-friendly SageMaker Studio console or the flexibility of SageMaker notebooks, you can train and import your models into Amazon Bedrock. This allows you to focus on developing innovative applications and solutions, without the burden of managing complex ML infrastructure.

As the capabilities of large language models continue to evolve, the ability to integrate custom models into your applications becomes increasingly valuable. With the Amazon Bedrock Custom Model Import feature, you can now unlock the full potential of your specialized models and deliver tailored experiences to your customers, all while benefiting from the scalability and cost-efficiency of a fully managed service.

To dive deeper into fine-tuning on SageMaker, see Instruction fine-tuning for FLAN T5 XL with Amazon SageMaker Jumpstart. To get more hands-on experience with Amazon Bedrock, check out our Building with Amazon Bedrock workshop.


About the Author

Joseph Sadler is a Senior Solutions Architect on the Worldwide Public Sector team at AWS, specializing in cybersecurity and machine learning. With public and private sector experience, he has expertise in cloud security, artificial intelligence, threat detection, and incident response. His diverse background helps him architect robust, secure solutions that use cutting-edge technologies to safeguard mission-critical systems

Read More

Unveiling a New Era of Local AI With NVIDIA NIM Microservices and AI Blueprints

Unveiling a New Era of Local AI With NVIDIA NIM Microservices and AI Blueprints

Over the past year, generative AI has transformed the way people live, work and play, enhancing everything from writing and content creation to gaming, learning and productivity. PC enthusiasts and developers are leading the charge in pushing the boundaries of this groundbreaking technology.

Countless times, industry-defining technological breakthroughs have been invented in one place — a garage. This week marks the start of the RTX AI Garage series, which will offer routine content for developers and enthusiasts looking to learn more about NVIDIA NIM microservices and AI Blueprints, and how to build AI agents, creative workflow, digital human, productivity apps and more on AI PCs. Welcome to the RTX AI Garage.

This first installment spotlights announcements made earlier this week at CES, including new AI foundation models available on NVIDIA RTX AI PCs that take digital humans, content creation, productivity and development to the next level.

These models — offered as NVIDIA NIM microservices — are powered by new GeForce RTX 50 Series GPUs. Built on the NVIDIA Blackwell architecture, RTX 50 Series GPUs deliver up to 3,352 trillion AI operations per second of performance, 32GB of VRAM and feature FP4 compute, doubling AI inference performance and enabling generative AI to run locally with a smaller memory footprint.

NVIDIA also introduced NVIDIA AI Blueprints — ready-to-use, preconfigured workflows, built on NIM microservices, for applications like digital humans and content creation.

NIM microservices and AI Blueprints empower enthusiasts and developers to build, iterate and deliver AI-powered experiences to the PC faster than ever. The result is a new wave of compelling, practical capabilities for PC users.

Fast-Track AI With NVIDIA NIM

There are two key challenges to bringing AI advancements to PCs. First, the pace of AI research is breakneck, with new models appearing daily on platforms like Hugging Face, which now hosts over a million models. As a result, breakthroughs quickly become outdated.

Second, adapting these models for PC use is a complex, resource-intensive process. Optimizing them for PC hardware, integrating them with AI software and connecting them to applications requires significant engineering effort.

NVIDIA NIM helps address these challenges by offering prepackaged, state-of-the-art AI models optimized for PCs. These NIM microservices span model domains, can be installed with a single click, feature application programming interfaces (APIs) for easy integration, and harness NVIDIA AI software and RTX GPUs for accelerated performance.

At CES, NVIDIA announced a pipeline of NIM microservices for RTX AI PCs, supporting use cases spanning large language models (LLMs), vision-language models, image generation, speech, retrieval-augmented generation (RAG), PDF extraction and computer vision.

The new Llama Nemotron family of open models provide high accuracy on a wide range of agentic tasks. The Llama Nemotron Nano model, which will be offered as a NIM microservice for RTX AI PCs and workstations, excels at agentic AI tasks like instruction following, function calling, chat, coding and math.

Soon, developers will be able to quickly download and run these microservices on Windows 11 PCs using Windows Subsystem for Linux (WSL).

To demonstrate how enthusiasts and developers can use NIM to build AI agents and assistants, NVIDIA previewed Project R2X, a vision-enabled PC avatar that can put information at a user’s fingertips, assist with desktop apps and video conference calls, read and summarize documents, and more. Sign up for Project R2X updates.

By using NIM microservices, AI enthusiasts can skip the complexities of model curation, optimization and backend integration and focus on creating and innovating with cutting-edge AI models.

What’s in an API?

An API is the way in which an application communicates with a software library. An API defines a set of “calls” that the application can make to the library and what the application can expect in return. Traditional AI APIs require a lot of setup and configuration, making AI capabilities harder to use and hampering innovation.

NIM microservices expose easy-to-use, intuitive APIs that an application can simply send requests to and get a response. In addition, they’re designed around the input and output media for different model types. For example, LLMs take text as input and produce text as output, image generators convert text to image, speech recognizers turn speech to text and so on.

The microservices are designed to integrate seamlessly with leading AI development and agent frameworks such as AI Toolkit for VSCode, AnythingLLM, ComfyUI, Flowise AI, LangChain, Langflow and LM Studio. Developers can easily download and deploy them from build.nvidia.com.

By bringing these APIs to RTX, NVIDIA NIM will accelerate AI innovation on PCs.

Enthusiasts are expected to be able to experience a range of NIM microservices using an upcoming release of the NVIDIA ChatRTX tech demo.

A Blueprint for Innovation

By using state-of-the-art models, prepackaged and optimized for PCs, developers and enthusiasts can quickly create AI-powered projects. Taking things a step further, they can combine multiple AI models and other functionality to build complex applications like digital humans, podcast generators and application assistants.

NVIDIA AI Blueprints, built on NIM microservices, are reference implementations for complex AI workflows. They help developers connect several components, including libraries, software development kits and AI models, together in a single application.

AI Blueprints include everything that a developer needs to build, run, customize and extend the reference workflow, which includes the reference application and source code, sample data, and documentation for customization and orchestration of the different components.

At CES, NVIDIA announced two AI Blueprints for RTX: one for PDF to podcast, which lets users generate a podcast from any PDF, and another for 3D-guided generative AI, which is based on FLUX.1 [dev] and expected be offered as a NIM microservice, offers artists greater control over text-based image generation.

With AI Blueprints, developers can quickly go from AI experimentation to AI development for cutting-edge workflows on RTX PCs and workstations.

Built for Generative AI

The new GeForce RTX 50 Series GPUs are purpose-built to tackle complex generative AI challenges, featuring fifth-generation Tensor Cores with FP4 support, faster G7 memory and an AI-management processor for efficient multitasking between AI and creative workflows.

The GeForce RTX 50 Series adds FP4 support to help bring better performance and more models to PCs. FP4 is a lower quantization method, similar to file compression, that decreases model sizes. Compared with FP16 — the default method that most models feature — FP4 uses less than half of the memory, and 50 Series GPUs provide over 2x performance compared with the previous generation. This can be done with virtually no loss in quality with advanced quantization methods offered by NVIDIA TensorRT Model Optimizer.

For example, Black Forest Labs’ FLUX.1 [dev] model at FP16 requires over 23GB of VRAM, meaning it can only be supported by the GeForce RTX 4090 and professional GPUs. With FP4, FLUX.1 [dev] requires less than 10GB, so it can run locally on more GeForce RTX GPUs.

With a GeForce RTX 4090 with FP16, the FLUX.1 [dev] model can generate images in 15 seconds with 30 steps. With a GeForce RTX 5090 with FP4, images can be generated in just over five seconds.

Get Started With the New AI APIs for PCs

NVIDIA NIM microservices and AI Blueprints are expected to be available starting next month, with initial hardware support for GeForce RTX 50 Series, GeForce RTX 4090 and 4080, and NVIDIA RTX 6000 and 5000 professional GPUs. Additional GPUs will be supported in the future.

NIM-ready RTX AI PCs are expected to be available from Acer, ASUS, Dell, GIGABYTE, HP, Lenovo, MSI, Razer and Samsung, and from local system builders Corsair, Falcon Northwest, LDLC, Maingear, Mifcon, Origin PC, PCS and Scan.

GeForce RTX 50 Series GPUs and laptops deliver game-changing performance, power transformative AI experiences, and enable creators to complete workflows in record time. Rewatch NVIDIA CEO Jensen Huang’s  keynote to learn more about NVIDIA’s AI news unveiled at CES.

See notice regarding software product information.

Read More

Align and monitor your Amazon Bedrock powered insurance assistance chatbot to responsible AI principles with AWS Audit Manager

Align and monitor your Amazon Bedrock powered insurance assistance chatbot to responsible AI principles with AWS Audit Manager

Generative AI applications are gaining widespread adoption across various industries, including regulated industries such as financial services and healthcare. As these advanced systems accelerate in playing a critical role in decision-making processes and customer interactions, customers should work towards ensuring the reliability, fairness, and compliance of generative AI applications with industry regulations. To address this need, AWS generative AI best practices framework was launched within AWS Audit Manager, enabling auditing and monitoring of generative AI applications. This framework provides step-by-step guidance on approaching generative AI risk assessment, collecting and monitoring evidence from Amazon Bedrock and Amazon SageMaker environments to assess your risk posture, and preparing to meet future compliance requirements.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Agents can be used to configure specialized agents that run actions seamlessly based on user input and your organization’s data. These managed agents play conductor, orchestrating interactions between FMs, API integrations, user conversations, and knowledge bases loaded with your data.

Insurance claim lifecycle processes typically involve several manual tasks that are painstakingly managed by human agents. An Amazon Bedrock-powered insurance agent can assist human agents and improve existing workflows by automating repetitive actions as demonstrated in the example in this post, which can create new claims, send pending document reminders for open claims, gather claims evidence, and search for information across existing claims and customer knowledge repositories.

Generative AI applications should be developed with adequate controls for steering the behavior of FMs. Responsible AI considerations such as privacy, security, safety, controllability, fairness, explainability, transparency and governance help ensure that AI systems are trustworthy. In this post, we demonstrate how to use the AWS generative AI best practices framework on AWS Audit Manager to evaluate this insurance claim agent from a responsible AI lens.

Use case

In this example of an insurance assistance chatbot, the customer’s generative AI application is designed with Amazon Bedrock Agents to automate tasks related to the processing of insurance claims and Amazon Bedrock Knowledge Bases to provide relevant documents. This allows users to directly interact with the chatbot when creating new claims and receiving assistance in an automated and scalable manner.

User interacts with Amazon Bedrock Agents, which in turn retrieves context from the Amazon Bedrock Knowledge Base or can make various API calls fro defined functions

The user can interact with the chatbot using natural language queries to create a new claim, retrieve an open claim using a specific claim ID, receive a reminder for documents that are pending, and gather evidence about specific claims.

The agent then interprets the user’s request and determines if actions need to be invoked or information needs to be retrieved from a knowledge base. If the user request invokes an action, action groups configured for the agent will invoke different API calls, which produce results that are summarized as the response to the user. Figure 1 depicts the system’s functionalities and AWS services. The code sample for this use case is available in GitHub and can be expanded to add new functionality to the insurance claims chatbot.

How to create your own assessment of the AWS generative AI best practices framework

  1. To create an assessment using the generative AI best practices framework on Audit Manager, go to the AWS Management Console and navigate to AWS Audit Manager.
  2. Choose Create assessment.

Choose Create Assessment on the AWS Audit Manager dashboard

  1. Specify the assessment details, such as the name and an Amazon Simple Storage Service (Amazon S3) bucket to save assessment reports to. Select AWS Generative AI Best Practices Framework for assessment.

Specify assessment details and choose the AWS Generative AI Best Practices Framework v2

  1. Select the AWS accounts in scope for assessment. If you’re using AWS Organizations and you have enabled it in Audit Manager, you will be able to select multiple accounts at once in this step. One of the key features of AWS Organizations is the ability to perform various operations across multiple AWS accounts simultaneously.

Add the AWS accounts in scope for the assessment

  1. Next, select the audit owners to manage the preparation for your organization. When it comes to auditing activities within AWS accounts, it’s considered a best practice to create a dedicated role specifically for auditors or auditing purposes. This role should be assigned only the permissions required to perform auditing tasks, such as reading logs, accessing relevant resources, or running compliance checks.

Specify audit owners

  1. Finally, review the details and choose Create assessment.

Review and create assessment

Principles of AWS generative AI best practices framework

Generative AI implementations can be evaluated based on eight principles in the AWS generative AI best practices framework. For each, we will define the principle and explain how Audit Manager conducts an evaluation.

Accuracy

A core principle of trustworthy AI systems is accuracy of the application and/or model. Measures of accuracy should consider computational measures, and human-AI teaming. It is also important that AI systems are well tested in production and should demonstrate adequate performance in the production setting. Accuracy measurements should always be paired with clearly defined and realistic test sets that are representative of conditions of expected use.

For the use case of an insurance claims chatbot built with Amazon Bedrock Agents, you will use the large language model (LLM) Claude Instant from Anthropic, which you won’t need to further pre-train or fine-tune. Hence, it is relevant for this use case to demonstrate the performance of the chatbot through performance metrics for the tasks through the following:

  • A prompt benchmark
  • Source verification of documents ingested in knowledge bases or databases that the agent has access to
  • Integrity checks of the connected datasets as well as the agent
  • Error analysis to detect the edge cases where the application is erroneous
  • Schema compatibility of the APIs
  • Human-in-the-loop validation.

To measure the efficacy of the assistance chatbot, you will use promptfoo—a command line interface (CLI) and library for evaluating LLM apps. This involves three steps:

  1. Create a test dataset containing prompts with which you test the different features.
  2. Invoke the insurance claims assistant on these prompts and collect the responses. Additionally, the traces of these responses are also helpful in debugging unexpected behavior.
  3. Set up evaluation metrics that can be derived in an automated manner or using human evaluation to measure the quality of the assistant.

In the example of an insurance assistance chatbot, designed with Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases, there are four tasks:

  • getAllOpenClaims: Gets the list of all open insurance claims. Returns all claim IDs that are open.
  • getOutstandingPaperwork: Gets the list of pending documents that need to be uploaded by the policy holder before the claim can be processed. The API takes in only one claim ID and returns the list of documents that are pending to be uploaded. This API should be called for each claim ID.
  • getClaimDetail: Gets all details about a specific claim given a claim ID.
  • sendReminder: Send a reminder to the policy holder about pending documents for the open claim. The API takes in only one claim ID and its pending documents at a time, sends the reminder, and returns the tracking details for the reminder. This API should be called for each claim ID you want to send reminders for.

For each of these tasks, you will create sample prompts to create a synthetic test dataset. The idea is to generate sample prompts with expected outcomes for each task. For the purposes of demonstrating the ideas in this post, you will create only a few samples in the synthetic test dataset. In practice, the test dataset should reflect the complexity of the task and possible failure modes for which you would want to test the application. Here are the sample prompts that you will use for each task:

  • getAllOpenClaims
    • What are the open claims?
    • List open claims.
  • getOutstandingPaperwork
    • What are the missing documents from {{claim}}?
    • What is missing from {{claim}}?
  • getClaimDetail
    • Explain the details to {{claim}}
    • What are the details of {{claim}}
  • sendReminder
    • Send reminder to {{claim}}
    • Send reminder to {{claim}}. Include the missing documents and their requirements.
  • Also include sample prompts for a set of unwanted results to make sure that the agent only performs the tasks that are predefined and doesn’t provide out of context or restricted information.
    • List all claims, including closed claims
    • What is 2+2?

Set up

You can start with the example of an insurance claims agent by cloning the use case of Amazon Bedrock-powered insurance agent. After you create the agent, set up promptfoo. Now, you will need to create a custom script that can be used for testing. This script should be able to invoke your application for a prompt from the synthetic test dataset. We created a Python script, invoke_bedrock_agent.py, with which we invoke the agent for a given prompt.

python invoke_bedrock_agent.py "What are the open claims?"

Step 1: Save your prompts

Create a text file of the sample prompts to be tested. As seen in the following, a claim can be a parameter that is inserted into the prompt during testing.

%%writefile prompts_getClaimDetail.txt
Explain the details to {{claim}}.
---
What are the details of {{claim}}.

Step 2: Create your prompt configuration with tests

For prompt testing, we defined test prompts per task. The YAML configuration file uses a format that defines test cases and assertions for validating prompts. Each prompt is processed through a series of sample inputs defined in the test cases. Assertions check whether the prompt responses meet the specified requirements. In this example, you use the prompts for task getClaimDetail and define the rules. There are different types of tests that can be used in promptfoo. This example uses keywords and similarity to assess the contents of the output. Keywords are checked using a list of values that are present in the output. Similarity is checked through the embedding of the FM’s output to determine if it’s semantically similar to the expected value.

%%writefile promptfooconfig.yaml
prompts: [prompts_getClaimDetail.txt] # text file that has the prompts
providers: ['bedrock_agent_as_provider.js'] # custom provider setting
defaultTest:
  options:
    provider:
      embedding:
        id: huggingface:sentence-similarity:sentence-transformers/all-MiniLM-L6-v2
tests:
  - description: 'Test via keywords'
    vars:
      claim: claim-008 # a claim that is open
    assert:
      - type: contains-any
        value:
          - 'claim'
          - 'open'
  - description: 'Test via similarity score'
    vars: 
      claim: claim-008 # a claim that is open
    assert:
      - type: similar
        value: 'Providing the details for claim with id xxx: it is created on xx-xx-xxxx, last activity date on xx-xx-xxxx, status is x, the policy type is x.'
        threshold: 0.6

Step 3: Run the tests

Run the following commands to test the prompts against the set rules.

npx promptfoo@latest eval -c promptfooconfig.yaml
npx promptfoo@latest share

The promptfoo library generates a user interface where you can view the exact set of rules and the outcomes. The user interface for the tests that were run using the test prompts is shown in the following figure.

Prompfoo user interface for the tests that were run using the test prompts

For each test, you can view the details, that is, what was the prompt, what was the output and the test that was performed, as well as the reason. You see the prompt test result for getClaimDetail in the following figure, using the similarity score against the expected result, given as a sentence.

promptfoo user interface showing prompt test result for getClaimDetail

Similarly, using the similarity score against the expected result, you get the test result for getOpenClaims as shown in the following figure.

Promptfoo user interface showing test result for getOpenClaims

Step 4: Save the output

For the final step, you want to attach evidence for both the FM as well as the application as a whole to the control ACCUAI 3.1: Model Evaluation Metrics. To do so, save the output of your prompt testing into an S3 bucket. In addition, the performance metrics of the FM can be found in the model card, which is also first saved to an S3 bucket. Within Audit Manager, navigate to the corresponding control, ACCUAI 3.1: Model Evaluation Metrics, select Add manual evidence and Import file from S3 to provide both model performance metrics and application performance as shown in the following figure.

Add manual evidence and Import file from S3 to provide both model performance metrics and application performance

In this section, we showed you how to test a chatbot and attach the relevant evidence. In the insurance claims chatbot, we did not customize the FM and thus the other controls—including ACCUAI3.2: Regular Retraining for Accuracy, ACCUAI3.11: Null Values, ACCUAI3.12: Noise and Outliers, and ACCUAI3.15: Update Frequency—are not applicable. Hence, we will not include these controls in the assessment performed for the use case of an insurance claims assistant.

We showed you how to test a RAG-based chatbot for controls using a synthetic test benchmark of prompts and add the results to the evaluation control. Based on your application, one or more controls in this section might apply and be relevant to demonstrate the trustworthiness of your application.

Fair

Fairness in AI includes concerns for equality and equity by addressing issues such as harmful bias and discrimination.

Fairness of the insurance claims assistant can be tested through the model responses when user-specific information is presented to the chatbot. For this application, it’s desirable to see no deviations in the behavior of the application when the chatbot is exposed to user-specific characteristics. To test this, you can create prompts containing user characteristics and then test the application using a process similar to the one described in the previous section. This evaluation can then be added as evidence to the control for FAIRAI 3.1: Bias Assessment.

An important element of fairness is having diversity in the teams that develop and test the application. This helps incorporate different perspectives are addressed in the AI development and deployment lifecycle so that the final behavior of the application addresses the needs of diverse users. The details of the team structure can be added as manual evidence for the control FAIRAI 3.5: Diverse Teams. Organizations might also already have ethics committees that review AI applications. The structure of the ethics committee and the assessment of the application can be included as manual evidence for the control FAIRAI 3.6: Ethics Committees.

Moreover, the organization can also improve fairness by incorporating features to improve accessibility of the chatbot for individuals with disabilities. By using Amazon Transcribe to stream transcription of user speech to text and Amazon Polly to play back speech audio to the user, voice can be used with an application built with Amazon Bedrock as detailed in Amazon Bedrock voice conversation architecture.

Privacy

NIST defines privacy as the norms and practices that help to safeguard human autonomy, identity, and dignity. Privacy values such as anonymity, confidentiality, and control should guide choices for AI system design, development, and deployment. The insurance claims assistant example doesn’t include any knowledge bases or connections to databases that contain customer data. If it did, additional access controls and authentication mechanisms would be required to make sure that customers can only access data they are authorized to retrieve.

Additionally, to discourage users from providing personally identifiable information (PII) in their interactions with the chatbot, you can use Amazon Bedrock Guardrails. By using the PII filter and adding the guardrail to the agent, PII entities in user queries of model responses will be redacted and pre-configured messaging will be provided instead. After guardrails are implemented, you can test them by invoking the chatbot with prompts that contain dummy PII. These model invocations are logged in Amazon CloudWatch; the logs can then be appended as automated evidence for privacy-related controls including PRIAI 3.10: Personal Identifier Anonymization or Pseudonymization and PRIAI 3.9: PII Anonymization.

In the following figure, a guardrail was created to filter PII and unsupported topics. The user can test and view the trace of the guardrail within the Amazon Bedrock console using natural language. For this use case, the user asked a question whose answer would require the FM to provide PII. The trace shows that sensitive information has been blocked because the guardrail detected PII in the prompt.

Under Guardrail details section of the agent builder, add the PII filter

As a next step, under the Guardrail details section of the agent builder, the user adds the PII guardrail, as shown in the figure below.

filter for bedrock-logs and choose to download them

Amazon Bedrock is integrated with CloudWatch, which allows you to track usage metrics for audit purposes. As described in Monitoring generative AI applications using Amazon Bedrock and Amazon CloudWatch integration, you can enable model invocation logging. When analyzing insights with Amazon Bedrock, you can query model invocations. The logs provide detailed information about each model invocation, including the input prompt, the generated output, and any intermediate steps or reasoning. You can use these logs to demonstrate transparency and accountability.

Model innovation logging can be used to collected invocation logs including full request data, response data, and metadata with all calls performed in your account. This can be enabled by following the steps described in Monitor model invocation using CloudWatch Logs.

You can then export the relevant CloudWatch logs from Log Insights for this model invocation as evidence for relevant controls. You can filter for bedrock-logs and choose to download them as a table, as shown in the figure below, so the results can be uploaded as manual evidence for AWS Audit Manager.

filter for bedrock-logs and choose to download them

For the guardrail example, the specific model invocation will be shown in the logs as in the following figure. Here, the prompt and the user who ran it are captured. Regarding the guardrail action, it shows that the result is INTERVENED because of the blocked action with the PII entity email. For AWS Audit Manager, you can export the result and upload it as manual evidence under PRIAI 3.9: PII Anonymization.

Add the Guardrail intervened behavior as evidence to the AWS Audit Manager assessment

Furthermore, organizations can establish monitoring of their AI applications—particularly when they deal with customer data and PII data—and establish an escalation procedure for when a privacy breach might occur. Documentation related to the escalation procedure can be added as manual evidence for the control PRIAI3.6: Escalation Procedures – Privacy Breach.

These are some of the most relevant controls to include in your assessment of a chatbot application from the dimension of Privacy.

Resilience

In this section, we show you how to improve the resilience of an application to add evidence of the same to controls defined in the Resilience section of the AWS generative AI best practices framework.

AI systems, as well as the infrastructure in which they are deployed, are said to be resilient if they can withstand unexpected adverse events or unexpected changes in their environment or use. The resilience of a generative AI workload plays an important role in the development process and needs special considerations.

The various components of the insurance claims chatbot require resilient design considerations. Agents should be designed with appropriate timeouts and latency requirements to ensure a good customer experience. Data pipelines that ingest data to the knowledge base should account for throttling and use backoff techniques. It’s a good idea to consider parallelism to reduce bottlenecks when using embedding models, account for latency, and keep in mind the time required for ingestion. Considerations and best practices should be implemented for vector databases, the application tier, and monitoring the use of resources through an observability layer. Having a business continuity plan with a disaster recovery strategy is a must for any workload. Guidance for these considerations and best practices can be found in Designing generative AI workloads for resilience. Details of these architectural elements should be added as manual evidence in the assessment.

Responsible

Key principles of responsible design are explainability and interpretability. Explainability refers to the mechanisms that drive the functionality of the AI system, while interpretability refers to the meaning of the output of the AI system with the context of the designed functional purpose. Together, both explainability and interpretability assist in the governance of an AI system to maintain the trustworthiness of the system. The trace of the agent for critical prompts and various requests that users can send to the insurance claims chatbot can be added as evidence for the reasoning used by the agent to complete a user request.

The logs gathered from Amazon Bedrock offer comprehensive insights into the model’s handling of user prompts and the generation of corresponding answers. The figure below shows a typical model invocation log. By analyzing these logs, you can gain visibility into the model’s decision-making process. This logging functionality can serve as a manual audit trail, fulfilling RESPAI3.4: Auditable Model Decisions.

typical model invocation log

Another important aspect of maintaining responsible design, development, and deployment of generative AI applications is risk management. This involves risk assessment where risks are identified across broad categories for the applications to identify harmful events and assign risk scores. This process also identifies mitigations that can reduce an inherent risk of a harmful event occurring to a lower residual risk. For more details on how to perform risk assessment of your Generative AI application, see Learn how to assess the risk of AI systems. Risk assessment is a recommended practice, especially for safety critical or regulated applications where identifying the necessary mitigations can lead to responsible design choices and a safer application for the users. The risk assessment reports are good evidence to be included under this section of the assessment and can be uploaded as manual evidence. The risk assessment should also be periodically reviewed to update changes to the application that can introduce the possibility of new harmful events and consider new mitigations for reducing the impact of these events.

Safe

AI systems should “not under defined conditions, lead to a state in which human life, health, property, or the environment is endangered.” (Source: ISO/IEC TS 5723:2022) For the insurance claims chatbot, following safety principles should be followed to prevent interactions with users outside of the limits of the defined functions. Amazon Bedrock Guardrails can be used to define topics that are not supported by the chatbot. The intended use of the chatbot should also be transparent to users to guide them in the best use of the AI application. An unsupported topic could include providing investment advice, which be blocked by creating a guardrail with investment advice defined as a denied topic as described in Guardrails for Amazon Bedrock helps implement safeguards customized to your use case and responsible AI policies.

After this functionality is enabled as a guardrail, the model will prohibit unsupported actions. The instance illustrated in the following figure depicts a scenario where requesting investment advice is a restricted behavior, leading the model to decline providing a response.

Guardrail can help to enforce restricted behavior

After the model is invoked, the user can navigate to CloudWatch to view the relevant logs. In cases where the model denies or intervenes in certain actions, such as providing investment advice, the logs will reflect the specific reasons for the intervention, as shown in the following figure. By examining the logs, you can gain insights into the model’s behavior, understand why certain actions were denied or restricted, and verify that the model is operating within the intended guidelines and boundaries. For the controls defined under the safety section of the assessment, you might want to design more experiments by considering various risks that arise from your application. The logs and documentation collected from the experiments can be attached as evidence to demonstrate the safety of the application.

Log insights from Amazon Bedrock shows the details of how Amazon Bedrock Guardrails intervened

Secure

NIST defines AI systems to be secure when they maintain confidentiality, integrity, and availability through protection mechanisms that prevent unauthorized access and use. Applications developed using generative AI should build defenses for adversarial threats including but not limited to prompt injection, data poisoning if a model is being fine-tuned or pre-trained, and model and data extraction exploits through AI endpoints.

Your information security teams should conduct standard security assessments that have been adapted to address the new challenges with generative AI models and applications—such as adversarial threats—and consider mitigations such as red-teaming. To learn more on various security considerations for generative AI applications, see Securing generative AI: An introduction to the Generative AI Security Scoping Matrix. The resulting documentation of the security assessments can be attached as evidence to this section of the assessment.

Sustainable

Sustainability refers to the “state of the global system, including environmental, social, and economic aspects, in which the needs of the present are met without compromising the ability of future generations to meet their own needs.”

Some actions that contribute to a more sustainable design of generative AI applications include considering and testing smaller models to achieve the same functionality, optimizing hardware and data storage, and using efficient training algorithms. To learn more about how you can do this, see Optimize generative AI workloads for environmental sustainability. Considerations implemented for achieving more sustainable applications can be added as evidence for the controls related to this part of the assessment.

Conclusion

In this post, we used the example of an insurance claims assistant powered by Amazon Bedrock Agents and looked at various principles that you need to consider when getting this application audit ready using the AWS generative AI best practices framework on Audit Manager. We defined each principle of safeguarding applications for trustworthy AI and provided some best practices for achieving the key objectives of the principles. Finally, we showed you how these development and design choices can be added to the assessment as evidence to help you prepare for an audit.

The AWS generative AI best practices framework provides a purpose-built tool that you can use for monitoring and governance of your generative AI projects on Amazon Bedrock and Amazon SageMaker. To learn more, see:


About the Authors

Bharathi Srinivasan is a Generative AI Data Scientist at the AWS Worldwide Specialist Organisation. She works on developing solutions for Responsible AI, focusing on algorithmic fairness, veracity of large language models, and explainability. Bharathi guides internal teams and AWS customers on their responsible AI journey. She has presented her work at various learning conferences.

Irem Gokcek is a Data Architect in the AWS Professional Services team, with expertise spanning both Analytics and AI/ML. She has worked with customers from various industries such as retail, automotive, manufacturing and finance to build scalable data architectures and generate valuable insights from the data. In her free time, she is passionate about swimming and painting.

Fiona McCann is a Solutions Architect at Amazon Web Services in the public sector. She specializes in AI/ML with a focus on Responsible AI. Fiona has a passion for helping nonprofit customers achieve their missions with cloud solutions. Outside of building on AWS, she loves baking, traveling, and running half marathons in cities she visits.

Read More

London Stock Exchange Group uses Amazon Q Business to enhance post-trade client services

London Stock Exchange Group uses Amazon Q Business to enhance post-trade client services

This post was co-written with Ben Doughton, Head of Product Operations – LCH, Iulia Midus, Site Reliability Engineer – LCH, and Maurizio Morabito, Software and AI specialist – LCH (part of London Stock Exchange Group, LSEG).

In the financial industry, quick and reliable access to information is essential, but searching for data or facing unclear communication can slow things down. An AI-powered assistant can change that. By instantly providing answers and helping to navigate complex systems, such assistants can make sure that key information is always within reach, improving efficiency and reducing the risk of miscommunication. Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business enables employees to become more creative, data-driven, efficient, organized, and productive.

In this blog post, we explore a client services agent assistant application developed by the London Stock Exchange Group (LSEG) using Amazon Q Business. We will discuss how Amazon Q Business saved time in generating answers, including summarizing documents, retrieving answers to complex Member enquiries, and combining information from different data sources (while providing in-text citations to the data sources used for each answer).

The challenge

The London Clearing House (LCH) Group of companies includes leading multi-asset class clearing houses and are part of the Markets division of LSEG PLC (LSEG Markets). LCH provides proven risk management capabilities across a range of asset classes, including over-the-counter (OTC) and listed interest rates, fixed income, foreign exchange (FX), credit default swap (CDS), equities, and commodities.

As the LCH business continues to grow, the LCH team has been continuously exploring ways to improve their support to customers (members) and to increase LSEG’s impact on customer success. As part of LSEG’s multi-stage AI strategy, LCH has been exploring the role that generative AI services can have in this space. One of the key capabilities that LCH is interested in is a managed conversational assistant that requires minimal technical knowledge to build and maintain. In addition, LCH has been looking for a solution that is focused on its knowledge base and that can be quickly kept up to date. For this reason, LCH was keen to explore techniques such as Retrieval Augmented Generation (RAG). Following a review of available solutions, the LCH team decided to build a proof-of-concept around Amazon Q Business.

Business use case

Realizing value from generative AI relies on a solid business use case. LCH has a broad base of customers raising queries to their client services (CS) team across a diverse and complex range of asset classes and products. Example queries include: “What is the eligible collateral at LCH?” and “Can members clear NIBOR IRS at LCH?” This requires CS team members to refer to detailed service and policy documentation sources to provide accurate advice to their members.

Historically, the CS team has relied on producing product FAQs for LCH members to refer to and, where required, an in-house knowledge center for CS team members to refer to when answering complex customer queries. To improve the customer experience and boost employee productivity, the CS team set out to investigate whether generative AI could help answer questions from individual members, thus reducing the number of customer queries. The goal was to increase the speed and accuracy of information retrieval within the CS workflows when responding to the queries that inevitably come through from customers.

Project workflow

The CS use case was developed through close collaboration between LCH and Amazon Web Service (AWS) and involved the following steps:

  1. Ideation: The LCH team carried out a series of cross-functional workshops to examine different large language model (LLM) approaches including prompt engineering, RAG, and custom model fine tuning and pre-training. They considered different technologies such as Amazon SageMaker and Amazon SageMaker Jumpstart and evaluated trade-offs between development effort and model customization. Amazon Q Business was selected because of its built-in enterprise search web crawler capability and ease of deployment without the need for LLM deployment. Another attractive feature was the ability to clearly provide source attribution and citations. This enhanced the reliability of the responses, allowing users to verify facts and explore topics in greater depth (important aspects to increase their overall trust in the responses received).
  2. Knowledge base creation: The CS team built data sources connectors for the LCH website, FAQs, customer relationship management (CRM) software, and internal knowledge repositories and included the Amazon Q Business built-in index and retriever in the build.
  3. Integration and testing: The application was secured using a third-party identity provider (IdP) as the IdP for identity and access management to manage users with their enterprise IdP and used AWS Identity and Access Management (IAM) to authenticate users when they signed in to Amazon Q Business. Testing was carried out to verify factual accuracy of responses, evaluating the performance and quality of the AI-generated answers, which demonstrated that the system had achieved a high level of factual accuracy. Wider improvements in business performance were demonstrated including enhancements in response time, where responses were delivered within a few seconds. Tests were undertaken with both unstructured and structured data within the documents.
  4. Phased rollout: The CS AI assistant was rolled out in a phased approach to provide thorough, high-quality answers. In the future, there are plans to integrate their Amazon Q Business application with existing email and CRM interfaces, and to expand its use to additional use cases and functions within LSEG. 

Solution overview

In this solution overview, we’ll explore the LCH-built Amazon Q Business application.

The LCH admin team developed a web-based interface that serves as a gateway for their internal client services team to interact with the Amazon Q Business API and other AWS services (Amazon Elastic Compute Cloud (Amazon ECS), Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), and Amazon Bedrock) and secured it using SAML 2.0 IAM federation—maintaining secure access to the chat interface—to retrieve answers from a pre-indexed knowledge base and to validate the responses using Anthropic’s Claude v2 LLM.

The following figure illustrates the architecture for the LCH client services application.

Architectural Design of the Solution

The workflow consists of the following steps:

  1. The LCH team set up the Amazon Q Business application using a SAML 2.0 IAM IdP. (The example in the blog post shows connecting with Okta as the IdP for Amazon Q Business. However, the LCH team built the application using a third-party solution as the IdP instead of Okta). This architecture allows LCH users to sign in using their existing identity credentials from their enterprise IdP, while they maintain control over which users have access to their Amazon Q Business application.
  2. The application had two data sources as part of the configuration for their Amazon Q Business application:
    1. An S3 bucket to store and index their internal LCH documents. This allows the Amazon Q Business application to access and search through their internal product FAQ PDF documents as part of providing responses to user queries. Indexing the documents in Amazon S3 makes them readily available for the application to retrieve relevant information.
    2. In addition to internal documents, the team has also set up their public-facing LCH website as a data source using a web crawler that can index and extract information from their rulebooks.
  3. The LCH team opted for a custom user interface (UI) instead of the built-in web experience provided by Amazon Q Business to have more control over the frontend by directly accessing the Amazon Q Business API. The application’s frontend was developed using the open source application framework and hosted on Amazon ECS. The frontend application accesses an Amazon API Gateway REST API endpoint to interact with the business logic written in AWS Lambda
  4. The architecture consists of two Lambda functions:
    1. An authorizer Lambda function is responsible for authorizing the frontend application to access the Amazon Q business API by generating temporary AWS credentials.
    2. A ChatSync Lambda function is responsible for accessing the Amazon Q Business ChatSync API to start an Amazon Q Business conversation.
  5. The architecture includes a Validator Lambda function, which is used by the admin to validate the accuracy of the responses generated by the Amazon Q Business application.
    1. The LCH team has stored a golden answer knowledge base in an S3 bucket, consisting of approximately 100 questions and answers about their product FAQs and rulebooks collected from their live agents. This knowledge base serves as a benchmark for the accuracy and reliability of the AI-generated responses.
    2. By comparing the Amazon Q Business chat responses against their golden answers, LCH can verify that the AI-powered assistant is providing accurate and consistent information to their customers.
    3. The Validator Lambda function retrieves data from a DynamoDB table and sends it to Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) that can be used to quickly experiment with and evaluate top FMs for a given use case, privately customize the FMs with existing data using techniques such as fine-tuning and RAG, and build agents that execute tasks using enterprise systems and data sources.
    4. The Amazon Bedrock service uses Anthropic’s Claude v2 model to validate the Amazon Q Business application queries and responses against the golden answers stored in the S3 bucket.
    5. Anthropic’s Claude v2 model returns a score for each question and answer, in addition to a total score, which is then provided to the application admin for review.
    6. The Amazon Q Business application returned answers within a few seconds for each question. The overall expectation is that Amazon Q Business saves time for each live agent on each question by providing quick and correct responses.

This validation process helped LCH to build trust and confidence in the capabilities of Amazon Q Business, enhancing the overall customer experience.

Conclusion

This post provides an overview of LSEG’s experience in adopting Amazon Q Business to support LCH client services agents for B2B query handling. This specific use case was built by working backward from a business goal to improve customer experience and staff productivity in a complex, highly technical area of the trading life cycle (post-trade). The variety and large size of enterprise data sources and the regulated environment that LSEG operates in makes this post particularly relevant to customer service operations dealing with complex query handling. Managed, straightforward-to-use RAG is a key capability within a wider vision of providing technical and business users with an environment, tools, and services to use generative AI across providers and LLMs. You can get started with this tool by creating a sample Amazon Q Business application.


About the Authors

Ben Doughton is a Senior Product Manager at LSEG with over 20 years of experience in Financial Services. He leads product operations, focusing on product discovery initiatives, data-informed decision-making and innovation. He is passionate about machine learning and generative AI as well as agile, lean and continuous delivery practices.

Maurizio Morabito, Software and AI specialist at LCH, one of the early adopters of Neural Networks in the years 1990–1992 before a long hiatus in technology and finance companies in Asia and Europe, finally returning to Machine Learning in 2021. Maurizio is now leading the way to implement AI in LSEG Markets, following the motto “Tackling the Long and the Boring”

Iulia Midus is a recent IT Management graduate and currently working in Post-trade. The main focus of the work so far has been data analysis and AI, and looking at ways to implement these across the business.

Magnus Schoeman is a Principal Customer Solutions Manager at AWS. He has 25 years of experience across private and public sectors where he has held leadership roles in transformation programs, business development, and strategic alliances. Over the last 10 years, Magnus has led technology-driven transformations in regulated financial services operations (across Payments, Wealth Management, Capital Markets, and Life & Pensions).

Sudha Arumugam is an Enterprise Solutions Architect at AWS, advising large Financial Services organizations. She has over 13 years of experience in creating reliable software solutions to complex problems and She has extensive experience in serverless event-driven architecture and technologies and is passionate about machine learning and AI. She enjoys developing mobile and web applications.

Elias Bedmar is a Senior Customer Solutions Manager at AWS. He is a technical and business program manager helping customers be successful on AWS. He supports large migration and modernization programs, cloud maturity initiatives, and adoption of new services. Elias has experience in migration delivery, DevOps engineering and cloud infrastructure.

Marcin Czelej is a Machine Learning Engineer at AWS Generative AI Innovation and Delivery. He combines over 7 years of experience in C/C++ and assembler programming with extensive knowledge in machine learning and data science. This unique skill set allows him to deliver optimized and customised solutions across various industries. Marcin has successfully implemented AI advancements in sectors such as e-commerce, telecommunications, automotive, and the public sector, consistently creating value for customers.

Zmnako Awrahman, Ph.D., is a generative AI Practice Manager at AWS Generative AI Innovation and Delivery with extensive experience in helping enterprise customers build data, ML, and generative AI strategies. With a strong background in technology-driven transformations, particularly in regulated industries, Zmnako has a deep understanding of the challenges and opportunities that come with implementing cutting-edge solutions in complex environments.

Read More

Evaluate large language models for your machine translation tasks on AWS

Evaluate large language models for your machine translation tasks on AWS

Large language models (LLMs) have demonstrated promising capabilities in machine translation (MT) tasks. Depending on the use case, they are able to compete with neural translation models such as Amazon Translate. LLMs particularly stand out for their natural ability to learn from the context of the input text, which allows them to pick up on cultural cues and produce more natural sounding translations. For instance, the sentence “Did you perform well?” translated in French might be translated into “Avez-vous bien performé?” The target translation can vary widely depending on the context. If the question is asked in the context of sport, such as “Did you perform well at the soccer tournament?”, the natural French translation would be very different. It is critical for AI models to capture not only the context, but also the cultural specificities to produce a more natural sounding translation. One of LLMs’ most fascinating strengths is their inherent ability to understand context.

A number of our global customers are looking to take advantage of this capability to improve the quality of their translated content. Localization relies on both automation and humans-in-the-loop in a process called Machine Translation Post Editing (MTPE). Building solutions that help enhance translated content quality present multiple benefits:

  • Potential cost savings on MTPE activities
  • Faster turnaround for localization projects
  • Better experience for content consumers and readers overall with enhanced quality

LLMs have also shown gaps with regards to MT tasks, such as:

  • Inconsistent quality over certain language pairs
  • No standard pattern to integrate past translations knowledge, also known as translation memory (TM)
  • Inherent risk of hallucination

Switching MT workloads from to LLM-driven translation should be considered on a case-by-case basis. However, the industry is seeing enough potential to consider LLMs as a valuable option.

This blog post with accompanying code presents a solution to experiment with real-time machine translation using foundation models (FMs) available in Amazon Bedrock. It can help collect more data on the value of LLMs for your content translation use cases.

Steering the LLMs’ output

Translation memory and TMX files are important concepts and file formats used in the field of computer-assisted translation (CAT) tools and translation management systems (TMSs).

Translation memory

A translation memory is a database that stores previously translated text segments (typically sentences or phrases) along with their corresponding translations. The main purpose of a TM is to aid human or machine translators by providing them with suggestions for segments that have already been translated before. This can significantly improve translation efficiency and consistency, especially for projects involving repetitive content or similar subject matter.

Translation Memory eXchange (TMX) is a widely used open standard for representing and exchanging TM data. It is an XML-based file format that allows for the exchange of TMs between different CAT tools and TMSs. A typical TMX file contains a structured representation of translation units, which are groupings of a same text translated into multiple languages.

Integrating TM with LLMs

The use of TMs in combination with LLMs can be a powerful approach for improving the quality and efficiency of machine translation. The following are a few potential benefits:

  • Improved accuracy and consistency – LLMs can benefit from the high-quality translations stored in TMs, which can help improve the overall accuracy and consistency of the translations produced by the LLM. The TM can provide the LLM with reliable reference translations for specific segments, reducing the chances of errors or inconsistencies.
  • Domain adaptation – TMs often contain translations specific to a particular domain or subject matter. By using a domain-specific TM, the LLM can better adapt to the terminology, style, and context of that domain, leading to more accurate and natural translations.
  • Efficient reuse of human translations – TMs store human-translated segments, which are typically of higher quality than machine-translated segments. By incorporating these human translations into the LLM’s training or inference process, the LLM can learn from and reuse these high-quality translations, potentially improving its overall performance.
  • Reduced post-editing effort – When the LLM can accurately use the translations stored in the TM, the need for human post-editing can be reduced, leading to increased productivity and cost savings.

Another approach to integrating TM data with LLMs is to use fine-tuning in the same way you would fine-tune a model for business domain content generation, for instance. For customers operating in global industries, potentially translating to and from over 10 languages, this approach can prove to be operationally complex and costly. The solution proposed in this post relies on LLMs’ context learning capabilities and prompt engineering. It enables you to use an off-the-shelf model as is without involving machine learning operations (MLOps) activity.

Solution overview

The LLM translation playground is a sample application providing the following capabilities:

  • Experiment with LLM translation capabilities using models available in Amazon Bedrock
  • Create and compare various inference configurations
  • Evaluate the impact of prompt engineering and Retrieval Augmented Generation (RAG) on translation with LLMs
  • Configure supported language pairs
  • Import, process, and test translation using your existing TMX file with Multiple LLMS
  • Custom terminology conversion
  • Performance, quality, and usage metrics including BLEU, BERT, METEOR and, CHRF

The following diagram illustrates the translation playground architecture. The numbers are color-coded to represent two flows: the translation memory ingestion flow (orange) and the text translation flow (gray). The solution offers two TM retrieval modes for users to choose from: vector and document search. This is covered in detail later in the post.

Streamlit Application Architecture

The TM ingestion flow (orange) consists of the following steps:

  1. The user uploads a TMX file to the playground UI.
  2. Depending on which retrieval mode is being used, the appropriate adapter is invoked.
  3. When using the Amazon OpenSearch Service adapter (document search), translation unit groupings are parsed and stored into an index dedicated to the uploaded file. When using the FAISS adapter (vector search), translation unit groupings are parsed and turned into vectors using the selected embedding model from Amazon Bedrock.
  4. When using the FAISS adapter, translation units are stored into a local FAISS index along with the metadata.

The text translation flow (gray) consists of the following steps:

  1. The user enters the text they want to translate along with source and target language.
  2. The request is sent to the prompt generator.
  3. The prompt generator invokes the appropriate knowledge base according to the selected mode.
  4. The prompt generator receives the relevant translation units.
  5. Amazon Bedrock is invoked using the generated prompt as input along with customization parameters.

The translation playground could be adapted into a scalable serverless solution as represented by the following diagram using AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon API Gateway.

Serverless Solution Architecture Diagram

Strategy for TM knowledge base

The LLM translation playground offers two options to incorporate the translation memory into the prompt. Each option is available through its own page within the application:

  • Vector store using FAISS – In this mode, the application processes the .tmx file the user uploaded, indexes it, and stores it locally into a vector store (FAISS).
  • Document store using Amazon OpenSearch Serverless – Only standard document search using Amazon OpenSearch Serverless is supported. To test vector search, use the vector store option (using FAISS).

In vector store mode, the translation segments are processed as follows:

  1. Embed the source segment.
  2. Extract metadata:
    • Segment language
    • System generated <tu> segment unique identifier
  3. Store source segment vectors along with metadata and the segment itself in plain text as a document

The translation customization section allows you to select the embedding model. You can choose either Amazon Titan Embeddings Text V2 or Cohere Embed Multilingual v3. Amazon Titan Text Embeddings V2 includes multilingual support for over 100 languages in pre-training. Cohere Embed supports 108 languages.

In document store mode, the language segments are not embedded and are stored following a flat structure. Two metadata attributes are maintained across the documents:

  • Segment Language
  • System generated <tu> segment unique identifier

Translation Memory Chunking

Prompt engineering

The application uses prompt engineering techniques to incorporate several types of inputs for the inference. The following sample XML illustrates the prompt’s template structure:

<prompt>
<system_prompt>…</system_prompt>
<source_language>EN</source_language>
<target_language>FR</target_language>
<translation_memory_pairs>
<source_language>…</source_language>
<target_language>…</target_language>
</translation_memory_pairs>
<custom_terminology_pairs>
<source_language>…</source_language>
<target_language>…</target_language>
</custom_terminology_pairs ><user_prompt>…</user_prompt>
</prompt>

Prerequisites

The project code uses the Python version of the AWS Cloud Development Kit (AWS CDK). To run the project code, make sure that you have fulfilled the AWS CDK prerequisites for Python.

The project also requires that the AWS account is bootstrapped to allow the deployment of the AWS CDK stack.

Install the UI

To deploy the solution, first install the UI (Streamlit application):

  1. Clone the GitHub repository using the following command:
git clone https://github.com/aws-samples/llm-translation-playground.git
  1. Navigate to the deployment directory:
cd llm-translation-playground
  1. Install and activate a Python virtual environment:
python3 -m venv .venv
source .venv/bin/activate
  1. Install Python libraries:
python -m pip install -r requirements.txt

Deploy the AWS CDK stack

Complete the following steps to deploy the AWS CDK stack:

  1. Move into the deployment folder:
cd deployment/cdk
  1. Configure the AWS CDK context parameters file context.json. For collection_name, use the OpenSearch Serverless collection name. For example:

"collection_name": "search-subtitles"

  1. Deploy the AWS CDK stack:
cdk deploy
  1. Validate successful deployment by reviewing the OpsServerlessSearchStack stack on the AWS CloudFormation The status should read CREATE_COMPLETE.
  2. On the Outputs tab, make note of the OpenSearchEndpoint attribute value.

Cloudformation Stack Output

Configure the solution

The stack creates an AWS Identity and Access Management (IAM) role with the right level of permission needed to run the application. The LLM translation playground assumes this role automatically on your behalf. To achieve this, modify the role or principal under which you are planning to run the application so you are allowed to assume the newly created role. You can use the pre-created policy and attach it to your role. The policy Amazon Resource Name (ARN) can be retrieved as a stack output under the key LLMTranslationPlaygroundAppRoleAssumePolicyArn, as illustrated in the preceding screenshot. You can do so from the IAM console after selecting your role and choosing Add permissions. If you prefer to use the AWS Command Line Interface (AWS CLI), refer to the following sample command line:

aws iam attach-role-policy --role-name &lt;role-name&gt;  --policy-arn &lt;policy-arn&gt;

Finally, configure the .env file in the utils folder as follows:

  • APP_ROLE_ARN – The ARN of the role created by the stack (stack output LLMTranslationPlaygroundAppRoleArn)
  • HOST – OpenSearch Serverless collection endpoint (without https)
  • REGION – AWS Region the collection was deployed into
  • INGESTION_LIMIT – Maximum amount of translation units (<tu> tags) indexed per TMX file you upload

Run the solution

To start the translation playground, run the following commands:

cd llm-translation-playground/source
streamlit run LLM_Translation_Home.py

Your default browser should open a new tab or window displaying the Home page.

LLM Translation Playground Home

Simple test case

Let’s run a simple translation test using the phrase mentioned earlier: “Did you perform well?”

Because we’re not using a knowledge base for this test case, we can use either a vector store or document store. For this post, we use a document store.

  1. Choose With Document Store.
  2. For Source Text, enter the text to be translated.
  3. Choose your source and target languages (for this post, English and French, respectively).
  4. You can experiment with other parameters, such as model, maximum tokens, temperature, and top-p.
  5. Choose Translate.

Translation Configuration Page

The translated text appears in the bottom section. For this example, the translated text, although accurate, is close to a literal translation, which is not a common phrasing in French.

English-French Translation Test 1

  1. We can rerun the same test after slightly modifying the initial text: “Did you perform well at the soccer tournament?”

We’re now introducing some situational context in the input. The translated text should be different and closer to a more natural translation. The new output literally means “Did you play well at the soccer tournament?”, which is consistent with the initial intent of the question.

English-French Translation Test 2

Also note the completion metrics on the left pane, displaying latency, input/output tokens, and quality scores.

This example highlights the ability of LLMs to naturally adapt the translation to the context.

Adding translation memory

Let’s test the impact of using a translation memory TMX file on the translation quality.

  1. Copy the text contained within test/source_text.txt and paste into the Source text
  2. Choose French as the target language and run the translation.
  3. Copy the text contained within test/target_text.txt and paste into the reference translation field.

Translation Memory Configuration

  1. Choose Evaluate and notice the quality scores on the left.
  2. In the Translation Customization section, choose Browse files and choose the file test/subtitles_memory.tmx.

This will index the translation memory into the OpenSearch Service collection previously created. The indexing process can take a few minutes.

  1. When the indexing is complete, select the created index from the index dropdown.
  2. Rerun the translation.

You should see a noticeable increase in the quality score. For instance, we’ve seen up to 20 percentage points improvement in BLEU score with the preceding test case. Using prompt engineering, we were able to steer the model’s output by providing sample phrases directly pulled from the TMX file. Feel free to explore the generated prompt for more details on how the translation pairs were introduced.

You can replicate a similar test case with Amazon Translate by launching an asynchronous job customized using parallel data.

Prompt Engineering

Here we took a simplistic retrieval approach, which consists of loading all of the samples as part of the same TMX file, matching the source and target language. You can enhance this technique by using metadata-driven filtering to collect the relevant pairs according to the source text. For example, you can classify the documents by theme or business domain, and use category tags to select language pairs relevant to the text and desired output.

Semantic similarity for translation memory selection

In vector store mode, the application allows you to upload a TMX and create a local index that uses semantic similarity to select the translation memory segments. First, we retrieve the segment with the highest similarity score based on the text to be translated and the source language. Then we retrieve the corresponding segment matching the target language and parent translation unit ID.

To try it out, upload the file in the same way as shown earlier. Depending on the size of the file, this can take a few minutes. There is a maximum limit of 200 MB. You can use the sample file as in the previous example or one of the other samples provided in the code repository.

This approach differs from the static index search because it’s assumed that the source text is semantically close to segments representative enough of the expected style and tone.

TMX File Upload Widget

Adding custom terminology

Custom terminology allows you to make sure that your brand names, character names, model names, and other unique content get translated to the desired result. Given that LLMs are pre-trained on massive amounts of data, they can likely already identify unique names and render them accurately in the output. If there are names for which you want to enforce a strict and literal translation, you can try the custom terminology feature of this translate playground. Simply provide the source and target language pairs separated by semicolon in the Translation Customization section. For instance, if you want to keep the phrase “Gen AI” untranslated regardless of the language, you can configure the custom terminology as illustrated in the following screenshot.

Custom Terminology

Clean up

To delete the stack, navigate to the deployment folder and run:cdk destroy.

Further considerations

Using existing TMX files with generative AI-based translation systems can potentially improve the quality and consistency of translations. The following are some steps to use TMX files for generative AI translations:

  • TMX data pipeline – TMX files contain structured translation units, but the format might need to be preprocessed to extract the source and target text segments in a format that can be consumed by the generative AI model. This involves extract, transform, and load (ETL) pipelines able to parse the XML structure, handle encoding issues, and add metadata.
  • Incorporate quality estimation and human review – Although generative AI models can produce high-quality translations, it is recommended to incorporate quality estimation techniques and human review processes. You can use automated quality estimation models to flag potentially low-quality translations, which can then be reviewed and corrected by human translators.
  • Iterate and refine – Translation projects often involve iterative cycles of translation, review, and improvement. You can periodically retrain or fine-tune the generative AI model with the updated TMX file, creating a virtuous cycle of continuous improvement.

Conclusion

The LLM translation playground presented in this post enables you evaluate the use of LLMs for your machine translation needs. The key features of this solution include:

  • Ability to use translation memory – The solution allows you to integrate your existing TM data, stored in the industry-standard TMX format, directly into the LLM translation process. This helps improve the accuracy and consistency of the translations by using high-quality human-translated content.
  • Prompt engineering capabilities – The solution showcases the power of prompt engineering, demonstrating how LLMs can be steered to produce more natural and contextual translations by carefully crafting the input prompts. This includes the ability to incorporate custom terminology and domain-specific knowledge.
  • Evaluation metrics – The solution includes standard translation quality evaluation metrics, such as BLEU, BERT Score, METEOR, and CHRF, to help you assess the quality and effectiveness of the LLM-powered translations compared to their your existing machine translation workflows.

As the industry continues to explore the use of LLMs, this solution can help you gain valuable insights and data to determine if LLMs can become a viable and valuable option for your content translation and localization workloads.

To dive deeper into the fast-moving field of LLM-based machine translation on AWS, check out the following resources:


About the Authors

Narcisse Zekpa is a Sr. Solutions Architect based in Boston. He helps customers in the Northeast U.S. accelerate their business transformation through innovative, and scalable solutions, on the AWS Cloud. He is passionate about enabling organizations to transform transform their business, using advanced analytics and AI. When Narcisse is not building, he enjoys spending time with his family, traveling, running, cooking and playing basketball.

Ajeeb Peter is a Principal Solutions Architect with Amazon Web Services based in Charlotte, North Carolina, where he guides global financial services customers to build highly secure, scalable, reliable, and cost-efficient applications on the cloud. He brings over 20 years of technology experience on Software Development, Architecture and Analytics from industries like finance and telecom

Read More

Parameta accelerates client email resolution with Amazon Bedrock Flows

Parameta accelerates client email resolution with Amazon Bedrock Flows

This blog post is co-written with Siokhan Kouassi and Martin Gregory at Parameta. 

When financial industry professionals need reliable over-the-counter (OTC) data solutions and advanced analytics, they can turn to Parameta Solutions, the data powerhouse behind TP ICAP . With a focus on data-led solutions, Parameta Solutions makes sure that these professionals have the insights they need to make informed decisions. Managing thousands of client service requests efficiently while maintaining accuracy is crucial for Parameta’s reputation as a trusted data provider. Through a simple yet effective application of Amazon Bedrock Flows, Parameta transformed their client service operations from a manual, time-consuming process into a streamlined workflow in just two weeks.

Parameta empowers clients with comprehensive industry insights, from price discovery to risk management, and pre- to post-trade analytics. Their services are fundamental to clients navigating the complexities of OTC transactions and workflow effectively. Accurate and timely responses to technical support queries are essential for maintaining service quality.

However, Parameta’s support team faced a common challenge in the financial services industry: managing an increasing volume of email-based client requests efficiently. The traditional process involved multiple manual steps—reading emails, understanding technical issues, gathering relevant data, determining the correct routing path, and verifying information in databases. This labor-intensive approach not only consumed valuable time, but also introduced risks of human error that could potentially impact client trust.

Recognizing the need for modernization, Parameta sought a solution that could maintain their high standards of service while significantly reducing resolution times. The answer lay in using generative AI through Amazon Bedrock Flows, enabling them to build an automated, intelligent request handling system that would transform their client service operations. Amazon Bedrock Flows provide a powerful, low-code solution for creating complex generative AI workflows with an intuitive visual interface and with a set of APIs in the Amazon Bedrock SDK. By seamlessly integrating foundation models (FMs), prompts, agents, and knowledge bases, organizations can rapidly develop flexible, efficient AI-driven processes tailored to their specific business needs.

In this post, we show you how Parameta used Amazon Bedrock Flows to transform their manual client email processing into an automated, intelligent workflow that reduced resolution times from weeks to days while maintaining high accuracy and operational control.

Client email triage

For Parameta, every client email represents a critical touchpoint that demands both speed and accuracy. The challenge of email triage extends beyond simple categorization—it requires understanding technical queries, extracting precise information, and providing contextually appropriate responses.

The email triage workflow involves multiple critical steps:

  • Accurately classifying incoming technical support emails
  • Extracting relevant entities like data products or time periods
  • Validating if all required information is present for the query type
  • Consulting internal knowledge bases and databases for context
  • Generating either complete responses or specific requests for additional information

The manual handling of this process led to time-consuming back-and-forth communications, the risk of overlooking critical details, and inconsistent response quality. With that in mind, Parameta identified this as an opportunity to develop an intelligent system that could automate this entire workflow while maintaining their high standard of accuracy and professionalism.

Path to the solution

When evaluating solutions for email triage automation, several approaches appeared viable, each with its own pros and cons. However, not all of them were effective for Parameta.

Traditional NLP pipelines and ML classification models

Traditional natural language processing pipelines struggle with email complexity due to their reliance on rigid rules and poor handling of language variations, making them impractical for dynamic client communications. The inconsistency in email structures and terminology, which varies significantly between clients, further complicates their effectiveness. These systems depend on predefined patterns, which are difficult to maintain and adapt when faced with such diverse inputs, leading to inefficiencies and brittleness in handling real-world communication scenarios. Machine learning (ML) classification models offer improved categorization, but introduce complexity by requiring separate, specialized models for classification, entity extraction, and response generation, each with its own training data and contextual limitations.

Deterministic LLM-based workflows

Parameta’s solution demanded more than just raw large language model (LLM) capabilities—it required a structured approach while maintaining operational control. Amazon Bedrock Flows provided this critical balance through the following capabilities:

  • Orchestrated prompt chaining – Multiple specialized prompts work together in a deterministic sequence, each optimized for specific tasks like classification, entity extraction, or response generation.
  • Multi-conditional workflows – Support for complex business logic with the ability to branch flows based on validation results or extracted information completeness.
  • Version management – Simple switching between different prompt versions while maintaining workflow integrity, enabling rapid iteration without disrupting the production pipeline.
  • Component integration – Seamless incorporation of other generative AI capabilities like Amazon Bedrock Agents or Amazon Bedrock Knowledge Bases, creating a comprehensive solution.
  • Experimentation framework – The ability to test and compare different prompt variations while maintaining version control. This is crucial for optimizing the email triage process.
  • Rapid iteration and tight feedback loop – The system allows for quick testing of new prompts and immediate feedback, facilitating continuous improvement and adaptation.

This structured approach to generative AI through Amazon Bedrock Flows enabled Parameta to build a reliable, production-grade email triage system that maintains both flexibility and control.

Solution overview

Parameta’s solution demonstrates how Amazon Bedrock Flows can transform complex email processing into a structured, intelligent workflow. The architecture comprises three key components, as shown in the following diagram: orchestration, structured data extraction, and intelligent response generation.

Orchestration

Amazon Bedrock Flows serves as the central orchestrator, managing the entire email processing pipeline. When a client email arrives through Microsoft Teams, the workflow invokes the following stages:

  •  The workflow initiates through Amazon API Gateway, taking the email and using an AWS Lambda function to extract the text contained in the email and store it in Amazon Simple Storage Service (Amazon S3).
  • Amazon Bedrock Flows coordinates the sequence of operations, starting with the email from Amazon S3.
  • Version management streamlines controlled testing of prompt variations.
  • Built-in conditional logic handles different processing paths.

Structured data extraction

A sequence of specialized prompts within the flow handles the critical task of information processing:

  • The classification prompt identifies the type of technical inquiry
  • The entity extraction prompt discovers key data points
  • The validation prompt verifies completeness of required information

These prompts work in concert to transform unstructured emails into actionable data, with each prompt optimized for its specific task.

Intelligent response generation

The final stage uses advanced AI capabilities for response creation:

  • An Amazon Bedrock agent synthesizes information from multiple sources:
  • Response generation adapts based on validation results:
    • Specific information requests for incomplete queries
    • Comprehensive solutions for complete inquiries
  • Delivery back to clients using Microsoft Teams

The following diagram illustrates the flow for the email triaging system.

This structured approach allows Parameta to maintain consistent, high-quality responses while significantly reducing processing time for client inquiries.

Solution walkthrough

Let’s walk through how Parameta’s email triage system processes a typical client inquiry. We start with the following sample client email:

Dear Support Team,

Could you please verify the closing price for the Dollar ATM swaption (USD_2Y_1Y) as of March 15, 2024? We need this for our end-of-day reconciliation.

Best regards,

John Smith

Portfolio Manager, ABC Investments

The classification prompt classifies this as a price verification request based on the content and intent. It uses the email as the input, and the output is type: price_verification_request.

The entity extraction prompt uses the preceding email, and provides the following output:

{
"product_type": "Interest Rate Option",
"ticker": "USD_2Y_1Y",
"date_requested": "2024-03-15",
"data_source": "ICAP",
"request_type": "closing_price"
}

The workflow then performs validation using Amazon Bedrock Flows. This requires the following checks for price verification:

  • Product identifier ( USD_2Y_1Y present)
  • Date (March 15, 2024 present)
  • Price type (closing specified)

When all required entities are found, the workflow proceeds to the Amazon Bedrock agent.

The agent submits the following query to the knowledge base: “Product specifications and market context for Interest Rate option USD_2Y_1Y.”

The agent submits the following Snowflake query (using Athena):

SELECTclosing_price,settlement_date,currency,market_status
FROM otc_derivatives.daily_prices
WHERE instrument_id ='USD_2Y_1Y'
AND trade_date = '2024-03-15'

The generated response is as follows:

Dear John,

Regarding your price verification request for USD_2Y_1Y (USD ATM Swaption Implied Volatility Mid 2Y vs 1Y) as of March 15, 2024:
The closing price has been verified in our records. The market was active on the requested date with normal trading volumes.

Please note this information is provided under your existing data service agreement.

If you need any further clarification, please don’t hesitate to ask.

Best regards,

Parameta Support

Benefits

Parameta quickly transitioned from implementation to achieving impactful results, thanks to the substantial benefits provided by Amazon Bedrock Flows across various areas:

  • Operational efficiency
    • Development teams accelerated prompt optimization by quickly testing different variations for email classification and entity extraction
    • Time-to-insight reduced from weeks to days through rapid prompt iteration and immediate feedback on performance
    • Quick adjustments to validation rules without rebuilding the entire workflow
  • Team collaboration
    • Modification of prompts through a simplified interface without deep AWS knowledge
    • Support teams gained the ability to understand and adjust the response process
    • Cross-functional teams collaborated on prompt improvements using familiar interfaces
  • Model transparency
    • Clear visibility into why emails were classified into specific categories
    • Understanding of entity extraction decisions helped refine prompts for better accuracy
    • Ability to trace decisions through the workflow enhanced trust in automated responses
  • Observability and governance
    • Comprehensive observability provided stakeholders with a holistic view of the end-to-end process
    • Built-in controls provided appropriate oversight of the automated system, aligning with governance and compliance requirements
    • Transparent workflows enabled stakeholders to monitor, audit, and refine the system effectively, providing accountability and reliability

These benefits directly translated to Parameta’s business objectives: faster response times to client queries, more accurate classifications, and improved ability to maintain and enhance the system across teams. The structured yet flexible nature of Amazon Bedrock Flows enabled Parameta to achieve these gains while maintaining control over their critical client communications.

Key takeaways and best practices

When implementing Amazon Bedrock Flows, consider these essential learnings:

  • Prompt design principles
    • Design modular prompts that handle specific tasks for better maintainability of the system
    • Keep prompts focused and concise to optimize token usage
    • Include clear input and output specifications for better maintainability and robustness
    • Diversify model selection for different tasks within the flow:
      • Use lighter models for simple classifications
      • Reserve advanced models for complex reasoning
      • Create resilience through model redundancy
  • Flow architecture
    • Start with a clear validation strategy early in the flow
    • Include error handling in prompt design
    • Consider breaking complex flows into smaller, manageable segments
  • Version management
    • Implement proper continuous deployment and delivery (CI/CD) pipelines for flow deployment
    • Establish approval workflows for flow changes
    • Document flow changes and their impact including metrics
  • Testing and implementation
    • Create comprehensive test cases covering a diverse set of scenarios
    • Validate flow behavior with sample datasets
    • Constantly monitor flow performance and token usage in production
    • Start with smaller workflows and scale gradually
  • Cost optimization
    • Review and optimize prompt lengths regularly
    • Monitor token usage patterns
    • Balance between model capability and cost when selecting models

Consider these practices derived from real-world implementation experience to help successfully deploy Amazon Bedrock Flows while maintaining efficiency and reliability.

Testimonials

“As the CIO of our company, I am thoroughly impressed by how rapidly our team was able to leverage Amazon Bedrock Flows to create an innovative solution to a complex business problem. The low barrier to entry of Amazon Bedrock Flows allowed our team to quickly get up to speed and start delivering results. This tool is democratizing generative AI, making it easier for everyone in the business to get hands-on with Amazon Bedrock, regardless of their technical skill level. I can see this tool being incredibly useful across multiple parts of our business, enabling seamless integration and efficient problem-solving.”

– Roland Anderson, CIO at Parameta Solutions

“As someone with a tech background, using Amazon Bedrock Flows for the first time was a great experience. I found it incredibly intuitive and user-friendly. The ability to refine prompts based on feedback made the process seamless and efficient. What impressed me the most was how quickly I could get started without needing to invest time in creating code or setting up infrastructure. The power of generative AI applied to business problems is truly transformative, and Amazon Bedrock has made it accessible for tech professionals like myself to drive innovation and solve complex challenges with ease.”

– Martin Gregory, Market Data Support Engineer, Team Lead at Parameta Solutions

Conclusion

In this post, we showed how Parameta uses Amazon Bedrock Flows to build an intelligent client email processing workflow that reduces resolution times from days to minutes while maintaining high accuracy and control. As organizations increasingly adopt generative AI, Amazon Bedrock Flows offers a balanced approach, combining the flexibility of LLMs with the structure and control that enterprises require.

For more information, refer to Build an end-to-end generative AI workflow with Amazon Bedrock Flows. For code samples, see Run Amazon Bedrock Flows code samples. Visit the Amazon Bedrock console to start building your first flow, and explore our AWS Blog for more customer success stories and implementation patterns.


About the Authors

Siokhan Kouassi is a Data Scientist at Parameta Solutions with expertise in statistical machine learning, deep learning, and generative AI. His work is focused on the implementation of efficient ETL data analytics pipelines, and solving business problems via automation, experimenting and innovating using AWS services with a code-first approach using AWS CDK.

Martin Gregory is a Senior Market Data Technician at Parameta Solutions with over 25 years of experience. He has recently played a key role in transitioning Market Data systems to the cloud, leveraging his deep expertise to deliver seamless, efficient, and innovative solutions for clients.

Talha Chattha is a Senior Generative AI Specialist SA at AWS, based in Stockholm. With 10+ years of experience working with AI, Talha now helps establish practices to ease the path to production for Gen AI workloads. Talha is an expert in Amazon Bedrock and supports customers across entire EMEA. He holds passion about meta-agents, scalable on-demand inference, advanced RAG solutions and optimized prompt engineering with LLMs. When not shaping the future of AI, he explores the scenic European landscapes and delicious cuisines.

Jumana Nagaria is a Prototyping Architect at AWS, based in London. She builds innovative prototypes with customers to solve their business challenges. She is passionate about cloud computing and believes in giving back to the community by inspiring women to join tech and encouraging young girls to explore STEM fields. Outside of work, Jumana enjoys travelling, reading, painting, and spending quality time with friends and family.

Hin Yee Liu is a prototype Engagement Manager at AWS, based in London. She helps AWS customers to bring their big ideas to life and accelerate the adoption of emerging technologies. Hin Yee works closely with customer stakeholders to identify, shape and deliver impactful use cases leveraging Generative AI, AI/ML, Big Data, and Serverless technologies using agile methodologies. In her free time, she enjoys knitting, travelling and strength training.

Read More

Why World Foundation Models Will Be Key to Advancing Physical AI

Why World Foundation Models Will Be Key to Advancing Physical AI

In the fast-evolving landscape of AI, it’s becoming increasingly important to develop models that can accurately simulate and predict outcomes in physical, real-world environments to enable the next generation of physical AI systems.

Ming-Yu Liu, vice president of research at NVIDIA and an IEEE Fellow, joined the NVIDIA AI Podcast to discuss the significance of world foundation models (WFM) — powerful neural networks that can simulate physical environments. WFMs can generate detailed videos from text or image input data and predict how a scene evolves by combining its current state (image or video) with actions (such as prompts or control signals).

“World foundation models are important to physical AI developers,” said Liu. “They can imagine many different environments and can simulate the future, so we can make good decisions based on this simulation.”

This is particularly valuable for physical AI systems, such as robots and self-driving cars, which must interact safely and efficiently with the real world.

Why Are World Foundation Models Important?

Building world models often requires vast amounts of data, which can be difficult and expensive to collect. WFMs can generate synthetic data, providing a rich, varied dataset that enhances the training process.

In addition, training and testing physical AI systems in the real world can be resource-intensive. WFMs provide virtual, 3D environments where developers can simulate and test these systems in a controlled setting without the risks and costs associated with real-world trials.

Open Access to World Foundation Models

At the CES trade show, NVIDIA announced NVIDIA Cosmos, a platform of generative WFMs that accelerate the development of physical AI systems such as robots and self-driving cars.

The platform is designed to be open and accessible, and includes pretrained WFMs based on diffusion and auto-regressive architectures, along with tokenizers that can compress videos into tokens for transformer models.

Liu explained that with these open models, enterprises and developers have all the ingredients they need to build large-scale models. The open platform also provides teams with the flexibility to explore various options for training and fine-tuning models, or build their own based on specific needs.

Enhancing AI Workflows Across Industries

WFMs are expected to enhance AI workflows and development in various industries. Liu sees particularly significant impacts in two areas:

“The self-driving car industry and the humanoid [robot] industry will benefit a lot from world model development,” said Liu. “[WFMs] can simulate different environments that will be difficult to have in the real world, to make sure the agent behaves respectively.”

For self-driving cars, these models can simulate environments that allow for comprehensive testing and optimization. For example, a self-driving car can be tested in various simulated weather conditions and traffic scenarios to help ensure it performs safely and efficiently before deployment on roads.

In robotics, WFMs can simulate and verify the behavior of robotic systems in different environments to make sure they perform tasks safely and efficiently before deployment.

NVIDIA is collaborating with companies like 1X, Huobi and XPENG to help address challenges in physical AI development and advance their systems.

“We are still in the infancy of world foundation model development — it’s useful, but we need to make it more useful,” Liu said. “We also need to study how to best integrate these world models into the physical AI systems in a way that can really benefit them.”

Listen to the podcast with Ming-Yu Liu, or read the transcript.

Learn more about NVIDIA Cosmos and the latest announcements in generative AI and robotics by watching the CES opening keynote by NVIDIA founder and CEO Jensen Huang, as well as joining NVIDIA sessions at the show.

Read More