Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

Enhance speech synthesis and video generation models with RLHF using audio and video segmentation in Amazon SageMaker

As generative AI models advance in creating multimedia content, the difference between good and great output often lies in the details that only human feedback can capture. Audio and video segmentation provides a structured way to gather this detailed feedback, allowing models to learn through reinforcement learning from human feedback (RLHF) and supervised fine-tuning (SFT). Annotators can precisely mark and evaluate specific moments in audio or video content, helping models understand what makes content feel authentic to human viewers and listeners.

Take, for instance, text-to-video generation, where models need to learn not just what to generate but how to maintain consistency and natural flow across time. When creating a scene of a person performing a sequence of actions, factors like the timing of movements, visual consistency, and smoothness of transitions contribute to the quality. Through precise segmentation and annotation, human annotators can provide detailed feedback on each of these aspects, helping models learn what makes a generated video sequence feel natural rather than artificial. Similarly, in text-to-speech applications, understanding the subtle nuances of human speech—from the length of pauses between phrases to changes in emotional tone—requires detailed human feedback at a segment level. This granular input helps models learn how to produce speech that sounds natural, with appropriate pacing and emotional consistency. As large language models (LLMs) increasingly integrate more multimedia capabilities, human feedback becomes even more critical in training them to generate rich, multi-modal content that aligns with human quality standards.

The path to creating effective AI models for audio and video generation presents several distinct challenges. Annotators need to identify precise moments where generated content matches or deviates from natural human expectations. For speech generation, this means marking exact points where intonation changes, where pauses feel unnatural, or where emotional tone shifts unexpectedly. In video generation, annotators must pinpoint frames where motion becomes jerky, where object consistency breaks, or where lighting changes appear artificial. Traditional annotation tools, with basic playback and marking capabilities, often fall short in capturing these nuanced details.

Amazon SageMaker Ground Truth enables RLHF by allowing teams to integrate detailed human feedback directly into model training. Through custom human annotation workflows, organizations can equip annotators with tools for high-precision segmentation. This setup enables the model to learn from human-labeled data, refining its ability to produce content that aligns with natural human expectations.

In this post, we show you how to implement an audio and video segmentation solution in the accompanying GitHub repository using SageMaker Ground Truth. We guide you through deploying the necessary infrastructure using AWS CloudFormation, creating an internal labeling workforce, and setting up your first labeling job. We demonstrate how to use Wavesurfer.js for precise audio visualization and segmentation, configure both segment-level and full-content annotations, and build the interface for your specific needs. We cover both console-based and programmatic approaches to creating labeling jobs, and provide guidance on extending the solution with your own annotation needs. By the end of this post, you will have a fully functional audio/video segmentation workflow that you can adapt for various use cases, from training speech synthesis models to improving video generation capabilities.

Feature Overview

The integration of Wavesurfer.js in our UI provides a detailed waveform visualization where annotators can instantly see patterns in speech, silence, and audio intensity. For instance, when working on speech synthesis, annotators can visually identify unnatural gaps between words or abrupt changes in volume that might make generated speech sound robotic. The ability to zoom into these waveform patterns means they can work with millisecond precision—marking exactly where a pause is too long or where an emotional transition happens too abruptly.

In this snapshot of audio segmentation, we are capturing a customer-representative conversation, annotating speaker segments, emotions, and transcribing the dialogue. The UI allows for playback speed adjustment and zoom functionality for precise audio analysis.

The multi-track feature lets annotators create separate tracks for evaluating different aspects of the content. In a text-to-speech task, one track might focus on pronunciation accuracy, another on emotional consistency, and a third on natural pacing. For video generation tasks, annotators can mark segments where motion flows naturally, where object consistency is maintained, and where scene transitions work well. They can adjust playback speed to catch subtle details, and the visual timeline for precise start and end points for each marked segment.

In this snapshot of video segmentation, we’re annotating a scene with dogs, tracking individual animals, their colors, emotions, and gaits. The UI also enables overall video quality assessment, scene change detection, and object presence classification.

Annotation process

Annotators begin by choosing Add New Track and selecting appropriate categories and tags for their annotation task. After you create the track, you can choose Begin Recording at the point where you want to start a segment. As the content plays, you can monitor the audio waveform or video frames until you reach the desired end point, then choose Stop Recording. The newly created segment appears in the right pane, where you can add classifications, transcriptions, or other relevant labels. This process can be repeated for as many segments as needed, with the ability to adjust segment boundaries, delete incorrect segments, or create new tracks for different annotation purposes.

Importance of high-quality data and reducing labeling errors

High-quality data is essential for training generative AI models that can produce natural, human-like audio and video content. The performance of these models depends directly on the accuracy and detail of human feedback, which stems from the precision and completeness of the annotation process. For audio and video content, this means capturing not just what sounds or looks unnatural, but exactly when and how these issues occur.

Our purpose built UI in SageMaker Ground Truth addresses common challenges in audio and video annotation that often lead to inconsistent or imprecise feedback. When annotators work with long audio or video files, they need to mark precise moments where generated content deviates from natural human expectations. For example, in speech generation, an unnatural pause might last only a fraction of a second, but its impact on perceived quality is significant. The tool’s zoom functionality allows annotators to expand these brief moments across their screen, making it possible to mark the exact start and end points of these subtle issues. This precision helps models learn the fine details that separate natural from artificial-sounding speech.

Solution overview

This audio/video segmentation solution combines several AWS services to create a robust annotation workflow. At its core, Amazon Simple Storage Service (Amazon S3) serves as the secure storage for input files, manifest files, annotation outputs, and the web UI components. SageMaker Ground Truth provides annotators with a web portal to access their labeling jobs and manages the overall annotation workflow. The following diagram illustrates the solution architecture.

The UI template, which includes our specialized audio/video segmentation interface built with Wavesurfer.js, requires specific JavaScript and CSS files. These files are hosted through Amazon CloudFront distribution, providing reliable and efficient delivery to annotators’ browsers. By using CloudFront with an origin access identity and appropriate bucket policies, we allow the UI components to be served to annotators. This setup follows AWS best practices for least-privilege access, making sure CloudFront can only access the specific UI files needed for the annotation interface.

Pre-annotation and post-annotation AWS Lambda functions are optional components that can enhance the workflow. The pre-annotation Lambda function can process the input manifest file before data is presented to annotators, enabling any necessary formatting or modifications. Similarly, the post-annotation Lambda function can transform the annotation outputs into specific formats required for model training. These functions provide flexibility to adapt the workflow to specific needs without requiring changes to the core annotation process.

The solution uses AWS Identity and Access Management (IAM) roles to manage permissions:

  • A SageMaker Ground Truth IAM role enables access to Amazon S3 for reading input files and writing annotation outputs
  • If used, Lambda function roles provide the necessary permissions for preprocessing and postprocessing tasks

Let’s walk through the process of setting up your annotation workflow. We start with a simple scenario: you have an audio file stored in Amazon S3, along with some metadata like a call ID and its transcription. By the end of this walkthrough, you will have a fully functional annotation system where your team can segment and classify this audio content.

Prerequisites

For this walkthrough, make sure you have the following:

Create your internal workforce

Before we dive into the technical setup, let’s create a private workforce in SageMaker Ground Truth. This allows you to test the annotation workflow with your internal team before scaling to a larger operation.

  1. On the SageMaker console, choose Labeling workforces.
  2. Choose Private for the workforce type and create a new private team.
  3. Add team members using their email addresses—they will receive instructions to set up their accounts.

Deploy the infrastructure

Although this demonstrates using a CloudFormation template for quick deployment, you can also set up the components manually. The assets (JavaScript and CSS files) are available in our GitHub repository. Complete the following steps for manual deployment:

  1. Download these assets directly from the GitHub repository.
  2. Host them in your own S3 bucket.
  3. Set up your own CloudFront distribution to serve these files.
  4. Configure the necessary permissions and CORS settings.

This manual approach gives you more control over infrastructure setup and might be preferred if you have existing CloudFront distributions or a need to customize security controls and assets.

The rest of this post will focus on the CloudFormation deployment approach, but the labeling job configuration steps remain the same regardless of how you choose to host the UI assets.

Launch Button

This CloudFormation template creates and configures the following AWS resources:

  • S3 bucket for UI components:
    • Stores the UI JavaScript and CSS files
    • Configured with CORS settings required for SageMaker Ground Truth
    • Accessible only through CloudFront, not directly public
    • Permissions are set using a bucket policy that grants read access only to the CloudFront Origin Access Identity (OAI)
  • CloudFront distribution:
    • Provides secure and efficient delivery of UI components
    • Uses an OAI to securely access the S3 bucket
    • Is configured with appropriate cache settings for optimal performance
    • Access logging is enabled, with logs being stored in a dedicated S3 bucket
  • S3 bucket for CloudFront logs:
    • Stores access logs generated by CloudFront
    • Is configured with the required bucket policies and ACLs to allow CloudFront to write logs
    • Object ownership is set to ObjectWriter to enable ACL usage for CloudFront logging
    • Lifecycle configuration is set to automatically delete logs older than 90 days to manage storage
  • Lambda function:
    • Downloads UI files from our GitHub repository
    • Stores them in the S3 bucket for UI components
    • Runs only during initial setup and uses least privilege permissions
    • Permissions include Amazon CloudWatch Logs for monitoring and specific S3 actions (read/write) limited to the created bucket

After the CloudFormation stack deployment is complete, you can find the CloudFront URLs for accessing the JavaScript and CSS files on the AWS CloudFormation console. You need these CloudFront URLs to update your UI template before creating the labeling job. Note these values—you will use them when creating the labeling job.

Prepare your input manifest

Before you create the labeling job, you need to prepare an input manifest file that tells SageMaker Ground Truth what data to present to annotators. The manifest structure is flexible and can be customized based on your needs. For this post, we use a simple structure:

{ 
"source": "s3://YOUR-BUCKET/audio/sample1.mp3", 
"call-id": "call-123", 
"transcription": "Customer: I'm really happy with your smart home security system. However, I have feature request that would make it betternRepresentative: We're always eager to hear from our customers. What feature would you like to see added ? " 
}

You can adapt this structure to include additional metadata that your annotation workflow requires. For example, you might want to add speaker information, timestamps, or other contextual data. The key is making sure your UI template is designed to process and display these attributes appropriately.

Create your labeling job

With the infrastructure deployed, let’s create the labeling job in SageMaker Ground Truth. For full instructions, refer to Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda.

  1. On the SageMaker console, choose Create labeling job.
  2. Give your job a name.
  3. Specify your input data location in Amazon S3.
  4. Specify an output bucket where annotations will be stored.
  5. For the task type, select Custom labeling task.
  6. In the UI template field, locate the placeholder values for the JavaScript and CSS files and update as follows:
    1. Replace audiovideo-wavesufer.js with your CloudFront JavaScript URL from the CloudFormation stack outputs.
    2. Replace audiovideo-stylesheet.css with your CloudFront CSS URL from the CloudFormation stack outputs.
<!-- Custom Javascript and Stylesheet -->
<script src="audiovideo-wavesufer.js"></script>
<link rel="stylesheet" href="audiovideo-stylesheet.css">
  1. Before you launch the job, use the Preview feature to verify your interface.

You should see the Wavesurfer.js interface load correctly with all controls working properly. This preview step is crucial—it confirms that your CloudFront URLs are correctly specified and the interface is properly configured.

Programmatic setup

Alternatively, you can create your labeling job programmatically using the CreateLabelingJob API. This is particularly useful for automation or when you need to create multiple jobs. See the following code:

response = sagemaker.create_labeling_job(
    LabelingJobName="audio-segmentation-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://your-bucket-name/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://your-bucket-name/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/SagemakerExecutionRole",

    # Optionally add PreHumanTaskLambdaArn or AnnotationConsolidationConfig
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-east-1:012345678910:workteam/private-crowd/work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://your-bucket-name/path-to-ui-template"

The API approach offers the same functionality as the SageMaker console, but allows for automation and integration with existing workflows. Whether you choose the SageMaker console or API approach, the result is the same: a fully configured labeling job ready for your annotation team.

Understanding the output

After your annotators complete their work, SageMaker Ground Truth will generate an output manifest in your specified S3 bucket. This manifest contains rich information at two levels:

  • Segment-level classifications – Details about each marked segment, including start and end times and assigned categories
  • Full-content classifications – Overall ratings and classifications for the entire file

Let’s look at a sample output to understand its structure:

{
  "answers": [
    {
      "acceptanceTime": "2024-11-04T18:33:38.658Z",
      "answerContent": {
        "annotations": {
          "categories": {
            "language": [
              "English",
              "Hindi",
              "Spanish",
              "French",
              "German",
              "Dutch"
            ],
            "speaker": [
              "Customer",
              "Representative"
            ]
          },
          "startTimestamp": 1730745219028,
          "startUTCTime": "Mon, 04 Nov 2024 18:33:39 GMT",
          "streams": {
            "language": [
              {
                "id": "English",
                "start": 0,
                "end": 334.808635,
                "text": "Sample text in English",
                "emotion": "happy"
              },
              {
                "id": "Spanish",
                "start": 334.808635,
                "end": 550.348471,
                "text": "Texto de ejemplo en español",
                "emotion": "neutral"
              }
            ]
          },
          "endTimestamp": 1730745269602,
          "endUTCTime": "Mon, 04 Nov 2024 18:34:29 GMT",
          "elapsedTime": 50574
        },
        "backgroundNoise": {
          "ambient": false,
          "music": true,
          "traffic": false
        },
        "emotiontag": "Neutral",
        "environmentalSounds": {
          "birdsChirping": false,
          "doorbell": true,
          "footsteps": false
        },
        "rate": {
          "1": false,
          "2": false,
          "3": false,
          "4": false,
          "5": true
        },
        "textTranslationFinal": "sample text for transcription"
      }
    }
  ]
} 

This two-level annotation structure provides valuable training data for your AI models, capturing both fine-grained details and overall content assessment.

Customizing the solution

Our audio/video segmentation solution is designed to be highly customizable. Let’s walk through how you can adapt the interface to match your specific annotation requirements.

Customize segment-level annotations

The segment-level annotations are controlled in the report() function of the JavaScript code. The following code snippet shows how you can modify the annotation options for each segment:

ranges.forEach(function (r) {
   // ... existing code ...
   
   // Example: Adding a custom dropdown for speaker identification
   var speakerDropdown = $('<select>').attr({
       name: 'speaker',
       class: 'custom-dropdown-width'
   });
   var speakerOptions = ['Speaker A', 'Speaker B', 'Multiple Speakers', 'Background Noise'];
   speakerOptions.forEach(function(option) {
       speakerDropdown.append($('<option>').val(option).text(option));
   });
   
   // Example: Adding a checkbox for quality issues
   var qualityCheck = $('<input>').attr({
       type: 'checkbox',
       name: 'quality_issue'
   });
   var qualityLabel = $('<label>').text('Contains Quality Issues');

   tr.append($('<TD>').append(speakerDropdown));
   tr.append($('<TD>').append(qualityCheck).append(qualityLabel));
   
   // Add event listeners for your new fields
   speakerDropdown.on('change', function() {
       r.speaker = $(this).val();
       updateTrackListData(r);
   });
   
   qualityCheck.on('change', function() {
       r.hasQualityIssues = $(this).is(':checked');
       updateTrackListData(r);
   });
});

You can remove existing fields or add new ones based on your needs. Make sure you’re updating the data model (updateTrackListData function) to handle your custom fields.

Modify full-content classifications

For classifications that apply to the entire audio/video file, you can modify the HTML template. The following code is an example of adding custom classification options:

<div class="row">
    <div class="col-6">
        <p><strong>Audio Quality Assessment:</strong></p>
        <label class="radio">
            <input type="radio" name="audioQuality" value="excellent" style="width: 20px;">
            Excellent
        </label>
        <label class="radio">
            <input type="radio" name="audioQuality" value="good" style="width: 20px;">
            Good
        </label>
        <label class="radio">
            <input type="radio" name="audioQuality" value="poor" style="width: 20px;">
            Poor
        </label>
    </div>
    <div class="col-6">
        <p><strong>Content Type:</strong></p>
        <label class="checkbox">
            <input type="checkbox" name="contentType" value="interview" style="width: 20px;">
            Interview
        </label>
        <label class="checkbox">
            <input type="checkbox" name="contentType" value="presentation" style="width: 20px;">
            Presentation
        </label>
    </div>
</div>

The classifications you add here will be included in your output manifest, allowing you to capture both segment-level and full-content annotations.

Extending Wavesurfer.js functionality

Our solution uses Wavesurfer.js, an open source audio visualization library. Although we’ve implemented core functionality for segmentation and annotation, you can extend this further using Wavesurfer.js’s rich feature set. For example, you might want to:

  • Add spectrogram visualization
  • Implement additional playback controls
  • Enhance zoom functionality
  • Add timeline markers

For these customizations, we recommend consulting the Wavesurfer.js documentation. When implementing additional Wavesurfer.js features, remember to test thoroughly in the SageMaker Ground Truth preview to review compatibility with the labeling workflow.

Wavesurfer.js is distributed under the BSD-3-Clause license. Although we’ve tested the integration thoroughly, modifications you make to the Wavesurfer.js implementation should be tested in your environment. The Wavesurfer.js community provides excellent documentation and support for implementing additional features.

Clean up

To clean up the resources created during this tutorial, follow these steps:

  1. Stop the SageMaker Ground Truth labeling job if it’s still running and you no longer need it. This will halt ongoing labeling tasks and stop additional charges from accruing.
  2. Empty the S3 buckets by deleting all objects within them. S3 buckets must be emptied before they can be deleted, so removing all stored files facilitates a smooth cleanup process.
  3. Delete the CloudFormation stack to remove all the AWS resources provisioned by the template. This action will automatically delete associated services like the S3 buckets, CloudFront distribution, Lambda function, and related IAM roles.

Conclusion

In this post, we walked through implementing an audio and video segmentation solution using SageMaker Ground Truth. We saw how to deploy the necessary infrastructure, configure the annotation interface, and create labeling jobs both through the SageMaker console and programmatically. The solution’s ability to capture precise segment-level annotations along with overall content classifications makes it particularly valuable for generating high-quality training data for generative AI models, whether you’re working on speech synthesis, video generation, or other multimedia AI applications. As you develop your AI models for audio and video generation, remember that the quality of human feedback directly impacts your model’s performance—whether you’re training models to generate more natural-sounding speech, create coherent video sequences, or understand complex audio patterns.

We encourage you to visit our GitHub repository to explore the solution further and adapt it to your specific needs. You can enhance your annotation workflows by customizing the interface, adding new classification categories, or implementing additional Wavesurfer.js features. To learn more about creating custom labeling workflows in SageMaker Ground Truth, visit Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda and Custom labeling workflows.

If you’re looking for a turnkey data labeling solution, consider Amazon SageMaker Ground Truth Plus, which provides access to an expert workforce trained in various machine learning tasks. With SageMaker Ground Truth Plus, you can quickly receive high-quality annotations without the need to build and manage your own labeling workflows, reducing costs by up to 40% and accelerating the delivery of labeled data at scale.

Start building your annotation workflow today and contribute to the next generation of AI models that push the boundaries of what’s possible in audio and video generation.


About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable and cost-efficient pipelines for computer vision applications, natural language processing, and generative AI. In his free time, Sundar loves exploring new places, sampling local eateries and embracing the great outdoors.

Vineet Agarwal is a Senior Manager of Customer Delivery in the Amazon Bedrock team responsible for Human in the Loop services. He has been in AWS for over 2 years managing Go-to-Market activities, business and technical operations. Prior to AWS, he worked in SaaS , Fintech and Telecommunications industry in services leadership role. He has MBA from the Indian School of Business and B. Tech in Electronics and Communications Engineering from National Institute of Technology, Calicut (India). In his free time, Vineet loves playing racquetball and enjoying outdoor activities with his family.

Read More

Using responsible AI principles with Amazon Bedrock Batch Inference

Using responsible AI principles with Amazon Bedrock Batch Inference

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

The recent announcement of batch inference in Amazon Bedrock enables organizations to process large volumes of data efficiently at 50% less cost compared to On-Demand pricing. It’s especially useful when the use case is not latency sensitive and you don’t need real-time inference. However, as we embrace these powerful capabilities, we must also address a critical challenge: implementing responsible AI practices in batch processing scenarios.

In this post, we explore a practical, cost-effective approach for incorporating responsible AI guardrails into Amazon Bedrock Batch Inference workflows. Although we use a call center’s transcript summarization as our primary example, the methods we discuss are broadly applicable to a variety of batch inference use cases where ethical considerations and data protection are a top priority.

Our approach combines two key elements:

  • Ethical prompting – We demonstrate how to embed responsible AI principles directly into the prompts used for batch inference, preparing for ethical outputs from the start
  • Postprocessing guardrails – We show how to apply additional safeguards to the batch inference output, making sure that the remaining sensitive information is properly handled

This two-step process offers several advantages:

  • Cost-effectiveness – By applying heavy-duty guardrails to only the typically shorter output text, we minimize processing costs without compromising on ethics
  • Flexibility – The technique can be adapted to various use cases beyond transcript summarization, making it valuable across industries
  • Quality assurance – By incorporating ethical considerations at both the input and output stages, we maintain high standards of responsible AI throughout the process

Throughout this post, we address several key challenges in responsible AI implementation for batch inference. These include safeguarding sensitive information, providing accuracy and relevance of AI-generated content, mitigating biases, maintaining transparency, and adhering to data protection regulations. By tackling these challenges, we aim to provide a comprehensive approach to ethical AI use in batch processing.

To illustrate these concepts, we provide practical step-by-step guidance on implementing this technique.

Solution overview

This solution uses Amazon Bedrock for batch inference to summarize call center transcripts, coupled with the following two-step approach to maintain responsible AI practices. The method is designed to be cost-effective, flexible, and maintain high ethical standards.

  • Ethical data preparation and batch inference:
    • Use ethical prompting to prepare data for batch processing
    • Store the prepared JSONL file in an Amazon Simple Storage Service (Amazon S3) bucket
    • Use Amazon Bedrock batch inference for efficient and cost-effective call center transcript summarization
  • Postprocessing with Amazon Bedrock Guardrails:
    • After the completion of initial summarization, apply Amazon Bedrock Guardrails to detect and redact sensitive information, filter inappropriate content, and maintain compliance with responsible AI policies
    • By applying guardrails to the shorter output text, you optimize for both cost and ethical compliance

This two-step approach combines the efficiency of batch processing with robust ethical safeguards, providing a comprehensive solution for responsible AI implementation in scenarios involving sensitive data at scale.

In the following sections, we walk you through the key components of implementing responsible AI practices in batch inference workflows using Amazon Bedrock, with a focus on ethical prompting techniques and guardrails.

Prerequisites

To implement the proposed solution, make sure you have satisfied the following requirements:

Ethical prompting techniques

When setting up your batch inference job, it’s crucial to incorporate ethical guidelines into your prompts. The following is a concise example of how you might structure your prompt:

prompt = f"""
Summarize the following customer service transcript:

{transcript}

Instructions:
1. Focus on the main issue, steps taken, and resolution.
2. Maintain a professional and empathetic tone.
3. Do not include any personally identifiable information (PII) in the summary.
4. Use gender-neutral language even if gender is explicitly mentioned.
5. Reflect the emotional context accurately without exaggeration.
6. Highlight actionable insights for improving customer service.
7. If any part is unclear or ambiguous, indicate this in the summary.
8. Replace specific identifiers with generic terms like 'the customer' or '{{MASKED}}'.
"""

This prompt sets the stage for ethical summarization by explicitly instructing the model to protect privacy, minimize bias, and focus on relevant information.

Set up a batch inference job

For detailed instructions on how to set up and run a batch inference job using Amazon Bedrock, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock. It provides detailed instructions for the following steps:

  • Preparing your data in the required JSONL format
  • Understanding the quotas and limitations for batch inference jobs
  • Starting a batch inference job using either the Amazon Bedrock console or API
  • Collecting and analyzing the output from your batch job

By following the instructions in our previous post and incorporating the ethical prompt provided in the preceding section, you’ll be well-equipped to set up batch inference jobs.

Amazon Bedrock Guardrails

After the batch inference job has run successfully, apply Amazon Bedrock Guardrails as a postprocessing step. This provides an additional layer of protection against potential ethical violations or sensitive information disclosure. The following is a simple implementation, but you can update this based on your data volume and SLA requirements:

import boto3, os, json, time

# Initialize Bedrock client and set guardrail details
bedrock_runtime = boto3.client('bedrock-runtime')
guardrail_id = "<Your Guardrail ID>"
guardrail_version = "<Your Guardrail Version>"

# S3 bucket and file details i.e. output of batch inference job
bucket_name = '<S3 bucket with batch inference output>'
prefix = "<prefix>"
filename = '<filename>'

# Set up AWS session and S3 client
session = boto3.Session(
    aws_access_key_id=os.environ.get('AWS_ACCESS_KEY_ID'),
    aws_secret_access_key=os.environ.get('AWS_SECRET_ACCESS_KEY'),
    region_name=os.environ.get('AWS_REGION')
)
s3 = session.client('s3')

# Read and process batch inference output from S3
output_data = []
try:
    object_key = f"{prefix}{filename}"
    json_data = s3.get_object(Bucket=bucket_name, Key=object_key)['Body'].read().decode('utf-8')
    
    for line in json_data.splitlines():
        data = json.loads(line)
        output_entry = {
            'request_id': data['recordId'],
            'output_text': data['modelOutput']['content'][0]['text']
        }
        output_data.append(output_entry)
except Exception as e:
    print(f"Error reading JSON file from S3: {e}")

# Function to apply guardrails and mask PII data
def mask_pii_data(batch_output: str):
    try:
        pii_data = [{"text": {"text": batch_output}}]
        response = bedrock_runtime.apply_guardrail(
            guardrailIdentifier=guardrail_id,
            guardrailVersion=guardrail_version,
            source='OUTPUT',
            content=pii_data
        )
        return response['outputs'][0]['text'] if response['action'] == 'GUARDRAIL_INTERVENED' else pii_data
    except Exception as e:
        print(f"An error occurred: {str(e)}")

# Set up rate limiting: # 20 requests per minute, 3 seconds interval
rpm = 20
interval = 3

# Apply guardrails to each record
masked_data = []
for record in output_data:
    iteration_start = time.time()
    
    record['masked_data'] = mask_pii_data(record['output_text'])
    masked_data.append(record)
    
    # Implement rate limiting
    time.sleep(max(0, interval - (time.time() - iteration_start)))

Key points about this implementation:

  • We use the apply_guardrail method from the Amazon Bedrock runtime to process each output
  • The guardrail is applied to the ‘OUTPUT’ source, focusing on postprocessing
  • We handle rate limiting by introducing a delay between API calls, making sure that we don’t exceed the requests per minute quota, which is 20 requests per minute
  • The function mask_pii_data applies the guardrail and returns the processed text if the guardrail intervened
  • We store the masked version for comparison and analysis

This approach allows you to benefit from the efficiency of batch processing while still maintaining strict control over the AI’s outputs and protecting sensitive information. By addressing ethical considerations at both the input (prompting) and output (guardrails) stages, you’ll have a comprehensive approach to responsible AI in batch inference workflows.

Although this example focuses on call center transcript summarization, you can adapt the principles and methods discussed in this post to various batch inference scenarios across different industries, always prioritizing ethical AI practices and data protection.

Ethical considerations for responsible AI

Although the prompt in the previous section provides a basic framework, there are many ethical considerations you can incorporate depending on your specific use case. The following is a more comprehensive list of ethical guidelines:

  • Privacy protection – Avoid including any personally identifiable information in the summary. This protects customer privacy and aligns with data protection regulations, making sure that sensitive personal data is not exposed or misused.
  • Factual accuracy – Focus on facts explicitly stated in the transcript, avoiding speculation. This makes sure that the summary remains factual and reliable, providing an accurate representation of the interaction without introducing unfounded assumptions.
  • Bias mitigation – Be mindful of potential biases related to gender, ethnicity, location, accent, or perceived socioeconomic status. This helps prevent discrimination and maintains fair treatment for your customers, promoting equality and inclusivity in AI-generated summaries.
  • Cultural sensitivity – Summarize cultural references or idioms neutrally, without interpretation. This respects cultural diversity and minimizes misinterpretation, making sure that cultural nuances are acknowledged without imposing subjective judgments.
  • Gender neutrality – Use gender-neutral language unless gender is explicitly mentioned. This promotes gender equality and minimizing stereotyping, creating summaries that are inclusive and respectful of all gender identities.
  • Location neutrality – Include location only if relevant to the customer’s issue. This minimizes regional stereotyping and focuses on the actual issue rather than unnecessary generalizations based on geographic information.
  • Accent awareness – If accent or language proficiency is relevant, mention it factually without judgment. This acknowledges linguistic diversity without discrimination, respecting the varied ways in which people communicate.
  • Socioeconomic neutrality – Focus on the issue and resolution, regardless of the product or service tier discussed. This promotes fair treatment regardless of a customer’s economic background, promoting equal consideration of customers’ concerns.
  • Emotional context – Use neutral language to describe emotions accurately. This provides insight into customer sentiment without escalating emotions, allowing for a balanced representation of the interaction’s emotional tone.
  • Empathy reflection – Note instances of the agent demonstrating empathy. This highlights positive customer service practices, encouraging the recognition and replication of compassionate interactions.
  • Accessibility awareness – Include information about any accessibility needs or accommodations factually. This promotes inclusivity and highlights efforts to accommodate diverse needs, fostering a more accessible and equitable customer service environment.
  • Ethical behavior flagging – Identify potentially unethical behavior without repeating problematic content. This helps identify issues for review while minimizing the propagation of inappropriate content, maintaining ethical standards in the summarization process.
  • Transparency – Indicate unclear or ambiguous information in the summary. This promotes transparency and helps identify areas where further clarification might be needed, making sure that limitations in understanding are clearly communicated.
  • Continuous improvement – Highlight actionable insights for improving customer service. This turns the summarization process into a tool for ongoing enhancement of service quality, contributing to the overall improvement of customer experiences.

When implementing ethical AI practices in your batch inference workflows, consider which of these guidelines are most relevant to your specific use case. You may need to add, remove, or modify instructions based on your industry, target audience, and specific ethical considerations. Remember to regularly review and update your ethical guidelines as new challenges and considerations emerge in the field of AI ethics.

Clean up

To delete the guardrail you created, follow the steps in Delete a guardrail.

Conclusion

Implementing responsible AI practices, regardless of the specific feature or method, requires a thoughtful balance of privacy protection, cost-effectiveness, and ethical considerations. In our exploration of batch inference with Amazon Bedrock, we’ve demonstrated how these principles can be applied to create a system that not only efficiently processes large volumes of data, but does so in a manner that respects privacy, avoids bias, and provides actionable insights.

We encourage you to adopt this approach in your own generative AI implementations. Start by incorporating ethical guidelines into your prompts and applying guardrails to your outputs. Responsible AI is an ongoing commitment—continuously monitor, gather feedback, and adapt your approach to align with the highest standards of ethical AI use. By prioritizing ethics alongside technological advancement, we can create AI systems that not only meet business needs, but also contribute positively to society.


About the authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Read More

Revolutionizing knowledge management: VW’s AI prototype journey with AWS

Revolutionizing knowledge management: VW’s AI prototype journey with AWS

Today, we’re excited to share the journey of the VW—an innovator in the automotive industry and Europe’s largest car maker—to enhance knowledge management by using generative AI, Amazon Bedrock, and Amazon Kendra to devise a solution based on Retrieval Augmented Generation (RAG) that makes internal information more easily accessible by its users. This solution efficiently handles documents that include both text and images, significantly enhancing VW’s knowledge management capabilities within their production domain.

The challenge

The VW engaged with AWS Industries Prototyping & Customer Engineering Team (AWSI-PACE) to explore ways to improve knowledge management in the production domain by building a prototype that uses advanced features of Amazon Bedrock, specifically Anthropic’s Claude 3 models, to extract and analyze information from private documents, such as PDFs containing text and images. The main technical challenge was to efficiently retrieve and process data in a multi-modal setup to provide comprehensive and accurate information from Chemical Compliance private documents.

PACE, a multi-disciplinary rapid prototyping team, focuses on delivering feature-complete initial products that enable business evaluation, determining feasibility, business value, and path to production. Using the PACE-Way (an Amazon-based development approach), the team developed a time-boxed prototype over a maximum of 6 weeks, which included a full stack solution with frontend and UX, backed by specialist expertise, such as data science, tailored for VW’s needs.

The choice of Anthropic’s Claude 3 models within Amazon Bedrock was driven by Claude’s advanced vision capabilities, enabling it to understand and analyze images alongside text. This multimodal interaction is crucial for applications that require extracting insights from complex documents containing both textual content and images. These features open up exciting possibilities for multimodal interactions, making it ideal for querying private PDF documents that include both text and images.

The integrated approach and ease of use of Amazon Bedrock in deploying large language models (LLMs), along with built-in features that facilitate seamless integration with other AWS services like Amazon Kendra, made it the preferred choice. By using Claude 3’s vision capabilities, we could upload image-rich PDF documents. Claude analyzes each image contained within these documents to extract text and understand the contextual details embedded in these visual elements. The extracted text and context from the images are then added to Amazon Kendra, enhancing the search-ability and accessibility of information within the system. This integration ensures that users can perform detailed and accurate searches across the indexed content, using the full depth of information extracted by Claude 3.

Architecture overview

Because of the need to provide access to proprietary information, it was decided early that the prototype would use RAG. The RAG approach, at this time an established solution to enhance LLMs with private knowledge, is implemented using a blend of AWS services that enable us to streamline the processing, searching, and querying of documents while at same time meeting non-functional requirements related to efficiency, scalability, and reliability. The architecture is centered around a native AWS serverless backend, which ensures minimal maintenance and high availability together with fast development.

Architecture overview

Core components of the RAG system

  1. Amazon Simple Storage Service (Amazon S3): Amazon S3 serves as the primary storage for source data. It’s also used for hosting static website components, ensuring high durability and availability.
  2. Amazon Kendra: Amazon Kendra provides semantic search capabilities for ranking of documents and passages, it also deals with the overhead of handling text extraction, embeddings, and managing vector datastore.
  3. Amazon Bedrock: This component is critical for processing and inference. It uses machine learning models to analyze and interpret the text and image data extracted from documents, integrating these insights to generate context-aware responses to queries.
  4. Amazon CloudFront: Distributes the web application globally to reduce latency, offering users fast and reliable access to the RAG system’s interface.
  5. AWS Lambda: Provides the serverless compute environment for running backend operations without provisioning or managing servers, which scales automatically with the application’s demands.
  6. Amazon DynamoDB: Used for storing metadata and other necessary information for quick retrieval during search operations. Its fast and flexible NoSQL database service accommodates high-performance needs.
  7. AWS AppSync: Manages real-time data synchronization and communication between the users’ interfaces and the serverless backend, enhancing the interactive experience.
  8. Amazon Cognito: Manages user authentication and authorization, providing secure and scalable user access control. It supports integration with various identity providers to facilitate easy and secure user sign-in and registration processes.
  9. Amazon API Gateway: Acts as the entry point for all RESTful API requests to the backend services, offering features such as throttling, monitoring, and API version management.
  10. AWS Step Functions: Orchestrates the various AWS services involved in the RAG system, ensuring coordinated execution of the workflow.

Solution walkthrough

The process flow handles complex documents efficiently from the moment a user uploads a PDF. These documents are often large and contain numerous images. This workflow integrates AWS services to extract, process, and make content available for querying. This section details the steps involved in processing uploaded documents and ensuring that extracted data is searchable and contextually relevant to user queries (shown in the following figure).

Solution walkthrough

Initiation and initial processing:

  1. User access: A user accesses the web interface through CloudFront, which allows users to upload PDFs as shown in Image A in Results. These PDFs are stored in Amazon S3.
  2. Text extraction: With the Amazon Kendra S3 connector, the solution indexes the S3 bucket repository of documents that the user has uploaded in Step 1. Amazon Kendra supports popular document types or formats such as PDF, HTML, Word, PowerPoint, and more. An index can contain multiple document formats. Amazon Kendra extracts the content inside the documents to make the documents searchable. The documents are parsed to optimize search on the extracted text within the documents. This means structuring the documents into fields or attributes that are used for search.
  3. Step function activation: When an object is created in S3, such as a user uploading a file in Step 1, the solution will launch a step function that orchestrates the document processing workflow for adding image context to the Kendra index.

Image extraction and analysis:

  1. Extract images: While Kendra indexes the text from the uploaded file, the step function extracts the images from the document. Extracting the images from the uploaded file allows the solution to process the images using Amazon Bedrock to extract text and contextual information. The code snippet that follows provides a sample of the code used to extract the images from the PDF file and save them back to S3.
import json
import fitz  # PyMuPDF
import os
import boto3

# Initialize the S3 client
s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket_name = event['bucket_name']
    pdf_key = event['pdf_key']
    
    # Define the local paths
    local_pdf_path = '/tmp/' + os.path.basename(pdf_key)
    local_image_dir = '/tmp/images'
    
    # Ensure the image directory exists
    if not os.path.exists(local_image_dir):
        os.makedirs(local_image_dir)
    
    # Download the PDF from S3
    s3.download_file(bucket_name, pdf_key, local_pdf_path)
    
    # Open the PDF file using PyMuPDF
    pdf_file = fitz.open(local_pdf_path)
    pdf_name = os.path.splitext(os.path.basename(local_pdf_path))[0]  # Extract PDF base name for labeling
    
    total_images_extracted = 0  # Counter for all images extracted from this PDF
    image_filenames = []  # List to store the filenames of extracted images
    
    # Iterate through each page of the PDF
    for current_page_index in range(len(pdf_file)):
        # Extract images from the current page
        for img_index, img in enumerate(pdf_file.get_page_images(current_page_index)):
            xref = img[0]
            image = fitz.Pixmap(pdf_file, xref)
            
            # Construct image filename with a global counter
            image_filename = f"{pdf_name}_image_{total_images_extracted}.png"
            image_path = os.path.join(local_image_dir, image_filename)
            total_images_extracted += 1
            
            # Save the image appropriately
            if image.n < 5:  # GRAY or RGB
                image.save(image_path)
            else:  # CMYK, requiring conversion to RGB
                new_image = fitz.Pixmap(fitz.csRGB, image)
                new_image.save(image_path)
                new_image = None
            
            image = None
            
            # Upload the image back to S3
            s3.upload_file(image_path, bucket_name, f'images/{image_filename}')
            
            # Add the image filename to the list
            image_filenames.append(image_filename)
    
    # Return the response with the list of image filenames and total images extracted
    return {
        'statusCode': 200,
        'image_filenames': image_filenames,
        'total_images_extracted': total_images_extracted
    }
    1. Lambda function code:
      1. Initialization: The function initializes the S3 client.
      2. Event extraction: Extracts the bucket name and PDF key from the incoming event payload.
      3. Local path set up: Defines local paths for storing the PDF and extracted images.
      4. Directory creation: Ensures the directory for images exists.
      5. PDF download: Downloads the PDF file from S3.
      6. Image extraction: Opens the PDF and iterates through its pages to extract images.
      7. Image processing: Saves the images locally and uploads them back to S3.
      8. Filename collection: Collects the filenames of the uploaded images.
      9. Return statement: Returns the list of image filenames and the total number of images extracted.
  1. Text extraction from images: The image files processed from the previous step are then sent to Amazon Bedrock, where advanced models extract textual content and contextual details from the images. The step function uses a map state to iterate over the list of images, processing each one individually. Claude 3 offers image-to-text vision capabilities that can process images and return text outputs. It excels at analyzing and understanding charts, graphs, technical diagrams, reports, and other visual assets. Claude 3 Sonnet achieves comparable performance to other best-in-class models with image processing capabilities while maintaining a significant speed advantage. The following is a sample snippet that extracts the contextual information from each image in the map state.
import json
import base64
import boto3
from botocore.exceptions import ClientError

# Initialize the boto3 client for BedrockRuntime and S3
s3 = boto3.client('s3', region_name='us-west-2')
bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-west-2')

def lambda_handler(event, context):
    source_bucket = event['bucket_name']
    destination_bucket = event['destination_bucket']
    image_filename = event['image_filename']
    
    try:
        # Get the image from S3
        image_file = s3.get_object(Bucket=source_bucket, Key=image_filename)
        contents = image_file['Body'].read()

        # Encode the image to base64
        encoded_string = base64.b64encode(contents).decode('utf-8')

        # Prepare the payload for Bedrock
        payload = {
            "modelId": "anthropic.claude-3-sonnet-20240229-v1:0",
            "contentType": "application/json",
            "accept": "application/json",
            "body": {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 4096,
                "temperature": 0.7,
                "top_p": 0.999,
                "top_k": 250,
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "image",
                                "source": {
                                    "type": "base64",
                                    "media_type": "image/png",
                                    "data": encoded_string
                                }
                            },
                            {
                                "type": "text",
                                "text": "Extract all text."
                            }
                        ]
                    }
                ]
            }
        }

        # Call Bedrock to extract text from the image
        body_bytes = json.dumps(payload['body']).encode('utf-8')
        response = bedrock_runtime.invoke_model(
            body=body_bytes,
            contentType=payload['contentType'],
            accept=payload['accept'],
            modelId=payload['modelId']
        )

        response = json.loads(response['body'].read().decode('utf-8'))
        response_content = response['content'][0]
        response_text = response_content['text']

        # Save the extracted text to S3
        text_file_key = image_filename.replace('.png', '.txt')
        s3.put_object(Bucket=destination_bucket, Key=text_file_key, Body=str(response_text))

        return {
            'statusCode': 200,
            'text_file_key': text_file_key,
            'message': f"Processed and saved text for {image_filename}"
        }

    except Exception as e:
        return {
            'statusCode': 500,
            'error': str(e),
            'message': f"An error occurred processing {image_filename}"
        }
    1. Lambda function code:
      1. Initialization: The script initializes the boto3 clients for BedrockRuntime and S3 services to interact with AWS resources.
      2. Lambda handler: The main function (lambda_handler) is invoked when the Lambda function is run. It receives the event and context parameters.
      3. Retrieve image: The image file is retrieved from the specified S3 bucket using the get_object method.
      4. Base64 encoding: The image is read and encoded to a base64 string, which is required for sending the image data to Bedrock.
      5. Payload preparation: A payload is constructed with the base64 encoded image and a request to extract text.
      6. Invoke Amazon Bedrock: The Amazon Bedrock model is invoked using the prepared payload to extract text from the image.
      7. Process response: The response from Amazon Bedrock is parsed to extract the textual content.
      8. Save text to S3: The extracted text is saved back to the specified S3 bucket with a filename derived from the original image filename.
      9. Return statement: The function returns a success message and the key of the saved text file. If an error occurs, it returns an error message.

Data storage and indexing:

  1. Save to S3: The extracted text from the images are saved back to S3 as text files.
  2. Indexing by Amazon Kendra: After being saved in S3, the data is indexed by Amazon Kendra, making it searchable and accessible for queries. This indexing adds the image context to perform similarity searches in the RAG system.

User query with semantic search and inference

The semantic search and inference process of our solution plays a critical role in providing users with accurate and contextually relevant information based on their queries.

Semantic search focuses on understanding the intent and contextual meaning behind a user’s query instead of relying solely on keyword matching. Amazon Kendra, an advanced enterprise search service, uses semantic search to deliver more accurate and relevant results. By using natural language processing (NLP) and machine learning algorithms, Amazon Kendra can interpret the nuances of a query, ensuring that the retrieved documents and data align closely with the user’s actual intent.

User query with semantic search and inference

User query handling:

  1. User interaction: Users submit their queries through a user-friendly interface.

Semantic search with Amazon Kendra:

  1. Context retrieval: Upon receiving a query, Amazon Kendra performs a semantic search to identify the most relevant documents and data. The advanced NLP capabilities of Amazon Kendra allow it to understand the intent and contextual nuances of the query.
  2. Provision of relevant context: Amazon Kendra provides a list of documents that are ranked based on their relevance to the user’s query. This ensures that the response is not only based on keyword matches but also on the semantic relevance of the content. Note that Amazon Kendra also uses the text extracted from images, which was processed with Amazon Bedrock, to enhance the search results.

Inference with Amazon Bedrock:

  1. Contextual analysis and inference: The relevant documents and data retrieved by Amazon Kendra are then passed to Amazon Bedrock. The inference models available in Amazon Bedrock consider both the context provided by Kendra and the specific details of the user query. This dual consideration allows Amazon Bedrock to formulate responses that are not only accurate but also finely tuned to the specifics of the query. The following are the snippets for generating prompts that help Bedrock provide accurate and contextually relevant responses:
def get_qa_prompt(self):
    template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}"""
    return PromptTemplate(template=template, input_variables=["context", "question"])

def get_prompt(self):
    template = """The following is a friendly conversation between a human and an AI. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
{chat_history}

Question: {input}"""
    input_variables = ["input", "chat_history"]
    prompt_template_args = {
        "chat_history": "{chat_history}",
        "input_variables": input_variables,
        "template": template,
    }
    prompt_template = PromptTemplate(**prompt_template_args)
    return prompt_template

def get_condense_question_prompt(self):
    template = """<conv>
{chat_history}
</conv>

<followup>
{question}
</followup>

Given the conversation inside the tags <conv></conv>, rephrase the follow up question you find inside <followup></followup> to be a standalone question, in the same language as the follow up question.
"""
    return PromptTemplate(input_variables=["chat_history", "question"], template=template)
    1. QA prompt explanation:
      1. QA Prompt:
        1. This prompt is designed to use the context provided by Amazon Kendra to answer a question accurately. The context provided by Amazon Kendra is from the most relevant documents and data processed by the semantic search from the user query.
        2. It instructs the AI to use the given context and only provide an answer if it is certain; otherwise, it should admit not knowing the answer.

Response delivery:

  1. Delivery to user: This response is then delivered back to the user; completing the cycle of query and response.

Results

Our evaluation of the system revealed significant multi-lingual capabilities, enhancing user interaction with documents in multiple languages:

  • Multilingual support: The model showed strong performance across different languages. Despite the documents being primarily in German, the system handled queries in English effectively. It translated the extracted text from the PDFs or images from German to English, providing responses in English. This feature was crucial for English-speaking users.
  • Seamless language transition: The system also supports transitions between languages. Users could ask questions in German and receive responses in German, maintaining context and accuracy. This dual-language functionality significantly enhanced efficiency, catering to documents containing both German and English.
  • Enhanced user experience: This multilingual capability broadened the system’s accessibility and ensured users could receive information in their preferred language, making interactions more intuitive.

Image A demonstrates a user querying their private data. The solution successfully answers the query using the private data. The answer isn’t derived from the extracted text within the files, but from an image embedded in the uploaded file.

Image A demonstrates a user querying their private data.

Image B shows the specific image from which Amazon Bedrock extracted the text and added it to the index, enabling the system to provide the correct answer.

Image B shows the specific image from which Amazon Bedrock extracted the text and added it to the index.

Image C also shows a scenario where, without the image context, the question cannot be answered.

Image C also shows a scenario where, without the image context, the question cannot be answered.

Following the successful prototype development, Stefan Krawinkel from VW shared his thoughts:

“We are thrilled by the AWS team’s joy of innovation and the constant questioning of solutions for the requirements we brought to the prototype. The solutions developed give us a good overview of what is possible with generative AI, and what limits still exist today. We are confident that we will continue to push existing boundaries together with AWS to be able to offer attractive products to our customers.”

This testimonial highlights how the collaborative effort addressed the complex challenges and underscores the ongoing potential for innovation in future projects.

Additional thanks to Fabrizio Avantaggiato, Verena Koutsovagelis and Jon Reed for their work on this prototype.


About the Authors

Rui Costa specializes in Software Engineering and currently holds the position of Principal Solutions Developer within the AWS Industries Prototyping and Customer Engineering (PACE) Team based out of Jersey City, New Jersey.

Mahendra Bairagi is a Generative AI specialist who currently holds a position of Principal Solutions Architect – Generative AI within the AWS Industries and Customer Engineering (PACE) team. Throughout his more than 9 years at AWS, Mahendra has held a variety of pivotal roles, including Principal AI/ML Specialist, IoT Specialist, Principal Product Manager and head of Sports Innovations Lab. In these capacities, he has consistently led innovative solutions, driving significant advancements for both customers and partners.

Read More

Fine-tune large language models with Amazon SageMaker Autopilot

Fine-tune large language models with Amazon SageMaker Autopilot

Fine-tuning foundation models (FMs) is a process that involves exposing a pre-trained FM to task-specific data and fine-tuning its parameters. It can then develop a deeper understanding and produce more accurate and relevant outputs for that particular domain.

In this post, we show how to use an Amazon SageMaker Autopilot training job with the AutoMLV2 SDK to fine-tune a Meta Llama2-7B model on question answering tasks. Specifically, we train the model on multiple-choice science exam questions covering physics, chemistry, and biology. This fine-tuning approach can be extended to other tasks, such as summarization or text generation, in domains like healthcare, education, or financial services.

AutoMLV2 supports the instruction-based fine-tuning of a selection of general-purpose FMs powered by Amazon SageMaker JumpStart. We use Amazon SageMaker Pipelines, which helps automate the different steps, including data preparation, fine-tuning, and creating the model. We use the open source library fmeval to evaluate the model and register it in the Amazon SageMaker Model Registry based on its performance.

Solution overview

The following architecture diagram shows the various steps involved to create an automated and scalable process to fine-tune large language models (LLMs) using AutoMLV2. The AutoMLV2 SDK simplifies the process of creating and managing AutoML jobs by providing high-level functions and abstractions, making it straightforward for developers who may not be familiar with AutoML concepts. The CreateAutoMLJobV2 API offers a low-level interface that allows for more control and customization. Using the SDK offers benefits like faster prototyping, better usability, and pre-built functions, and the API is better for advanced customizations.

Add image architecture

To implement the solution, we use SageMaker Pipelines in Amazon SageMaker Studio to orchestrate the different steps. The solution consists of two pipelines: training and inference.

To create the training pipeline, you complete the following steps:

Load and prepare the dataset.

  1. Create a SageMaker Autopilot CreateAutoMLJobV2 training job.
  2. Check the training job status.
  3. Deploy the best candidate model.

The following steps configure the inference pipeline:

Preprocess data for evaluation.

  1. Evaluate the model using the fmeval library.
  2. Register the model if it meets the required performance.

To deploy the solution, refer to the GitHub repo, which provides step-by-step instructions for fine-tuning Meta Llama2-7B using SageMaker Autopilot and SageMaker Pipelines.

Prerequisites

For this walkthrough, complete the following prerequisite steps:

  1. Set up an AWS account.
  2. Create a SageMaker Studio environment.
  3. Create two AWS Identity and Access Management (IAM) roles: LambdaExecutionRole and SageMakerExecutionRole, with permissions as outlined in the SageMaker notebook. The managed policies should be scoped down further for improved security. For instructions, refer to Create a role to delegate permissions to an IAM user.
  4. On the SageMaker Studio console, upload the code from the GitHub repo.
  5. Open the SageMaker notebook ipynb and run the cells.

Training pipeline

The following training pipeline shows a streamlined way to automate the fine-tuning of a pre-trained LLM and the deployment of the model to a real-time endpoint inference.

Add training pipeline image

Prepare the data

For this project, we used the SciQ dataset, which contains science exam questions about physics, chemistry, biology, and other subjects. SageMaker Autopilot supports instruction-based fine-tuning datasets formatted as CSV files (default) or as Parquet files.

When you prepare your CSV file, make sure that it contains exactly two columns:

  • The input column must be in a string format and contains the prompt
  • The output column is in a string format and indicates the ground truth answer

In this project, we start by removing the irrelevant columns. Next, we combine the question and support columns to create a comprehensive prompt, which is then placed in the input column. SageMaker Autopilot sets a maximum limit on the number of rows in the dataset and the context length based on the type of model being used. We select 10,000 rows from the dataset.

Finally, we divide the data into training and validation sets:

# Load and split dataset. Change this to your own dataset
dataset = load_dataset("allenai/sciq", split="train")   
dataset = dataset.train_test_split(test_size=0.1, shuffle=True)   
dataset_training_df = pd.DataFrame(dataset['train'])   
dataset_validation_df = pd.DataFrame(dataset['test'])   
dataset_training_df = dataset_training_df.sample(n=10000, random_state=42, ignore_index=True)   
# prepare training dataset to fit autopilot job.   
fields = ['question', 'correct_answer', 'support']
dataset_train_ist_df = dataset_training_df[fields].copy()  
dataset_fine_tune_ist = Dataset.from_pandas(dataset_train_ist_df)
dataset_fine_tune_ist_cpy= dataset_train_ist_df.copy()
dataset_fine_tune_ist_cpy["input"] = ("Below is an instruction that describes a task, paired with an input that provides further context."
"Write a response that appropriately completes the request.nn### Instruction:n"+ dataset_fine_tune_ist_cpy["question"]+
 "nn### Input:n" + dataset_fine_tune_ist_cpy["support"])
dataset_fine_tune_ist_cpy["output"] = dataset_fine_tune_ist_cpy["correct_answer"]
autopilot_fields = ['input', 'output']
dataset_fine_tune = Dataset.from_pandas(dataset_fine_tune_ist_cpy[autopilot_fields])
dataset_fine_tune.to_csv(train_dataset_s3_path, index=False)

Create an CreateAutoMLJobV2 training job

AutoMLV2 makes it straightforward to train, optimize, and deploy machine learning (ML) models by automating the tasks involved in the ML development lifecycle. It provides a simple approach to create highly accurate models tailored to your specific problem type, whether it’s classification, regression, forecasting, or others. In this section, we go through the steps to train a model with AutoMLV2, using an LLM fine-tuning job as an example. For this project, we used the Meta Llama2-7B model. You can change the model by choosing from the supported LLMs for fine-tuning.

Define the text generation configuration

AutoMLV2 automates the entire ML process, from data preprocessing to model training and deployment. However, for AutoMLV2 to work effectively, it’s crucial to provide the right problem configuration. This configuration acts as a guide, helping SageMaker Autopilot understand the nature of your problem and select the most appropriate algorithm or approach. By specifying details such as the problem type (such as classification, regression, forecasting, or fine-tuning), you give AutoMLV2 the necessary information to tailor its solution to your specific requirements.

For a fine-tuning job, the configuration consists of determining the model to be used and its access configuration, in addition to the hyperparameters that optimize the model learning process. See the following code:

text_generation_config = AutoMLTextGenerationConfig(   
 base_model_name= "Llama2-7B",
 accept_eula= True,
 text_generation_hyper_params={"epochCount": "3", "learningRate": "0.00001", "batchSize": "1", "learningRateWarmupSteps": "1"},
)

The definitions of each parameter used in text_generation_config are:

  • base_model_name – The name of the base model to fine-tune. SageMaker Autopilot supports fine-tuning a variety of LLMs. If no value is provided, the default model used is Falcon7BInstruct.
  • accept_eula – The access configuration file to control access to the ML model. The value is set to True to accept the model end-user license agreement (EULA). This setting is necessary for models like Meta Llama2-7B, which require accepting the license terms before they can be used.

epochCount – The number of times the model goes through the entire training dataset. Its value should be a string containing an integer value within the range of 1–10. One epoch means the Meta Llama2-7B model has been exposed to the 10,000 samples and had a chance to learn from them. You can set it to 3, meaning the model will make three complete passes, or increase the number, if the model doesn’t converge with just three epochs.

learningRate – The step size at which a model’s parameters are updated during training. Its value should be a string containing a floating-point value within the range of 0–1. A learning rate of 0,00001 or 0,00002 is a good standard when fine-tuning LLMs like Meta Llama2-7B.

batchSize – The number of data samples used in each iteration of training. Its value should be a string containing an integer value within the range of 1–64. Start with 1 in order to not receive an out-of-memory error.

learningRateWarmupSteps – The number of training steps during which the learning rate gradually increases before reaching its target or maximum value. Its value should be a string containing an integer value within the range of 0–250. Start with 1.

The configuration settings can be adjusted to align with your specific requirements and the chosen FM.

Start the AutoMLV2 job

Next, set up the AutoMLV2 job by providing the problem configuration details, the AWS role with the necessary permissions, a base name for job identification, and the output path where the model artifacts will be saved. To initiate the training process in a pipeline step, we invoked the create_auto_ml_job_v2 method. In the following code snippet, the create_auto_ml_job_v2 method is called to create an AutoML job object with specific inputs. The AutoMLJobInputDataConfig parameter takes a list that includes an AutoMLDataChannel, which specifies the type of data (in this case, ‘S3Prefix’) and the location of the training dataset (given by train_dataset_s3_path.default_value) in an S3 bucket. The channel_type is set to ‘training’, indicating that this dataset is used for training the model.

sagemaker_client.create_auto_ml_job_v2(
AutoMLJobName=event["AutopilotJobName"],
AutoMLJobInputDataConfig=[{ 
"ChannelType": "training", 
"CompressionType": "None", 
"ContentType": "text/csv;header=present",
"DataSource": {"S3DataSource": {"S3DataType": "S3Prefix",
                                "S3Uri": event["TrainDatasetS3Path"],}}}],
DataSplitConfig={'ValidationFraction':0.1},
OutputDataConfig={"S3OutputPath": event["TrainingOutputS3Path"]}, 
AutoMLProblemTypeConfig={ 
"TextGenerationJobConfig":{"BaseModelName": event["BaseModelName"],
'TextGenerationHyperParameters':{
"epochCount": event["epochCount"],
"learningRate": event["learningRate"],
"batchSize": event["batchSize"],
"learningRateWarmupSteps": event["learningRateWarmupSteps"]},
'ModelAccessConfig':{'AcceptEula': True}}},    
RoleArn=event["AutopilotExecutionRoleArn"],)

Check SageMaker Autopilot job status

This step tracks the status of the Autopilot training job. In the script check_autopilot_job_status.py, we repeatedly check the status of the training job until it’s complete.

The callback step sends a token in an Amazon Simple Queue Service (Amazon SQS) queue, which invokes the AWS Lambda function to check the training job status. If the job is complete, the Lambda function sends a success message back to the callback step and the pipeline continues with the next step.

Deploy a model with AutoMLV2 using real-time inference

AutoMLV2 simplifies the deployment of models by automating the entire process, from model training to deployment. It takes care of the heavy lifting involved in selecting the best-performing model and preparing it for production use.

Furthermore, AutoMLV2 simplifies the deployment process. It can directly create a SageMaker model from the best candidate model and deploy it to a SageMaker endpoint with just a few lines of code.

In this section, we look at the code that deploys the best-performing model to a real-time SageMaker endpoint.

This pipeline step uses a Lambda step, which runs a serverless Lambda function. We use a Lambda step because the API call to create and deploy the SageMaker model is lightweight.

The first stage after the completion of the AutoMLV2 training process is to select the best candidate, making sure that the most accurate and efficient solution is chosen for deployment. We use the method describe_auto_ml_job_v2 to retrieve detailed information about a specific AutoMLV2 job. This method provides insights into the current status, configuration, and output of your AutoMLV2 job, allowing you to monitor its progress and access relevant information. See the following code:

autopilot_job = sagemaker_client.describe_auto_ml_job_v2(
AutoMLJobName= event['autopilot_job_name'])
best_candidate = autopilot_job['BestCandidate']

In SageMaker Autopilot, the best candidate model is selected based on minimizing cross-entropy loss, a default metric that measures the dissimilarity between predicted and actual word distributions during fine-tuning. Additionally, the model’s quality is evaluated using metrics like ROUGE scores (ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-L-Sum), which measure the similarity between machine-generated text and human-written reference text, along with perplexity, which assesses how well the model predicts the next word in a sequence. The model with the lowest cross-entropy and perplexity, combined with strong ROUGE scores, is considered the best candidate.

With the best candidate model identified, you can create a SageMaker model object, encapsulating the trained model artifacts and necessary dependencies. For that, we use the method create_model for the AutoML job object:

best_candidate_name = best_candidate['CandidateName']
response = sagemaker_client.create_model(ModelName=best_candidate_name,
PrimaryContainer={'Image': autopilot_job["BestCandidate"]["InferenceContainers"][0].pop("Image"),
'ModelDataUrl': autopilot_job["BestCandidate"]["InferenceContainers"][0].pop("ModelDataUrl"),
'ImageConfig': {'RepositoryAccessMode': 'Platform',},
'Environment': {"HUGGINGFACE_HUB_CACHE": "/tmp", 
"TRANSFORMERS_CACHE": "/tmp",
"HF_MODEL_ID": "/opt/ml/model"}},
ExecutionRoleArn=event["AutopilotExecutionRoleArn"])

Next, we create a SageMaker endpoint configuration and deploy a SageMaker endpoint for real-time inference using the best candidate model. We use the instance type ml.g5.12xlarge to deploy the model. You may need to increase your quota to use this instance.

endpoint_name = f"ep-{model_name}-automl"
endpoint_config_name = f"{model_name}-endpoint-config"
endpoint_configuration = sagemaker_client.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[{'VariantName': "Variant-1",
'ModelName': model_name,
'InstanceType': "ml.g5.12xlarge",
'InitialInstanceCount': 1,}],)
response = sagemaker_client.create_endpoint(EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
endpoint_arn = response["EndpointArn"] 

Inference pipeline

The inference pipeline is used for batch inference. It demonstrates a way to deploy and evaluate an FM and register it in SageMaker Model Registry. The following diagram shows the workflow starting with a preprocess data step, through model inference, to post-inference evaluation and conditional model registration.

Add Inference Pipeline image

Preprocess data for evaluation

The first crucial step in evaluating the performance of the fine-tuned LLM is to preprocess the data for evaluation. This preprocessing stage involves transforming the data into a format suitable for the evaluation process and verifying the compatibility with the chosen evaluation library.

In this particular case, we use a pipeline step to prepare the data for evaluation. The preprocessing script (preprocess_evaluation.py) creates a .jsonl (JSON Lines) file, which serves as the test dataset for the evaluation phase. The JSON Lines format is a convenient way to store structured data, where each line represents a single JSON object.

This test dataset is crucial for obtaining an unbiased evaluation of the model’s generalization capabilities and its ability to handle new, previously unseen inputs. After the evaluation_dataset.jsonl file is created, it’s saved in the appropriate path in an Amazon Simple Storage Service (Amazon S3) bucket.

Evaluate the model using the fmeval library

SageMaker Autopilot streamlines the entire ML workflow, automating steps from data preprocessing to model evaluation. After training multiple models, SageMaker Autopilot automatically ranks them based on selected performance metrics, such as cross-entropy loss for text generation tasks, and identifies the best-performing model.

However, when deeper, more granular insights are required, particularly during post-training evaluation with a testing dataset, we use fmeval, an open source library tailored for fine-tuning and evaluating FMs. Fmeval provides enhanced flexibility and control, allowing for a comprehensive assessment of model performance using custom metrics tailored to the specific use case. This makes sure the model behaves as expected in real-world applications. Fmeval facilitates the evaluation of LLMs across a broad range of tasks, including open-ended text generation, summarization, question answering, and classification. Additionally, fmeval assesses models on metrics such as accuracy, toxicity, semantic robustness, and prompt stereotyping, helping identify the optimal model for diverse use cases while maintaining ethical and robust performance.

To start using the library, follow these steps:

  1. Create a ModelRunner that can perform invocation on your LLM. ModelRunner encapsulates the logic for invoking different types of LLMs, exposing a predict method to simplify interactions with LLMs within the eval algorithm code. For this project, we use SageMakerModelRunner from fmeval.
  2. In the file py used by our pipeline, create a DataConfig object to use the evaluation_dataset created in the previous step.
  3. Next, use an evaluation algorithm with the custom dataset. For this project, we use the QAAccuracy algorithm, which measures how well the model performs in question answering tasks. The model is queried for a range of facts, and we evaluate the accuracy of its response by comparing the model’s output to target answers under different metrics:
    1. Exact match (EM) – Binary score. 1 if model output and target answer match exactly.
    2. Quasi-exact match – Binary score. Similar to exact match, but both model output and target answer are normalized first by removing articles and punctuation.
    3. Precision over words – The fraction of words in the prediction that are also found in the target answer. The text is normalized as before.
    4. Recall over words – The fraction of words in the target answer that are also found in the prediction.
    5. F1 over words – The harmonic mean of precision and recall over words (normalized).

As an output, the evaluation step produces a file (evaluation_metrics.json) that contains the computed metrics. This file is stored in Amazon S3 and is registered as a property file for later access in the pipeline.

Register the model

Before registering the fine-tuned model, we introduce a quality control step by implementing a condition based on the evaluation metrics obtained from the previous step. Specifically, we focus on the F1 score metric, which measures the harmonic mean of precision and recall between the normalized response and reference.

To make sure that only high-performing models are registered and deployed, we set a predetermined threshold for the F1 score metric. If the model’s performance meets or exceeds this threshold, it is suitable for registration and deployment. However, if the model fails to meet the specified threshold, the pipeline concludes without registering the model, stopping the deployment of suboptimal models.

Create and run the pipeline

After we define the pipeline steps, we combine them into a SageMaker pipeline. The steps are run sequentially. The pipeline runs the steps for an AutoML job, using SageMaker Autopilot for training, model evaluation, and model registration. See the following code:

pipeline = Pipeline(name="training-pipeline",
parameters=[evaluation_dataset_s3_path,
model_name,
metrics_report_s3_path, 
output_s3_path,
model_package_name,
model_approval_status],
steps=[step_preprocess_evaluation_data,
step_evaluate_autopilot_model,
step_condition,
step_register_autopilot_model],
sagemaker_session=sagemaker_session,)
pipeline.upsert(role_arn=SAGEMAKER_EXECUTION_ROLE_ARN)
pipeline_execution = pipeline.start()
pipeline_execution.wait(delay=20, max_attempts=24 * 60 * 3)  # max wait: 24 hours

Clean up

To avoid unnecessary charges and maintain a clean environment after running the demos outlined in this post, it’s important to delete all deployed resources. Follow these steps to properly clean up:

  1. To delete deployed endpoints, use the SageMaker console or the AWS SDK. This step is essential because endpoints can accrue significant charges if left running.
  2. Delete both SageMaker pipelines created during this walkthrough. This will help prevent residual executions that might generate additional costs.
  3. Remove all artifacts stored in your S3 buckets that were used for training, storing model artifacts, or logging. Make sure you delete only the resources related to this project to help avoid data loss.
  4. Clean up any additional resources. Depending on your implementation and any additional configurations, there may be other resources to consider, such as IAM roles, Amazon CloudWatch logs, or other AWS services. Identify and delete any resources that are no longer needed.

Conclusion

In this post, we explored how AutoMLV2 streamlines the process of fine-tuning FMs by automating the heavy lifting involved in model development. We demonstrated an end-to-end solution that uses SageMaker Pipelines to orchestrate the steps of data preparation, model training, evaluation, and deployment. The fmeval library played a crucial role in assessing the fine-tuned LLM’s performance, enabling us to select the best-performing model based on relevant metrics. By seamlessly integrating with the SageMaker infrastructure, AutoMLV2 simplified the deployment process, allowing us to create a SageMaker endpoint for real-time inference with just a few lines of code.

Get started by accessing the code on the GitHub repo to train and deploy your own custom AutoML models.

For more information on SageMaker Pipelines and SageMaker Autopilot, refer to Amazon SageMaker Pipelines and SageMaker Autopilot, respectively.


About the Author

headshotHajer Mkacher is a Solutions Architect at AWS, specializing in the Healthcare and Life Sciences industries. With over a decade in software engineering, she leverages generative AI to create innovative solutions, acting as a trusted advisor to her customers. In her free time, Hajer enjoys painting or working on creative robotics projects with her family.

Read More

Efficiency Meets Personalization: How AI Agents Improve Customer Service

Efficiency Meets Personalization: How AI Agents Improve Customer Service

Editor’s note: This post is the first in the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copilots. The series will also highlight the NVIDIA software and hardware powering advanced AI agents, which form the foundation of AI query engines that gather insights and perform tasks to transform everyday experiences and reshape industries.

Whether it’s getting a complex service claim resolved or having a simple purchase inquiry answered, customers expect timely, accurate responses to their requests.

AI agents can help organizations meet this need. And they can grow in scope and scale as businesses grow, helping keep customers from taking their business elsewhere.

AI agents can be used as virtual assistants, which use artificial intelligence and natural language processing to handle high volumes of customer service requests. By automating routine tasks, AI agents ease the workload on human agents, allowing them to focus on tasks requiring a more personal touch.

AI-powered customer service tools like chatbots have become table stakes across every industry looking to increase efficiency and keep buyers happy. According to a recent IDC study on conversational AI, 41% of organizations use AI-powered copilots for customer service and 60% have implemented them for IT help desks.

Now, many of those same industries are looking to adopt agentic AI, semi-autonomous tools that have the ability to perceive, reason and act on more complex problems.

How AI Agents Enhance Customer Service

A primary value of AI-powered systems is the time they free up by automating routine tasks. AI agents can perform specific tasks, or agentic operations, essentially becoming part of an organization’s workforce — working alongside humans who can focus on more complex customer issues.

AI agents can handle predictive tasks and problem-solve, can be trained to understand industry-specific terms and can pull relevant information from an organization’s knowledge bases, wherever that data resides.

With AI agents, companies can:

  • Boost efficiency: AI agents handle common questions and repetitive tasks, allowing support teams to prioritize more complicated cases. This is especially useful during high-demand periods.
  • Increase customer satisfaction: Faster, more personalized interactions result in happier and more loyal customers. Consistent and accurate support improves customer sentiment and experience.
  • Scale Easily: Equipped to handle high volumes of customer support requests, AI agents scale effortlessly with growing businesses, reducing customer wait times and resolving issues faster.

AI Agents for Customer Service Across Industries

AI agents are transforming customer service across sectors, helping companies enhance customer conversations, achieve high-resolution rates and improve human representative productivity.

For instance, ServiceNow recently introduced IT and customer service management AI agents to boost productivity by autonomously solving many employee and customer issues. Its agents can understand context, create step-by-step resolutions and get live agent approvals when needed.

To improve patient care and reduce preprocedure anxiety, The Ottawa Hospital is using AI agents that have consistent, accurate and continuous access to information. The agent has the potential to improve patient care and reduce administrative tasks for doctors and nurses.

The city of Amarillo, Texas, uses a multilingual digital assistant named Emma to provide its residents with 24/7 support. Emma brings more effective and efficient disbursement of important information to all residents, including the one-quarter who don’t speak English.

AI agents meet current customer service demands while preparing organizations for the future.

Key Steps for Designing AI Virtual Assistants for Customer Support

AI agents for customer service come in a wide range of designs, from simple text-based virtual assistants that resolve customer issues, to animated avatars that can provide a more human-like experience.

Digital human interfaces can add warmth and personality to the customer experience. These agents respond with spoken language and even animated avatars, enhancing service interactions with a touch of real-world flair. A digital human interface lets companies customize the assistant’s appearance and tone, aligning it with the brand’s identity.

There are three key building blocks to creating an effective AI agent for customer service:

  • Collect and organize customer data: AI agents need a solid base of customer data (such as profiles, past interactions, and transaction histories) to provide accurate, context-aware responses.
  • Use memory functions for personalization: Advanced AI systems remember past interactions, allowing agents to deliver personalized support that feels human.
  • Build an operations pipeline: Customer service teams should regularly review feedback and update the AI agent’s responses to ensure it’s always improving and aligned with business goals.

Powering AI Agents With NVIDIA NIM Microservices

NVIDIA NIM microservices power AI agents by enabling natural language processing, contextual retrieval and multilingual communication. This allows AI agents to deliver fast, personalized and accurate support tailored to diverse customer needs.

Key NVIDIA NIM microservices for customer service agents include:

NVIDIA NIM for Large Language Models — Microservices that bring advanced language models to applications and enable complex reasoning, so AI agents can understand complicated customer queries.

NVIDIA NeMo Retriever NIM — Embedding and reranking microservices that support retrieval-augmented generation pipelines allow virtual assistants to quickly access enterprise knowledge bases and boost retrieval performance by ranking relevant knowledge-base articles and improving context accuracy.

NVIDIA NIM for Digital Humans — Microservices that enable intelligent, interactive avatars to understand speech and respond in a natural way. NVIDIA Riva NIM microservices for text-to-speech, automatic speech recognition (ASR), and translation services enable AI agents to communicate naturally across languages. The recently released Riva NIM microservices for ASR enable additional multilingual enhancements. To build realistic avatars, Audio2Face NIM converts streamed audio to facial movements for real-time lip syncing. 2D and 3D Audio2Face NIM microservices support varying use cases.

Getting Started With AI Agents for Customer Service

NVIDIA AI Blueprints make it easy to start building and setting up virtual assistants by offering ready-made workflows and tools to accelerate deployment. Whether for a simple AI-powered chatbot or a fully animated digital human interface, the blueprints offer resources to create AI assistants that are scalable, aligned with an organization’s brand and deliver a responsive, efficient customer support experience.

Editor’s note: IDC figures are sourced to IDC, Market Analysis Perspective: Worldwide Conversational AI Tools and Technologies, 2024 US51619524, Sept 2024

Read More

Into the Omniverse: How Generative AI Fuels Personalized, Brand-Accurate Visuals With OpenUSD

Into the Omniverse: How Generative AI Fuels Personalized, Brand-Accurate Visuals With OpenUSD

Editor’s note: This post is part of Into the Omniverse, a blog series focused on how developers, 3D artists and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

3D product configurators are changing the way industries like retail and automotive engage with customers by offering interactive, customizable 3D visualizations of products.

Using physically accurate product digital twins, even non-3D artists can streamline content creation and generate stunning marketing visuals.

With the new NVIDIA Omniverse Blueprint for 3D conditioning for precise visual generative AI, developers can start using the NVIDIA Omniverse platform and Universal Scene Description (OpenUSD) to easily build personalized, on-brand and product-accurate marketing content at scale.

By integrating generative AI into product configurators, developers can optimize operations and reduce production costs. With repetitive tasks automated, teams can focus on the creative aspects of their jobs.

Developing Controllable Generative AI for Content Production

The new Omniverse Blueprint introduces a robust framework for integrating generative AI into 3D workflows to enable precise and controlled asset creation.

Example images created using the NVIDIA Omniverse Blueprint for 3D conditioning for precise visual generative AI.

Key highlights of the blueprint include:

  • Model conditioning to ensure that the AI-generated visuals adhere to specific brand requirements like colors and logos.
  • Multimodal approach that combines 3D and 2D techniques to offer developers complete control over final visual outputs while ensuring the product’s digital twin remains accurate.
  • Key components such as an on-brand hero asset, a simple and untextured 3D scene, and a customizable application built with the Omniverse Kit App Template.
  • OpenUSD integration to enhance development of 3D visuals with precise visual generative AI.
  • Integration of NVIDIA NIM, such as the Edify 360 NIM, Edify 3D NIM, USD Code NIM and USD Search NIM microservices, allows the blueprint to be extensible and customizable. The microservices are available to preview on build.nvidia.com.

How Developers Are Building AI-Enabled Content Pipelines

Katana Studio developed a content creation tool with OpenUSD called COATcreate that empowers marketing teams to rapidly produce 3D content for automotive advertising. By using 3D data prepared by creative experts and vetted by product specialists in OpenUSD, even users with limited artistic experience can quickly create customized, high-fidelity, on-brand content for any region or use case without adding to production costs.

Global marketing leader WPP has built a generative AI content engine for brand advertising with OpenUSD. The Omniverse Blueprint for precise visual generative AI helped facilitate the integration of controllable generative AI in its content creation tools. Leading global brands like The Coca-Cola Company are already beginning to adopt tools from WPP to accelerate iteration on its creative campaigns at scale.

Watch the replay of a recent livestream with WPP for more on its generative AI- and OpenUSD-enabled workflow:

The NVIDIA creative team developed a reference workflow called CineBuilder on Omniverse that allows companies to use text prompts to generate ads personalized to consumers based on region, weather, time of day, lifestyle and aesthetic preferences.

Developers at independent software vendors and production services agencies are building content creation solutions infused with controllable generative AI and built on OpenUSD. Accenture Song, Collective World, Grip, Monks and WPP are among those adopting Omniverse Blueprints to accelerate development.

Read the tech blog on developing product configurators with OpenUSD and get started developing solutions using the DENZA N7 3D configurator and CineBuilder reference workflow.

Get Plugged Into the World of OpenUSD

Various resources are available to help developers get started building AI-enabled product configuration solutions:

For more on optimizing OpenUSD workflows, explore the new Learn OpenUSD training curriculum that includes free Deep Learning Institute courses for 3D practitioners and developers. For more resources on OpenUSD, explore the Alliance for OpenUSD forum and visit the AOUSD website.

Don’t miss the CES keynote delivered by NVIDIA founder and CEO Jensen Huang live in Las Vegas on Monday, Jan. 6, at 6:30 p.m. PT for more on the future of AI and graphics.

Stay up to date by subscribing to NVIDIA news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.

Read More

First ‘Star Wars Outlaws’ Story Pack Hits GeForce NOW

First ‘Star Wars Outlaws’ Story Pack Hits GeForce NOW

Get ready to dive deeper into the criminal underworld of a galaxy far, far away as GeForce NOW brings the first major story pack for Star Wars Outlaws to the cloud this week.

The season of giving continues — GeForce NOW members can access a new free reward: a special in-game Star Wars Outlaws enhancement.

It’s all part of an exciting GFN Thursday, topped with five new games joining the more than 2,000 titles supported in the GeForce NOW library, including the launch of S.T.A.L.K.E.R. 2: Heart of Chornobyl and Xbox Gaming Studios fan favorites Fallout 3: Game of the Year Edition and The Elder Scrolls IV: Oblivion.

And make sure not to pass this opportunity up — gamers who want to take the Performance and Ultimate memberships for a spin can do so with 25% off Day Passes, now through Friday, Nov. 22. Day Passes give access to 24 continuous hours of powerful cloud gaming.

A New Saga Begins

The galaxy’s most electrifying escapade gets even more exciting with the new Wild Card story pack for Star Wars Outlaws.

This thrilling story pack invites scoundrels to join forces with the galaxy’s smoothest operator, Lando Calrissian, for a high-stakes Sabacc tournament that’ll keep players on the edge of their seats. As Kay Vess, gamers bluff, charm and blast their way through new challenges, exploring uncharted corners of the Star Wars galaxy. Meanwhile, a free update will scatter fresh Contract missions across the stars, offering members ample opportunities to build their reputations and line their pockets with credits.

To kick off this thrilling underworld adventure, GeForce NOW members are in for a special reward with the Forest Commando Character Pack.

Star Wars Outlaws Wild Card DLC on GeForce NOW
Time to get wild.

The pack gives Kay and Nix, her loyal companion, a complete set of gear that’s perfect for missions in lush forest worlds. Get equipped with tactical trousers, a Bantha leather belt loaded with attachments, a covert poncho to shield against jungle rain and a hood for Nix that’s great for concealment in thick forests.

Members of the GeForce NOW rewards program can check their email for instructions on how to claim the reward. Ultimate and Performance members can start redeeming style packages today. Don’t miss out — this offer is available through Saturday, Dec. 21, on a first-come, first-served basis.

Welcome to the Zone

STALKER 2 on GeForce NOW
Welcome to the zone.

S.T.A.L.K.E.R. 2: Heart of Chornobyl, the highly anticipated sequel in the cult-classic S.T.A.L.K.E.R. series, is a first-person-shooter survival-horror game set in the Chornobyl Exclusion Zone.

In the game — which blends postapocalyptic fiction with Ukrainian folklore and the eerie reality of the Chornobyl disaster — players can explore a vast open world filled with mutated creatures, anomalies and other stalkers while uncovering the zone’s secrets and battling for survival.

The title features advanced graphics and physics powered by Unreal Engine 5 for stunningly realistic and detailed environments. Players’ choices impact the game world and narrative, which comprises a nonlinear storyline with multiple possible endings.

Players will take on challenging survival mechanics to test their skills and decision-making abilities. Members can make their own epic story with a Performance membership for enhanced GeForce RTX-powered streaming at 1440p or an Ultimate membership for up to 4K 120 frames per second streaming, offering the crispest visuals and smoothest gameplay.

Adventures Await

Fallout 3 GOTY on GeForce NOW
Vault 101 has opened.

Members can emerge from Vault 101 into the irradiated ruins of Washington, D.C., in Fallout 3: Game of the Year Edition, which includes all five downloadable content packs released for Fallout 3. Experience the game that redefined the postapocalyptic genre with its morally ambiguous choices, memorable characters and the innovative V.A.T.S. combat system. Whether revisiting the Capital Wasteland, exploring the Mojave Desert or delving into the realm of Cyrodiil, these iconic titles have never looked or played better thanks to the power of GeForce NOW’s cloud streaming technology.

Members can look for the following games available to stream in the cloud this week:

  • Towers of Aghasba (New release on Steam, Nov. 19)
  • S.T.A.L.K.E.R. 2: Heart of Chornobyl (New release on Steam and Xbox, available on PC Game Pass, Nov. 20)
  • Star Wars Outlaws (New release on Steam, Nov. 21)
  • The Elder Scrolls IV: Oblivion Game of the Year Edition (Epic Games Store, Steam and Xbox, available on PC Game Pass)
  • Fallout 3: Game of the Year Edition (Epic Games Store, Steam and Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Rebellions logo

Rebellions Joins the PyTorch Foundation as a General Member

Rebellions logo

The PyTorch Foundation, a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem, is announcing today that Rebellions has joined as a general member.

Rebellions is a South Korea-based semiconductor company specializing in the design and development of AI chips for data centers and edge devices. Their innovative hardware and software solutions aim to accelerate generative AI and machine learning workloads, focusing on high energy efficiency and performance. The company successfully launched and deployed its AI chip ‘ATOM’ targeting data centers in 2023 and is developing its next-generation AI accelerator ‘REBEL’.

“We’re thrilled to welcome Rebellions as a new general member of the PyTorch Foundation,” said Matt White, Executive Director of the PyTorch Foundation. “Rebellions brings a unique perspective to the PyTorch ecosystem with their focus on advancing the integration of NPU architectures for AI acceleration with PyTorch. Their expertise will play a vital role in ensuring PyTorch continues to evolve as a versatile framework, accommodating the diverse needs of modern AI workloads. We look forward to collaborating with Rebellions to drive innovation and strengthen the PyTorch ecosystem for developers worldwide.”

Rebellions has introduced native support for PyTorch 2.0 in their RBLN SDK. This integration includes compatibility with torch.compile, a pivotal feature of PyTorch 2.0 that enhances model performance. Through this development, Rebellions has empowered developers to seamlessly harness the full potential of their AI accelerator lineup within the environment.

Rebellions is also deeply committed to advancing the PyTorch ecosystem through collaborative innovation starting in Korea. The company has established a Special Interest Group (SIG) focusing on Pytorch Core within the PyTorch Korea community and is actively working with volunteers recruited through MODULABS, an open research institute, to integrate native support for the deep learning framework into their Neural Processing Unit (NPU).

In addition, Rebellions is collaborating with academic institutions, such as Yonsei University, Hanyang University, University of Science & Technology (UST) and national agencies, such as the Electronics and Telecommunications Research Institute (ETRI), to offer undergraduate and graduate courses on PyTorch and enable them to leverage Pytorch as their research platform.

These initiatives highlight Rebellions’ dedication to optimizing the PyTorch experience for developers and researchers alike, while also fostering education and innovation in the field.

“By integrating our hardware innovations with PyTorch, we’re building Native NPU support to accelerate diverse AI workloads.” said Hong-seok Kim, the Chief Software Architect at Rebellions. “We’re excited to contribute to the PyTorch community by community-driven initiatives and partnerships, advancing NPU architecture support for next-generation AI solutions. Together with the PyTorch community, we aim to pioneer new possibilities in AI acceleration and empower developers worldwide with efficient computing solutions.”

To learn more about how your organization can be a part of the PyTorch Foundation, visit our website.

About Rebellions

Rebellions is a South Korea-based semiconductor company specializing in the design and development of AI chips for data centers and edge devices. Their innovative hardware and software solutions aim to accelerate generative AI and machine learning workloads, focusing on high energy efficiency and performance. The company successfully launched and deployed its AI chip ‘ATOM’ targeting data centers in 2023 and is developing its next-generation AI accelerator ‘REBEL’ incorporating a scalable chiplet architecture and high-bandwidth memory.

About PyTorch Foundation

The PyTorch Foundation is a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem. The PyTorch Foundation is supported by its members and leading contributors to the PyTorch open source project. The Foundation leverages resources provided by members and contributors to enable community discussions and collaboration.

About The Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, ONAP, PyTorch, RISC-V, SPDX, OpenChain, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.

Read More