Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

Using Amazon Q Business with AWS HealthScribe to gain insights from patient consultations

With the advent of generative AI and machine learning, new opportunities for enhancement became available for different industries and processes. During re:Invent 2023, we launched AWS HealthScribe, a HIPAA eligible service that empowers healthcare software vendors to build their clinical applications to use speech recognition and generative AI to automatically create preliminary clinician documentation. In addition to AWS HealthScribe, we also launched Amazon Q Business, a generative AI-powered assistant that can perform functions such as answer questions, provide summaries, generate content, and securely complete tasks based on data and information that are in your enterprise systems.

AWS HealthScribe combines speech recognition and generative AI trained specifically for healthcare documentation to accelerate clinical documentation and enhance the consultation experience.

Key features of AWS HealthScribe include:

  • Rich consultation transcripts with word-level timestamps.
  • Speaker role identification (clinician or patient).
  • Transcript segmentation into relevant sections such as subjective, objective, assessment, and plan.
  • Summarized clinical notes for sections such as chief complaint, history of present illness, assessment, and plan.
  • Evidence mapping that references the original transcript for each sentence in the AI-generated notes.
  • Extraction of structured medical terms for entries such as conditions, medications, and treatments.

AWS HealthScribe provides a suite of AI-powered features to streamline clinical documentation while maintaining security and privacy. It doesn’t retain audio or output text, and users have control over data storage with encryption in transit and at rest.

With Amazon Q Business, we provide a new generative AI-powered assistant designed specifically for business and workplace use cases. It can be customized and integrated with an organization’s data, systems, and repositories. Amazon Q allows users to have conversations, help solve problems, generate content, gain insights, and take actions through its AI capabilities. Amazon Q offers user-based pricing plans tailored to how the product is used. It can adapt interactions based on individual user identities, roles, and permissions within the organization. Importantly, AWS never uses customer content from Amazon Q to train its underlying AI models, making sure that company information remains private and secure.

In this blog post, we’ll show you how AWS HealthScribe and Amazon Q Business together analyze patient consultations to provide summaries and trends from clinician conversations, simplifying documentation workflows. This automation and use of machine learning from clinician-patient interactions with Amazon HealthScribe and Amazon Q can help improve patient outcomes by enhancing communication, leading to more personalized care for patients and increased efficiency for clinicians.

Benefits and use cases

Gaining insight from patient-clinician interactions alongside a chatbot can help in a variety of ways such as:

  1. Enhanced communication: In analyzing consultations, clinicians using AWS HealthScribe can more readily identify patterns and trends in large patient datasets, which can help improve communication between clinicians and patients. An example would be a clinician understanding common trends in their patient’s symptoms that they can then consider for new consultations.
  2. Personalized care: Using machine learning, clinicians can tailor their care to individual patients by analyzing the specific needs and concerns of each patient. This can lead to more personalized and effective care.
  3. Streamlined workflows: Clinicians can use machine learning to help streamline their workflows by automating tasks such as appointment scheduling and consultation summarization. This can give clinicians more time to focus on providing high-quality care to their patients. An example would be using clinician summaries together with agentic workflows to perform these tasks on a routine basis.

Architecture diagram

Architecture diagram of the workflow which includes AWS IAM Identity Center, Amazon Q Business, Amazon Simple Storage Service, and AWS HealthScribe

In the architecture diagram we present for this demo, two user workflows are shown. To kickoff the process, a clinician uploads the recording of a consultation to Amazon Simple Storage Service (Amazon S3). This audio file is then ingested by AWS HealthScribe and used to analyze consultation conversations. AWS HealthScribe will then output two files which are also stored on Amazon S3. In the second workflow, an authenticated user logs in via AWS IAM Identity Center to an Amazon Q web front end hosted by Amazon Q Business. In this scenario, Amazon Q Business is given the output Amazon S3 bucket as the data source for use in its web app.

Prerequisites

Implementation

To start using AWS HealthScribe you must first start a transcription job that takes a source audio file and outputs summary and transcription JSON files with the analyzed conversation. You’ll then connect these output files to Amazon Q.

Creating the AWS HealthScribe job

  1. In the AWS HealthScribe console, choose Transcription jobs in the navigation pane, and then choose Create job to get started.Screenshot of AWS HealthScribe on the console and the button to create a job
  2. Enter a name for the job—in this example, we use FatigueConsult—and select the S3 bucket where the audio file of the clinician-patient conversation is stored.Screenshot of AWS HealthScribe and how to choose the S3 bucket for the input files
  3. Next, use the S3 URI search field to find and point the transcription job to the Amazon S3 bucket you want the output files to be saved to. Maintain the default options for audio settings, customization, and content removal.
  4. Create a new AWS Identity and Access Management (IAM) role for AWS HealthScribe to use for access to the S3 input and output buckets by choosing Create an IAM role. In our example, we entered HealthScribeRole as the Role name. To complete the job creation, choose Create job.Screenshot of AWS HealthScribe and how to set up access permissions
  5. This will take a few minutes to finish. When it’s complete, you will see the status change from In Progress to Complete and can inspect the results by selecting the job name.
  6. AWS HealthScribe will create two files: a word-for-word transcript of the conversation with the suffix /transcript.json and a summary of the conversation with the suffix /summary.json. This summary uses the underlying power of generative AI to highlight key topics in the conversation, extract medical terminology, and more.

In this workflow, AWS HealthScribe analyzes the patient-clinician conversation audio to:

  1. Transcribe the consultation
  2. Identify speaker roles (for example, clinician and patient)
  3. Segment the transcript (for example, small talk, visit flow management, assessment, and treatment plan)
  4. Extract medical terms (for example, medication name and medical condition name)
  5. Summarize notes for key sections of the clinical document (for example, history of present illness and treatment plan)
  6. Create evidence mapping (linking every sentence in the AI-generated note with corresponding transcript dialogues).

Connecting an AWS HealthScribe job to Amazon Q

To use Amazon Q with the summarized notes and transcripts from AWS HealthScribe, we need to first create an Amazon Q business application and set the data source as the S3 bucket where the output files were stored in the HealthScribe jobs workflow. This will allow Amazon Q to index the files and give users the ability to ask questions of the data.

  1. In the Amazon Q Business console, choose Get Started, then choose Create Application.
  2. Enter a name for your application and select Create and use a new service-linked role (SLR).Screenshot of Q Business app creation and access permissions
  3. Choose Create when you’re ready to select a data source.
  4. In the Add data source pane select Amazon S3.Screenshot of which data source to configure for the application.
  5. To configure the S3 bucket with Amazon Q, enter a name for the data source. In our example we use my-s3-bucket.Screenshot of adding the data source (Amazon S3) for Q Business
  6. Next, locate the S3 bucket with the JSON outputs from HealthScribe using the Browse S3 button. Select Full sync for the sync mode and select a cadence of your preference. Once you complete these steps, Amazon Q Business will run a full sync of the objects in your S3 bucket and be ready for use.Screenshot of which parameters to change in the Sync scope and Sync mode option for Q Business
  7. In the main applications dashboard, navigate to the URL under Web experience URL. This is how you will access the Amazon Q web front end to interact with the assistant.Screenshot of where to find the web experience URL front end once the application has been created successfully.

 After a user signs in to the web experience, they can start asking questions directly in the chat box as shown in the sample frontend that follows.

Sample frontend workflow

With the AWS HealthScribe results integrated into Amazon Q Business, users can go to the web experience to gain insights from their patient conversations. For example, you can use Q to determine information such as trends in patient symptoms, checking which medications patients are taking and so on as shown in the following figures.

The workflow starts with a question and answer about issues patients had, as shown in the following figure. Example of the frontend workflow asking what symptoms patients had with stomach painIn the example above, a clinician is asking what the symptoms were of patients who complained of stomach pain. Q responds with common symptoms, like bloating and bowel problems, from the data it has access to. The answers generated cite the source files from Amazon S3 that led to its summary and can be inspected by choosing Sources.

In the following example, a clinician asks what medications patients with knee pain are taking. Using our sample data of various consultations for knee pain, Q tells us patients are taking over the counter ibuprofen, but that it is not often providing patients relief.

This application can also help clinicians understand common trends in their patient data, such as asking what the common symptoms are for patients with chest pain.

Example of the frontend workflow asking what are the most common symptoms in patients that have chest painIn the final example for this post, a clinician asks Q if there are common symptoms for patients complaining of knee and elbow pain. Q responds that both sets of patients describe their pain being exacerbated by movement, but that it cannot conclusively point to any common symptoms across both consultation types. In this case Amazon Q is correctly using source data to prevent a hallucination from occurring.Example of the frontend workflow asking if there are any common symptoms between patients with knee pain and elbow pain

Considerations

The UI for Amazon Q has limited customization. At the time of writing this post, the Amazon Q frontend cannot be embedded in other tools. Supported customization of the web experience includes the addition of a title and subtitle, adding a welcome message, and displaying sample prompts. For updates on web experience customizations, see Customizing an Amazon Q Business web experience. If this kind of customization is critical to your application and business needs, you can explore custom large language model chatbot designs using Amazon Bedrock or Amazon SageMaker.

AWS HealthScribe uses conversational and generative AI to transcribe patient-clinician conversations and generate clinical notes. The results produced by AWS HealthScribe are probabilistic and might not always be accurate because of various factors, including audio quality, background noise, speaker clarity, the complexity of medical terminology, and context-specific language nuances. AWS HealthScribe is designed to be used in an assistive role for clinicians and medical scribes rather than as a substitute for their clinical expertise. As such, AWS HealthScribe output should not be employed to fully automate clinical documentation workflows, but rather to provide additional assistance to clinicians or medical scribes in their documentation process. Please ensure that your application provides the workflow for reviewing the clinical notes produced by AWS HealthScribe and establishes expectation of the need for human review before finalizing clinical notes.

Amazon Q Business uses machine learning models that generate predictions based on patterns in data, and generate insights and recommendations from your content. Outputs are probabilistic and should be evaluated for accuracy as appropriate for your use case, including by employing human review of the output. You and your users are responsible for all decisions made, advice given, actions taken, and failures to take action based on your use of these features.

This proof-of-concept can be extrapolated to create a patient-facing application as well, with the notion that a patient can review their own conversations with physicians and be given access to their medical records and consultation notes in a way that makes it easy for them to ask questions of the trends and data for their own medical history.

AWS HealthScribe is only available for English-US language at this time in the US East (N. Virginia) Region. Amazon Q Business is only available in US East (N. Virginia) and US West (Oregon).

Clean up

To ensure that you don’t continue to accrue charges from this solution, you must complete the following clean-up steps.

AWS HealthScribe

Navigate to the AWS HealthScribe the console and choose Transcription jobs. Select whichever HealthScribe jobs you want to clean up and choose Delete at the top right corner of the console page.

Amazon S3

To clean up your Amazon S3 resources, navigate to the Amazon S3 console and choose the buckets that you used or created while going through this post. To empty the buckets, follow the instructions for Emptying a bucket. After you empty the bucket, you delete the entire bucket.

Amazon Q Business

To delete your Amazon Q Business application, follow the instructions on Managing Amazon Q Business applications.

Conclusion

In this post, we discussed how you can use AWS HealthScribe with Amazon Q Business to create a chatbot to quickly gain insights into patient clinician conversations. To learn more, reach out to your AWS account team or check out the links that follow.


About the Authors

Laura Salinas is a Startup Solution Architect supporting customers whose core business involves machine learning. She is passionate about guiding her customers on their cloud journey and finding solutions that help them innovate. Outside of work she loves boxing, watching the latest movie at the theater and playing competitive dodgeball.

Tiffany Chen is a Solutions Architect on the CSC team at AWS. She has supported AWS customers with their deployment workloads and currently works with Enterprise customers to build well-architected and cost-optimized solutions. In her spare time, she enjoys traveling, gardening, baking, and watching basketball.

Art Tuazon is a Partner Solutions Architect focused on enabling AWS Partners through technical best practices and is passionate about helping customers build on AWS. In her free time, she enjoys running and cooking.

Winnie Chen is a Solutions Architect currently on the CSC team at AWS supporting greenfield customers. She supports customers of all industries as well as sizes such as enterprise and small to medium businesses. She has helped customers migrate and build their infrastructure on AWS. In her free time, she enjoys traveling and spending time outdoors through activities like hiking, biking and rock climbing.

Read More

Use Amazon SageMaker Studio with a custom file system in Amazon EFS

Use Amazon SageMaker Studio with a custom file system in Amazon EFS

Amazon SageMaker Studio is the latest web-based experience for running end-to-end machine learning (ML) workflows. SageMaker Studio offers a suite of integrated development environments (IDEs), which includes JupyterLab, Code Editor, as well as RStudio. Data scientists and ML engineers can spin up SageMaker Studio private and shared spaces, which are used to manage the storage and resource needs of the JupyterLab and Code Editor applications, enable stopping the applications when not in use to save on compute costs, and resume the work from where they stopped.

The storage resources for SageMaker Studio spaces are Amazon Elastic Block Store (Amazon EBS) volumes, which offer low-latency access to user data like notebooks, sample data, or Python/Conda virtual environments. However, there are several scenarios where using a distributed file system shared across private JupyterLab and Code Editor spaces is convenient, which is enabled by configuring an Amazon Elastic File System (Amazon EFS) file system in SageMaker Studio. Amazon EFS provides a scalable fully managed elastic NFS file system for AWS compute instances.

Amazon SageMaker supports automatically mounting a folder in an EFS volume for each user in a domain. Using this folder, users can share data between their own private spaces. However, users can’t share data with other users in the domain; they only have access to their own folder user-default-efs in the $HOME directory of the SageMaker Studio application.

In this post, we explore three distinct scenarios that demonstrate the versatility of integrating custom Amazon EFS with SageMaker Studio.

For further information on configuring Amazon EFS in SageMaker Studio, refer to Attaching a custom file system to a domain or user profile.

Solution overview

In the first scenario, an AWS infrastructure admin wants to set up an EFS file system that can be shared across the private spaces of a given user profile in SageMaker Studio. This means that each user within the domain will have their own private space on the EFS file system, allowing them to store and access their own data and files. The automation described in this post will enable new team members joining the data science team can quickly set up their private space on the EFS file system and access the necessary resources to start contributing to the ongoing project.

The following diagram illustrates this architecture.

First scenario architecture

This scenario offers the following benefits:

  • Individual data storage and analysis – Users can store their personal datasets, models, and other files in their private spaces, allowing them to work on their own projects independently. Segregation is made by their user profile.
  • Centralized data management – The administrator can manage the EFS file system centrally, maintaining data security, backup, and direct access for all users. By setting up an EFS file system with a private space, users can effortlessly track and maintain their work.
  • Cross-instance file sharing – Users can access their files from multiple SageMaker Studio spaces, because the EFS file system provides a persistent storage solution.

The second scenario is related to the creation of a single EFS directory that is shared across all the spaces of a given SageMaker Studio domain. This means that all users within the domain can access and use the same shared directory on the EFS file system, allowing for better collaboration and centralized data management (for example, to share common artifacts). This is a more generic use case, because there is no specific segregated folder for each user profile.

The following diagram illustrates this architecture.

Second scenario architecture

This scenario offers the following benefits:

  • Shared project directories – Suppose the data science team is working on a large-scale project that requires collaboration among multiple team members. By setting up a shared EFS directory at project level, the team can collaborate on the same projects by accessing and working on files in the shared directory. The data science team can, for example, use the shared EFS directory to store their Jupyter notebooks, analysis scripts, and other project-related files.
  • Simplified file management – Users don’t need to manage their own private file storage, because they can rely on the shared directory for their file-related needs.
  • Improved data governance and security – The shared EFS directory, being centrally managed by the AWS infrastructure admin, can provide improved data governance and security. The admin can implement access controls and other data management policies to maintain the integrity and security of the shared resources.

The third scenario explores the configuration of an EFS file system that can be shared across multiple SageMaker Studio domains within the same VPC. This allows users from different domains to access and work with the same set of files and data, enabling cross-domain collaboration and centralized data management.

The following diagram illustrates this architecture.

Third scenario architecture

This scenario offers the following benefits:

  • Enterprise-level data science collaboration – Imagine a large organization with multiple data science teams working on various projects across different departments or business units. By setting up a shared EFS file system accessible across the organization’s SageMaker Studio domains, these teams can collaborate on cross-functional projects, share artifacts, and use a centralized data repository for their work.
  • Shared infrastructure and resources – The EFS file system can be used as a shared resource across multiple SageMaker Studio domains, promoting efficiency and cost-effectiveness.
  • Scalable data storage – As the number of users or domains increases, the EFS file system automatically scales to accommodate the growing storage and access requirements.
  • Data governance – The shared EFS file system, being managed centrally, can be subject to stricter data governance policies, access controls, and compliance requirements. This can help the organization meet regulatory and security standards while still enabling cross-domain collaboration and data sharing.

Prerequisites

This post provides an AWS CloudFormation template to deploy the main resources for the solution. In addition to this, the solution expects that the AWS account in which the template is deployed already has the following configuration and resources:

Refer to Attaching a custom file system to a domain or user profile for additional prerequisites.

Configure an EFS directory shared across private spaces of a given user profile

In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, creating a private file system directory for each user. We can distinguish two use cases:

  • Create new SageMaker Studio user profiles – A new team member joins a preexisting SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces
  • Use preexisting SageMaker Studio user profiles – A team member is already working on a specific SageMaker Studio domain and wants to attach a custom EFS file system to the JupyterLab or Code Editor spaces

The solution provided in this post focuses on the first use case. We discuss how to adapt the solution for preexisting SageMaker Studio domain user profiles later in this post.

The following diagram illustrates the high-level architecture of the solution.

AWS Architecture

In this solution, we use CloudTrail, Amazon EventBridge, and Lambda to automatically create a private EFS directory when a new SageMaker Studio user profile is created. The high-level steps to set up this architecture are as follows:

  1. Create an EventBridge rule that invokes the Lambda function when a new SageMaker user profile is created and logged in CloudTrail.
  2. Create an EFS file system with an access point for the Lambda function and with a mount target in every Availability Zone that the SageMaker Studio domain is located.
  3. Use a Lambda function to create a private EFS directory with the required POSIX permissions for the profile. The function will also update the profile with the new file system configuration.

Deploy the solution using AWS CloudFormation

To use the solution, you can deploy the infrastructure using the following CloudFormation template. This template deploys three main resources in your account: Amazon EFS resources (file system, access points, mount targets), an EventBridge rule, and a Lambda function.

Refer to Create a stack from the CloudFormation console for additional information. The input parameters for this template are:

  • SageMakerDomainId – The SageMaker Studio domain ID that will be associated with the EFS file system.
  • SageMakerStudioVpc – The VPC associated to the SageMaker Studio domain.
  • SageMakerStudioSubnetId – One or multiple subnets associated to the SageMaker Studio domain. The template deploys its resources in these subnets.
  • SageMakerStudioSecurityGroupId – The security group associated to the SageMaker Studio domain. The template configures the Lambda function with this security group.

Amazon EFS resources

After you deploy the template, navigate to the Amazon EFS console and confirm that the EFS file system has been created. The file system has a mount target in every Availability Zone that your SageMaker domain connects to.

Note that each mount target uses the EC2 security group that SageMaker created in your AWS account when you first created the domain, which allows NFS traffic at port 2049. The provided template automatically retrieves this security group when it is first deployed, using a Lambda backed custom resource.

You can also observe that the file system has an EFS access point. This access point grants root access on the file system for the Lambda function that will create the directories for the SageMaker Studio user profiles.

EventBridge rule

The second main resource is an EventBridge rule invoked when a new SageMaker Studio user profile is created. Its target is the Lambda function that creates the folder in the EFS file system and updates the profile that has been just created. The input of the Lambda function is the event matched, where you can get the SageMaker Studio domain ID and the SageMaker user profile name.

Lambda function

Lastly, the template creates a Lambda function that creates a directory in the EFS file system with the required POSIX permissions for the user profile and updates the user profile with the new file system configuration.

At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

  • UID – The POSIX user ID. The default is 200001. A valid range is a minimum value of 10000 and maximum value of 4000000.
  • GID – The POSIX group ID. The default is 1001. A valid range is a minimum value of 1001 and maximum value of 4000000.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created.

Lambda function configuration

Adapt the solution for preexisting SageMaker Studio domain user profiles

We can reuse the previous solution for scenarios in which the domain already has user profiles created. For that, you can create an additional Lambda function in Python that lists all the user profiles for the given SageMaker Studio domain and creates a dedicated EFS directory for each user profile.

The Lambda function should be in the same VPC as the EFS file system and it has attached the file system and access point previously created. You need to add the efs_id and domain_id values as environment variables for the function.

You can include the following code as part of this new Lambda function and run it manually:

import json
import subprocess
import boto3
import os

sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    # Get EFS and Domain ID
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']    
    
    
    # Get Domain user profiles
    list_user_profiles_response = sm_client.list_user_profiles(
        DomainIdEquals=domain_id
    )
    domain_users = list_user_profiles_response["UserProfiles"]
    
    # Create directories for each user
    for user in domain_users:

        user_profile_name = user["UserProfileName"]

        # Permissions
        repository=f'/mnt/efs/{user_profile_name}'
        subprocess.call(['mkdir', repository])
        subprocess.call(['chown', '200001:1001', repository])
        
        # Update SageMaker user
        response = sm_client.update_user_profile(
            DomainId=domain_id,
            UserProfileName=user_profile_name,
            UserSettings={
                'CustomFileSystemConfigs': [
                    {
                        'EFSFileSystemConfig': {
                            'FileSystemId': file_system,
                            'FileSystemPath': f'/{user_profile_name}'
                        }
                    }
                ]
            }
        )

Configure an EFS directory shared across all spaces of a given domain

In this scenario, an administrator wants to provision an EFS file system for all users of a SageMaker Studio domain, using the same file system directory for all the users.

To achieve this, in addition to the prerequisites described earlier in this post, you need to complete the following steps.

Create the EFS file system

The file system needs to be in the same VPC as the SageMaker Studio domain. Refer to Creating EFS file systems for additional information.

Add mount targets to the EFS file system

Before SageMaker Studio can access the new EFS file system, the file system must have a mount target in each of the subnets associated with the domain. For more information about assigning mount targets to subnets, see Managing mount targets. You can get the subnets associated to the domain on the SageMaker Studio console under Network. You need to create a mount target for each subnet.

Networking used

Additionally, for each mount target, you must add the security group that SageMaker created in your AWS account when you created the SageMaker Studio domain. The security group name has the format security-group-for-inbound-nfs-domain-id.

The following screenshot shows an example of an EFS file system with two mount targets for a SageMaker Studio domain associated to two subnets. Note the security group associated to both mount targets.

EFS file system

Create an EFS access point

The Lambda function accesses the EFS file system as root using this access point. See Creating access points for additional information.

EFS access point

Create a new Lambda function

Define a new Lambda function with the name LambdaManageEFSUsers. This function updates the default space settings of the SageMaker Studio domain, configuring the file system settings to use a specific EFS file system shared repository path. This configuration is automatically applied to all spaces within the domain.

The Lambda function is in the same VPC as the EFS file system and it has attached the file system and access point previously created. Additionally, you need to add efs_id and domain_id as environment variables for the function.

At a POSIX permissions level, you can control which users can access the file system and which files or data they can access. The POSIX user and group ID for SageMaker apps are:

  • UID – The POSIX user ID. The default is 200001.
  • GID – The POSIX group ID. The default is 1001.

The function updates the default space settings of the SageMaker Studio domain, configuring the EFS file system to be used by all users. See the following code:

import json
import subprocess
import boto3
import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)
sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    # Environment variables
    file_system=os.environ['efs_id']
    domain_id=os.environ['domain_id']
    
    # EFS directory name
    repository_name='shared_repository'
    repository=f'/mnt/efs/{repository_name}'
            
    # Add permissions to the new directory
    try:
        subprocess.call(['mkdir -p', repository])
        subprocess.call(['chown', '200001:1001', repository])
    except:
        print("Repository already created")
    
    # Update Sagemaker domain to enable access to the new directory
    response = sm_client.update_domain(
        DomainId=domain_id,
        DefaultUserSettings={
            'CustomFileSystemConfigs': [
                {
                    'EFSFileSystemConfig': {
                        'FileSystemId': file_system,
                        'FileSystemPath': f'/{repository_name}'
                    }
                }
            ]
        }
    )
    logger.info(f"Updated Studio Domain {domain_id} and EFS {file_system}")
    return {
        'statusCode': 200,
        'body': json.dumps(f"Created dir and modified permissions for Studio Domain {domain_id}")
    }

The execution role of the Lambda function needs to have permissions to update the SageMaker Studio domain:

{ 
"Version": "2012-10-17",
    "Statement": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:UpdateDomain"
        ],
        "Resource": "*" 
        } 
    ]
}

Configure an EFS directory shared across multiple domains under the same VPC

In this scenario, an administrator wants to provision an EFS file system for all users of multiple SageMaker Studio domains, using the same file system directory for all the users. The idea in this case is to assign the same EFS file system to all users of all domains that are within the same VPC. To test the solution, the account should ideally have two SageMaker Studio domains inside the VPC and subnet.

Create the EFS file system, add mount targets, and create an access point

Complete the steps in the previous section to set up your file system, mount targets, and access point.

Create a new Lambda function

Define a Lambda function called LambdaManageEFSUsers. This function is responsible for automating the configuration of SageMaker Studio domains to use a shared EFS file system within a specific VPC. This can be useful for organizations that want to provide a centralized storage solution for their ML projects across multiple SageMaker Studio domains. See the following code:

import json
import subprocess
import boto3
import os
import sys

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

sm_client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    #Environment variables
    event_domain_id =event["domain_id"]
    file_system=os.environ['efs_id']
    env_vpc_id =os.environ['vpc_id']
    
    #Event parameters 
    repository_name='shared_repository'
    repository=f'/mnt/efs/{repository_name}'
    domains =[]    

    # List all SageMaker domains in the specified VPC
    response = sm_client.list_domains()
    all_domains = response['Domains']
    for domain in all_domains:
        domain_id =domain["DomainId"]
        data =sm_client.describe_domain(DomainId=domain_id)
        domain_vpc_id = data['VpcId']
        if domain_vpc_id ==env_vpc_id:
            domains.append(domain_id)
    
    # Create directory and add the permission
    try:
        subprocess.call(['mkdir -p', repository])
        subprocess.call(['chown', '200001:1001', repository])
    except:
        print("Repository already created")
    
    #Update Sagemaker domain
    if len(domains)>0:
        for domain_id in domains: 
            response = sm_client.update_domain(
                DomainId=event_domain_id,
                DefaultUserSettings={
                    'CustomFileSystemConfigs': [
                        {
                            'EFSFileSystemConfig': {
                                'FileSystemId': file_system,
                                'FileSystemPath': f'/{repository_name}'
                            }
                        }
                    ]
                }
            )
   
        logger.info(f"Updated Studio for Domains {domains} and EFS {file_system}")
        return {
                'statusCode': 200,
                'body': json.dumps(f"Created dir and modified permissions for Domains {domains}")
            }
    
    else:
        return {
            'statusCode': 400,
            'body': json.dumps(f"VPC id of all the domains {domain_vpc} is different than the vpc id configured {env_vpc_id}")
        }

The execution role of the Lambda function needs to have permissions to describe and update the SageMaker Studio domain:

{ 
"Version": "2012-10-17",
    "Statement": [ 
        { 
        "Effect": "Allow", 
        "Action": [
            "sagemaker:DescribeDomain",
            "sagemaker:UpdateDomain"
        ],
        "Resource": "*" 
        } 
    ]
}

Clean up

To clean up the solution you implemented and avoid further costs, delete the CloudFormation template you deployed in your AWS account. When you delete the template, you also delete the EFS file system and its storage. For additional information, refer to Delete a stack from the CloudFormation console.

Conclusion

In this post, we have explored three scenarios demonstrating the versatility of integrating Amazon EFS with SageMaker Studio. These scenarios highlight how Amazon EFS can provide a scalable, secure, and collaborative data storage solution for data science teams.

The first scenario focused on configuring an EFS directory with private spaces for individual user profiles, allowing users to store and access their own data while the administrator manages the EFS file system centrally.

The second scenario showcased a shared EFS directory across all spaces within a SageMaker Studio domain, enabling better collaboration and centralized data management.

The third scenario explored an EFS file system shared across multiple SageMaker Studio domains, empowering enterprise-level data science collaboration and promoting efficient use of shared resources.

By implementing these Amazon EFS integration scenarios, organizations can unlock the full potential of their data science teams, improve data governance, and enhance the overall efficiency of their data-driven initiatives. The integration of Amazon EFS with SageMaker Studio provides a versatile platform for data science teams to thrive in the evolving landscape of ML and AI.


About the Authors

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads, to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.

Itziar Molina Fernandez is an AI/ML Consultant in the AWS Professional Services team. In her role, she works with customers building large-scale machine learning platforms and generative AI use cases on AWS. In her free time, she enjoys exploring new places.

Matteo Amadei is a Data Scientist Consultant in the AWS Professional Services team. He uses his expertise in artificial intelligence and advanced analytics to extract valuable insights and drive meaningful business outcomes for customers. He has worked on a wide range of projects spanning NLP, computer vision, and generative AI. He also has experience with building end-to-end MLOps pipelines to productionize analytical models. In his free time, Matteo enjoys traveling and reading.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years of software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Read More

Summarize call transcriptions securely with Amazon Transcribe and Amazon Bedrock Guardrails

Summarize call transcriptions securely with Amazon Transcribe and Amazon Bedrock Guardrails

Given the volume of meetings, interviews, and customer interactions in modern business environments, audio recordings play a crucial role in capturing valuable information. Manually transcribing and summarizing these recordings can be a time-consuming and tedious task. Fortunately, advancements in generative AI and automatic speech recognition (ASR) have paved the way for automated solutions that can streamline this process.

Customer service representatives receive a high volume of calls each day. Previously, calls were recorded and manually reviewed later for compliance, regulations, and company policies. Call recordings had to be transcribed, summarized, and then redacted for personal identifiable information (PII) before analyzing calls, resulting in delayed access to insights.

Redacting PII is a critical practice in security for several reasons. Maintaining the privacy and protection of individuals’ personal information is not only a matter of ethical responsibility, but also a legal requirement. In this post, we show you how to use Amazon Transcribe to get near real-time transcriptions of calls sent to Amazon Bedrock for summarization and sensitive data redaction. We’ll walk through an architecture that uses AWS Step Functions to orchestrate the process, providing seamless integration and efficient processing

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading model providers such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, Mistral AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. You can use  Amazon Bedrock Guardrails to redact sensitive information such as PII found in the generated call transcription summaries. Clean, summarized transcripts are then sent to analysts. This provides quicker access to call trends while protecting customer privacy.

Solution overview

The architecture of this solution is designed to be scalable, efficient, and compliant with privacy regulations. It includes the following key components:

  1. Recording – An audio file, such as a meeting or support call, to be transcribed and summarized
  2. Step Functions workflow – Coordinates the transcription and summarization process
  3. Amazon Transcribe – Converts audio recordings into text
  4. Amazon Bedrock – Summarizes the transcription and removes PII
  5. Amazon SNS – Delivers the summary to the designated recipient
  6. Recipient – Receives the summarized, PII-redacted transcript

The following diagram shows the architecture overflow –

The workflow orchestrated by Step Functions is as follows:

  1. An audio recording is provided as an input to the Step Functions workflow. This could be done manually or automatically depending on the specific use case and integration requirements.
  2. The workflow invokes Amazon Transcribe, which converts the multi-speaker audio recording into a textual, speaker-partition transcription. Amazon Transcribe uses advanced speech recognition algorithms and machine learning (ML) models to accurately partition speakers and transcribe the audio, handling various accents, background noise, and other challenges.
  3. The transcription output from Amazon Transcribe is then passed to Anthropic’s Claude 3 Haiku model on Amazon Bedrock through AWS Lambda. This model was chosen because it has relatively lower latency and cost than other models. The model first summarizes the transcript according to its summary instructions, and then the summarized output (the model response) is evaluated by Amazon Bedrock Guardrails to redact PII. To learn how it blocks harmful content, refer to How Amazon Bedrock Guardrails works. The instructions and transcript are both passed to the model as context.
  4. The output from Amazon Bedrock is stored in Amazon Simple Storage Service (Amazon S3) and sent to the designated recipient using Amazon Simple Notification Service (Amazon SNS). Amazon SNS supports various delivery channels, including email, SMS, and mobile push notifications, making sure that the summary reaches the intended recipient in a timely and reliable manner

The recipient can then review the concise summary, quickly grasping the key points and insights from the original audio recording. Additionally, sensitive information has been redacted, maintaining privacy and compliance with relevant regulations.

The following diagram shows the Step Functions workflow –

Prerequisites

Follow these steps before starting:

  1. Amazon Bedrock users need to request access to models before they’re available for use. This is a one-time action. For this solution, you need to enable access to Anthropic’s Claude 3 Haiku model on Amazon Bedrock. For more information, refer to Access Amazon Bedrock foundation models. Deployment, as described below, is currently supported only in the US West (Oregon) us-west-2 AWS Region. Users may explore other models if desired. You might need some customizations to deploy to alternative Regions with different model availability (such as us-east-1, which hosts Anthropic’s Claude 3.5 Sonnet). Make sure you consider model quality, speed, and cost tradeoffs before choosing a model.
  2. Create a guardrail for PII redaction. Configure filters to block or mask sensitive information. This option can be found on the Amazon Bedrock console on the Add sensitive information filters page when creating a guardrail. To learn how to configure filters for other use cases, refer to Remove PII from conversations by using sensitive information filters.

Deploy solution resources

To deploy the solution, download an AWS CloudFormation template to automatically provision the necessary resources in your AWS account. The template sets up the following components:

  • A Step Functions workflow
  • Lambda functions
  • An SNS topic
  • An S3 bucket
  • AWS Key Management Service (AWS KMS) keys for data encryption and decryption

By using this template, you can quickly deploy the sample solution with minimal manual configuration. The template requires the following parameters:

  • Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
  • Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary
  • Guardrail ID – This is the ID of your recently created guardrail, which can be found on the Amazon Bedrock Guardrails console in Guardrail overview

The Summary instructions are read into your Lambda function as an environment variable.

 
# Use the provided instructions to provide the summary. Use a default if no intructions are provided.
SUMMARY_INSTRUCTIONS = os.getenv('SUMMARY_INSTRUCTIONS')
 
These are then used as part of your payload to Anthropic’s Claude 3 Haiku model. This is shared to give you an understanding of how to pass the instructions and text to the model.
 
# Create the payload to provide to the Anthropic model.
        user_message = {"role": "user", "content": f"{SUMMARY_INSTRUCTIONS}{transcript}"}
        messages = [user_message]
response = generate_message(bedrock_client, 'anthropic.claude-3-haiku-20240307-v1:0"', "", messages, 1000)
 
The generate_message() function contains the invocation to Amazon Bedrock with the guardrail ID and other relevant parameters.
 
def generate_message(bedrock_runtime, model_id, system_prompt, messages, max_tokens):
    body = json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "system": system_prompt,
            "messages": messages
        }
    )
print(f'Invoking model: {BEDROCK_MODEL_ID}')
 
    response = bedrock_runtime.invoke_model(
        body=body,
        modelId=BEDROCK_MODEL_ID,
        # contentType=contentType,
        guardrailIdentifier =BEDROCK_GUARDRAIL_ID,
        guardrailVersion ="1",
        trace ="ENABLED")
    response_body = json.loads(response.get('body').read())
    print(f'response: {response}')
    return response_body

Deploy the solution

After you deploy the resources using AWS CloudFormation, complete these steps:

  1. Add a Lambda layer.

Although AWS Lambda regularly updates the version of AWS Boto3 included, at the time of writing this post, it still provides version 1.34.126. To use Amazon Bedrock Guardrails, you need version 1.34.90 or higher, for which we’ll add a Lambda layer that updates the Boto3. You can follow the official developer guide on how to add a Lambda layer.

There are different ways to create a Lambda layer. A simple method is to use the steps outlined in Packaging the layer content, which references a sample application repo. You should be able to replace requests==2.31.0 within requirements.txt content to boto3, which will install the latest available version, then create the layer.

To add the layer to Lambda, make sure that the parameters specified in Creating the layer match the deployed Lambda. That is, you need to update compatible-architectures to x86_64.

  1. Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack
  2. On the AWS CloudFormation console, find the stack you just created
  3. On the stack’s Outputs tab, look for the value associated with AssetBucketName. It will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
  4. On the Amazon S3 console, find your S3 assets bucket.

This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

  1. Upload your recording to the recordings folder in Amazon S3

Uploading recordings will automatically trigger the AWS Step Functions state machine. For this example, we use a sample team meeting recording from the sample recording.

  1. On the AWS Step Functions console, find the summary-generator state machine. Choose the name of the state machine run with the status Running.

Here, you can watch the progress of the state machine as it processes the recording. After it reaches its Success state, you should receive an emailed summary of the recording. Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.

Expand the solution

Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:

  • Try altering the process to fit your available source content and desired outputs:
    • For situations where transcripts are available, create an alternate AWS Step Functions workflow to ingest existing text-based or PDF-based transcriptions
    • Instead of using Amazon SNS to notify recipients through email, you can use it to send the output to a different endpoint, such as a team collaboration site or to the team’s chat channel
  • Try changing the summary instructions for the AWS CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case. The following are some examples:
    • When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor
    • If you’re using the model to summarize a course lecture, it could identify upcoming assignments, summarize key concepts, list facts, and filter out small talk from the recording
  • For the same recording, create different summaries for different audiences:
    • Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables
    • Project managers’ summaries focus on timelines, costs, deliverables, and action items
    • Project sponsors get a brief update on project status and escalations
    • For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you might want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.

Clean up

Clean up the resources you created for this solution to avoid incurring costs. You can use an AWS SDK, the AWS Command Line Interface (AWS CLI), or the console.

  1. Delete Amazon Bedrock Guardrails and the Lambda layer you created
  2. Delete the CloudFormation stack

To use the console, follow these steps:

  1. On the Amazon Bedrock console, in the navigation menu, select Guardrails. Choose your guardrail, then select Delete.
  2. On the AWS Lambda console, in the navigation menu, select Layers. Choose your layer, then select Delete.
  3. On the AWS CloudFormation console, in the navigation menu, select Stacks. Choose the stack you created, then select Delete.

Deleting the stack won’t delete the associated S3 bucket. If you no longer require the recordings or transcripts, you can delete the bucket separately. Amazon Transcribe is designed to automatically delete transcription jobs after 90 days. However, you can opt to manually delete these jobs before the 90-day retention period expires.

Conclusion

As businesses turn to data as a foundation for decision-making, having the ability to efficiently extract insights from audio recordings is invaluable. By using the power of generative AI with Amazon Bedrock and Amazon Transcribe, your organization can create concise summaries of audio recordings while maintaining privacy and compliance. The proposed architecture demonstrates how AWS services can be orchestrated using AWS Step Functions to streamline and automate complex workflows, enabling organizations to focus on their core business activities.

This solution not only saves time and effort, but also makes sure that sensitive information is redacted, mitigating potential risks and promoting compliance with data protection regulations. As organizations continue to generate and process large volumes of audio data, solutions like this will become increasingly important for gaining insights, making informed decisions, and maintaining a competitive edge.


About the authors

Yash Yamsanwar is a Machine Learning Architect at Amazon Web Services (AWS). He is responsible for designing high-performance, scalable machine learning infrastructure that optimizes the full lifecycle of machine learning models, from training to deployment. Yash collaborates closely with ML research teams to push the boundaries of what is possible with LLMs and other cutting-edge machine learning technologies.

Sawyer Hirt is a Solutions Architect at AWS, specializing in AI/ML and cloud architectures, with a passion for helping businesses leverage cutting-edge technologies to overcome complex challenges. His expertise lies in designing and optimizing ML workflows, enhancing system performance, and making advanced AI solutions more accessible and cost-effective, with a particular focus on Generative AI. Outside of work, Sawyer enjoys traveling, spending time with family, and staying current with the latest developments in cloud computing and artificial intelligence.

Read More

Waterways Wonder: Clearbot Autonomously Cleans Waters With Energy-Efficient AI

Waterways Wonder: Clearbot Autonomously Cleans Waters With Energy-Efficient AI

What started as two classmates seeking a free graduation trip to Bali subsidized by a university project ended up as an AI-driven sea-cleaning boat prototype built of empty water bottles, hobbyist helicopter blades and a GoPro camera.

University of Hong Kong grads Sidhant Gupta and Utkarsh Goel have since then made a splash with their Clearbot autonomous trash collection boats enabled by NVIDIA Jetson.

“We came up with the idea to clean the water there because there are a lot of dirty beaches, and the local community depends on them to be clean for their tourism business,” said Gupta, who points out the same is true for touristy regions of Hong Kong and India, where they do business now.

Before launching Clearbot, in 2021, the university friends put up their proof-of-concept waste collection boat on a website and then just forgot about it, he said, starting work after graduation. A year later, a marine construction company proposed a water cleanup project, and the pair developed their prototype around the effort to remove three tons of trash daily from a Hong Kong marine construction site.

“They were using a big boat and a crew of three to four people every day, at a cost of about $1,000 per day — that’s when we realized we can build this and do it better and at lower cost,” said Gupta.

Plastic makes up about 85% of ocean litter, with an estimated 11 million metric tons entering oceans every year, according to the United Nations Environment Programme. Clearbot aims to remove waste from waterways before it gets into oceans.

Cleaning Waters With Energy-Efficient Jetson Xavier

Clearbot, based in Hong Kong and India, has 24 employees developing and deploying its water-cleaning electric-powered boats that can self-dock at solar charging stations.

We believe that humanity’s relationship with the ocean is sort of broken — the question is can we make that better and is there a better future outcome? We can do it 100% emissions-free, so you’re not creating pollution while you’re cleaning pollution. Sidhant Gupta, co-founder of Clearbot

The ocean vessels, ranging in length from 10 to 16 feet, have two cameras — one for navigation and another for waste identification of what boats have scooped up. The founders trained garbage models on cloud and desktop NVIDIA GPUs from images they took in their early days, and now they have large libraries of images from collecting on cleanup sites. They’ve also trained models that enable the Clearbot to autonomously navigate away from obstacles. 

The energy-efficient Jetson Xavier NX allows the water-cleaning boats — propelled by battery-driven motors — to collect for eight hours at a time before returning to recharge.   

Harbors and other waterways frequented by tourists and businesses often rely on diesel-powered boats with workers using nets to remove garbage, said Gupta. Traditionally, a crew of 50 people in such scenarios can run about 15 or 20 boats, estimates Gupta. With Clearbot, a crew of 50 people can run about 150 boats, boosting intake, he said.  

“We believe that humanity’s relationship with the ocean is sort of broken — the question is can we make that better and is there a better future outcome?” said Gupta. “We can do it 100% emissions-free, so you’re not creating pollution while you’re cleaning pollution.” 

Customers Harnessing Clearbot for Environmental Benefits

Kingspan, a maker of building materials, is working with Clearbot to clean up trash and oil in rivers and lakes in Nongstoin, India. So far, the work has resulted in the removal of 1.2 tons of waste per month in the area. 

Umiam Lake in Meghalaya, India, has long been a tourist destination and place for fishing. However, it’s become so polluted, that areas of the water’s surface aren’t visible with all of the floating trash. 

The region’s leadership is working with Clearbot in a project with the University of California Berkeley Haas School of Business to help remove the trash from the lake. Since the program began three months ago, Clearbot has collected 15 tons of waste.

Mitigating Environmental Impacts With Clearbot Data 

Clearbot has expanded its services beyond trash collection to address environmental issues more broadly. The company is now assisting in marine pollution control for sewage, oil, gas and other chemical spills as well as undersea inspections for dredging projects, examining algae growth and many other areas where its autonomous boats can capture data.

Unforeseen to Clearbot’s founders, they have discovered that the data about garbage collection and other environmental pollutants can be used in mitigation strategies. The images that they collect are geotagged, so if somebody is trying to find the source of a problem, backtracking from Clearbot’s software dashboard on some of the data on findings is a good place to start.

For example, if there’s a concentration of plastic bottle waste in a particular area, and it’s of a particular type, local agencies could track back to where it’s coming from. This could allow local governments to mitigate the waste by reaching out to the polluter to put a stop to the activity that is causing it, said Gupta.

“Let’s say I’m a municipality and I want to ban plastic bags in my area — you need the NGOs, the governments and the change makers to acquire the data to back their justifications for why they want to close down the plastic plant up the stream,” said Gupta. “That data is being generated on board your NVIDIA Jetson Xavier.”

Learn about NVIDIA Jetson Xavier and Earth-2

Read More

Sustainable Manufacturing and Design: How Digital Twins Are Driving Efficiency and Cutting Emissions

Sustainable Manufacturing and Design: How Digital Twins Are Driving Efficiency and Cutting Emissions

Improving the sustainability of manufacturing involves optimizing entire product lifecycles — from material sourcing and transportation to design, production, distribution and end-of-life disposal.

According to the International Energy Agency, reducing the carbon footprint of industrial production by just 1% could save 90 million tons of CO₂ emissions annually. That’s equivalent to taking more than 20 million gasoline-powered cars off the road each year.

Technologies such as digital twins and accelerated computing are enabling manufacturers to reduce emissions, enhance energy efficiency and meet the growing demand for environmentally conscious production.

Siemens and NVIDIA are at the forefront of developing technologies that help customers achieve their sustainability goals and improve production processes.

Key Challenges in Sustainable Manufacturing

Balancing sustainability with business objectives like profitability remains a top concern for manufacturers. A study by Ernst & Young in 2022 found that digital twins can reduce construction costs by up to 35%, underscoring the close link between resource consumption and construction expenses.

Yet, one of the biggest challenges in driving sustainable manufacturing and reducing overhead is the presence of silos between departments, different plants within the same organization and across production teams. These silos arise from a variety of issues, including conflicting priorities and incentives, a lack of common energy-efficiency metrics and language, and the need for new skills and solutions to bridge these gaps.

Data management also presents a hurdle, with many manufacturers struggling to turn vast amounts of data into actionable insights — particularly those that can impact sustainability goals.

According to a case study by The Manufacturer, a quarter of respondents surveyed acknowledged that their data shortcomings negatively impact energy efficiency and environmental sustainability, with nearly a third reporting that data is siloed to local use cases.

Addressing these challenges requires innovative approaches that break down barriers and use data to drive sustainability. Acting as a central hub for information, digital twin technology is proving to be an essential tool in this effort.

The Role of Digital Twins in Sustainable Manufacturing

Industrial-scale digital twins built on the NVIDIA Omniverse development platform and Universal Scene Description (OpenUSD) are transforming how manufacturers approach sustainability and scalability.

These technologies power digital twins that take engineering data from various sources and contextualize it as it would appear in the real world. This breaks down information silos and offers a holistic view that can be shared across teams — from engineering to sales and marketing.

This enhanced visibility enables engineers and designers to simulate and optimize product designs, facility layouts, energy use and manufacturing processes before physical production begins. That allows for deeper insights and collaboration by helping stakeholders make more informed decisions to improve efficiency and reduce costly errors and last-minute changes that can result in significant waste.

To further transform how products and experiences are designed and manufactured, Siemens is integrating NVIDIA Omniverse Cloud application programming interfaces into its Siemens Xcelerator platform, starting with Teamcenter X, its cloud-based product lifecycle management software.

These integrations enable Siemens to bring the power of photorealistic visualization to complex engineering data and workflows, allowing companies to create physics-based digital twins that help eliminate workflow waste and errors.

Siemens and NVIDIA have demonstrated how companies like HD Hyundai, a leader in sustainable ship manufacturing, are using these new capabilities to visualize and interact with complex engineering data at new levels of scale and fidelity.

HD Hyundai is unifying and visualizing complex engineering projects directly within Teamcenter X.

Physics-based digital twins are also being utilized to test and validate robotics and physical AI before they’re deployed into real-world manufacturing facilities.

Foxconn, the world’s largest electronics manufacturer, has introduced a virtual plant that pushes the boundaries of industrial automation. Foxconn’s digital twin platform, built on Omniverse and NVIDIA Isaac, replicates a new factory in the Guadalajara, Mexico, electronics hub to allow engineers to optimize processes and train robots for efficient production of NVIDIA Blackwell systems.

By simulating the factory environment, engineers can determine the best placement for heavy robotic arms, optimize movement and maximize safe operations while strategically positioning thousands of sensors and video cameras to monitor the entire production process.

Foxconn’s virtual factory uses a digital twin powered by the NVIDIA Omniverse and NVIDIA Isaac platforms to produce NVIDIA Blackwell systems.

The use of digital twins, like those in Foxconn’s virtual factory, is becoming increasingly common in industrial settings for simulation and testing.

Foxconn’s chairman, Young Liu, highlighted how the digital twin will lead to enhanced automation and efficiency, resulting in significant savings in time, cost and energy. The company expects to increase manufacturing efficiency while reducing energy consumption by over 30% annually.

By connecting data from Siemens Xcelerator software to its platform built on NVIDIA Omniverse and OpenUSD, the virtual plant allows Foxconn to design and train robots in a realistic, simulated environment, revolutionizing its approach to automation and sustainable manufacturing.

Making Every Watt Count

One consideration for industries everywhere is how the rising demand for AI is outpacing the adoption of renewable energy. This means business leaders, particularly manufacturing plant and data center operators, must maximize energy efficiency and ensure every watt is utilized effectively to balance decarbonization efforts alongside AI growth.

The best and simplest means of optimizing energy use is to accelerate every possible workload.

Using accelerated computing platforms that integrate both GPUs and CPUs, manufacturers can significantly enhance computational efficiency.

GPUs, specifically designed for handling complex calculations, can outperform traditional CPU-only systems in AI tasks. These systems can be up to 20x more energy efficient when it comes to AI inference and training.

This leap in efficiency has fueled substantial gains over the past decade, enabling AI to address more complex challenges while maintaining energy-efficient operations.

Building on these advances, businesses can further reduce their environmental impact by adopting key energy management strategies. These include implementing energy demand management and efficiency measures, scaling battery storage for short-duration power outages, securing renewable energy sources for baseload electricity, using renewable fuels for backup generation and exploring innovative ideas like heat reuse.


Join the Siemens and NVIDIA session at the 7X24 Exchange 2024 Fall Conference to discover how digital twins and AI are driving sustainable solutions across data centers.


The Future of Sustainable Manufacturing: Industrial Digitalization

The next frontier in manufacturing is the convergence of the digital and physical worlds in what is known as industrial digitalization, or the “industrial metaverse.” Here, digital twins become even more immersive and interactive, allowing manufacturers to make data-driven decisions faster than ever.

“We will revolutionize how products and experiences are designed, manufactured and serviced,” said Roland Busch, president and CEO of Siemens AG. “On the path to the industrial metaverse, this next generation of industrial software enables customers to experience products as they would in the real world: in context, in stunning realism and — in the future — interact with them through natural language input.”

Leading the Way With Digital Twins and Sustainable Computing

Siemens and NVIDIA’s collaboration showcases the power of digital twins and accelerated computing for reducing the environmental impact caused by the manufacturing industry every year. By leveraging advanced simulations, AI insights and real-time data, manufacturers can reduce waste and increase energy efficiency on their path to decarbonization.

Learn more about how Siemens and NVIDIA are accelerating sustainable manufacturing.

Read about NVIDIA’s sustainable computing efforts and check out the energy-efficiency calculator to discover potential energy and emissions savings.

Read More

Get Ready to Slay: ‘Dragon Age: The Veilguard’ to Soar Into GeForce NOW at Launch

Get Ready to Slay: ‘Dragon Age: The Veilguard’ to Soar Into GeForce NOW at Launch

Bundle up this fall with GeForce NOW and Dragon Age: The Veilguard with a special, limited-time promotion just for members.

The highly anticipated role-playing game (RPG) leads 10 titles joining the ever-growing GeForce NOW library of over 2,000 games.

A Heroic Bundle

Dragon Age: The Veilguard on GeForce NOW
The mother of dragon bundles.

Fight for Thedas’ future at Ultimate quality this fall as new and existing members who purchase six months of GeForce NOW Ultimate can get BioWare and Electronic Arts’ epic RPG Dragon Age: The Veilguard for free when it releases on Oct. 31.

Rise as Rook, Dragon Age’s newest hero. Lead a team of seven companions, each with their own unique story, against a new evil rising in Thedas. The latest entry in the legendary Dragon Age franchise lets players customize their characters and engage with new romancable companions whose stories unfold over time. Band together and become the Veilguard.

Ultimate members can experience BioWare’s latest entry at full GeForce quality, with support for NVIDIA DLSS 3, low-latency gameplay with NVIDIA Reflex, and enhanced image quality and immersion with ray-traced ambient occlusion and reflections. Ultimate members can also play popular PC games at up to 4K resolution with extended session lengths, even on low-spec devices.

Move fast — this bundle is only available for a limited time until Oct. 30.

Supernatural Thrills, Super New Games

New World Aeternum
Eternal adventure, instant access.

New World: Aeternum is the latest content for Amazon Games’ hit action RPG. Available for members to stream at launch this week, it offers a thrilling action RPG experience in a vast, perilous world. Explore the mysterious island, encounter diverse creatures, face supernatural dangers and uncover ancient secrets.

The game’s action-oriented combat system and wide variety of weapons allow for diverse playstyles, while the crafting and progression systems offer depth for long-term engagement. Then, grab the gaming squad for intense combat and participate in large-scale battles for territorial control.

Members can look for the following games available to stream in the cloud this week:

  • Neva (New release on Steam, Oct. 15)
  • MechWarrior 5: Clans (New release on Steam and Xbox, available on PC Game Pass, Oct. 16)
  • A Quiet Place: The Road Ahead (New release on Steam, Oct. 17)
  • Assassin’s Creed Mirage (New release on Steam, Oct. 17)
  • Artisan TD (Steam) 
  • ASKA (Steam)
  • Dungeon Tycoon (Steam)
  • South Park: The Fractured But Whole (Available on PC Game Pass, Oct 16. Members will need to activate access)
  • Spirit City: Lofi Sessions (Steam)
  • Star Trucker (Xbox, available on Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

PyTorch 2.5 Release Blog

We are excited to announce the release of PyTorch® 2.5 (release note)! This release features a new CuDNN backend for SDPA, enabling speedups by default for users of SDPA on H100s or newer GPUs. As well, regional compilation of torch.compile offers a way to reduce the cold start up time for torch.compile by allowing users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Finally, TorchInductor CPP backend offers solid performance speedup with numerous enhancements like FP16 support, CPP wrapper, AOT-Inductor mode, and max-autotune mode.

This release is composed of 4095 commits from 504 contributors since PyTorch 2.4. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.5. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

As well, please check out our new ecosystem projects releases with TorchRec and TorchFix.

Beta Prototype
CuDNN backend for SDPA FlexAttention
torch.compile regional compilation without recompilations Compiled Autograd
TorchDynamo added support for exception handling & MutableMapping types Flight Recorder
TorchInductor CPU backend optimization Max-autotune Support on CPU with GEMM Template
TorchInductor on Windows
FP16 support on CPU path for both eager mode and TorchInductor CPP backend
Autoload Device Extension
Enhanced Intel GPU support

*To see a full list of public feature submissions click here.

BETA FEATURES

[Beta] CuDNN backend for SDPA

The cuDNN “Fused Flash Attention” backend was landed for torch.nn.functional.scaled_dot_product_attention. On NVIDIA H100 GPUs this can provide up to 75% speed-up over FlashAttentionV2. This speedup is enabled by default for all users of SDPA on H100 or newer GPUs.

[Beta] torch.compile regional compilation without recompilations

Regional compilation without recompilations, via* torch._dynamo.config.inline_inbuilt_nn_modules* which default to True in 2.5+. This option allows users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Compared to compiling the full model, this option can result in smaller compilation latencies with 1%-5% performance degradation compared to full model compilation.

See the tutorial for more information.

[Beta] TorchInductor CPU backend optimization

This feature advances Inductor’s CPU backend optimization, including CPP backend code generation and FX fusions with customized CPU kernels. The Inductor CPU backend supports vectorization of common data types and all Inductor IR operations, along with the static and symbolic shapes. It is compatible with both Linux and Windows OS and supports the default Python wrapper, the CPP wrapper, and AOT-Inductor mode.

Additionally, it extends the max-autotune mode of the GEMM template (prototyped in 2.5), offering further performance gains. The backend supports various FX fusions, lowering to customized kernels such as oneDNN for Linear/Conv operations and SDPA. The Inductor CPU backend consistently achieves performance speedups across three benchmark suites—TorchBench, Hugging Face, and timms—outperforming eager mode in 97.5% of the 193 models tested.

PROTOTYPE FEATURES

[Prototype] FlexAttention

We’ve introduced a flexible API that enables implementing various attention mechanisms such as Sliding Window, Causal Mask, and PrefixLM with just a few lines of idiomatic PyTorch code. This API leverages torch.compile to generate a fused FlashAttention kernel, which eliminates extra memory allocation and achieves performance comparable to handwritten implementations. Additionally, we automatically generate the backwards pass using PyTorch’s autograd machinery. Furthermore, our API can take advantage of sparsity in the attention mask, resulting in significant improvements over standard attention implementations.

For more information and examples, please refer to the official blog post and Attention Gym.

[Prototype] Compiled Autograd

Compiled Autograd is an extension to the PT2 stack allowing the capture of the entire backward pass. Unlike the backward graph traced by AOT dispatcher, Compiled Autograd tracing is deferred until backward execution time, which makes it impervious to forward pass graph breaks, and allows it to record backward hooks into the graph.

Please refer to the tutorial for more information.

[Prototype] Flight Recorder

Flight recorder is a new debugging tool that helps debug stuck jobs. The tool works by continuously capturing information about collectives as they run. Upon detecting a stuck job, the information can be used to quickly identify misbehaving ranks/machines along with code stack traces.

For more information please refer to the following tutorial.

[Prototype] Max-autotune Support on CPU with GEMM Template

Max-autotune mode for the Inductor CPU backend in torch.compile profiles multiple implementations of operations at compile time and selects the best-performing one. This is particularly beneficial for GEMM-related operations, using a C++ template-based GEMM implementation as an alternative to the ATen-based approach with oneDNN and MKL libraries. We support FP32, BF16, FP16, and INT8 with epilogue fusions for x86 CPUs. We’ve seen up to 7% geomean speedup on the dynamo benchmark suites and up to 20% boost in next-token latency for LLM inference.

For more information please refer to the tutorial.

[Prototype] TorchInductor CPU on Windows

Inductor CPU backend in torch.compile now works on Windows. We support MSVC (cl), clang (clang-cl) and Intel compiler (icx-cl) for Windows inductor currently.

See the tutorial for more details.

[Prototype] FP16 support on CPU path for both eager mode and TorchInductor CPP backend

Float16 is a commonly used reduced floating point type for performance improvement in neural network inference/training. Since this release, float16 for both eager and TorchInductor is supported on the CPU path.

[Prototype] Autoload Device Extension

PyTorch now supports autoloading for out-of-tree device extensions, streamlining integration by eliminating the need for manual imports. This feature, enabled through the torch.backends entrypoint, simplifies usage by ensuring seamless extension loading, while allowing users to disable it via an environment variable if needed.

See the tutorial for more information.

[Prototype] Enhanced Intel GPU support

Intel GPUs support enhancement is now available for both Intel® Data Center GPU Max Series and Intel® Client GPUs (Intel® Core™ Ultra processors with built-in Intel® Arc™ graphics and Intel® Arc™ Graphics for dGPU parts), which is to make it easier to accelerate your Machine Learning workflows on Intel GPUs in PyTorch 2.5 release. We also enabled the initial support of PyTorch on Windows for Intel® Client GPUs in this release.

  • Expanded PyTorch hardware backend support matrix to include both Intel Data Center and Client GPUs.  
  • The implementation of SYCL* kernels to enhance coverage and execution of Aten operators on Intel GPUs to boost performance in PyTorch eager mode.
  • Enhanced Intel GPU backend of torch.compile to improve inference and training performance for a wide range of deep learning workloads.

These features are available through PyTorch preview and nightly binary PIP wheels. For more information regarding Intel GPU support, please refer to documentation.

Read More

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

How DPG Media uses Amazon Bedrock and Amazon Transcribe to enhance video metadata with AI-powered pipelines

This post was co-written with Lucas Desard, Tom Lauwers, and Sam Landuydt from DPG Media.

DPG Media is a leading media company in Benelux operating multiple online platforms and TV channels. DPG Media’s VTM GO platform alone offers over 500 days of non-stop content.

With a growing library of long-form video content, DPG Media recognizes the importance of efficiently managing and enhancing video metadata such as actor information, genre, summary of episodes, the mood of the video, and more. Having descriptive metadata is key to providing accurate TV guide descriptions, improving content recommendations, and enhancing the consumer’s ability to explore content that aligns with their interests and current mood.

This post shows how DPG Media introduced AI-powered processes using Amazon Bedrock and Amazon Transcribe into its video publication pipelines in just 4 weeks, as an evolution towards more automated annotation systems.

The challenge: Extracting and generating metadata at scale

DPG Media receives video productions accompanied by a wide range of marketing materials such as visual media and brief descriptions. These materials often lack standardization and vary in quality. As a result, DPG Media Producers have to run a screening process to consume and understand the content sufficiently to generate the missing metadata, such as brief summaries. For some content, additional screening is performed to generate subtitles and captions.

As DPG Media grows, they need a more scalable way of capturing metadata that enhances the consumer experience on online video services and aids in understanding key content characteristics.

The following were some initial challenges in automation:

  • Language diversity – The services host both Dutch and English shows. Some local shows feature Flemish dialects, which can be difficult for some large language models (LLMs) to understand.
  • Variability in content volume – They offer a range of content volume, from single-episode films to multi-season series.
  • Release frequency – New shows, episodes, and movies are released daily.
  • Data aggregation – Metadata needs to be available at the top-level asset (program or movie) and must be reliably aggregated across different seasons.

Solution overview

To address the challenges of automation, DPG Media decided to implement a combination of AI techniques and existing metadata to generate new, accurate content and category descriptions, mood, and context.

The project focused solely on audio processing due to its cost-efficiency and faster processing time. Video data analysis with AI wasn’t required for generating detailed, accurate, and high-quality metadata.

The following diagram shows the metadata generation pipeline from audio transcription to detailed metadata.

Architecture for end to end media transcribe and AI generated meta data

The general architecture of the metadata pipeline consists of two primary steps:

  1. Generate transcriptions of audio tracks: use speech recognition models to generate accurate transcripts of the audio content.
  2. Generate metadata: use LLMs to extract and generate detailed metadata from the transcriptions.

In the following sections, we discuss the components of the pipeline in more detail.

Step 1. Generate transcriptions of audio tracks

To generate the necessary audio transcripts for metadata extraction, the DPG Media team evaluated two different transcription strategies: Whisper-v3-large, which requires at least 10 GB of vRAM and high operational processing, and Amazon Transcribe, a managed service with the added benefit of automatic model updates from AWS over time and speaker diarization. The evaluation focused on two key factors: price-performance and transcription quality.

To evaluate the transcription accuracy quality, the team compared the results against ground truth subtitles on a large test set, using the following metrics:

  • Word error rate (WER) – This metric measures the percentage of words that are incorrectly transcribed compared to the ground truth. A lower WER indicates a more accurate transcription.
  • Match error rate (MER) – MER assesses the proportion of correct words that were accurately matched in the transcription. A lower MER signifies better accuracy.
  • Word information lost (WIL) – This metric quantifies the amount of information lost due to transcription errors. A lower WIL suggests fewer errors and better retention of the original content.
  • Word information preserved (WIP) – WIP is the opposite of WIL, indicating the amount of information correctly captured. A higher WIP score reflects more accurate transcription.
  • Hits – This metric counts the number of correctly transcribed words, giving a straightforward measure of accuracy.

Both experiments transcribing audio yielded high-quality results without the need to incorporate video or further speaker diarization. For further insights into speaker diarization in other use cases, see Streamline diarization using AI as an assistive technology: ZOO Digital’s story.

Considering the varying development and maintenance efforts required by different alternatives, DPG Media chose Amazon Transcribe for the transcription component of their system. This managed service offered convenience, allowing them to concentrate their resources on obtaining comprehensive and highly accurate data from their assets, with the goal of achieving 100% qualitative precision.

Step 2. Generate metadata

Now that DPG Media has the transcription of the audio files, they use LLMs through Amazon Bedrock to generate the various categories of metadata (summaries, genre, mood, key events, and so on). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Through Amazon Bedrock, DPG Media selected the Anthropic Claude 3 Sonnet model based on internal testing, and the Hugging Face LMSYS Chatbot Arena Leaderboard for its reasoning and Dutch language performance. Working closely with end-consumers, the DPG Media team tuned the prompts to make sure the generated metadata matched the expected format and style.

After the team had generated metadata at the individual video level, the next step was to aggregate this metadata across an entire series of episodes. This was a critical requirement, because content recommendations on a streaming service are typically made at the series or movie level, rather than the episode level.

To generate summaries and metadata at the series level, the DPG Media team reused the previously generated video-level metadata. They fed the summaries in an ordered and structured manner, along with a specifically tailored system prompt, back through Amazon Bedrock to Anthropic Claude 3 Sonnet.

Using the summaries instead of the full transcriptions of the episodes was sufficient for high-quality aggregated data and was more cost-efficient, because many of DPG Media’s series have extended runs.

The solution also stores the direct association between each type of metadata and its corresponding system prompt, making it straightforward to tune, remove, or add prompts as needed—similar to the adjustments made during the development process. This flexibility allows them to tailor the metadata generation to evolving business requirements.

To evaluate the metadata quality, the team used reference-free LLM metrics, inspired by LangSmith. This approach used a secondary LLM to evaluate the outputs based on tailored metrics such as if the summary is simple to understand, if it contains all important events from the transcription, and if there are any hallucinations in the generated summary. The secondary LLM is used to evaluate the summaries on a large scale.

Results and lessons learned

The implementation of the AI-powered metadata pipeline has been a transformative journey for DPG Media. Their approach saves days of work generating metadata for a TV series.

DPG Media chose Amazon Transcribe for its ease of transcription and low maintenance, with the added benefit of incremental improvements by AWS over the years. For metadata generation, DPG Media chose Anthropic Claude 3 Sonnet on Amazon Bedrock, instead of building direct integrations to various model providers. The flexibility to experiment with multiple models was appreciated, and there are plans to try out Anthropic Claude Opus when it becomes available in their desired AWS Region.

DPG Media decided to strike a balance between AI and human expertise by having the results generated by the pipeline validated by humans. This approach was chosen because the results would be exposed to end-customers, and AI systems can sometimes make mistakes. The goal was not to replace people but to enhance their capabilities through a combination of human curation and automation.

Transforming the video viewing experience is not merely about adding more descriptions, it’s about creating a richer, more engaging user experience. By implementing AI-driven processes, DPG Media aims to offer better-recommended content to users, foster a deeper understanding of its content library, and progress towards more automated and efficient annotation systems. This evolution promises not only to streamline operations but also to align content delivery with modern consumption habits and technological advancements.

Conclusion

In this post, we shared how DPG Media introduced AI-powered processes using Amazon Bedrock into its video publication pipelines. This solution can help accelerate audio metadata extraction, create a more engaging user experience, and save time.

We encourage you to learn more about how to gain a competitive advantage with powerful generative AI applications by visiting Amazon Bedrock and trying this solution out on a dataset relevant to your business.


About the Authors

Lucas DesardLucas Desard is GenAI Engineer at DPG Media. He helps DPG Media integrate generative AI efficiently and meaningfully into various company processes.

Tom LauwersTom Lauwers is a machine learning engineer on the video personalization team for DPG Media. He builds and architects the recommendation systems for DPG Media’s long-form video platforms, supporting brands like VTM GO, Streamz, and RTL play.

Sam LanduydtSam Landuydt is the Area Manager Recommendation & Search at DPG Media. As the manager of the team, he guides ML and software engineers in building recommendation systems and generative AI solutions for the company.

Irina RaduIrina Radu is a Prototyping Engagement Manager, part of AWS EMEA Prototyping and Cloud Engineering. She helps customers get the most out of the latest tech, innovate faster, and think bigger.

Fernanda MachadoFernanda Machado, AWS Prototyping Architect, helps customers bring ideas to life and use the latest best practices for modern applications.

Andrew ShvedAndrew Shved, Senior AWS Prototyping Architect, helps customers build business solutions that use innovations in modern applications, big data, and AI.

Read More