Detecting and redacting PII using Amazon Comprehend

Detecting and redacting PII using Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to find insights and relationships like people, places, sentiments, and topics in unstructured text. You can now use Amazon Comprehend ML capabilities to detect and redact personally identifiable information (PII) in customer emails, support tickets, product reviews, social media, and more. No ML experience required. For example, you can analyze support tickets and knowledge articles to detect PII entities and redact the text before you index the documents in the search solution. After that, search solutions are free of PII entities in documents. Redacting PII entities helps you protect privacy and comply with local laws and regulations.

Customer use case: TeraDact Solutions

TeraDact Solutions has already put this new feature to work. TeraDact Solutions’ software offers a robust alternative for secure information sharing in a world of ever-increasing compliance and privacy concerns. With its signature Information Identification & Presentation (IIaP™) capabilities, TeraDact’s tools provide the user with a safe information sharing environment. “Using Amazon Comprehend for PII redaction with our tokenization system not only helps us reach a larger set of our customers but also helps us overcome the shortcomings of rules-based PII detection which can result in false alarms or missed details. PII detection is critical for businesses and with the power of context-aware NLP models from Comprehend we can uphold the trust customers place in us with their information. Amazon is innovating in ways to help push our business forward by adding new features which are critical to our business thereby providing enhanced service to 100% of customers able to access Comprehend in AWS.” said Chris Schrichte, CEO, TeraDact Solutions, Inc.

In this post, I cover how to use Amazon Comprehend to detect PII and redact the PII entities via the AWS Management Console and the AWS Command Line Interface (AWS CLI).

Detecting PII in Amazon Comprehend

When you analyze text using Amazon Comprehend real-time analysis, Amazon Comprehend automatically identifies PII, as summarized in the following table.

PII entity category PII entity types
Financial

BANK_ACCOUNT_NUMBER

BANK_ROUTING

CREDIT_DEBIT_NUMBER

CREDIT_DEBIT_CVV

CREDIT_DEBIT_EXPIRY

PIN

Personal

NAME

ADDRESS

PHONE

EMAIL

AGE

Technical security

USERNAME

PASSWORD

URL

AWS_ACCESS_KEY

AWS_SECRET_KEY

IP_ADDRESS

MAC_ADDRESS

National

SSN

PASSPORT_NUMBER

DRIVER_ID

Other DATE_TIME

For each detected PII entity, you get the type of PII, a confidence score, and begin and end offset. These offsets help you locate PII entities in your documents for document processing to redact it at the secure storage or downstream solutions.

Analyzing text on the Amazon Comprehend console

To get started with Amazon Comprehend, all you need is an AWS account. To use the console, complete the following steps:

  1. On the Amazon Comprehend console, in the Input text section, select Built-in.
  2. For Input text, enter your text.
  3. Choose Analyze.

  1. On the Insights page, choose the PII

The PII tab shows color-coded text to indicate different PII entity types, such as name, email, address, phone, and others. The Results section shows more information about the text. Each entry shows the PII entity, its type, and the level of confidence Amazon Comprehend has in this analysis.

Analyzing text via the AWS CLI

To perform real-time analysis using the AWS CLI, enter the following code:

aws comprehend detect-pii-entities 
--language-code en 
--text 
" Good morning, everybody. My name is Van Bokhorst Serdar, and today I feel like sharing a whole lot of personal information with you. Let's start with my Email address SerdarvanBokhorst@dayrep.com. My address is 2657 Koontz Lane, Los Angeles, CA. My phone number is 818-828-6231. My Social security number is 548-95-6370. My Bank account number is 940517528812 and routing number 195991012. My credit card number is 5534816011668430, Expiration Date 6/1/2022, my C V V code is 121, and my pin 123456. Well, I think that's it. You know a whole lot about me. And I hope that Amazon comprehend is doing a good job at identifying PII entities so you can redact my personal information away from this document. Let's check."

To view the output, open the JSON response object and look at the detected PII entities. For each entity, the service returns the type of PII, confidence score metric, BeginOffset, and EndOffset. See the following code:

{
    "Entities": [
        {
            "Score": 0.9996334314346313,
            "Type": "NAME",
            "BeginOffset": 36,
            "EndOffset": 55
        },
        {
            "Score": 0.9999902248382568,
            "Type": "EMAIL",
            "BeginOffset": 167,
            "EndOffset": 195
        },
        {
            "Score": 0.9999983310699463,
            "Type": "ADDRESS",
            "BeginOffset": 211,
            "EndOffset": 245
        },
        {
            "Score": 0.9999997615814209,
            "Type": "PHONE",
            "BeginOffset": 265,
            "EndOffset": 277
        },
        {
            "Score": 0.9999996423721313,
            "Type": "SSN",
            "BeginOffset": 308,
            "EndOffset": 319
        },
        {
            "Score": 0.9999984502792358,
            "Type": "BANK_ACCOUNT_NUMBER",
            "BeginOffset": 347,
            "EndOffset": 359
        },
        {
            "Score": 0.9999974966049194,
            "Type": "BANK_ROUTING",
            "BeginOffset": 379,
            "EndOffset": 388
        },
        {
            "Score": 0.9999991655349731,
            "Type": "CREDIT_DEBIT_NUMBER",
            "BeginOffset": 415,
            "EndOffset": 431
        },
        {
            "Score": 0.9923601746559143,
            "Type": "CREDIT_DEBIT_EXPIRY",
            "BeginOffset": 449,
            "EndOffset": 457
        },
        {
            "Score": 0.9999997615814209,
            "Type": "CREDIT_DEBIT_CVV",
            "BeginOffset": 476,
            "EndOffset": 479
        },
        {
            "Score": 0.9998345375061035,
            "Type": "PIN",
            "BeginOffset": 492,
            "EndOffset": 498
        }
    ]
}

Asynchronous PII redaction batch processing on the Amazon Comprehend console

You can redact documents by using Amazon Comprehend asynchronous operations. You can choose redaction mode Replace with PII entity to replace PII entities with PII entity type, or choose to mask PII entity with redaction mode Replace with character and replace the characters in PII entities with a character of your choice (!, #, $, %, &, *, or @).

To analyze and redact large documents and large collections of documents, ensure that the documents are stored in an Amazon Simple Storage Service (Amazon S3) bucket and start an asynchronous operation to detect and redact PII in the documents. The results of the analysis are returned in an S3 bucket.

  1. On the Amazon Comprehend console, choose Analysis jobs.
  2. Choose Create job.

  1. On the Create analysis job page, for Name, enter a name (for this post, we enter comprehend-blog-redact-01).
  2. For Analysis type, choose Personally identifiable information (PII).
  3. For Language, choose English.

  1. In the PII detection settings section, for Output mode, select Redactions.
  2. Expand PII entity types and select the entity types to redact.
  3. For Redaction mode, choose Replace with PII entity type.

Alternatively, you can choose Replace with character to replace PII entities with a character of your choice (!, #, $, %, &, *, or @).

  1. In the Input data section, for Data source, select My documents.
  2. For S3 location, enter the S3 path for pii-s3-input.txt.

This text file has the same example content we used earlier for real-time analysis.

  1. In the Output data section, for S3 location, enter the path to the output folder in Amazon S3.

Make sure you choose the correct input and output paths based on how you organized the document.

  1. In the Access permissions section, for IAM role, select Create an IAM role.

You need an AWS Identity and Access Management (IAM) role with required permissions to access the input and output S3 buckets for the job that is created and propagated.

  1. For Permissions to access, choose Input and Output S3 buckets.
  2. For Name suffix, enter a suffix for your role (for this post, we enter ComprehendPIIRole).
  3. Choose Create job.

You can see the job comprehend-blog-redact-01 with the job status In progress.

When the job status changes to Completed, you can access the output file to view the output. The pii-s3-input.txt file has the same example content we used earlier, and using redaction mode replaces PII with its PII entity type. Your output looks like the following text:

Good morning, everybody. My name is [NAME], and today I feel like sharing a whole lot of personal information with you. Let's start with my Email address [EMAIL]. My address is [ADDRESS] My phone number is [PHONE]. My Social security number is [SSN]. My Bank account number is [BANK-ACCOUNT-NUMBER] and routing number [BANK-ROUTING]. My credit card number is [CREDIT-DEBIT-NUMBER], Expiration Date [CREDIT-DEBIT-EXPIRY], my C V V code is [CREDIT-DEBIT-CVV], and my pin [PIN]. Well, I think that's it. You know a whole lot about me. And I hope that Amazon comprehend is doing a good job at identifying PII entities so you can redact my personal information away from this document. Let's check.

If you have very long entity types, you may prefer to mask PII with a character. If you choose to replace PII with the character *, your output looks like the following text. :

Good morning, everybody. My name is *******************, and today I feel like sharing a whole lot of personal information with you. Let's start with my Email address ****************************. My address is ********************************** My phone number is ************. My Social security number is ***********. My Bank account number is ************ and routing number *********. My credit card number is ****************, Expiration Date ********, my C V V code is ***, and my pin ******. Well, I think that's it. You know a whole lot about me. And I hope that Amazon comprehend is doing a good job at identifying PII entities so you can redact my personal information away from this document. Let's check.

Asynchronous PII redaction batch processing via the AWS CLI

To perform the PII redaction job using the AWS CLI, enter the following code:

aws comprehend start-pii-entities-detection-job 
 --input-data-config S3Uri="s3://ai-ml-services-lab/public/labs/comprehend/pii/input/redact/pii-s3-input.txt"  
 --output-data-config S3Uri="s3://ai-ml-services-lab/public/labs/comprehend/pii/output/redact/"  
 --mode "ONLY_REDACTION" 
 --redaction-config PiiEntityTypes="BANK_ACCOUNT_NUMBER","BANK_ROUTING","CREDIT_DEBIT_NUMBER","CREDIT_DEBIT_CVV","CREDIT_DEBIT_EXPIRY","PIN","EMAIL","ADDRESS","NAME","PHONE","SSN",MaskMode="REPLACE_WITH_PII_ENTITY_TYPE" 
 --data-access-role-arn "arn:aws:iam::<ACCOUNTID>:role/service-role/AmazonComprehendServiceRole-ComprehendPIIRole" 
 --job-name "comprehend-blog-redact-001" 
 --language-code "en"

The request yields the following output:

{
    "JobId": "e41101e2f0919a320bc0583a50f86b5f",
    "JobStatus": "SUBMITTED"
}

To monitor the job request, enter the following code:

aws comprehend describe-pii-entities-detection-job --job-id " e41101e2f0919a320bc0583a50f86b5f "

The following output shows that the job is complete:

{
    "PiiEntitiesDetectionJobProperties": {
        "JobId": "e41101e2f0919a320bc0583a50f86b5f",
        "JobName": "comprehend-blog-redact-001",
        "JobStatus": "COMPLETED",
        "SubmitTime": <SubmitTime>,
        "EndTime": <EndTime>,
        "InputDataConfig": {
            "S3Uri": "s3://ai-ml-services-lab/public/labs/comprehend/pii/input/redact/pii-s3-input.txt",
            "InputFormat": "ONE_DOC_PER_LINE"
        },
        "OutputDataConfig": {
            "S3Uri": "s3://ai-ml-services-lab/public/labs/comprehend/pii/output/redact/<AccountID>-PII-e41101e2f0919a320bc0583a50f86b5f/output/"
        },
        "RedactionConfig": {
            "PiiEntityTypes": [
                "BANK_ACCOUNT_NUMBER",
                "BANK_ROUTING",
                "CREDIT_DEBIT_NUMBER",
                "CREDIT_DEBIT_CVV",
                "CREDIT_DEBIT_EXPIRY",
                "PIN",
                "EMAIL",
                "ADDRESS",
                "NAME",
                "PHONE",
                "SSN"
            ],
            "MaskMode": "REPLACE_WITH_PII_ENTITY_TYPE"
        },
        "LanguageCode": "en",
        "DataAccessRoleArn": "arn:aws:iam::<AccountID>:role/ComprehendBucketAccessRole",
        "Mode": "ONLY_REDACTION"
    }
}

After the job is complete, the output file is plain text (same as the input file). Other Amazon Comprehend asynchronous jobs (start-entities-detection-job) have an output file called output.tar.gz, which is a compressed archive that contains the output of the operation. Start-pii-entities-detection-job retains the folder and file structure as input. Our comprehend-blog-redact-001 job input file pii-s3-input.txt has the respective pii-s3-input.txt.out file with the redacted text in the jobs output folder. You can find the Amazon S3 location in the output from monitoring the job; the JSON element PiiEntitiesDetectionJobProperties.OutputDataConfig.S3uri has the file pii-s3-input.txt.out and the redacted content with PII entity type.

Conclusion

As of this writing, the PII detection feature in Amazon Comprehend is available for US English in the following Regions:

  • US East (Ohio)
  • US East (N. Virginia)
  • US West (Oregon),
  • Asia Pacific (Mumbai)
  • Asia Pacific (Seoul)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Tokyo)
  • EU (Frankfurt)
  • EU (Ireland)
  • EU (London)
  • AWS GovCloud (US-West)

Take a look at the pricing page, give the feature a try, and please send us feedback either via the AWS forum for Amazon Comprehend or through your usual AWS support contacts.


About the Author

Sriharsha M S is an AI/ML specialist solution architect in the Strategic Specialist team at Amazon Web Services. He works with strategic AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement AI/ML applications at scale. His expertise spans application architecture, bigdata, analytics and machine learning.

Read More

Build alerting and human review for images using Amazon Rekognition and Amazon A2I

Build alerting and human review for images using Amazon Rekognition and Amazon A2I

The volume of user-generated content (UGC) and third-party content has been increasing substantially in sectors like social media, ecommerce, online advertising, and photo sharing. However, such content needs to be reviewed to ensure that end-users aren’t exposed to inappropriate or offensive material, such as nudity, violence, adult products, or disturbing images. Today, some companies simply react to user complaints to take down offensive images, ads, or videos, whereas many employ teams of human moderators to review small samples of content. However, human moderators alone can’t scale to meet these needs, leading to a poor user experience or even a loss of brand reputation.

With Amazon Rekognition, you can automate or streamline your image and video analysis workflows using machine learning (ML). Amazon Rekognition provides an image moderation API that can detect unsafe or inappropriate content containing nudity, suggestiveness, violence, and more. You get a hierarchical taxonomy of labels that you can use to define your business rules, without needing any ML experience. Each detection by Amazon Rekognition comes with a confidence score between 0–100, which provides a measure of how confident the ML model is in its prediction.

Content moderation still requires human reviewers to audit results and judge nuanced situations where AI may not be certain in its prediction. Combining machine predictions with human judgment and managing the infrastructure needed to set up such workflows is hard, expensive, and time-consuming to do at scale. This is why we built Amazon Augmented AI (Amazon A2I), which lets you implement a human review of ML predictions and is directly integrated with Amazon Rekognition. Amazon A2I allows you to use in-house, private, or third-party vendor workforces with a web interface that has instructions and tools they need to complete their review tasks.

You can easily set up the criteria that triggers a human review of a machine prediction; for example, you can send an image for further human review if Amazon Rekognition’s confidence score is between 50–90. Amazon Rekognition handles the bulk of the work and makes sure that every image gets scanned, and Amazon A2I helps send the remaining content for further review to best utilize human judgment. Together, this helps ensure that you get full moderation coverage while maintaining very high accuracy, at a fraction of the cost to review each image manually.

In this post, we show you how to use Amazon Rekognition image moderation APIs to automatically detect explicit adult, suggestive, violent, and disturbing content in an image and use Amazon A2I to onboard human workforces, set up human review thresholds of the images, and define human review tasks. When these conditions are met, images are sent to human reviewers for further review, which is performed according to the instructions in the human review task definition.

Prerequisites

This post requires you to complete the following prerequisites:

  • Create an AWS Identity and Access Management (IAM) role. To create a human review workflow, you need to provide an IAM role that grants Amazon A2I permission to access Amazon Simple Storage Service (Amazon S3) for reading objects to render in a human task UI and writing the results of the human review. This role also needs an attached trust policy to give Amazon SageMaker permission to assume the role. This allows Amazon A2I to perform actions in accordance with permissions that you attach to the role. For example policies that you can modify and attach to the role you use to create a flow definition, see Add Permissions to the IAM Role Used to Create a Flow Definition.
  • Configure permission to invoke the Amazon Rekognition DetectModerationLabels You need to attach the AmazonRekognitionFullAccess policy to the AWS Lambda function that calls the Amazon Rekognition detect_moderation_labels API.
  • Provide Amazon S3 Access, Put, and Get permission to Lambda if you wish to have Lambda use Amazon S3 to access images for analysis.
  • Give the Lambda function AmazonSageMakerFullAccess access to the Amazon A2I services for the human review.

Creating a private work team

A work team is a group of people that you select to review your documents. You can create a work team from a workforce, which is made up of Amazon Mechanical Turk workers, vendor-managed workers, or your own private workers that you invite to work on your tasks. Whichever workforce type you choose, Amazon A2I takes care of sending tasks to workers. For this post, you create a work team using a private workforce and add yourself to the team to preview the Amazon A2I workflow.

To create your private work team, complete the following steps:

  1. Navigate to the Labeling workforces page on the Amazon SageMaker console.
  2. On the Private tab, choose Create private team.
  3. For Team name, enter an appropriate team name.
  4. For Add workers, you can choose to add workers to your workforce by importing workers from an existing user group in AWS Cognito or by inviting new workers by email.

For this post, we suggest adding workers by email. If you create a workforce using an existing AWS Cognito user group, be sure that you can access an email in that workforce to complete this use case.

  1. Choose Create private team.
  2. On the Private tab, choose the work team you just created to view your work team ARN.
  3. Record the ARN to use when you create a flow definition in the next section.

After you create the private team, you get an email invitation. The following screenshot shows an example email.

  1. Choose the link to log in and change your password.

You’re now registered as a verified worker for this team. The following screenshot shows the updated information on the Private tab.

Your one-person team is now ready, and you can create a human review workflow.

Creating a human review workflow

In this step, you create a human review workflow, where you specify your work team, identify where you want output data to be stored in Amazon S3, and create instructions to help workers complete your document review task.

To create a human review workflow, complete the following:

  1. In the Augmented AI section on the Amazon SageMaker console, navigate to the Human review workflows
  2. Choose Create human review workflow.

On this page, you configure your workflow.

  1. Enter a name for your workflow.
  2. Choose an S3 bucket where you want Amazon A2I to store the output of the human review.
  3. Choose an IAM role for the workflow.

You can create a new role automatically with Amazon S3 access and an Amazon SageMaker execution policy attached, or you can choose a role that already has these permissions attached.

  1. In the Task type section, select Rekognition – Image moderation.
  2. In the Amazon Rekognition-Image Moderation – Conditions for invoking human review section, you can specify conditions that trigger a human review.

For example, if the confidence of the output label produced by Amazon Rekognition is between the range provided (70–100, for this use case), the document is sent to the portal for human review. You can also select different confidence thresholds for each image moderation output label through Amazon A2I APIs.

  1. In the Worker task template creation section, if you already have an A2I worker task template, you can choose Use your own template. Otherwise, select Create from a default template and enter a name and task description. For this use case, you can use the default worker instructions provided.
  2. In the Workers section, select Private.
  3. For Private teams, choose the private work team you created earlier.
  4. Choose Create.

You’re redirected to the Human review workflows page, where you can see the name and ARN of the human review workflow you just created.

  1. Record the ARN to use in the next section.

Configuring Lambda to run Amazon Rekognition

In this step, you create a Lambda function to call the Amazon Rekognition API detect_moderation_labels. You use the HumanLoopConfig parameter of detect_moderation_labels to integrate an Amazon A2I human review workflow into your Amazon Rekognition image moderation job.

  1. On the Lambda console, create a new function called A2IRegok.
  2. For Runtime, choose Python 3.7.
  3. Under Permission, choose Use an existing role.
  4. Choose the role you created.
  5. In the Function code section, remove the function code and replace it with the following code.
    1. Inside the Lambda function, import two libraries: uuid and boto3.
    2. Modify the function code as follows:
      1. Replace the FlowDefinationArn in line 12 with one you saved in the last step.
      2. On line 13, provide a unique name to the HumanLoopName or use uuid to generate a unique ID.
      3. You use the detect_moderation_labels API operation to analyze the picture (JPG, PNG). To use the picture from the Amazon S3 bucket, specify the bucket name and key of the file inside the API call as shown in lines 7 and 8.
  1 import boto3
  2 import uuid
  3
  4 def lambda_handler(event, context):
  5     if event:
  6
  7      bucket_name = "a2idemorekog". # Add your sourcebucketname
  8      src_filename = "1.png".       # Add the src filename
  9      rekognition = boto3.client('rekognition')
 10      human_loop_unique_id = str(uuid.uuid4()) + '1'
 11      humanLoopConfig = {
 12                 'FlowDefinitionArn':"arn:aws:sagemaker:us-east-1:123456789123:flow-definition/a2i-rekognition-wf",
 13                 'HumanLoopName':human_loop_unique_id
 14             }
 15
 16      response = rekognition.detect_moderation_labels(
 17                 Image = {
 18                     "S3Object": {
 19                      "Bucket": bucket_name,
 20                      "Name": src_filename,
 21                      }
 22                     },
 23      HumanLoopConfig = humanLoopConfig
 24             )

Calling Amazon Rekognition using Lambda

To configure and run a serverless function, complete the following steps:

  1. On the Lambda console, choose your function.
  2. Choose Configure test events from the drop-down menu.

The editor appears to enter an event to test your function.

  1. On the Configure test event page, select Create new test event.
  2. For Event template, choose hello-world.
  3. For Event name, enter a name; for example, DemoEvent.
  4. You can change the values in the sample JSON. For this use case, no change is needed.

For more information, see Run a Serverless “Hello, World!” and Create a Lambda function with the console.

  1. Choose Create.
  2. To run the function, choose Test.

When the test is complete, you can view the results on the console:

  • Execution result – Verifies that the test succeeded
  • Summary – Shows the key information reported in the log output
  • Log output – Shows the logs the Lambda function generated

The response to this call contains the inference from Amazon Rekognition and the evaluated activation conditions that may or may not have led to a human loop creation. If a human loop is created, the output contains HumanLoopArn. You can track its status using the Amazon A2I API operation DescribeHumanLoop.

Completing a human review of your image

To complete a human review of your image, complete the following steps:

  1. Open the URL in the email you received.

You see a list of reviews you are assigned to.

  1. Choose the image you want to review.
  2. Choose Start working.

After you start working, you must complete the task within 60 minutes.

  1. Choose an appropriate category for the image.

Before choosing Submit, if you go to the Human review workflow page on the Amazon SageMaker console and choose the human review workflow you created, you can see a Human loops summary section for that workflow.

  1. In your worker portal, when you’re done working, choose Submit.

After you complete your job, the status of the human loop workflow is updated.

If you navigate back to the Human review workflow page, you can see the human loop you just completed has the status Completed.

Processing the output

The output data from your review is located in Bucket when you configured your human review workflow on the Amazon A2I console. The path to the data uses the following pattern: YYYY/MM/DD/hh/mm/ss.

The output file (output.json) is structured as follows:

{
    "awsManagedHumanLoopRequestSource": "AWS/Rekognition/DetectModerationLabels/Image/V3",
    "flowDefinitionArn": "arn:aws:sagemaker:us-east-1:111122223333:flow-definition/a2i-rekog-blog",
    "humanAnswers": [
        {
            "answerContent": {
                "AWS/Rekognition/DetectModerationLabels/Image/V3": {
                    "moderationLabels": [
                        {
                            "name": "Weapon Violence",
                            "parentName": "Violence"
                        },
                        {
                            "name": "Violence",
                            "parentName": ""
                        }
                    ]
                }
            },
            "submissionTime": "2020-05-27T15:44:39.726Z",
            "workerId": "000cd1c234b5fcc7",
            "workerMetadata": {
                "identityData": {
                    "identityProviderType": "Cognito",
                    "issuer": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_00aa00a",
                    "sub": "b000a000-0b00-0ae0-bf00-0000f0bfd00d"
                }
            }
        }
    ],
    "humanLoopName": "389fd1a7-c658-4020-8f73-e9afcbfa8fd31",
    "inputContent": {
        "aiServiceRequest": {
            "humanLoopConfig": {
                "flowDefinitionArn": "arn:aws:sagemaker:us-east-1:111122223333:flow-definition/a2i-rekog-blog",
                "humanLoopName": "389fd1a7-c658-4020-8f73-e9afcbfa8fd31"
            },
            "image": {
                "s3Object": {
                    "bucket": "AWSDOC-EXAMPLE-BUCKET",
                    "name": "1.png"
                }
            }
        },
        "aiServiceResponse": {
            "moderationLabels": [
                {
                    "confidence": 80.41172,
                    "name": "Weapon Violence",
                    "parentName": "Violence"
                },
                {
                    "confidence": 80.41172,
                    "name": "Violence",
                    "parentName": ""
                }
            ],
            "moderationModelVersion": "3.0"
        },
        "humanTaskActivationConditionResults": {
            "Conditions": [
                {
                    "And": [
                        {
                            "ConditionParameters": {
                                "ConfidenceLessThan": 100,
                                "ModerationLabelName": "*"
                            },
                            "ConditionType": "ModerationLabelConfidenceCheck",
                            "EvaluationResult": true
                        },
                        {
                            "ConditionParameters": {
                                "ConfidenceGreaterThan": 60,
                                "ModerationLabelName": "*"
                            },
                            "ConditionType": "ModerationLabelConfidenceCheck",
                            "EvaluationResult": true
                        }
                    ],
                    "EvaluationResult": true
                }
            ]
        },
        "selectedAiServiceResponse": {
            "moderationLabels": [
                {
                    "confidence": 80.4117202758789,
                    "name": "Weapon Violence",
                    "parentName": "Violence"
                },
                {
                    "confidence": 80.4117202758789,
                    "name": "Violence",
                    "parentName": ""
                }
            ],
            "moderationModelVersion": "3.0"
        }
    }
}

In this JSON object, you have all the input and output content in one place so that you can parse one file to get the following:

  • humanAnswers – Contains answerContent, which lists the labels chosen by the human reviewer, and workerMetadata, which contains information that you can use to track private workers
  • inputContent – Contains information about the input data object that was reviewed, the label category options available to workers, and the responses workers submitted

For more information about the location and format of your output data, see Monitor and Manage Your Human Loop.

Conclusion

This post has merely scratched the surface of what Amazon A2I can do. Amazon A2I is available in 12 Regions. For more information, see Region Table. To learn more about the Amazon Rekognition DetectModerationLabels API integration with Amazon A2I, see Use Amazon Augmented AI with Amazon Rekognition.

For video presentations, sample Jupyter notebooks, or more information about use cases like document processing, object detection, sentiment analysis, text translation, and others, see Amazon Augmented AI Resources.


About the Author

Suresh Patnam is a Solutions Architect at AWS. He helps customers innovate on the AWS platform by building highly available, scalable, and secure architectures on Big Data and AI/ML. In his spare time, Suresh enjoys playing tennis and spending time with his family.

Read More

Serving PyTorch models in production with the Amazon SageMaker native TorchServe integration

Serving PyTorch models in production with the Amazon SageMaker native TorchServe integration

In April 2020, AWS and Facebook announced the launch of TorchServe to allow researches and machine learning (ML) developers from the PyTorch community to bring their models to production more quickly and without needing to write custom code. TorchServe is an open-source project that answers the industry question of how to go from a notebook to production using PyTorch and customers around the world, such as Matroid, are experiencing the benefits firsthand. Similarly, over 10,000 customers have adopted Amazon SageMaker to quickly build, train, and deploy ML models at scale, and many of them have made it their standard platform for ML. From a model serving perspective, Amazon SageMaker abstracts all the infrastructure-centric heavy lifting and allows you to deliver low-latency predictions securely and reliably to millions of concurrent users around the world.

TorchServe’s native integration with Amazon SageMaker

AWS is excited to announce that TorchServe is now natively supported in Amazon SageMaker as the default model server for PyTorch inference. Previously, you could use TorchServe with Amazon SageMaker by installing it on a notebook instance and starting a server to perform local inference or by building a TorchServe container and referencing its image to create a hosted endpoint. However, full notebook installations can be time-intensive and some data scientists and ML developers may not prefer to manage all the steps and AWS Identity and Access Management (IAM) permissions involved with building the Docker container and storing the image on Amazon Elastic Container Registry (Amazon ECR) before ultimately uploading the model to Amazon Simple Storage Service (Amazon S3) and deploying the model endpoint. With this release, you can use the native Amazon SageMaker SDK to serve PyTorch models with TorchServe.

To support TorchServe natively in Amazon SageMaker, the AWS engineering teams submitted pull requests to the aws/sagemaker-pytorch-inference-toolkit and the aws/deep-learning-containers repositories. After these were merged, we could use TorchServe via the Amazon SageMaker APIs for PyTorch inference. This change introduces a tighter integration with the PyTorch community. As more features related to the TorchServe serving framework are released in the future, they are tested, ported over, and made available as an AWS Deep Learning Container image. It’s important to note that our implementation hides the .mar from the user while still using the Amazon SageMaker PyTorch API everyone is used to.

The TorchServe architecture in Amazon SageMaker

You can use TorchServe natively with Amazon SageMaker through the following steps:

  1. Create a model in Amazon SageMaker. By creating a model, you tell Amazon SageMaker where it can find the model components. This includes the Amazon S3 path where the model artifacts are stored and the Docker registry path for the Amazon SageMaker TorchServe image. In subsequent deployment steps, you specify the model by name. For more information, see CreateModel.
  2. Create an endpoint configuration for an HTTPS endpoint. You specify the name of one or more models in production variants and the ML compute instances that you want Amazon SageMaker to launch to host each production variant. When hosting models in production, you can configure the endpoint to elastically scale the deployed ML compute instances. For each production variant, you specify the number of ML compute instances that you want to deploy. When you specify two or more instances, Amazon SageMaker launches them in multiple Availability Zones. This provides continuous availability. Amazon SageMaker manages deploying the instances. For more information, see CreateEndpointConfig.
  3. Create an HTTPS endpoint. Provide the endpoint configuration to Amazon SageMaker. The service launches the ML compute instances and deploys the model or models as specified in the configuration. For more information, see CreateEndpoint. To get inferences from the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint. For more information about the API, see InvokeEndpoint.

The Amazon SageMaker Python SDK simplifies these steps as we will demonstrate in the following example notebook.

Using a fine-tuned HuggingFace base transformer (RoBERTa)

For this post, we use a HuggingFace transformer, which provides us with a general-purpose architecture for Natural Language Understanding (NLU). Specifically, we present you with a RoBERTa base transformer that was fined tuned to perform sentiment analysis. The pre-trained checkpoint loads the additional head layers and the model will outputs positive, neutral, and negative sentiment of text.

Deploying a CloudFormation Stack and verifying notebook creation

You will deploy an ml.m5.xlarge Amazon SageMaker notebook instance. For more information about pricing, see Amazon SageMaker Pricing.

  1. Sign in to the AWS Management Console.
  2. Choose from the following table to launch your template.
Launch Template Region
N.Virginia (us-east-1)
Ireland (eu-west-1)
Singapore (ap-southeast-1)

You can launch this stack for any Region by updating the hyperlink’s Region value.

  1. In the Capabilities and transforms section, select the three acknowledgement boxes.
  2. Choose Create stack.

Your CloudFormation stack takes about 5 minutes to complete creating the Amazon SageMaker notebook instance and its IAM role.

  1. When the stack creation is complete, check the output on the Resources tab.
  2. On the Amazon SageMaker console, under Notebook, choose Notebook instances.
  3. Locate your newly created notebook and choose Open Jupyter.

Accessing the Lab

From within the notebook instance, navigate to the serving_natively_with_amazon_sagemaker directory and open deploy.ipynb.

You can now run through the steps within the Jupyter notebook:

  1. Set up your hosting environment.
  2. Create your endpoint.
  3. Perform predictions with a TorchServe backend Amazon SageMaker endpoint.

After setting up your hosting environment, creating an Amazon SageMaker endpoint using the native TorchServe estimator is as easy as:

model = PyTorchModel(model_data=model_artifact,
                   name=name_from_base('roberta-model'),
                   role=role, 
                   entry_point='torchserve-predictor.py',
                   source_dir='source_dir',
                   framework_version='1.6.0',
                   predictor_cls=SentimentAnalysis)

endpoint_name = name_from_base('roberta-model')
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', endpoint_name=endpoint_name)

Cleaning Up

When you’re finished with this lab, your Amazon SageMaker endpoint should have already been deleted. If not, complete the following steps to delete it:

  1. On the Amazon SageMaker console, under Inference, choose Endpoints.
  2. Select the endpoint (it should begin with roberta-model).
  3. From the Actions drop-down menu, choose Delete.

On the AWS CloudFormation console, delete the rest of your environment choosing the torchserve-on-aws stack and choosing Delete.

You can see two other stack names that were built based off of the original CloudFormation template. These are nested stacks and are automatically deleted with the main stack. The cleanup process takes just over 3 minutes to spin down your environment and will delete your notebook instance and the associated IAM role.

Conclusion

As TorchServe continues to evolve around the very specific needs of the PyTorch community, AWS is focused on ensuring that you have a common and performant way to serve models with PyTorch. Whether you’re using Amazon SageMaker, Amazon Elastic Compute Cloud (Amazon EC2), or Amazon Elastic Kubernetes Service (Amazon EKS), you can expect AWS to continue to optimize the backend infrastructure in support of our open-source community. We encourage all of you to submit pull requests and/or create issues in our repositories (TorchServe, AWS Deep learning containers, PyTorch inference toolkit, etc) as needed.


About the Author

As a Principal Solutions Architect, Todd spends his time working with strategic and global customers to define business requirements, provide architectural guidance around specific use cases, and design applications and services that are scalable, reliable, and performant. He has helped launch and scale the reinforcement learning powered AWS DeepRacer service, is a host for the AWS video series “This is My Architecture”, and speaks regularly at AWS re:Invent, AWS Summits, and technology conferences around the world.

 

 

 

 

Read More

Activity detection on a live video stream with Amazon SageMaker

Activity detection on a live video stream with Amazon SageMaker

Live video streams are continuously generated across industries including media and entertainment, retail, and many more. Live events like sports, music, news, and other special events are broadcast for viewers on TV and other online streaming platforms.

AWS customers increasingly rely on machine learning (ML) to generate actionable insights in real time and deliver an enhanced viewing experience or timely alert notifications. For example, AWS Sports explains how leagues, broadcasters, and partners can train teams, engage fans, and transform the business of sports with ML.

Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no ML expertise to use. With Amazon Rekognition, you can identify objects, people, text, scenes, and some pre-defined activities in videos.

However, if you want to use your own video activity dataset and your own model or algorithm, you can use Amazon SageMaker. Amazon SageMaker is a fully managed service that allows you to build, train, and deploy ML models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models.

In this post, you use Amazon SageMaker to automatically detect activities from a custom dataset in a live video stream.

The large volume of live video streams generated needs to be stored, processed in real time, and reviewed by a human team at low latency and low cost. Such pipelines include additional processing steps if specific activities are automatically detected in a video segment, such as penalty kicks in football triggering the generation of highlight clips for viewer replay.

ML inference on a live video stream can have the following challenges:

  • Input source – The live stream video input must be segmented and made available in a data store for consumption with negligible latency.
  • Large payload size – A video segment of 10 seconds can range from 1–15 MB. This is significantly larger than a typical tabular data record.
  • Frame sampling – Video frames must be extracted at a sampling rate generally lesser than the original video FPS (frames per second) efficiently and without a loss in output accuracy.
  • Frame preprocessing – Video frames must be preprocessed to reduce image size and resolution without a loss in output accuracy. They also need to be normalized across all the images used in training.
  • External dependencies – External libraries like opencv and ffmpeg with different license types are generally used for image processing.
  • 3D model network – Video models are typically 3D networks with an additional temporal dimension to image networks. They work on 5D input (batch size, time, RGB channel, height, width) and require a large volume of annotated input videos.
  • Low latency, low cost – The output ML inference needs to be generated within an acceptable latency and at a low cost to demonstrate value.

This post describes two phases of a ML lifecycle. In the first phase, you build, train, and deploy a 3D video classification model on Amazon SageMaker. You fine-tune a pre-trained model with a ResNet50 backbone by transfer learning on another dataset and test inference on a sample video segment. The pre-trained model from a well-known model zoo reduces the need for large volumes of annotated input videos and can be adapted for tasks in another domain.

In the second phase, you deploy the fine-tuned model on an Amazon SageMaker endpoint in a production context of an end-to-end solution with a live video stream. The live video stream is simulated by a sample video in a continuous playback loop with AWS Elemental MediaLive. Video segments from the live stream are delivered to Amazon Simple Storage Service (Amazon S3), which invokes the Amazon SageMaker endpoint for inference. The inference payload is the Amazon S3 URI of the video segment. The use of the Amazon S3 URI is efficient because it eliminates the need to serialize and deserialize a large video frame payload over a REST API and transfers the responsibility of ingesting the video to the endpoint. The endpoint inference performs frame sampling and pre-processing followed by video segment classification, and outputs the result to Amazon DynamoDB.

Finally, the solution performance is summarized with respect to latency, throughput, and cost.

For the complete code associated with this post, see the GitHub repo.

Building, training, and deploying an activity detection model with Amazon SageMaker

In this section, you use Amazon SageMaker to build, train, and deploy an activity detection model in a development environment. The following diagram depicts the Amazon SageMaker ML pipeline used to perform the end-to-end model development. The training and inference Docker images are available for any deep learning framework of choice, like TensorFlow, Pytorch, and MXNet. You use the “bring your own script” capability of Amazon SageMaker to provide a custom entrypoint script for training and inference. For this use case, you use transfer learning to fine-tune a pretrained model from a model zoo with another dataset loaded in Amazon S3. You use model artifacts from the training job to deploy an endpoint with automatic scaling, which dynamically adjusts the number of instances in response to changes in the invocation workload.

For this use case, you use two popular activity detection datasets: Kinetics400 and UCF101. Dataset size is a big factor in the performance of deep learning models. Kinetics400 has 306,245 short trimmed videos from 400 action categories. However, you may not have such a high volume of annotated videos in other domains. Training a deep learning model on small datasets from scratch may lead to severe overfitting, which is why you need transfer learning.

UCF101 has 13,320 videos from 101 action categories. For this post, we use UCF101 as a dataset from another domain for transfer learning. The pre-trained model is fine-tuned by replacing the last classification (dense) layer to the number of classes in the UCF101 dataset. The final test inference is run on new sample videos from Pexels that were unavailable during the training phase. Similarly, you can train good models on activity videos from your own domain without large annotated datasets and with less computing resource utilization.

We chose Apache MXNet as the deep learning framework for this use case because of the availability of the GluonCV toolkit. GluonCV provides implementations of state-of-the-art deep learning algorithms in computer vision. It features training scripts that reproduce state-of-the-art results reported in the latest research papers, a model zoo with a large set of pre-trained models, carefully designed APIs, and easy-to-understand implementations and community support. The action recognition model zoo contains multiple pre-trained models with the Kinetics400 dataset. The following graph shows the inference throughputs vs. validation accuracy and device memory footprint for each of them.

I3D (Inflated 3D Networks) and SlowFast are 3D video classification networks with varying trade-offs between accuracy and efficiency. For this post, we chose the Inflated 3D model (I3D) with ResNet50 backbone trained on the Kinetics400 dataset. It uses 3D convolution to learn spatiotemporal information directly from videos. The input to this model is of the form (N x C x T x H x W), where N is the batch size, C is the number of colour channels, T is the number of frames in the video segment, H is the height of the video frame, and W is the width of the video frame.

Both model training and inference require efficient and convenient slicing methods for video preprocessing. Decord provides such methods based on a thin wrapper on top of hardware accelerated video decoders. We use Decord as a video reader, and use VideoClsCustom as a custom GluonCV video data loader for pre-processing.

We launch the training script mode with the Amazon SageMaker MXNet training toolkit and the AWS Deep Learning container for training on Apache MXNet. The training process includes the following steps for the entire dataset:

  • Frame sampling and center cropping
  • Frame normalization with mean and standard deviation across all ImageNet images
  • Transfer learning on the pre-trained model with a new dataset

After you have a trained model artifact, you can include it in a Docker container that runs your inference script and deploys to an Amazon SageMaker endpoint. Inference is launched with the Amazon SageMaker MXNet inference toolkit and the AWS Deep Learning container for inference on Apache MXNet. The inference process includes the following steps for the video segment payload:

  • Pre-processing (similar to training)
  • Activity classification

We use the Amazon Elastic Compute Cloud (Amazon EC2) G4 instance for our Amazon SageMaker endpoint hosting. G4 instances provide the latest generation NVIDIA T4 GPUs, AWS custom Intel Cascade Lake CPUs, up to 100 Gbps of networking throughput, and up to 1.8 TB of local NVMe storage. G4 instances are optimized for computer vision application deployments like image classification and object detection. For more information about inference benchmarks, see NVIDIA Data Center Deep Learning Product Performance.

We test the model inference deployed as an Amazon SageMaker endpoint using the following video examples (included in the code repository).

Output activity  : TennisSwing
Output activity  : Skiing

Note that the UCF101 dataset does not have a snowboarding activity class. The most similar activity present in the dataset is skiing and the model predicts skiing when evaluated with a snowboarding video.

You can find the complete Amazon SageMaker Jupyter notebook example with transfer learning and inference on the GitHub repo. After the model is trained, deployed to an Amazon SageMaker endpoint, and tested with the sample videos in the development environment, you can deploy it in a production context of an end-to-end solution.

Deploying the solution with AWS CloudFormation

In this step, you deploy the end-to-end activity detection solution using an AWS CloudFormation template. After a successful deployment, you use the model to automatically detect an activity in a video segment from a live video stream. The following diagram depicts the roles of the AWS services in the solution.

The AWS Elemental MediaLive livestreaming channel that is created from a sample video file in S3 bucket is used to demonstrate the real-time architecture. You can use other live streaming services as well that are capable of delivering live video segments to S3.

  1. AWS Elemental MediaLive sends live video with HTTP Live Streaming (HLS) and regularly generates fragments of equal length (10 seconds) as .ts files and an index file that contains references of the fragmented files as a .m3u8 file in a S3 bucket.
  2. An upload of each video .ts fragment into the S3 bucket triggers a lambda function.
  3. The lambda function simply invokes a SageMaker endpoint for activity detection with the S3 URI of the video fragment as the REST API payload input.
  4. The SageMaker inference container reads the video from S3, pre-processes it, detects an activity and saves the prediction results to an Amazon DynamoDB table.

For this post, we use the sample Skiing People video to create the MediaLive live stream channel (also included in the code repository). In addition, the deployed model is the I3D model with Resnet50 backbone fine-tuned with the UCF101 dataset, as explained in the previous section. Autoscaling is enabled for the Amazon SageMaker endpoint to adjust the number of instances based on the actual workload.

The end-to-end solution example is provided in the GitHub repo. You can also use your own sample video and fine-tuned model.

There is a cost associated with deploying and running the solution, as mentioned in the Cost Estimation section of this post. Make sure to delete the CloudFormation stack if you no longer need it. The steps to delete the solution are detailed in the section Cleaning up.

You can deploy the solution by launching the CloudFormation stack:

The solution is deployed in the us-east-1 Region. For instructions on changing your Region, see the GitHub repo. After you launch the CloudFormation stack, you can update the parameters for your environments or leave the defaults. All their descriptions are provided when you launch it. Don’t create or select any other options in the configuration. In addition, don’t create or select an AWS Identity and Access Management (IAM) role on the Permissions tab. At the final page of the CloudFormation creation process, you need to select the two check-boxes under Capabilities. To create the AWS resources associated with the solution, you should have permissions to create CloudFormation stacks.

After you launch the stack, make sure that the root stack with its nested sub-stacks are successfully created by viewing the stack on the AWS CloudFormation console.

Wait until all the stacks are successfully created before you proceed to the next session. It takes 15–20 minutes to deploy the architecture.

The root CloudFormation stack ActivityDetectionCFN is mainly used to call the other nested sub-stacks and create the corresponding AWS resources.

The sub-stacks have the root stack name as their prefix. This is mainly to show that they are nested. In addition, the AWS Region name and account ID are added as suffices in the live stream S3 bucket name to avoid naming conflicts.

After the solution is successfully deployed, the following AWS resources are created:

  • S3 bucket – The bucket activity-detection-livestream-bucket-us-east-1-<account-id> stores video segments from the live video stream
  • MediaLive channel – The channel activity-detection-channel creates video segments from the live video stream and saves them to the S3 bucket
  • Lambda function – The function activity-detection-lambda invokes the Amazon SageMaker endpoint to detect activities from each video segment
  • SageMaker endpoint – The endpoint activity-detection-endpoint loads a video segment, detects an activity, and saves the results into a DynamoDB table
  • DynamoDB table – The table activity-detection-table stores the prediction results from the endpoint

Other supporting resources, like IAM roles, are also created to provide permissions to their respective resources.

Using the Solution

After the solution is successfully deployed, it’s time to test the model with a live video stream. You can start the channel on the MediaLive console.

Wait for the channel state to change to Running. When the channel state changes to Running, .ts-formatted video segments are saved into the live stream S3 bucket.

Only the most recent 21 video segments are kept in the S3 bucket to save storage. For more information about increasing the number of segments, see the GitHub repo.

Each .ts-formatted video upload triggers a Lambda function. The function invokes the Amazon SageMaker endpoint with an Amazon S3 URI to the .ts video as the payload. The deployed model predicts the activity and saves the results to the DynamoDB table.

Evaluating performance results

The Amazon SageMaker endpoint instance in this post is a ml.g4dn.2xlarge with 8 vCPU and 1 Nvidia T4 Tensor Core GPU. The solution generates one video segment every 10 seconds at 30 FPS. Therefore, there are six endpoint requests per minute (RPM). The overhead latency in video segment creation from the live stream up to delivery in Amazon S3 is approximately 1 second.

Amazon SageMaker supports automatic scaling for your hosted models. Autoscaling dynamically adjusts the number of instances provisioned for a model in response to changes in your workload. You can simulate a load test to determine the throughput of the system and the scaling policy for autoscaling in Amazon SageMaker. For the activity detection model in this post, a single ml.g4dn.2xlarge instance hosting the endpoint can handle 600 RPM with a latency of 300 milliseconds and with less than 5% number of HTTP errors. With a SAFETY_FACTOR = 0.5, the autoscaling target metric can be set as SageMakerVariantInvocationsPerInstance = 600 * 0.5 = 300 RPM.

For more information on load testing, see Load testing your autoscaling configuration.

Observe the differences between two scenarios under the same increasing load up to 1,700 invocations per minute in a period of 15 minutes. The BEFORE scenario is for a single instance without autoscaling, and the AFTER scenario is after two additional instances are launched by autoscaling.

The following graphs report the number of invocations and invocations per instance per minute. In the BEFORE scenario, with a single instance, both lines merge. In the AFTER scenario, the load pattern and Invocations (the blue line) are similar, but InvocationsPerInstance (the red line) is lower because the endpoint has scaled out.

The following graphs report average ModelLatency per minute. In the BEFORE scenario, the average latency remains under 300 milliseconds for the first 3 minutes of the test. The average ModelLatency exceeds 1 second after the endpoint gets more than 600 RPS and goes up to 8 seconds during peak load. In the AFTER scenario, the endpoint has already scaled to three instances and the average ModelLatency per minute remains close to 300 milliseconds.

The following graphs report the errors from the endpoint. In the BEFORE scenario, as the load exceeds 300 RPS, the endpoint starts to error out for some requests and increases as the number of invocations increase. In the AFTER scenario, the endpoint has scaled out already and reports no errors.

In the AFTER scenario, at peak load, average CPU utilization across instances is near maximum (100 % * 8 vcpu = 800 %) and average GPU utilization across instances is about 35%. To improve GPU utilization, additional effort can be made to optimize the CPU intensive pre-processing.

Cost Estimation

The following table summarizes the cost estimation for this solution. The cost includes the per hour pricing for one Amazon SageMaker endpoint instance of type ml.g4dn.2xlarge in the us-east-1 Region. If autoscaling is applied, the cost per hour includes the cost of the number of active instances. Also, MediaLive channel pricing is governed by the input and output configuration for the channel.

Component Details  Pricing (N.Virginia) Cost per hour
Elemental Media Live Single pipeline, On-Demand, SD $0.036 / hr input + $0.8496 / hr output

$ 0.8856

 

S3 30 mb per minute $0.023 per gb $ 0.0414
Lambda Only invoke sm, minimum config (128 mb) $0.0000002083 per 100 ms

$ 0.00007

 

SageMaker endpoint (ml.g4dn.2xlarge) single instance $1.053 per hour $1.053
DynamoDB $1.25 per million write request units $ 0.0015

Cleaning up

If you no longer need the solution, complete the following steps to properly delete all the associated AWS resources:

  1. On the MediaLive console, stop the live stream channel.
  2. On the Amazon S3 console, delete the S3 bucket you created earlier.
  3. On the AWS CloudFormation console, delete the root ActivityDetectionCFN This also removes all nested sub-stacks.

Conclusion

In this post, you used Amazon SageMaker to automatically detect activity in a simulated live video stream. You used a pre-trained model from the gluon-cv model zoo and Apache MXNet framework for transfer learning on another dataset. You also used Amazon SageMaker inference to preprocess and classify a video segment delivered to Amazon S3. Finally, you evaluated the solution respect to model latency, system throughput, and cost.


About the Authors

Hasan Poonawala is a Machine Learning Specialist Solution Architect at AWS, based in London, UK. Hasan helps customers design and deploy machine learning applications in production on AWS. He is passionate about the use of machine learning to solve business problems across various industries. In his spare time, Hasan loves to explore nature outdoors and spend time with friends and family.

 

 

Tesfagabir Meharizghi is a Data Scientist at the Amazon ML Solutions Lab where he works with customers across different verticals accelerate their use of machine learning and AWS cloud services to solve their business challenges. Outside of work, he enjoys spending time with his family and reading books.

Read More

Automating the analysis of multi-speaker audio files using Amazon Transcribe and Amazon Athena

Automating the analysis of multi-speaker audio files using Amazon Transcribe and Amazon Athena

In an effort to drive customer service improvements, many companies record the phone conversations between their customers and call center representatives. These call recordings are typically stored as audio files and processed to uncover insights such as customer sentiment, product or service issues, and agent effectiveness. To provide an accurate analysis of these audio files, the transcriptions need to clearly identify who spoke what and when.

However, given the average customer service agent handles 30–50 calls a day, the sheer volume of audio files to analyze quickly becomes a challenge. Companies need a robust system for transcribing audio files in large batches to improve call center quality management. Similarly, legal investigations often need to efficiently analyze case-related audio files in search of potential evidence or insight that can help win legal cases. Also, in the healthcare sector, there is a growing need for this solution to help transcribe and analyze virtual patient-provider interactions.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy to convert audio to text. One key feature of the service is called speaker identification, which you can use to label each individual speaker when transcribing multi-speaker audio files. You can specify Amazon Transcribe to identify 2–10 speakers in the audio clip. For the best results, define the correct number of speakers for the audio input.

A contact center, which often records multi-channel audio, can also benefit from using a feature called channel identification. The feature can separate each channel from within a single audio file and simultaneously transcribe each track. Typically, an agent and a caller are recorded on separate channels, which are merged into a single audio file. Contact center applications like Amazon Connect record agent and customer conversations on different channels (for example, the agent’s voice is captured in the left channel, and the customer’s in the right for a two-channel stereo recording). Contact centers can submit the single audio file to Amazon Transcribe, which identifies the two channels and produces a coherent merged transcript with channel labels.

In this post, we walk through a solution that analyzes audio files involving multiple speakers using Amazon Transcribe and Amazon Athena, a serverless query service for big data. Combining these two services together, you can easily set up a serverless, pay-per-use solution for processing audio files into readable text and analyze the data using standard query language (SQL).

Solution overview

The following diagram illustrates the solution architecture.

The solution contains the following steps:

  1. You upload the audio file to the Amazon Simple Storage Service (Amazon S3) bucket AudioRawBucket.
  2. The Amazon S3 PUT event triggers the AWS Lambda function LambdaFunction1.
  3. The function invokes an asynchronous Amazon Transcribe API call on the uploaded audio file.
  4. The function also writes a message into Amazon Simple Queue Service (Amazon SQS) with the transcription job information.
  5. The transcription job runs and writes the output in JSON format to the target S3 bucket, AudioPrcsdBucket.
  6. An Amazon CloudWatch Events rule triggers the function(LambdaFunction2) to run for every 2 minutes interval.
  7. The function LambdaFunction2 reads the SQS queue for transcription jobs, checks for job completion, converts the JSON file to CSV, and loads an Athena table with the audio text data.
  8. You can access the processed audio file transcription from the AudioPrcsdBucket.
  9. You also query the data with Amazon Athena.

Prerequisites

To get started, you need the following:

  • A valid AWS account with access to AWS services
  • The Athena database “default” in an AWS account in us-east-1
  • A multi-speaker audio file—for this post, we use medical-diarization.wav

To achieve the best results, we recommend the following:

  • Use a lossless format, such as WAV or FLAC, with PCM 16-bit encoding
  • Use a sample rate of 8000 Hz for low-fidelity audio and 16000 Hz for high-fidelity audio

Deploying the solution

You can use the provided AWS CloudFormation template to launch and configure all the resources for the solution.

  1. Choose Launch Stack:

This takes you to the Create stack wizard on the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default.

The CloudFormation templates used in this post are designed to work only in the us-east-1 Region. These templates are also not intended for production use without modification.

  1. On the Select Template page, keep the default URL for the CloudFormation template, and choose Next.
  2. On the Specify Details page, review and provide values for the required parameters in the template.
    • For EnvName, enter Dev.

Dev is your environment, where you want to deploy the template. AWS CloudFormation uses this value for resources in Lambda, Amazon SQS, and other services.

  1. After you specify the template details, choose Next.
  2. On the Options page, choose Next again.
  3. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  4. Choose Create Stack

It takes approximately 5–10 minutes for the deployment to complete. When the stack launch is complete, it returns outputs with information about the resources that were created.

You can view the stack outputs on the AWS Management Console or by using the following AWS Command Line Interface (AWS CLI) command:

aws cloudformation describe-stacks --stack-name <stack-name> --region us-east-1 --query 	Stacks[0].Outputs

Resources created by the CloudFormation stack

  • AudioRawBucket – Stores raw audio files based on the PUT event Lambda function for Amazon Transcribe to run
  • AudioPrcsdBucket – Stores the processed output
  • LambdaRole1 – The Lambda role with required permissions for S3 buckets, Amazon SQS, Amazon Transcribe, and CloudWatch
  • LambdaFunction1 – The initial function to run Amazon Transcribe to process the audio file, create a JSON file, and update Amazon SQS
  • LambdaFunction2 – The post function that reads the SQS queue, converts (aggregates) the JSON to CSV format, and loads it into an Athena table
  • TaskAudioQueue– The SQS queue for storing all audio processing requests
  • ScheduledRule– The CloudWatch schedule for LambdaFunction2
  • AthenaNamedQuery – The Athena table definition for storing processed audio files transcriptions with object information

The Athena table for the audio text has the following definitions:

  • audio_transcribe_job – The job submitted to transcribe the audio
  • time_start – The beginning timestamp for the speaker
  • speaker – Speaker tags (for example, spk_0, spk-1, and so on)
  • speaker_text – The text from the speaker audio

Validating the solution

You can now validate that the solution works.

  1. Verify the AWS CloudFormation resources were created (see previous section for instructions via the console or AWS CLI).
  2. Upload the sample audio file to the S3 bucket AudioRawBucket.

The transcription process is asynchronous, so it can take a few minutes for the job to complete. You can check the job status on the Amazon Transcribe console and CloudWatch console.

When the transcription job is complete and Athena table transcribe_data created, you can run Athena queries to verify the transcription output. See the following select statement:

select * from "default"."transcribe_data" order by 1,2

The following table shows the output for the above select statement.

audio_transcribe_job time_start speaker speaker_text
medical-diarization.wav 0:00:01 spk_0  Hey, Jane. So what brings you into my office today?
medical-diarization.wav 0:00:03 spk_1  Hey, Dr Michaels. Good to see you. I’m just coming in from a routine checkup.
medical-diarization.wav 0:00:07 spk_0  All right, let’s see, I last saw you. About what, Like a year ago. And at that time, I think you were having some minor headaches. I don’t recall prescribing anything, and we said we’d maintain some observations unless things were getting worse.
medical-diarization.wav 0:00:20 spk_1  That’s right. Actually, the headaches have gone away. I think getting more sleep with super helpful. I’ve also been more careful about my water intake throughout my work day.
medical-diarization.wav 0:00:29 spk_0  Yeah, I’m not surprised at all. Sleep deprivation and chronic dehydration or to common contributors to potential headaches. Rest is definitely vital when you become dehydrated. Also, your brain tissue loses water, causing your brain to shrink and, you know, kind of pull away from the skull. And this contributor, the pain receptors around the brain, giving you the sensation of a headache. So how much water are you roughly taking in each day
medical-diarization.wav 0:00:52 spk_1  of? I’ve become obsessed with drinking enough water. I have one of those fancy water bottles that have graduated markers on the side. I’ve also been logging my water intake pretty regularly on average. Drink about three litres a day.
medical-diarization.wav 0:01:06 spk_0  That’s excellent. Before I start the routine physical exam is there anything else you like me to know? Anything you like to share? What else has been bothering you?

Cleaning up

To avoid incurring additional charges, complete the following steps to clean up your resources when you are done with the solution:

  1. Delete the Athena table transcribe_data from default
  2. Delete the prefixes and objects you created from the buckets AudioRawBucket and AudioPrcsdBucket.
  3. Delete the CloudFormation stack, which removes your additional resources.

Conclusion

In this post, we walked through the solution, reviewed sample implementation of audio file conversion using Amazon S3, Amazon Transcribe, Amazon SQS, Lambda, and Athena, and validated the steps for processing and analyzing multi-speaker audio files.

You can further extend this solution to perform sentiment analytics and improve your customer experience. For more information, see Detect sentiment from customer reviews using Amazon Comprehend. For more information about live call and post-call analytics, see AWS announces AWS Contact Center Intelligence solutions.


About the Authors

Mahendar Gajula is a Big Data Consultant at AWS. He works with AWS customers in their journey to the cloud with a focus on Big data, Data warehouse and AI/ML projects. In his spare time, he enjoys playing tennis and spending time with his family.

 

 

 

 

Rajarao Vijjapu is a data architect with AWS. He works with AWS customers and partners to provide guidance and technical assistance about Big Data, Analytics, AI/ML and Security projects, helping them improve the value of their solutions when using AWS.

Read More

Learn from the winner of the AWS DeepComposer Chartbusters Spin the Model Challenge

Learn from the winner of the AWS DeepComposer Chartbusters Spin the Model Challenge

AWS is excited to announce the winner of the second AWS DeepComposer Chartbusters challenge, Lena Taupier. AWS DeepComposer gives developers a creative way to get started with machine learning (ML). In June, we launched the Chartbusters challenge, a global competition where developers use AWS DeepComposer to create original compositions and compete to showcase their ML and generative AI skills. The second challenge, Spin the Model, required developers to bring their own data and create a custom genre model using a sample Amazon SageMaker notebook.

When Lena Taupier first attended the AWS DeepComposer workshop at re:Invent 2019, she had no idea she would be the winner of the Spin the Model challenge. Lena, a software developer for Blubrry, helps lead the company’s cloud infrastructure and applications development team. She also has her own blog in which she creates tutorials to make AWS skills more accessible. She describes herself as an ML novice and never would have thought she’d be experimenting with machine learning today.

We interviewed Lena about her experience competing in the second Chartbusters challenge, which ran from July 31 to August 23, and asked her to tell us more about how she created her winning composition.

Lena with her AWS DeepComposer keyboard

Getting started with machine learning

Lena has a background in classical piano, so when she first learned about AWS DeepComposer, she was intrigued to learn more.

“When I was younger, I studied classical piano pretty seriously and I still enjoy playing piano very much. I was at re:Invent last year when AWS DeepComposer was announced, and I was so excited by the thought of learning about AI while creating music. I ended up waiting in line for several hours to attend one of the demo sessions, but I was so eager to try it out that I didn’t even mind!”

Lena first heard about the AWS DeepComposer Chartbusters challenge through the AWS blog, and thought the challenge was a great way to get started with ML.

Building in AWS DeepComposer

To get started, Lena used the AWS DeepComposer learning capsules to learn more about AR-CNN models. The learning capsules provide easy-to-consume, bite-size content to help you learn the concepts of generative AI algorithms.

“The first thing I did was to go through the learning capsules about autoregressive convolutional neural networks and how to train AR-CNN models. It was a great resource for learning about different generative AI techniques.”

The Chartbusters Spin the Model challenge required developers to get creative and make a custom genre model by bringing their own dataset to train. Lena drew from her own background, having grown up in St. Lucia, a city with a history of oral and folk traditional music.

“Once I had a good understanding, I started brainstorming about what kind of music I wanted to use to train my model. I’m from St. Lucia, a small island in the Caribbean, where there is a rich history of unique music, so I thought it would be interesting to incorporate songs from there. I decided to create some of my own music clips inspired by Calypso and St. Lucian folk music to supplement my dataset.”

Lena’s workstation for the AWS DeepComposer Chartbusters challenge

Next, Lena began training her model using Amazon SageMaker.

“Once I had my dataset, I created a Jupyter notebook within Amazon SageMaker, using the repository provided as a starting point. I experimented with the hyperparameters and then let the training run overnight because I knew it would take many hours to process. The next day, I was finally able to use my trained model to make new music!”

Lena used her AWS DeepComposer keyboard and the music studio to generate different melodies and compositions until she was satisfied with her two final compositions.

“I submitted two AI-generated songs. The main theme in “Little Banjo” was inspired by a famous St. Lucian folk song. Layered on top of the melody generated by my AR-CNN model, I also used the MuseGAN Rock model to generate additional instruments for accompaniment. The other song is meant to resemble the style of Calypso, and has a rich beat with trumpet lines to complement the melody. I named it “Home Sweet Home” because I started feeling nostalgic about home after listening to so much St. Lucian music for this project!”

Lena working on her compositions in the AWS DeepComposer console

You can listen to Lena’s winning composition, “Home Sweet Home,” on the AWS DeepComposer SoundCloud page.

Conclusion

The AWS DeepComposer Chartbusters challenge Spin the Model helped Lena learn about generative AI through a hands-on and fun experience.

“By participating in this challenge, I was able to learn a lot about different generative AI techniques in a very hands-on way, which is the best way to learn. As someone with very little experience in AI and machine learning, it was a great feeling of accomplishment to be able to train a custom AR-CNN model and actually generate results.”

The Chartbusters challenge empowered Lena to go from beginner knowledge ML to creating winning compositions with AWS DeepComposer.

“I think AWS DeepComposer is such a great tool for reducing the barrier of entry into machine learning and making those concepts accessible to more people […] Even just a few months ago, I never would have thought I’d be experimenting with AI/ML. This challenge was such a great learning experience! I know there’s so much more to learn so I will definitely continue to explore and dive deeper.”

Her advice to future competitors? Now is the time to get started with ML.

“As a developer, I think it’s such an exciting time to have access to the cloud, because it really widens your horizons on what you can do […] The Chartbusters challenge is the perfect opportunity to get involved and start learning in a fun, creative, and hands-on manner!”

Congratulations to Lena for her well-deserved win!

We hope Lena’s story has inspired you to learn more about ML and get started with AWS DeepComposer. Check out the next AWS DeepComposer Chartbusters challenge, The Sounds of Science, running now until September 23.


About the Author

Paloma Pineda is a Product Marketing Manager for AWS Artificial Intelligence Devices. She is passionate about the intersection of technology, art, and human centered design. Out of the office, Paloma enjoys photography, watching foreign films, and cooking French cuisine.

 

 

Read More