Securing Amazon Comprehend API calls with AWS PrivateLink

Securing Amazon Comprehend API calls with AWS PrivateLink

Amazon Comprehend now supports Amazon Virtual Private Cloud (Amazon VPC) endpoints via AWS PrivateLink so you can securely initiate API calls to Amazon Comprehend from within your VPC and avoid using the public internet.

Amazon Comprehend is a fully managed natural language processing (NLP) service that uses machine learning (ML) to find meaning and insights in text. You can use Amazon Comprehend to analyze text documents and identify insights such as sentiment, people, brands, places, and topics in text. No ML expertise required.

Using AWS PrivateLink, you can access Amazon Comprehend easily and securely by keeping your network traffic within the AWS network, while significantly simplifying your internal network architecture. It enables you to privately access Amazon Comprehend APIs from your VPC in a scalable manner by using interface VPC endpoints. A VPC endpoint is an elastic network interface in your subnet with a private IP address that serves as the entry point for all Amazon Comprehend API calls.

In this post, we show you how to set up a VPC endpoint and enforce the use of this private connectivity for all requests to Amazon Comprehend using AWS Identity and Access Management (IAM) policies.

Prerequisites

For this example, you should have an AWS account and sufficient access to create resources in the following services:

Solution overview

The walkthrough includes the following high-level steps:

  1. Deploy your resources.
  2. Create VPC endpoints.
  3. Enforce private connectivity with IAM.
  4. Use Amazon Comprehend via AWS PrivateLink.

Deploying your resources

For your convenience, we have supplied an AWS CloudFormation template to automate the creation of all prerequisite AWS resources. We use the us-east-2 Region in this post, so the console and URLs may differ depending on the Region you select. To use this template, complete the following steps:

  1. Choose Launch Stack:
  2. Confirm the following parameters, which you can leave at the default values:
    1. SubnetCidrBlock1 – The primary IPv4 CIDR block assigned to the first subnet. The default value is 10.0.1.0/24.
    2. SubnetCidrBlock2 – The primary IPv4 CIDR block assigned to the second subnet. The default value is 10.0.2.0/24.
  3. Acknowledge that AWS CloudFormation may create additional IAM resources.
  4. Choose Create stack.

The creation process should take roughly 10 minutes to complete.

The CloudFormation template creates the following resources on your behalf:

  • A VPC with two private subnets in separate Availability Zones
  • VPC endpoints for private Amazon S3 and Amazon Comprehend API access
  • IAM roles for use by Lambda and Amazon Comprehend
  • An IAM policy to enforce the use of VPC endpoints to interact with Amazon Comprehend
  • An IAM policy for Amazon Comprehend to access data in Amazon S3
  • An S3 bucket for storing open-source data

The next two sections detail how to manually create a VPC endpoint for Amazon Comprehend and enforce usage with an IAM policy. If you deployed the CloudFormation template and prefer to skip to testing the API calls, you can advance to the Using Amazon Comprehend via AWS PrivateLink section.

Creating VPC endpoints

To create a VPC endpoint, complete the following steps:

  1. On the Amazon VPC console, choose Endpoints.
  2. Choose Create Endpoint.
  3. For Service category, select AWS services.
  4. For Service Name, choose amazonaws.us-east-2.comprehend.
  5. For VPC, enter the VPC you want to use.
  6. For Availability Zone, select your preferred Availability Zones.
  7. For Enable DNS name, select Enable for this endpoint.

This creates a private hosted zone that enables you to access the resources in your VPC using custom DNS domain names, such as example.com, instead of using private IPv4 addresses or private DNS hostnames provided by AWS. The Amazon Comprehend DNS hostname that the AWS Command Line Interface (CLI) and Amazon Comprehend SDKs use by default (https://comprehend.Region.amazonaws.com) resolves to your VPC endpoint.

  1. For Security group, choose the security group to associate with the endpoint network interface.

If you don’t specify a security group, the default security group for your VPC is associated.

  1. Choose Create Endpoint.

When the Status changes to available, your VPC endpoint is ready for use.

  1. Choose the Policy tab to apply more restrictive access control to the VPC endpoint.

The following example policy limits VPC endpoint access to an IAM role used by a Lambda function in our deployment. You should apply the principle of least privilege when defining your own policy. For more information, see Controlling access to services with VPC endpoints.

{                        
    "Version": "2012-10-17",
    "Statement": [    
        {                
            "Action": [
                "comprehend:DetectEntities",
                "comprehend:CreateDocumentClassifier"
            ],           
            "Resource": [
                "*"  
            ],           
            "Effect": "Allow",
            "Principal": {
                "AWS": [
"arn:aws:iam::#########:role/ComprehendPrivateLink-LambdaExecutionRole"                                         
                ]    
            }            
        }                
    ]                    
}

Enforcing private connectivity with IAM

To allow or deny access to Amazon Comprehend based on the use of a VPC endpoint, we include an aws:sourceVpce condition in the IAM policy. The following example policy provides access specifically to the DetectEntities and CreateDocumentClassifier APIs only when the request utilizes your VPC endpoint. You can include additional Amazon Comprehend APIs in the “Action” section of the policy or use “comprehend:*” to include them all. You can attach this policy to an IAM role to enable compute resources hosted within your VPC to interact with Amazon Comprehend.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ComprehendEnforceVpce",
            "Effect": "Allow",
            "Action": [
                "comprehend:CreateDocumentClassifier",
                "comprehend:DetectEntities"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "aws:SourceVpce": "vpce-xxxxxxxx"
                }
            }
        },
        {
            "Sid": "PassRole",
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::#########:role/ComprehendDataAccessRole"
        }
    ]
}

You should replace the VPC endpoint ID with the endpoint ID you created earlier. Permission to invoke the PassRole API is required for asynchronous operations in Amazon Comprehend like CreateDocumentClassifer and should be scoped to your specific data access role.

Using Amazon Comprehend via AWS PrivateLink

To start using Amazon Comprehend with AWS PrivateLink, you perform the following high-level steps:

  1. Review the Lambda function for API testing.
  2. Create the DetectEntities test event.
  3. Train a custom classifier.

Reviewing the Lambda function

To review your Lambda function, on the Lambda console, choose the Lambda function that contains ComprehendPrivateLink in its name.

The VPC section of the Lambda console provides links to the various networking components automatically created for you during the CloudFormation deployment.

The function code includes a sample program that takes user input to invoke the specific Amazon Comprehend APIs supported by our example IAM policy.

Creating a test event

In this section, we create an event to detect entities within sample text using a pretrained model.

  1. From the Test drop-down menu, choose Create new test event.
  2. For Event name, enter a name (for example, DetectEntities).
  3. Replace the event JSON with the following code:
    {
      "comprehend_api": "DetectEntities",
      "language_code": "en",
      "text": "Amazon.com, Inc. is located in Seattle, WA and was founded July 5th, 1994 by Jeff Bezos, allowing customers to buy everything from books to blenders."
    }

  4. Choose Save to store the test event.
  5. Choose Save to update the Lambda function.
  6. Choose Test to invoke the DetectEntities API.

The response should include results similar to the following code:

{
    "Entities": [
        {
            "Score": 0.9266431927680969,
            "Type": "ORGANIZATION",
            "Text": "Amazon.com, Inc.",
            "BeginOffset": 0,
            "EndOffset": 16
        },
        {
            "Score": 0.9952651262283325,
            "Type": "LOCATION",
            "Text": "Seattle, WA",
            "BeginOffset": 31,
            "EndOffset": 42
        },
        {
            "Score": 0.9998188018798828,
            "Type": "DATE",
            "Text": "July 5th, 1994",
            "BeginOffset": 59,
            "EndOffset": 73
        },
        {
            "Score": 0.9999810457229614,
            "Type": "PERSON",
            "Text": "Jeff Bezos",
            "BeginOffset": 77,
            "EndOffset": 87
        }
    ]
}

You can update the test event to identify entities from your own text.

Training a custom classifier

We now demonstrate how to build a custom classifier. For training data, we use a version of the Yahoo answers corpus that is preprocessed into the format expected by Amazon Comprehend. This corpus, available on the AWS Open Data Registry, is cited in the paper Text Understanding from Scratch by Xiang Zhang and Yann LeCun. It is also used in the post Building a custom classifier using Amazon Comprehend.

  1. Retrieve the training data from Amazon S3.
  2. On the Amazon S3 console, choose the example S3 bucket created for you.
  3. Choose Upload and add the file you retrieved.
  4. Choose the uploaded object and note the Key.
  5. Return to the test function on the Lambda console.
  6. From the Test drop-down menu, choose Create new test event.
  7. For Event name, enter a name (for example, TrainCustomClassifier).
  8. Replace the event input with the following code:
    {
      "comprehend_api": "CreateDocumentClassifier",
      "custom_classifier_name": "custom-classifier-example",
      "language_code": "en",
      "training_data_s3_key": "comprehend-train.csv"
    }

  9. If you changed the default file name, update the training_data_s3_key to match.
  10. Choose Save to store the test event.
  11. Choose Save to update the Lambda function.
  12. Choose Test to invoke the CreateDocumentClassifier API.

The response should include results similar to the following code:

{
"DocumentClassifierArn": "arn:aws:comprehend:us-east-2:0123456789:document-classifier/custom-classifier-example"
}
  1. On the Amazon Comprehend console, choose Custom classification to check the status of the document classifier training.

After approximately 20 minutes, the document classifier is trained and available for use.

Cleaning Up

To avoid incurring future charges, delete the resources you created during this walkthrough after concluding your testing.

  1. On the Amazon Comprehend console, delete the custom classifier.
  2. On the Amazon S3 console, empty the bucket created for you.
  3. If you launched the automated deployment, on the AWS CloudFormation console, delete the appropriate stack.

The deletion process takes approximately 10 minutes.

Conclusion

You have now successfully invoked Amazon Comprehend APIs using AWS PrivateLink. The use of IAM policies prevents requests from leaving your VPC and further improves your security posture. You can extend this solution to securely test additional features like Amazon Comprehend custom entity recognition real-time endpoints.

All Amazon Comprehend API calls are now supported via AWS PrivateLink. This feature exists in all commercial Regions where AWS PrivateLink and Amazon Comprehend are available. To learn more about securing Amazon Comprehend, see Security in Amazon Comprehend.


About the Authors

Dave Williams is a Cloud Consultant for AWS Professional Services. He works with public sector customers to securely adopt AI/ML services. In his free time, he enjoys spending time with his family, traveling, and watching college football.

 

 

 

Adarsha Subick is a Cloud Consultant for AWS Professional Services based out of Virginia. He works with public sector customers to help solve their AI/ML-focused business problems. In his free time, he enjoys archery and hobby electronics.

 

 

 

Saman Zarandioon is a Sr. Software Development Engineer for Amazon Comprehend. He earned a PhD in Computer Science from Rutgers University.

 

 

 

 

Read More

Machine learning best practices in financial services

Machine learning best practices in financial services

We recently published a new whitepaper, Machine Learning Best Practices in Financial Services, that outlines security and model governance considerations for financial institutions building machine learning (ML) workflows. The whitepaper discusses common security and compliance considerations and aims to accompany a hands-on demo and workshop that walks you through an end-to-end example. Although the whitepaper focuses on financial services considerations, much of the information around authentication and access management, data and model security, and ML operationalization (MLOps) best practices may be applicable to other regulated industries, such as healthcare.

A typical ML workflow, as shown in the following diagram, involves multiple stakeholders. To successfully govern and operationalize this workflow, you should collaborate across multiple teams, including business stakeholders, sysops administrators, data engineers, and software and devops engineers.

In the whitepaper, we discuss considerations for each team and also provide examples and illustrations of how you can use Amazon SageMaker and other AWS services to build, train, and deploy ML workloads. More specifically, based on feedback from customers running workloads in regulated environments, we cover the following topics:

  • Provisioning a secure ML environment – This includes the following:
    • Compute and network isolation – How to deploy Amazon SageMaker in a customer’s private network, with no internet connectivity.
    • Authentication and authorization – How to authenticate users in a controlled fashion and authorize these users based on their AWS Identity and Access Management (IAM) permissions, with no multi-tenancy.
    • Data protection – How to encrypt data in transit and at rest with customer-provided encryption keys.
    • Auditability – How to audit, prevent, and detect who did what at any given point in time to help identify and protect against malicious activities.
  • Establishing ML governance – This includes the following:
    • Traceability – Methods to trace ML model lineage from data preparation, model development, and training iterations, and how to audit who did what at any given point in time.
    • Explainability and interpretability –Methods that may help explain and interpret the trained model and obtain feature importance.
    • Model monitoring – How to monitor your model in production to protect against data drift, and automatically react to rules that you define.
    • Reproducibility – How to reproduce the ML model based on model lineage and the stored artifacts.
  • Operationalizing ML workloads – This includes the following:
    • Model development workload – How to build automated and manual review processes in the dev environment.
    • Preproduction workload – How to build automated CI/CD pipelines using the AWS CodeStar suite and AWS Step Functions.
    • Production and continuous monitoring workload – How to combine continuous deployment and automated model monitoring.
    • Tracking and alerting – How to track model metrics (operational and statistical) and alert appropriate users if anomalies are detected.

Provisioning a secure ML environment

A well-governed and secure ML workflow begins with establishing a private and isolated compute and network environment. This may be especially important in regulated industries, particularly when dealing with PII data for model building or training. The Amazon Virtual Private Cloud (VPC) that hosts Amazon SageMaker and its associated components, such as Jupyter notebooks, training instances, and hosting instances, should be deployed in a private network with no internet connectivity.

Furthermore, you can associate these Amazon SageMaker resources with your VPC environment, which allows you to apply network-level controls, such as security groups to govern access to Amazon SageMaker resources and control ingress and egress of data into and out of the environment. You can establish connectivity between Amazon SageMaker and other AWS services, such as Amazon Simple Storage Service (Amazon S3), using VPC endpoints or AWS PrivateLink. The following diagram illustrates a suggested reference architecture of a secure Amazon SageMaker deployment.

The next step is to ensure that only authorized users can access the appropriate AWS services. IAM can help you create preventive controls for many aspects of your ML environment, including access to Amazon SageMaker resources, your data in Amazon S3, and API endpoints. You can access AWS services using a RESTful API, and every API call is authorized by IAM. You grant explicit permissions through IAM policy documents, which specify the principal (who), the actions (API calls), and the resources (such as Amazon S3 objects) that are allowed, as well as the conditions under which the access is granted. For a deeper dive into building secure environments for financial services as well as other well-architected pillars, also refer to this whitepaper.

In addition, as ML environments may contain sensitive data and intellectual property, the third consideration for a secure ML environment is data encryption. We recommend that you enable data encryption both at rest and in transit with your own encryption keys. And lastly, another consideration for a well-governed and secure ML environment is a robust and transparent audit trail that logs all access and changes to the data and models, such as a change in the model configuration or the hyperparameters. More details on all those fronts are included in the whitepaper.

To enable self-service provisioning and automation, administrators can use tools such as the AWS Service Catalog to create these secure environments in a repeatable manner for their data scientists. This way, data scientists can simply log in to a secure portal using AWS Single Sign-On, and create a secure Jupyter notebook environment provisioned for their use with appropriate security guardrails in place.

Establishing ML governance

In this section of the whitepaper, we discuss the considerations around ML governance, which includes four key aspects: traceability, explainability, real-time model monitoring, and reproducibility. The financial services industry has various compliance and regulatory obligations that may touch on these aspects of ML governance. You should review and understand those obligations with your legal counsel, compliance personnel, and other stakeholders involved in the ML process.

As an example, if Jane Smith is denied a bank loan, the lender may be required to explain how that decision was made in order to comply with regulatory requirements. If the financial services industry customer is using ML as part of the loan review process, the prediction made by the ML model may need to be interpreted or explained in order to meet these requirements. Generally, an ML model’s interpretability or explainability refers to people’s ability to understand and explain the processes that the model uses to arrive at its predictions. It is also important to note that many ML models make predictions of a likely answer, rather than the answer itself. Therefore, it may be appropriate to have human review of predictions made by ML models before any action is taken. The model may also need to be monitored, so that if the underlying data changes, the model is periodically retrained on new data. Finally, the ML model may need to be reproducible, such that if the steps leading to the model’s output are retraced, the model outputs don’t change.

Operationalizing ML workloads

In the final section, we discuss some best practices around operationalizing ML workloads. We begin with a high-level discussion and then dive deeper into a specific architecture that uses AWS native tools and services. In addition to the process of deploying models, or what in traditional software deployments is referred to as CI/CD (continuous integration/deployment), deploying ML models into production for regulated industries may have additional implications from an implementation perspective.

The following diagram captures some of the high-level requirements that an enterprise ML platform might have to address guidelines around governance, auditing, logging, and reporting:

  • A data lake for managing raw data and associated metadata
  • A feature store for managing ML features and associated metadata (mapping from raw data to generated features such as one-hot encodings or scaling transformations)
  • A model and container registry containing trained model artifacts and associated metadata (such as hyperparameters, training times, and dependencies)
  • A code repository (such as Git, AWS CodeCommit, or Artifactory) for maintaining and versioning source code
  • A pipeline registry to version and maintain training and deployment pipelines
  • Logging tools for maintaining access logs
  • Production monitoring and performance logs
  • Tools for auditing and reporting

The following diagram illustrates a specific implementation that uses AWS native tools and services. Although several scheduling and orchestration tools are on the market, such as Airflow or Jenkins, for concreteness, we focus predominantly on Step Functions.

In the whitepaper, we dive deeper into each part of the preceding diagram, and more specifically into the following workloads:

  • Model development
  • Pre-production
  • Production and continuous monitoring

Summary

The Machine Learning Best Practices in Financial Services whitepaper is available here. Start using it today to help illustrate how you can build secure and well-governed ML workflows and feel free to reach out to the authors if you have any questions. As you progress on your journey, also refer to this whitepaper for a lens on the AWS well-architected principles applied to machine learning workloads. You can also use the video demo walkthrough, and the following two workshops:


About the authors

Stefan Natu is a Sr. Machine Learning Specialist at Amazon Web Services. He is focused on helping financial services customers build and operationalize end-to-end machine learning solutions on AWS. His academic background is in theoretical physics, and in the past, he worked on a number of data science problems in retail and energy verticals. In his spare time, he enjoys reading machine learning blogs, traveling, playing the guitar, and exploring the food scene in New York City.

 

Kosti Vasilakakis is a Sr. Business Development Manager for Amazon SageMaker, the AWS fully managed service for end-to-end machine learning, and he focuses on helping financial services and technology companies achieve more with ML. He spearheads curated workshops, hands-on guidance sessions, and pre-packaged open-source solutions to ensure that customers build better ML models quicker and safer. Outside of work, he enjoys traveling the world, philosophizing, and playing tennis.

 

 

Alvin Huang is a Capital Markets Specialist for Worldwide Financial Services Business Development at Amazon Web Services with a focus on data lakes and analytics, and artificial intelligence and machine learning. Alvin has over 19 years of experience in the financial services industry, and prior to joining AWS, he was an Executive Director at J.P. Morgan Chase & Co, where he managed the North America and Latin America trade surveillance teams and led the development of global trade surveillance. Alvin also teaches a Quantitative Risk Management course at Rutgers University and serves on the Rutgers Mathematical Finance Master’s program (MSMF) Advisory Board.

 

David Ping is a Principal Machine Learning Architect and Sr. Manager of AI/ML Solutions Architecture at Amazon Web Services. He helps enterprise customers build and operate machine learning solutions on AWS. David enjoys hiking and following the latest machine learning advancement.

 

Read More

Build more effective conversations on Amazon Lex with confidence scores and increased accuracy

In the rush of our daily lives, we often have conversations that contain ambiguous or incomplete sentences. For example, when talking to a banking associate, a customer might say, “What’s my balance?” This request is ambiguous and it is difficult to disambiguate if the intent of the customer is to check the balance on her credit card or checking account. Perhaps she only has a checking account with the bank. The agent can provide a good customer experience by looking up the account details and identifying that the customer is referring to the checking account. In the customer service domain, agents often have to resolve such uncertainty in interpreting the user’s intent by using the contextual data available to them. Bots face the same ambiguity and need to determine the correct intent by augmenting the contextual data available about the customer.

Today, we’re launching natural language understanding improvements and confidence scores on Amazon Lex. We continuously improve the service based on customer feedback and advances in research. These improvements enable better intent classification accuracy. You can also achieve better detection of user inputs not included in the training data (out of domain utterances). In addition, we provide confidence score support to indicate the likelihood of a match with a certain intent. The confidence scores for the top five intents are surfaced as part of the response. This better equips you to handle ambiguous scenarios such as the one we described. In such cases, where two or more intents are matched with reasonably high confidence, intent classification confidence scores can help you determine when you need to use business logic to clarify the user’s intent. If the user only has a credit card, then you can trigger the intent to surface the balance on the credit card. Alternately, if the user has both a credit card and a checking account, you can pose a clarification question such as “Is that for your credit card or checking account?” before proceeding with the query. You now have better insights to manage the conversation flow and create more effective conversations.

This post shows how you can use these improvements with confidence scores to trigger the best response based on business knowledge.

Building a Lex bot

This post uses the following conversations to model a bot.

If the customer has only one account with the bank:

User: What’s my balance?
Agent: Please enter your ATM card PIN to confirm
User: 5555
Agent: Your checking account balance is $1,234.00

As well as an alternate conversation path, where the customer has multiple types of accounts:

User: What’s my balance?
Agent: Sure. Is this for your checking account, or credit card?
User: My credit card
Agent: Please enter your card’s CCV number to confirm
User: 1212
Agent: Your credit card balance is $3,456.00

The first step is to build an Amazon Lex bot with intents to support transactions such as balance inquiry, funds transfer, and bill payment. The GetBalanceCreditCard, GetBalanceChecking, and GetBalanceSavings intents provide account balance information. The PayBills intent processes payments to payees, and TransferFunds enables the transfer of funds from one account to another. Lastly, you can use the OrderChecks intent to replenish checks. When a user makes a request that the Lex bot can’t process with any of these intents, the fallback intent is triggered to respond.

Deploying the sample Lex bot

To create the sample bot, perform the following steps. For this post, you create an Amazon Lex bot BankingBot, and an AWS Lambda function called BankingBot_Handler.

  1. Download the Amazon Lex bot definition and Lambda code.
  2. On the Lambda console, choose Create function.
  3. Enter the function name BankingBot_Handler.
  4. Choose the latest Python runtime (for example, Python 3.8).
  5. For Permissions, choose Create a new role with basic Lambda permissions.
  6. Choose Create function.
  7. When your new Lambda function is available, in the Function code section, choose Actions and Upload a .zip file.
  8. Choose the BankingBot.zip file that you downloaded.
  9. Choose Save.
  10. On the Amazon Lex console, choose Actions, Import.
  11. Choose the file BankingBot.zip that you downloaded and choose Import.
  12. Select the BankingBot bot on the Amazon Lex console.
  13. In the Fulfillment section, for each of the intents, including the fallback intent (BankingBotFallback), choose AWS Lambda function and choose the BankingBot_Handler function from the drop-down menu.
  14. When prompted to Add permission to Lambda function, choose OK.
  15. When all the intents are updated, choose Build.

At this point, you should have a working Lex bot.

Setting confidence score thresholds

You’re now ready to set an intent confidence score threshold. This setting controls when Amazon Lex will default to Fallback Intent based on the confidence scores of intents. To configure the settings, complete the following steps:

  1. On the Amazon Lex console, choose Settings, and the choose General.
  2. For us-east-1, us-west-2, ap-southeast-2, or eu-west-1, scroll down to Advanced options and select Yes to opt in to the accuracy improvements and features to enable the confidence score feature.

These improvements and confidence score support are enabled by default in other Regions.

  1. For Confidence score threshold, enter a number between 0 and 1. You can choose to leave it at the default value of 0.4.

  1. Choose Save and then choose Build.

When the bot is configured, Amazon Lex surfaces the confidence scores and alternative intents in the PostText and PostContent responses:

   "alternativeIntents": [ 
      { 
         "intentName": "string",
         "nluIntentConfidence": { 
            "score": number
         },
         "slots": { 
            "string": "string" 
         }
      }
   ]

Using a Lambda function and confidence scores to identify the user intent

When the user makes an ambiguous request such as “Can I get my account balance?” the Lambda function parses the list of intents that Amazon Lex returned. If multiple intents are returned, the function checks whether the top intents have similar scores as defined by an AMBIGUITY_RANGE value. For example, if one intent has a confidence score of 0.95 and another has a score of 0.65, the first intent is probably correct. However, if one intent has a score of 0.75 and another has a score of 0.72, you may be able to discriminate between the two intents using business knowledge in your application. In our use case, if the customer holds multiple accounts, the function is configured to respond with a clarification question such as, “Is this for your credit card or for your checking account?” But if the customer holds only a single account (for example, checking), the balance for that account is returned.

When you use confidence scores, Amazon Lex returns the most likely intent and up to four alternative intents with their associated scores in each response. If all the confidence scores are less than the threshold you defined, Amazon Lex includes the AMAZON.FallbackIntent intent, the AMAZON.KendraSearchIntent intent, or both. You can use the default threshold or you can set your own threshold value.

The following code samples are from the Lambda code you downloaded when you deployed this sample bot. You can adapt it for use with any Amazon Lex bot.

The Lambda function’s dispatcher function forwards requests to handler functions, but for the GetBalanceCreditCard, GetBalanceChecking, and GetBalanceSavings intents, it forwards to determine_intent instead.

The determine_intent function inspects the top event as reported by Lex, as well as any alternative intents. If an alternative intent is valid for the user (based on their accounts), and is within the AMBIGUITY_RANGE of the top event, it is added to a list of possible events.

possible_intents = []
# start with the top intent (if it is valid for the user)
top_intent = intent_request["currentIntent"]
if top_intent["name"] in valid_intents:
    possible_intents.append(top_intent)

# add any alternative intents that are within the AMBIGUITY_RANGE
# if they are valid for the user
if intent_request.get("alternativeIntents", None):
    top_intent_score = top_intent["nluIntentConfidenceScore"]
    for alternative_intent in intent_request["alternativeIntents"]:
        alternative_intent_score = alternative_intent["nluIntentConfidenceScore"]
        if top_intent_score is None:
            top_intent_score = alternative_intent_score                            
        if alternative_intent["name"] in valid_intents:
            if abs(top_intent_score - alternative_intent_score) <= AMBIGUITY_RANGE:
                possible_intents.append(alternative_intent)

If there is only one possible intent for the user, it is fulfilled (after first eliciting any missing slots).

num_intents = len(possible_intents)
if num_intents == 1:
    # elicit any missing slots or fulfill the intent
    slots = possible_intents[0]["slots"]
    for slot_name, slot_value in slots.items():
        if slot_value is None:
            return elicit_slot(intent_request['sessionAttributes'],
                          possible_intents[0]["name"], slots, slot_name)
    # dispatch to the appropriate fulfillment method 
    return HANDLERS[possible_intents[0]['name']]['fulfillment'](intent_request)

If there are multiple possible intents, ask the user for clarification.

elif num_intents > 1:
    counter = 0
    response = ""
    while counter < num_intents:
        if counter == 0:
            response += "Sure. Is this for your " + 
               INTENT_TO_ACCOUNT_MAPPING[possible_intents[counter]["name"]]
        elif counter < num_intents - 1:
            response += ", " + 
               INTENT_TO_ACCOUNT_MAPPING[possible_intents[counter]["name"]]
        else:
            response += ", or " + 
               INTENT_TO_ACCOUNT_MAPPING[possible_intents[counter]["name"]] + "?"
        counter += 1
    return elicit_intent(form_message(response))

If there are no possible intents for the user, the fallback intent is triggered.

else:
    return fallback_handler(intent_request)

To test this, you can change the test user configuration in the code by changing the return value from the check_available_accounts function:

# This could be a DynamoDB table or other data store
USER_LIST = {
    "user_with_1": [AccountType.CHECKING],
    "user_with_2": [AccountType.CHECKING, AccountType.CREDIT_CARD],
    "user_with_3": [AccountType.CHECKING, AccountType.SAVINGS, AccountType.CREDIT_CARD]
}

def check_available_accounts(user_id: str):
    # change user ID to test different scenarios
    return USER_LIST.get("user_with_2")

You can see the Lex confidence scores in the Amazon Lex console, or in your Lambda functions CloudWatch Logs log file.

Confidence scores can also be used to test different versions of your bot. For example, if you add new intents, utterances, or slot values, you can test the bot and inspect the confidence scores to see if your changes had the desired effect.

Conclusion

Although people aren’t always precise in their wording when they interact with a bot, we still want to provide them with a natural user experience. With natural language understanding improvements and confidence scores now available on Amazon Lex, you have additional information available to design a more intelligent conversation. You can couple the machine learning-based intent matching capabilities of Amazon Lex with your own business logic to zero in on your user’s intent. You can also use the confidence score threshold while testing during bot development, to determine if changes to the sample utterances for intents have the desired effect. These improvements enable you to design more effective conversation flows. For more information about incorporating these techniques into your bots, see Amazon Lex documentation.


About the Authors

Trevor Morse works as a Software Development Engineer at Amazon AI. He focuses on building and expanding the NLU capabilities of Lex. When not at a keyboard, he enjoys playing sports and spending time with family and friends.

 

 

 

Brian Yost is a Senior Consultant with the AWS Professional Services Conversational AI team. In his spare time, he enjoys mountain biking, home brewing, and tinkering with technology.

 

 

 

As a Product Manager on the Amazon Lex team, Harshal Pimpalkhute spends his time trying to get machines to engage (nicely) with humans.

 

 

 

 

 

Read More

Training knowledge graph embeddings at scale with the Deep Graph Library

We’re extremely excited to share the Deep Graph Knowledge Embedding Library (DGL-KE), a knowledge graph (KG) embeddings library built on top of the Deep Graph Library (DGL). DGL is an easy-to-use, high-performance, scalable Python library for deep learning on graphs. You can now create embeddings for large KGs containing billions of nodes and edges two-to-five times faster than competing techniques.

For example, DGL-KE has created embeddings on top of the Drug Repurposing Knowledge Graph (DRKG) to show which drugs can be repurposed to fight COVID-19. These embeddings can be used to predict the likelihood of a drug’s ability to treat a disease or bind to a protein associated with the disease.

In this post, we focus on creating knowledge graph embeddings (KGE) using the Kensho Derived Wikimedia Dataset (KDWD). You can use those embeddings to find similar nodes and predict new relations. For example, in natural language processing (NLP) and information retrieval use cases, you can parse a new query and transform it syntactically into a triplet (subject, predicate, object). Upon adding new triplets to a KG, you can augment nodes and relations by classifying nodes and inferring relations based on the existing KG embeddings. This helps guide and find the intent for a chatbot application, for example, and provide the right FAQ or information to a customer.

Knowledge graph

A knowledge graph is a structured representation of facts, consisting of entities, relationships, and semantic descriptions purposely built for a given domain or application. They are also known as heterogenous graphs, where there are multiple entity types and relation types. The information stored in a KG is often specified in triplets, which contain three elements: head, relation, and tail ([h,r,t]). Heads and tails are also known as entities. The union of triplets is also known as statements.

KGs allow you to model your information very intuitively and expressively, giving you the ability to integrate data easily. For example, you can use Amazon Neptune to build an identity graph powering your customer 360 or Know Your Customer application commonly found in financial services. In healthcare and life sciences, where data is usually sparse, KGs can integrate and harmonize data from different silos using taxonomy and vocabularies. In e-commerce and telco, KGs are commonly used in question answering, chatbots, and recommender systems. Fore more information on using Amazon Neptune for your use case, visit the Amazon Neptune homepage.

Knowledge graph embeddings

Knowledge graph embeddings are low-dimensional representations of the entities and relations in a knowledge graph. They generalize information of the semantic and local structure for a given node.

Many popular KGE models exist, such as TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE.

Each model has a different score function that measures the distance of two associated entities by their relation. The general intuition is that entities connected by a relation are closed to each other, whereas the entities that aren’t connected are far apart in the vector space.

The scoring functions for the models currently supported by DGL-KE are as follows:

Wikimedia dataset

In our use case, we use the Kensho Derived Wikimedia Dataset (KDWD). You can find the notebook example and code in the DGL-KE GitHub repo.

The combination of Wikipedia and Wikidata is composed of three data layers:

  • Base – Contains the English Wikipedia corpus
  • Middle – Identifies which text spans are links and annotates the corpus
  • Top – Connects Wikipedia links to items in the Wikidata KG

The following diagram illustrates these data layers.

The KDWD contains the following:

  • 2,315,761,359 tokens
  • 121,835,453 page links
  • 5,343,564 Wikipedia pages
  • 51,450,317 Wikidata items
  • 141,206,854 Wikidata statements

The following code is an example of the entity.txt file:

ID	Label		Description
1	Universe	totality of space and all contents
2	Earth		third planet from the Sun in the Solar System
3	life		matter capable of extracting energy from the environment for replication

Before you can create your embeddings, you need to pre-process the data. DGL-KE gives you the ability to compute embeddings using two formats.

In raw user-defined knowledge graphs, you provide the triplets; the entities and relations can be arbitrary strings. The dataloader automatically generates the ID mappings.

The following table from Train User-Defined Knowledge Graphs shows an example of triplets.

train.tsv
Beijing is_capital_of China
London is_capital_of UK
UK located_at Europe

In user-defined knowledge graphs, you also provide the ID mapping for entities and relations (the triplets should only contain these IDs). The IDs start from 0 and are continuous. The following table from Train User-Defined Knowledge Graphs shows an example of mapping and triplets files.

entities.dict relation.dict train.tsv
Beijing 0 is_capital_of 0 0 0 2
London 1 located_at 1 1 0 3
China 2 3 1 4
UK 3
Europe 4

For more information, see DGL-KE Command Lines.

Although the dataset KDWD provides dictionaries, we can’t use it for our use case because the index doesn’t start with 0 and the index values aren’t continuous. We preprocess our data and use the raw format to generate our embeddings. After merging and cleaning the data, we end up with a KG with the following properties:

  • 39,569,815 entities
  • 1,213 relations
  • Approximately 120 million statements

The following code is an example of the triplets.

Head			Relation			Tail
Eiksteinen      	located in the administrative…	Rogaland
Trivellona marlowi     instance of     		taxon
Acta Numerica   	main subject    		mathematical analysis
Günther Neukirchner    given name      		Günther
Ruth Pointer    	given name      		Ruth

DGL-KE has many different training modes. CPU, GPU, mix-CPU-GPU mode, and distributed training are all supported options, depending on your dataset and training requirements. For our use case, we use mix mode to generate our embeddings. If you can contain all the data in GPU memory, GPU is the preferred method. Because we’re training on a large KG, we use mix mode to get a larger pool of CPU- and GPU-based memory and still benefit from GPU for accelerated training.

We create our embeddings with the dgl-ke command line. See the following code:

!DGLBACKEND=pytorch dglke_train 
--model_name TransE_l2 
--batch_size 1000 
--neg_sample_size 200 
 --hidden_dim 400 
--gamma 19.9 
--lr 0.25 
--max_step 24000 
--log_interval 100 
--batch_size_eval 16 
-adv 
--regularization_coef 1.00E-09 
--test 
--gpu 0 1 2 3 
--mix_cpu_gpu 
--save_path ./wikimedia 
--data_path ./data/wikimedia/ 
--format raw_udd_hrt 
--data_files train.txt valid.txt test.txt 
--neg_sample_size_eval 10000

For more information about the DGL-KE arguments, see the DGL-KE website.

We trained our KG of about 40 million entities and 1,200 relations and approximately 120 million statements on a p3.8xl in about 7 minutes.

We evaluate our model by entering the following code:

!DGLBACKEND=pytorch dglke_eval --dataset wikimedia --model_name TransE_l2 
--neg_sample_size 200 --hidden_dim 400 --gamma 19.9 
--batch_size_eval 16 --gpu 0 1 2 3  --model_path ./wikimedia/TransE_l2_wikimedia_0/ 
--data_path ./data/wikimedia/ --format raw_udd_hrt --data_files train.txt valid.txt test.txt --neg_sample_size_eval 10000 --no_eval_filter

The following code is the output:

-------------- Test result --------------
Test average MRR: 0.4159753346227368
Test average MR: 1001.1689418833716
Test average HITS@1: 0.3540242971873324
Test average HITS@3: 0.45541123141672746
Test average HITS@10: 0.5213350742247359
-----------------------------------------

DGL-KE allows you to perform KG downstream tasks by using any combinations of [h,r,t].

In the following example, we find similar node entities for two people by creating a head.list file with the following entries:

head.list
Jeff Bezos
Barack Obama

DGL-KE provides functions to perform offline inference on entities and relations.

To find similar node entities from our head.list, we enter the following code:

!DGLBACKEND=pytorch dglke_emb_sim 
--format 'l_*' --data_files /home/ec2-user/SageMaker/DGL-Neptune/notebooks/data/wikimedia/head.list 
--mfile ./data/wikimedia/entities.tsv 
--emb_file ./wikimedia/TransE_l2_wikimedia_1/wikimedia_TransE_l2_entity.npy 
--raw_data --gpu 0 
--exec_mode 'batch_left' 
--sim_func 'cosine' --topK 5

The following code is the output:

resulst.tsv
head tail score
Jeff Bezos Jeff Bezos 1.0
Jeff Bezos Aga Khan IV 0.8602205514907837
Jeff Bezos Alisher Usmanov 0.8584005236625671
Jeff Bezos Klaus Tschira 0.8512368202209473
Jeff Bezos Bill Gates 0.8441287875175476
Barack Obama Barack Obama 1.0
Barack Obama Donald Trump 0.9529082179069519
Barack Obama George W. Bush 0.9426612854003906
Barack Obama Harry S. Truman 0.9414601922035217
Barack Obama Ronald Reagan 0.9393566250801086

Interestingly, all the nodes similar to Jeff Bezos describe tech tycoons. Barack Obama’s similar nodes show former and current presidents of the US.

Conclusion

Graphs can be found in many domains, such as chemistry, biology, financial services, and social networks, and allow us to represent complex concepts intuitively using entities and relations. Graphs can be homogeneous or heterogeneous, where you have many types of entities and relations.

Knowledge graph embeddings give you powerful methods to encode semantic and local structure information for a given node, and you can also use them as input for machine learning and deep learning models. DGL-KE supports popular embedding models and allows you to compute those embeddings on CPU or GPU at scale two-to-five times faster than other techniques.

We’re excited to see how you use graph embeddings on your existing KGs or new machine learning problems. For more information about the library, see the DGL-KE GitHub repo. For instructions on using Wikimedia KG embeddings in your KG, see the DGL-KE notebook example.


About the Authors

Phi Nguyen is a solution architect at AWS helping customers with their cloud journey with a special focus on data lake, analytics, semantics technologies and machine learning. In his spare time, you can find him biking to work, coaching his son’s soccer team or enjoying nature walk with his family.

 

 

 

Xiang Song is an Applied Scientist with the AWS Shanghai AI Lab. He got his Bachelor’s degree in Software Engineer and Ph.D’s in Operating System and Architecture from Fudan University. His research interests include building machine learning systems and graph neural network for real world applications.

Read More

Building a Pictionary-style game with AWS DeepLens and Amazon Alexa

Are you bored of the same old board games? Tired of going through the motions with charades week after week? In need of a fun and exciting way to mix up game night? Well we have a solution for you!

From the makers of AWS DeepLens, Guess My Drawing with DeepLens is a do-it-yourself recipe for building your very own Machine Learning (ML)-enabled Pictionary-style game! In this post, you learn how to harness the power of AWS DeepLens, the AWS programmable video camera for developers to learn ML, and Amazon Alexa, the Amazon cloud-based voice service.

You start by learning to deploy a trained model to AWS DeepLens that can recognize sketches drawn on a whiteboard and pair it with an Alexa skill that serves as the official scorekeeper.

When your recipe is complete, the fun begins!

Solution overview

Guess My Drawing with AWS DeepLens uses Alexa to host a multi-player drawing challenge game. To get started, gather your game supplies mentioned in the Prerequisites section.

To initiate gameplay, simply say, “Alexa, play Guess My Drawing with DeepLens.” Alexa explains the game rules and asks how many players are playing the game. The players decide the turn order.

Alexa provides each player with a common word. For example, Alexa may say, “Your object to draw is bowtie.” The player has 12 seconds to draw it on a whiteboard without writing letters or words.

When time runs out, the player stops drawing and asks Alexa to share the results. The ML model running on AWS DeepLens predicts the object that you drew. If the object matches with what Alexa asks, Alexa awards 10 points. If DeepLens can’t correctly guess the drawing or the player takes more than 12 seconds to draw, no points are earned.

Alexa prompts the next participant with their word, repeating until all participants have taken a turn. After each round, Alexa provides a score update. The game ends after five rounds, and whoever has the highest score wins the game!

The following diagram shows the architecture of our solution.

This tutorial includes the following steps:

  1. Create an AWS DeepLens inference AWS Lambda function to isolate the drawing area and feed each camera frame into the model to generate predictions on the sketches.
  2. Deploy a pre-trained trained model included in this post to AWS DeepLens to perform image classification.
  3. Create an AWS IoT Core rule to send the results to Amazon Kinesis Data Streams.
  4. Create a custom Alexa skill with a different Lambda function to retrieve the detected objects from the Kinesis data stream and have Alexa verbalize the result to you.

Prerequisites

Before you begin this tutorial, make sure you have the following prerequisites:

Creating an AWS DeepLens inference Lambda function

In this section, you create an inference function that you deploy to AWS DeepLens. The inference function isolates the drawing area, optimizes the model to run on AWS DeepLens, and feeds each camera frame into the model to generate predictions.

To create your function, complete the following steps:

  1. Download aws-deeplens-pictionary-lambda.zip.
  2. On the Lambda console, choose Create function.
  3. Choose Author from scratch and choose the following options:
    1. For Runtime, choose Python 2.7.
    2. For Choose or create an execution role, choose Use an existing role.
    3. For Existing role, enter service-role/AWSDeepLensLambdaRole.
  4. After you create the function, go to the Function code
  5. From the Actions drop-down menu in Function code, choose Upload a .zip file.
  6. Upload the aws-deeplens-pictionary-lambda.zip file you downloaded earlier.
  7. Choose Save.
  8. From the Actions drop-down menu, choose Publish new version.
  9. Enter a version number and choose Publish.

Publishing the function makes it available on the AWS DeepLens console so you can add it to your custom project.

Understanding the Lambda function

You should pay attention to the following two files:

  • labels.txt – This file is for the inference function to translate the numerical result from the model into human readable labels used in our game. It contains a list of 36 objects on which the model has been trained to recognize sketches.
  • lambda_function.py – This file contains the preprocessing algorithm and the function being called to generate predictions on drawings and send back results.

Because the model was trained on digital sketches with clean, white backgrounds, we have a preprocessing algorithm that helps isolate the drawing and remove any clutter in the background. You can find the algorithm to do this in the isolate_image() function inside the lambda_function.py file. In this section, we walk you through some important parts of the preprocessing algorithm.

Fisheye calibration

AWS DeepLens uses a wide-angle lens to get as much information as possible in the frame. As a result, any input frame is distorted, especially for the rectangular shape of a whiteboard. Therefore, you need to perform fisheye calibration to straighten the edges. As part of this post, we provide the calibration code to undistort your AWS DeepLens images. The following code straightens the edges and eliminates the distortion:

def undistort(frame): 
    frame_height, frame_width, _ = frame.shape
 K=np.array([[511.98828907136766, 0.0, 426.48016197546474], 
                [0.0, 513.8644747557715, 236.89875770956868], 
                [0.0, 0.0, 1.0]])
    D=np.array([[-0.10969105781526832], [0.03463562293251206], 
                [-0.2341226037892333], [0.34335682066685935]])
    DIM = (int(frame_width/3), int(frame_height/3))
    frame_resize = cv2.resize(frame, DIM)
    map1, map2 = cv2.fisheye.initUndistortRectifyMap(K, D, np.eye(3), 
                                                     K, DIM, cv2.CV_16SC2)
    undistorted_img = cv2.remap(frame_resize, map1, map2, 
                                interpolation=cv2.INTER_LINEAR, 
                                borderMode=cv2.BORDER_CONSTANT)
    return undistorted_img

The following screenshots shows the raw image captured by AWS DeepLens.

The following screenshot shows the results of the undistort function with fisheye calibration.

The next code section enhances the images to eliminate the effects caused by different lighting conditions:

enh_con = ImageEnhance.Contrast(img_colored)
contrast = 5.01
img_contrasted = enh_con.enhance(contrast)
image = img_contrasted
image = np.array(image)

The following screenshot shows the results of the contrast enhancement.

Canny Edge Detection

The next part of the preprocessing algorithm uses OpenCV’s Canny Edge Detection technique to find the edges in the image. See the following code:

# these constants are carefully picked
    MORPH = 9
    CANNY = 84
    HOUGH = 25
    img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    cv2.GaussianBlur(img, (3,3), 0, img)
    # this is to recognize white on white
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(MORPH,MORPH))
    dilated = cv2.dilate(img, kernel)
    edges = cv2.Canny(dilated, 0, CANNY, apertureSize=3)
    lines = cv2.HoughLinesP(edges, 1,  3.14/180, HOUGH)
    for line in lines[0]:
        cv2.line(edges, (line[0], line[1]), (line[2], line[3]), (255,0,0), 2, 8)

The following screenshot shows the results from applying the Canny Edge Detector.

For more information about how Canny Edge Detection works, see the Canny Edge Detection tutorial on the OpenCV website.

Finding contours

After the edges are found, you can use OpenCV’s findContours() function to extract the polygon contours from the image. This function returns a list of polygon shapes that are closed and ignores any open edges or lines. See the following code:

contours, _ = cv2.findContours(edges.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = filter(lambda cont: cv2.arcLength(cont, False) > 100, contours)
contours = filter(lambda cont: cv2.contourArea(cont) > 10000, contours)
 result = None
for idx, c in enumerate(contours):
      if len(c) < Config.min_contours:
          continue
      epsilon = Config.epsilon_start
      while True:
            approx = cv2.approxPolyDP(c, epsilon, True)
            approx = approx.reshape((len(approx), 2))
            new_approx = []
            for i in range(len(approx)):
                if 80 < approx[i][0] < 750:
                    new_approx.append(approx[i])
            approx = np.array(new_approx)
            if (len(approx) < 4):
                break
            if math.fabs(cv2.contourArea(approx)) > Config.min_area:
                if (len(approx) > 4):
                    epsilon += Config.epsilon_step
                    continue
                else:
                    # for p in approx:
                    #  cv2.circle(binary,(p[0][0],p[0][1]),8,(255,255,0),thickness=-1)
                    approx = approx.reshape((4, 2))
                    # [top-left, top-right, bottom-right, bottom-left]
                    src_rect = order_points(approx)
                    cv2.drawContours(image, c, -1, (0, 255, 255), 1)
                    cv2.line(image, (src_rect[0][0], src_rect[0][1]), (src_rect[1][0], src_rect[1][1]),
                             color=(100, 255, 100))
                    cv2.line(image, (src_rect[2][0], src_rect[2][1]), (src_rect[1][0], src_rect[1][1]),
                             color=(100, 255, 100))
                    cv2.line(image, (src_rect[2][0], src_rect[2][1]), (src_rect[3][0], src_rect[3][1]), 
                             color=(100, 255, 100))
                    cv2.line(image, (src_rect[0][0], src_rect[0][1]), (src_rect[3][0], src_rect[3][1]),
                             color=(100, 255, 100))

For more information, see Contours: Getting Started.

Perspective transformation

Finally, the preprocessing algorithm does perspective transformation to correct for any skew. The following code helps achieve perspective transformation and crop a rectangular area:

M = cv2.getPerspectiveTransform(src_rect, dst_rect)
warped = cv2.warpPerspective(image, M, (w, h))	

The following image is the input of the preprocessing algorithm.

The following image is the final result.

Performing inference

In this section, you learn how to perform inference with an ML model and send back results from AWS DeepLens.

AWS DeepLens uses the Intel OpenVino model optimizer to optimize the ML model to run on DeepLens hardware. The following code optimizes a model to run locally:

error, model_path = mo.optimize(model_name, INPUT_WIDTH, INPUT_HEIGHT)

The following code loads the model:

model = awscam.Model(model_path, {'GPU': 1})

The following code helps run the model frame-per-frame over the images from the camera:

ret, frame = awscam.getLastFrame()

Viewing the text results in the cloud is a convenient way to make sure the model is working correctly. Each AWS DeepLens device has a dedicated iot_topic automatically created to receive the inference results. The following code sends the messages from AWS DeepLens to the IoT Core console:

# Send the top k results to the IoT console via MQTT
cloud_output = {}
for obj in top_k:
    cloud_output[output_map[obj['label']]] = obj['prob']
client.publish(topic=iot_topic, payload=json.dumps(cloud_output))

Deploying the model to AWS DeepLens

In this section, you set up your AWS DeepLens device, import a pre-trained model, and deploy the model to AWS DeepLens.

Setting up your AWS DeepLens device

You first need to register your AWS DeepLens device, if you haven’t already.

After you register your device, you need to install the latest OpenCV (version 4.x) packages and Pillow libraries to enable the preprocessing algorithm in the DeepLens inference Lambda function. To do so, you need the IP address of AWS DeepLens on the local network, which is listed in the Device details section. You also need to ensure that Secure Shell (SSH) is enabled for your device. For more information about enabling SSH on your device, see View or Update Your AWS DeepLens 2019 Edition Device Settings.

Open a terminal application on your computer. SSH into your DeepLens by entering the following code into your terminal application:

ssh aws_cam@<YOUR_DEEPLENS_IP>

Then enter the following commands in the SSH terminal:

sudo su
pip install --upgrade pip
pip install opencv-python
pip install pillow

Importing the model to AWS DeepLens

For this post, you use a pre-trained model. We trained the model for 36 objects on The Quick Draw Dataset made available by Google, Inc., under the CC BY 4.0 license. For each object, we took 1,600 images for training and 400 images for testing the model from the dataset. Holding back 400 images for testing allows us to measure the accuracy of our model against images that it has never seen.

For instructions on training a model using Amazon SageMaker as the development environment, see AWS DeepLens Recipes and Amazon SageMaker: Build an Object Detection Model Using Images Labeled with Ground Truth.

To import your model, complete the following steps:

  1. Download the model aws-deeplens-pictionary-game.tar.gz.
  2. Create an Amazon Simple Storage Service (Amazon S3) bucket to store this model. For instructions, see How do I create an S3 Bucket?

The S3 bucket name must contain the term deeplens. The AWS DeepLens default role has permission only to access the bucket with the name containing deeplens.

  1. After the bucket is created, upload aws-deeplens-pictionary-game.tar.gz to the bucket and copy the model artifact path.
  2. On the AWS DeepLens console, under Resources, choose Models.
  3. Choose Import model.
  4. On the Import model to AWS DeepLens page, choose Externally trained model.
  5. For Model artifact path, enter the Amazon S3 location for the model you uploaded earlier.
  6. For Model name, enter a name.
  7. For Model framework, choose MXNet.
  8. Choose Import model.

Deploying the model to your AWS DeepLens device

To deploy your model, complete the following steps:

  1. On the AWS DeepLens console, under Resources, choose Projects.
  2. Choose Create new project.
  3. Choose Create a new blank project.
  4. For Project name, enter a name.
  5. Choose Add model and choose the model you imported earlier.
  6. Choose Add function and choose the Lambda function you created earlier.
  7. Choose Create.
  8. Select your newly created project and choose Deploy to device.
  9. On the Target device page, select your device from the list.
  10. On the Review and deploy page, choose Deploy.

The deployment can take up to 5 minutes to complete, depending on the speed of the network your AWS DeepLens is connected to. When the deployment is complete, you should see a green banner message that the deployment succeeded.

To verify that the project was deployed successfully, you can check the text prediction results sent to the cloud via AWS IoT Greengrass. For instructions, see Using the AWS IoT Greengrass Console to View the Output of Your Custom Trained Model (Text Output).

In addition to the text results, you can view the pose detection results overlaid on top of your AWS DeepLens live video stream. For instructions, see Viewing AWS DeepLens Output Streams.

Sending results from AWS DeepLens to a data stream

In this section, you learn how to send messages from AWS DeepLens to a Kinesis data stream by configuring an AWS IoT rule.

  1. On the Kinesis console, create a new data stream.
  2. For Data stream name, enter a name.
  3. For Number of shards, choose 1.
  4. Choose Create data stream.
  5. On the AWS IoT console, under Act, choose Rules.
  6. Choose Create to set up a rule to push MQTT messages from AWS DeepLens to the newly created data stream.
  7. On the Create a rule page, enter a name for your rule.
  8. For Rule query statement, enter the DeepLens device MQTT topic.
  9. Choose Add action.
  10. Choose Send a message to an Amazon Kinesis Stream.
  11. Choose Configuration.
  12. Choose the data stream you created earlier.
  13. For Partition key, enter ${newuuid()}.
  14. Choose Create a new role or Update role.
  15. Choose Add action.
  16. Choose Create rule to finish the setup.

Now that the rule is set up, MQTT messages are loaded into the data stream.

Amazon Kinesis Data Streams is not currently available in the AWS Free Tier, which offers a free trial for a group of AWS services. For more information about the pricing of Amazon Kinesis Data Streams, see link.

We recommend that you delete the data stream after completing the tutorial because charges occur on an active data stream even when you aren’t sending and receiving messages.

Creating an Alexa skill

In this section, you first create a Lambda function that queries a data stream and returns the sketches detected by AWS DeepLens to Alexa. Then, you create a custom Alexa skill to start playing the game.

Creating a custom skill with Lambda

To create your custom skill in Lambda, complete the following steps:

  1. On the Lambda console, create a new function.

The easiest way to create an Alexa skill is to create the function from the existing blueprints or serverless app repository provided by Lambda and overwrite the code with your own.

  1. For Create function, choose Browse serverless app repository.
  2. For Public repositories, search for and choose alexa-skills-kit-color-expert-python.
  3. Under Application settings, enter an application name and TopicNameParameter.
  4. Choose Deploy.
  5. When the application has been deployed, open the Python file.
  6. Download the alexa-lambda-function.py file onto your computer.
  7. Copy the Python code from the file and replace the sample code in the lambda_function.py file in the Function code section.

This function includes the entire game logic, reads data from the data stream, and returns the result to Alexa. Be sure to change the Region from the default (us-east-1) if you’re in a different Region. See the following code:

kinesis = boto3.client(‘kinesis’, region_name=’us-east-1′
  1. Set the Timeout value to 20 seconds.

You now need to give your Lambda function IAM permissions to read data from the data stream.

  1. In your Lambda function editor, choose Permissions.
  2. Choose the Role name under the Execution role.

You’re directed to the IAM role editor.

  1. In the editor, choose Attach policies.
  2. Enter Kinesis and choose AmazonKinesisFullAccess.
  3. Choose Attach policy.

Creating a custom skill to play the game

To create your second custom skill to start playing the game, complete the following steps:

  1. Log in to the Alexa Developer Console.
  2. Create a new custom Alexa skill.
  3. On the Create a new skill page, for Skill name, enter a skill name
  4. For Choose a model to add to your skill, choose Custom.
  5. For Choose a method to host your skill’s backend resources, choose Provision your own.
  6. Choose Create skill.
  7. On the next page, choose the default template to add your skill.
  8. Choose Continue with template.

After about 1–2 minutes, your skill appears on the console.

  1. In the Endpoint section, enter the Amazon Resource Name (ARN) of the Lambda function created for the Alexa skill in the previous step.
  2. Download alexa-skill-json-code.txt onto your computer.
  3. Copy the code from the file and paste in the Alexa skill JSON editor to automatically configure intents and sample utterances for the custom skill.

In the Alexa architecture, intents can be thought of as distinct functions that a skill can perform. Intents can take arguments that are known here as slots.

  1. Choose Save Model to apply the changes.
  2. Choose Build Model.
  3. On the Lambda console, open the Lambda function for the Alexa skill you created earlier.

You need to enable the skill by adding a trigger to the Lambda function.

  1. Choose Add trigger.
  2. Choose Alexa Skills Kit.
  3. For Skill ID, enter the ID for the Alexa skill you created.
  4. Choose Add.

Testing the skill

Your Alexa skill is now ready to tell you the drawings detected by AWS DeepLens. To test with an Alexa-enabled device (such as an Amazon Echo), register the device with the same email address you used to sign up for your developer account on the Amazon Developer Portal. You can invoke your skill with the wake word and your invocation name: “Alexa, Play Guess My Drawing with DeepLens.”

The language in your Alexa companion app should match with the language chosen in your developer account. Alexa considers English US and English UK to be separate languages.

Alternatively, the Test page includes a simulator that lets you test your skill without a device. For Skill testing is enabled in, choose Development. You can test your skill with the phrase, “Alexa, Play Guess My Drawing with DeepLens.”

Windows 10 users can download the free Alexa app from the Microsoft Store and interact with it from their PC.

For more information on testing your Alexa skill, see Test Your Skill. For information on viewing the logs, check Amazon CloudWatch logs for AWS Lambda.

The following diagram shows the user interaction flow of our game.

The following images show the prediction outputs of our model with the name of an object and its probability. You need to have your AWS DeepLens located in front of a rectangular-shaped whiteboard or a piece of white paper to ensure that the edges are visible in the frame.

Conclusion

In this post, you learned about the preprocessing algorithm to isolate a drawing area and how to deploy a pre-trained model onto AWS DeepLens to recognize sketches. Next, you learned how to send results from AWS IoT to Kinesis Data Streams. Finally, you learned how to create a custom Alexa skill with Lambda to retrieve the detected objects in the data stream and return the results to players via Alexa.

For other tutorials, samples, and project ideas with AWS DeepLens, see AWS DeepLens Recipes.


About the Authors

Amit Choudhary is a Senior Product Manager Technical Intern. He loves to build products that help developers learn about various machine learning techniques in a fun and hands-on manner.

 

 

 

 

Phu Nguyen is a Product Manager for AWS DeepLens. He builds products that give developers of any skill level an easy, hands-on introduction to machine learning.

 

 

 

Brian Nguyen is an undergraduate senior in majoring Electrical Engineering with a concentration of Digital Signal & Image Processing at University of Washington Seattle.

Hanwen Guo is an undergraduate senior majoring in Electrical Engineering with a concentration of Digital Signal & Image Processing at University of Washington Seattle.

Jack Ma is an undergraduate senior majoring in Electrical Engineering with a concentration of Embedded Systems at University of Washington Seattle.

Sairam Tabibu is pursuing a master’s degree in Electrical Engineering with an interest in machine learning/deep learning at University of Washington Seattle

Aaron Liang is pursuing a master’s degree in Electrical Engineering with an interest in software engineering at University of Washington Seattle

Read More

Safely deploying and monitoring Amazon SageMaker endpoints with AWS CodePipeline and AWS CodeDeploy

As machine learning (ML) applications become more popular, customers are looking to streamline the process for developing, deploying, and continuously improving models. To reliably increase the frequency and quality of this cycle, customers are turning to ML operations (MLOps), which is the discipline of bringing continuous delivery principles and practices to the data science team. The following diagram illustrates the continuous deployment workflow.

There are many ways to operationalizing ML. In this post, you see how to build an ML model that predicts taxi fares in New York City using Amazon SageMaker, AWS CodePipeline, and AWS CodeDeploy in a safe blue/green deployment pipeline.

Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to quickly build, train, deploy, and monitor ML models. When combined with CodePipeline and CodeDeploy, it’s easy to create a fully serverless build pipeline with best practices in security and performance with lower costs.

CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the Build, Test, and Deploy phases of your release process every time a code change occurs, based on the release model you define.

What is blue/green deployment?

Blue/green deployment is a technique that reduces downtime and risk by running two identical production environments called Blue and Green. After you deploy a fully tested model to the Green endpoint, a fraction of traffic (for this use case, 10%) is sent to this new replacement endpoint. This continues for a period of time while there are no errors with an optional ramp up to reach 100% of traffic, at which point the Blue endpoint can be decommissioned. Green becomes live, and the process can repeat again. If any errors are detected during this process, a rollback occurs; Blue remains live and Green is decommissioned. The following diagram illustrates this architecture.

In this solution, the blue/green deployment is managed by AWS CodeDeploy with the AWS Lambda compute platform to switch between the blue/green autoscaling Amazon SageMaker endpoints.

Solution overview

For this post, you use a publicly available New York green taxi dataset to train an ML model to predict the fare amount using the Amazon SageMaker built-in XGBoost algorithm.

You automate the process of training, deploying, and monitoring the model with CodePipeline, which you orchestrate within an Amazon SageMaker notebook.

Getting started

This walkthrough uses AWS CloudFormation to create a continuous integration pipeline. You can configure this to a public or private GitHub repo with your own access token, or you can use an AWS CodeCommit repository in your environment that is cloned from the public GitHub repo.

Complete the following steps:

  1. Optionally, fork a copy of the GitHub repo into your own GitHub account by choosing the fork
    1. Create a personal access token (OAuth 2) with the scopes (permissions) admin:repo_hook and repo. If you already have a token with these permissions, you can use that. You can find a list of all your personal access tokens in https://github.com/settings/tokens.
    2. Copy the access token to your clipboard. For security reasons, after you navigate off the page, you can’t see the token again. If you lose your token, you can regenerate
  2. Choose Launch Stack:
  3. Enter the following parameters:
    1. Model Name – A unique name for this model (must be fewer than 15 characters)
    2. Notebook Instance Type – The Amazon SageMaker instance type (default is ml.t3.medium)
    3. GitHub Access Token – Your access token

  4. Acknowledge that AWS CloudFormation may create additional AWS Identity and Access Management (IAM) resources.
  5. Choose Create stack.

The CloudFormation template creates an Amazon SageMaker notebook and pipeline.

When the deployment is complete, you have a new pipeline linked to your GitHub source. It starts in a Failed state because it’s waiting on an Amazon Simple Storage Service (Amazon S3) data source.

The pipeline has the following stages:

  1. Build Artifacts – Run a CodeBuild job to create CloudFormation templates.
  2. Train – Train an Amazon SageMaker pipeline and baseline processing job.
  3. Deploy Dev – Deploys a development Amazon SageMaker endpoint.
  4. Manual Approval – The user gives approval.
  5. Deploy Prod – Deploys an Amazon API Gateway AWS Lambda function in front of the Amazon SageMaker endpoints using CodeDeploy for blue/green deployment and rollback.

The following diagram illustrates this workflow.

Running the pipeline

Launch the newly created Amazon SageMaker notebook in your AWS account. For more information, see Build, Train, and Deploy a Machine Learning Model.

Navigate to the notebook directory and open the notebook by choosing the mlops.ipynb link.

The notebook guides you through a series of steps, which we also review in this post:

  1. Data Prep
  2. Start Build
  3. Wait for Training Job
  4. Test Dev Deployment
  5. Approve Prod Deployment
  6. Test Prod Deployment
  7. Model Monitoring
  8. CloudWatch Monitoring

The following diagram illustrates this workflow.

Step 1: Data Prep

In this step, you download the February 2018 trips from New York green taxi trip records to a local file for input into a pandas DataFrame. See the following code:

import pandas as pd
parse_dates = ["lpep_dropoff_datetime", "lpep_pickup_datetime"]
trip_df = pd.read_csv("nyc-tlc.csv", parse_dates=parse_dates)

You then add a feature engineering step to calculate the duration in minutes from the pick-up and drop-off times:

trip_df["duration_minutes"] = (
    trip_df["lpep_dropoff_datetime"] - trip_df["lpep_pickup_datetime"]
).dt.seconds / 60

You create a new DataFrame just to include the total amount as the target column, using duration in minutes, passenger count, and trip distance as input features:

cols = ["total_amount", "duration_minutes", "passenger_count", "trip_distance"]
data_df = trip_df[cols]
print(data_df.shape)
data_df.head()

The following table shows you the first five rows in your DataFrame.

total_amount duration_minutes passenger_count trip_distance
1 23 0.05 1 0
2 9.68 7.11667 5 1.6
3 35.76 22.81667 1 9.6
4 5.8 3.16667 1 0.73
5 9.3 6.63333 2 1.87

Continue through the notebook to visualize a sample of the DataFrame, before splitting the dataset into 80% training, 15% validation, and 5% test. See the following code:

from sklearn.model_selection import train_test_split
train_df, val_df = train_test_split(data_df, test_size=0.20, random_state=42)
val_df, test_df = train_test_split(val_df, test_size=0.05, random_state=42)
# Set the index for our test dataframe
test_df.reset_index(inplace=True, drop=True)
print('split train: {}, val: {}, test: {} '.format(train_df.shape[0], val_df.shape[0], test_df.shape[0]))

Step 2: Start Build

The pipeline source has two inputs:

  • A Git source repository containing the model definition and all supporting infrastructure
  • An Amazon S3 data source that includes a reference to the training and validation datasets

The Start Build section in the notebook uploads a .zip file to the Amazon S3 data source that triggers the build. See the following code:

from io import BytesIO
import zipfile
import json
input_data = {
    "TrainingUri": s3_train_uri,
    "ValidationUri": s3_val_uri,
    "BaselineUri": s3_baseline_uri,
}
hyperparameters = {"num_round": 50}
data_source_key = "{}/data-source.zip".format(pipeline_name)
zip_buffer = BytesIO()
with zipfile.ZipFile(zip_buffer, "a") as zf:
    zf.writestr("inputData.json", json.dumps(input_data))
    zf.writestr("hyperparameters.json", json.dumps(hyperparameters))
zip_buffer.seek(0)
s3 = boto3.client("s3")
s3.put_object(
    Bucket=artifact_bucket, Key=data_source_key, Body=bytearray(zip_buffer.read())
)

Specifically, you see a VersionId in the output from this cell:

{'ResponseMetadata': {'RequestId': 'ED389631CA6A9815',
  'HostId': '3jAk/BJoRb78yElCVxrEpekVKE34j/WKIqwTIJIxgb2IoUSV8khz7T5GLiSKO/u0c66h8/Iye9w=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': '3jAk/BJoRb78yElCVxrEpekVKE34j/WKIqwTIJIxgb2IoUSV8khz7T5GLiSKO/u0c66h8/Iye9w=',
   'x-amz-request-id': 'ED389631CA6A9815',
   'date': 'Mon, 15 Jun 2020 05:06:39 GMT',
   'x-amz-version-id': 'NJMR4LzjbC0cNarlnZwtDKYwTnYsIdF3',
   'etag': '"47f9ca2b44d0e2d66def2f098dd13094"',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'ETag': '"47f9ca2b44d0e2d66def2f098dd13094"',
 'VersionId': 'NJMR4LzjbC0cNarlnZwtDKYwTnYsIdF3'}

This corresponds to the Amazon S3 data source version id in the pipeline. See the following screenshot.

The Build stage in the pipeline runs a CodeBuild job defined in buildspec.yml that runs the following actions:

  • Runs the model run.py Python file to output the training job definition.
  • Packages CloudFormation templates for the Dev and Prod deploy stages, including the API Gateway and Lambda resources for the blue/green deployment.

The source code for the model and API are available in the Git repository. See the following directory tree:

├── api
│   ├── __init__.py
│   ├── app.py
│   ├── post_traffic_hook.py
│   └── pre_traffic_hook.py
├── model
│   ├── buildspec.yml
│   ├── requirements.txt
│   └── run.py

The Build stage is also responsible for deploying the Lambda custom resources referenced in the CloudFormation stacks.

Step 3: Wait for Training Job

When the training and baseline job is complete, you can inspect the metrics associated with the Experiment and Trial component linked to the pipeline execution ID.

In addition to the train metrics, there is another job to create a baseline, which outputs statistics and constraints used for model monitoring after the model is deployed to production. The following table summarizes the parameters.

TrialComponentName DisplayName SageMaker.InstanceType train:rmse – Last validation:rmse – Last
mlops-nyctaxi-pbl-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba-aws-processing-job Baseline ml.m5.xlarge NaN NaN
mlops-nyctaxi-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba-aws-training-job Training ml.m4.xlarge 2.69262 2.73961

These actions run in parallel using custom AWS CloudFormation resources that poll the jobs every 2 minutes to check on their status. See the following screenshot.

Step 4: Test Dev Deployment

With the training job complete, the development endpoint is deployed in the next stage. The notebook polls the endpoint status until it becomes InService. See the following code:

sm = boto3.client('sagemaker')

while True:
    try:
        response = sm.describe_endpoint(EndpointName=dev_endpoint_name)
        print("Endpoint status: {}".format(response['EndpointStatus']))
        if response['EndpointStatus'] == 'InService':
            break
    except:
        pass 
    time.sleep(10)

With an endpoint in service, you can use the notebook to predict the expected fare amount based on the inputs from the test dataset. See the following code:

pred_df = pd.DataFrame({'total_amount_predictions': predictions })
pred_df = test_df.join(pred_df) # Join on all
pred_df['error'] = abs(pred_df['total_amount']-pred_df['total_amount_predictions'])

ax = pred_df.tail(1000).plot.scatter(x='total_amount_predictions', y='total_amount', 
                                     c='error', title='actual amount vs pred')

You can join these predictions back to the target total amount, and visualize them in a scatter plot.

The notebook also calculates the root mean square error, which is commonly used in regression problems like this.

Step 5: Approve Prod Deployment

If you’re happy with the model, you can approve it directly using the Jupyter notebook widget.

As an administrator, you can also approve or reject the manual approval on the AWS CodePipeline console.

Approving this action moves the pipeline to the final blue/green production deployment stage.

Step 6: Test Prod Deployment

Production deployment is managed through a single AWS CloudFormation which performs several dependent actions, including:

  1. Creates an Amazon SageMaker endpoint with AutoScaling enabled
  2. Updates the endpoints to enable data capture and schedule model monitoring
  3. Calls CodeDeploy to create or update a RESTful API using blue/green Lambda deployment

The following diagram illustrates this workflow.

 

The first time this pipeline is run, there’s no existing Blue deployment, so CodeDeploy creates a new API Gateway and Lambda resource, which is configured to invoke an Amazon SageMaker endpoint that has been configured with AutoScaling and data capture enabled.

Rerunning the build pipeline

If you go back to Step 2 in the notebook and upload a new Amazon S3 data source artifact, you trigger a new build in the pipeline, which results in an update to the production deployment CloudFormation stack. This results in a new Lambda version pointing to a second Green Amazon SageMaker endpoint being created in parallel with the original Blue endpoint.

You can use the notebook to query the most recent events in the CloudFormation stack associated with this deployment. See the following code:

from datetime import datetime
from dateutil.tz import tzlocal

def get_event_dataframe(events):
    stack_cols = [
        "LogicalResourceId",
        "ResourceStatus",
        "ResourceStatusReason",
        "Timestamp",
    ]
    stack_event_df = pd.DataFrame(events)[stack_cols].fillna("")
    stack_event_df["TimeAgo"] = datetime.now(tzlocal()) - stack_event_df["Timestamp"]
    return stack_event_df.drop("Timestamp", axis=1)

# Get latest stack events
response = cfn.describe_stack_events(StackName=stack_name)
get_event_dataframe(response["StackEvents"]).head()

The output shows the latest events and, for this post, illustrates that the endpoint has been updated for data capture and CodeDeploy is in the process of switching traffic to the new endpoint. The following table summarizes the output.

LogicalResourceId ResourceStatus ResourceStatusReason TimeAgo
PreTrafficLambdaFunction UPDATE_COMPLETE 00:06:17.143352
SagemakerDataCapture UPDATE_IN_PROGRESS 00:06:17.898352
PreTrafficLambdaFunction UPDATE_IN_PROGRESS 00:06:18.114352
Endpoint UPDATE_COMPLETE 00:06:20.911352
Endpoint UPDATE_IN_PROGRESS Resource creation Initiated 00:12:56.016352

When the Blue endpoint status is InService, the notebook outputs a link to the CodeDeploy Deployment Application page, where you can watch as the traffic shifts from the original to the replacement endpoint.

A successful blue/green deployment is contingent on the post-deployment validation passing, which for this post is configured to check that live traffic has been received; evident by data capture logs in Amazon S3.

The notebook guides you through the process of sending requests to the RESTful API, which is provided as an output in the CloudFormation stack. See the following code:

from urllib import request

headers = {"Content-type": "text/csv"}
payload = test_df[test_df.columns[1:]].head(1).to_csv(header=False, index=False).encode('utf-8')

while True:
    try:
        resp = request.urlopen(request.Request(outputs['RestApi'], data=payload, headers=headers))
        print("Response code: %d: endpoint: %s" % (resp.getcode(), resp.getheader('x-sagemaker-endpoint')))
        status, outputs = get_stack_status(stack_name) 
        if status.endswith('COMPLETE'):
            print('Deployment completen')
            break
    except Exception as e:
        pass
    time.sleep(10)

This cell loops every 10 seconds until the deployment is complete, retrieving a header that indicates which Amazon SageMaker endpoint was hit for that request. Because we’re using the canary mode, you see a small sample of hits from the new target endpoint (ending in c3e945b2b3ba) until the CodeDeploy process completes successfully, at which point the original endpoint (ending in 5e62980afced) is deleted because it’s no longer required. See the following output:

Response code: 200: endpoint: mlops-nyctaxi-prd-b7e92138-1aad-4197-8cfb-5e62980afced
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-b7e92138-1aad-4197-8cfb-5e62980afced
Response code: 200: endpoint: mlops-nyctaxi-prd-b7e92138-1aad-4197-8cfb-5e62980afced
Response code: 200: endpoint: mlops-nyctaxi-prd-b7e92138-1aad-4197-8cfb-5e62980afced
Response code: 200: endpoint: mlops-nyctaxi-prd-b7e92138-1aad-4197-8cfb-5e62980afced
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba
Response code: 200: endpoint: mlops-nyctaxi-prd-4baf03ab-e738-4c3e-9f89-c3e945b2b3ba

Step 7: Model Monitoring

As part of the production deployment, Amazon SageMaker Model Monitor is scheduled to run every hour on the newly created endpoint, which has been configured to capture data request input and output data to Amazon S3.

You can use the notebook to list these data capture files, which are collected as a series of JSON lines. See the following code:

bucket = sagemaker_session.default_bucket()
data_capture_logs_uri = 's3://{}/{}/datacapture/{}'.format(bucket, model_name, prd_endpoint_name)

capture_files = S3Downloader.list(data_capture_logs_uri)
print('Found {} files'.format(len(capture_files)))

if capture_files:
    # Get the first line of the most recent file    
    event = json.loads(S3Downloader.read_file(capture_files[-1]).split('n')[0])
    print('nLast file:n{}'.format(json.dumps(event, indent=2)))

If you take the first line of the last file, you can see that the input contains a CSV with fields for the following:

  • Duration (10.65 minutes)
  • Passenger count (1 person)
  • Trip distance (2.56 miles)

The output is the following:

  • Predicted fare ($12.70)

See the following code:

Found 8 files

Last file:
{
  "captureData": {
    "endpointInput": {
      "observedContentType": "text/csv",
      "mode": "INPUT",
      "data": "10.65,1,2.56n",
      "encoding": "CSV"
    },
    "endpointOutput": {
      "observedContentType": "text/csv; charset=utf-8",
      "mode": "OUTPUT",
      "data": "12.720224380493164",
      "encoding": "CSV"
    }
  },
  "eventMetadata": {
    "eventId": "44daf7d7-97c8-4504-8b3d-399891f8f217",
    "inferenceTime": "2020-05-12T04:18:39Z"
  },
  "eventVersion": "0"
}

The monitoring job rolls up all the results in the last hour and compares these to the baseline statistics and constraints captured in the Train phase of the pipeline. As we can see from the notebook visualization, we can detect baseline drift with respect to the total_amount and trip_distance inputs, which were randomly sampled from our test set.

Step 8: CloudWatch Monitoring

AWS CloudWatch Synthetics provides you a way to set up a canary to test that your API is returning a successful HTTP 200 response on a regular interval. This is a great way to validate that the blue/green deployment isn’t causing any downtime for your end-users.

The notebook loads the canary.js template, which is parameterized with a rest_uri and payload invoked from the Lambda layer created by the canary, which is configured to hit the production REST API every 10 minutes. See the following code:

from urllib.parse import urlparse
from string import Template
from io import BytesIO
import zipfile
import sagemaker

# Format the canary_js with rest_api and payload
rest_url = urlparse(rest_api)
with open('canary.js') as f:
    canary_js = Template(f.read()).substitute(hostname=rest_url.netloc, 
                                              path=rest_url.path, 
                                              data=payload.decode('utf-8').strip())
# Write the zip file
zip_buffer = BytesIO()
with zipfile.ZipFile(zip_buffer, 'w') as zf:
    zip_path = 'nodejs/node_modules/apiCanaryBlueprint.js' # Set a valid path
    zip_info = zipfile.ZipInfo(zip_path)
    zip_info.external_attr = 0o0755 << 16 # Ensure the file is readable
    zf.writestr(zip_info, canary_js)
zip_buffer.seek(0)

# Create the canary
synth = boto3.client('synthetics')

role = sagemaker.get_execution_role()
s3_canary_uri = 's3://{}/{}'.format(artifact_bucket, model_name)
canary_name = 'mlops-{}'.format(model_name)

response = synth.create_canary(
    Name=canary_name,
    Code={
        'ZipFile': bytearray(zip_buffer.read()),
        'Handler': 'apiCanaryBlueprint.handler'
    },
    ArtifactS3Location=s3_canary_uri,
    ExecutionRoleArn=role,
    Schedule={ 
        'Expression': 'rate(10 minutes)', 
        'DurationInSeconds': 0 },
    RunConfig={
        'TimeoutInSeconds': 60,
        'MemoryInMB': 960
    },
    SuccessRetentionPeriodInDays=31,
    FailureRetentionPeriodInDays=31,
    RuntimeVersion='syn-1.0',
)

You can also configure an alarm for this canary to alert when the success rates drop below 90%:

cloudwatch = boto3.client('cloudwatch')

canary_alarm_name = '{}-synth-lt-threshold'.format(canary_name)

response = cloudwatch.put_metric_alarm(
    AlarmName=canary_alarm_name,
    ComparisonOperator='LessThanThreshold',
    EvaluationPeriods=1,
    DatapointsToAlarm=1,
    Period=600, # 10 minute interval
    Statistic='Average',
    Threshold=90.0,
    ActionsEnabled=False,
    AlarmDescription='SuccessPercent LessThanThreshold 90%',
    Namespace='CloudWatchSynthetics',
    MetricName='SuccessPercent',
    Dimensions=[
        {
          'Name': 'CanaryName',
          'Value': canary_name
        },
    ],
    Unit='Seconds'
)

With the canary created, you can choose the link provided to view the detail on the AWS CloudWatch console, which includes metrics such as uptime and as the logs output from your Lambda code.

Returning to the notebook, create a CloudWatch dashboard from a template parametrized with the current region, account_id and model_name:

sts = boto3.client('sts')
account_id = sts.get_caller_identity().get('Account')
dashboard_name = 'mlops-{}'.format(model_name)

with open('dashboard.json') as f:
    dashboard_body = Template(f.read()).substitute(region=region, 
                                                   account_id=account_id, 
                                                   model_name=model_name)
    response = cloudwatch.put_dashboard(
        DashboardName=dashboard_name,
        DashboardBody=dashboard_body
    )

This creates a dashboard with a four-row-by-three-column layout with metrics for your production deployment, including the following:

  • Lambda metrics for latency and throughput
  • Amazon SageMaker endpoint
  • CloudWatch alarms for CodeDeploy and model drift

You can choose the link to open the dashboard in full screen and use Dark mode. See the following screenshot.

Cleaning up

You can remove the canary and CloudWatch dashboard directly within the notebook. The pipeline created Amazon SageMaker training, baseline jobs, and endpoints using AWS CloudFormation, so to clean up these resources, delete the stacks prefixed with the name of your model. For example, for nyctaxi, the resources are the following:

  • nyctaxi-devploy-prd
  • nyctaxi-devploy-dev
  • nyctaxi-training-job
  • nyctaxi-suggest-baseline

After these are deleted, complete your cleanup by emptying the S3 bucket created to store your pipeline artefacts, and delete the original stack, which removes the pipeline, Amazon SageMaker notebook, and other resources.

Conclusion

In this post, we walked through creating an end-to-end safe deployment pipeline for Amazon SageMaker models using native AWS development tools CodePipeline, CodeBuild, and CodeDeploy. We demonstrated how you can trigger the pipeline directly from within an Amazon SageMaker notebook, validate and approve deployment to production, and continue to monitor your models after they go live. By running the pipeline a second time, we saw how CodeDeploy created a new Green replacement endpoint that was cut over to after the post-deployment validated it had received live traffic. Finally, we saw how to use CloudWatch Synthetics to constantly monitor our live endpoints and visualize metrics in a custom CloudWatch dashboard.

You could easily extend this solution to support your own dataset and model. The source code is available on the GitHub repo.


About the Author

Julian Bright is an Sr. AI/ML Specialist Solutions Architect based out of Melbourne, Australia. Julian works as part of the global AWS machine learning team and is passionate about helping customers realise their AI and ML journey through MLOps. In his spare time he loves running around after his kids, playing soccer and getting outdoors.

Read More

Deploying your own data processing code in an Amazon SageMaker Autopilot inference pipeline

The machine learning (ML) model-building process requires data scientists to manually prepare data features, select an appropriate algorithm, and optimize its model parameters. It involves a lot of effort and expertise. Amazon SageMaker Autopilot removes the heavy lifting required by this ML process. It inspects your dataset, generates several ML pipelines, and compares their performance to produce a leaderboard of candidate pipelines. Each candidate pipeline is a combination of data preprocessing steps, an ML algorithm, and its optimized hyperparameters. You can easily deploy any of these candidate pipelines to use for real-time prediction or batch prediction.

But what if you want to preprocess the data before invoking Amazon SageMaker Autopilot? For example, you might have a dataset with several features and need customized feature selection to remove irrelevant variables before using it to train a model in an Autopilot job. Then you need to incorporate your custom processing code into the pipeline when deploying it to a real-time endpoint or for batch processing. This post shows you how to customize an Autopilot inference pipeline with your own data processing code. The code from this post is available in the GitHub repo.

Solution overview

The solution of customizing a pipeline that combines custom feature selection with Autopilot models includes the following steps:

  1. Prepare a dataset with 100 features as the example dataset for this post and upload it to Amazon Simple Storage Service (Amazon S3).
  2. Train the feature selection model and prepare the dataset using sagemaker-scikit-learn-container to feed to Autopilot.
  3. Configure and launch the Autopilot job.
  4. Create an inference pipeline that combines feature selection with the Autopilot models.
  5. Make predictions with the inference pipeline.

The following diagram outlines the architecture of this workflow.

Preparing and uploading the dataset

First, we generate a regression dataset using sklearn.datasets.make_regression. Set the number of features to 100. Five of these features are informative. The 100 variable names are indexed as x_i and the name of the target variable is y:

X, y = make_regression(n_features = 100, n_samples = 1500, n_informative = 5, random_state=0)
df_X = pd.DataFrame(X).rename(columns=lambda x: 'x_'+ str(x))
df_y = pd.DataFrame(y).rename(columns=lambda x: 'y')
df = pd.concat([df_X, df_y], axis=1)

The following screenshot shows the data generated. You upload this dataset to Amazon S3 to use in later steps.

Training the feature selection model and preparing the dataset

Feature selection is the process of selecting a subset of the most relevant features on which to train an ML model. This simplification shortens training time and reduces the chance of overfitting. The sklearn.feature_selection module contains several feature selection algorithms. For this post, we use the following:

  • feature_selection.RFE – The recursive feature elimination (RFE) algorithm selects features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained. Then, the least important features are pruned from the current set of features. We use Epsilon-Support Vector Regression (sklearn.svm.SVR) as our learning estimator for RFE.
  • feature_selection.SelectKBest – The SelectKBest algorithm selects the k features that have the highest scores of a specified metric. We use mutual information and f regression as the score functions—both methods measure the dependency between variables. For more information about f regression and mutual information, see Feature Selection.

We stack these three feature selection algorithms into one sklearn.pipeline.Pipeline. RFE by default eliminates 50% of the total features. We use SelectKBest to select the top 30 features using the f_regression method and reduce the number of features to 10 using the mutual_info_regression method. Note that the feature selection algorithms used here are for demonstration purposes only. You can update the script to incorporate feature selection algorithm of your choice.

We also create a Python script for feature selection. In the following code example, we build a sklearn Pipeline object that implements the method we described:

'''Feature selection pipeline'''
feature_selection_pipe = pipe = Pipeline([
                 ('svr', RFE(SVR(kernel="linear"))),
                 ('f_reg',SelectKBest(f_regression, k=30)),
                ('mut_info',SelectKBest(mutual_info_regression, k=10))
                ])
feature_selection_pipe.fit(X_train,y_train)

To provide visibility on which features are selected, we use the following script to generate and save the names of selected features as a list:

  '''Save selected feature names'''
    feature_names = concat_data.columns[:-1]
    feature_names = feature_names[pipe.named_steps['svr'].get_support()]
    feature_names = feature_names[pipe.named_steps['f_reg'].get_support()]
    feature_names = feature_names[pipe.named_steps['mut_info'].get_support()]

We use the Amazon SageMaker SKLearn Estimator with a feature selection script as an entry point. The script is very similar to a training script you might run outside of Amazon SageMaker, but you can access useful properties about the training environment through various environment variables, such as SM_MODEL_DIR, which represents the path to the directory inside the container to write model artifacts to. These artifacts are uploaded to the Amazon S3 output path by the Amazon SageMaker training job. After training is complete, we save model artifacts and selected column names for use during inference to SM_MODEL_DIR. See the following code:

    joblib.dump(feature_selection_pipe, os.path.join(args.model_dir, "model.joblib"))
    ...
    joblib.dump(feature_selection_pipe, os.path.join(args.model_dir, "selected_feature_names.joblib"))

Although we use feature selection algorithms in this post, you can customize and add additional data preprocessing code, such as code for data imputation or other forms of data cleaning, to this entry point script.

Now that our feature selection model is properly fitted, we transform the raw input data to the training dataset with selected features. To use Amazon SageMaker batch transform to directly process the raw data and store back to Amazon S3, enter the following code:

# Define a SKLearn Transformer from the trained SKLearn Estimator
transformer_output = os.path.join('s3://',bucket, prefix, 'Feature_selection_output/')
transformer = sklearn_preprocessor.transformer(
    instance_count=1, 
    instance_type='ml.m4.xlarge',
    output_path = transformer_output,
    assemble_with = 'Line',
    accept = 'text/csv')
    
transformer.transform(train_input, content_type='text/csv') 

The notebook contains an additional step that adds the selected column names as headers to the generated CSV data files.

Configuring and launching the Autopilot job

The output from batch transform is the new training dataset for Autopilot. The new dataset has 10 features. To use Autopilot, we simply provide our new dataset and choose the target column to be y. Autopilot automatically inspects our dataset and runs several candidates to determine the optimal combination of data preprocessing steps, ML algorithms, and hyperparameters. Before launching the Autopilot job, we define the job input configuration, output configuration, and stopping criteria:

input_data_config = [{
      'DataSource': {
        'S3DataSource': {
          'S3DataType': 'S3Prefix',
          'S3Uri': 's3://{}/{}/training_data_new'.format(bucket,prefix)
        }
      },
      'TargetAttributeName': 'y'
    }
  ]

output_data_config = {
    'S3OutputPath': 's3://{}/{}/autopilot_job_output'.format(bucket,prefix)
  }

AutoML_Job_Config = {
    'CompletionCriteria': {
            'MaxCandidates': 50,
            'MaxAutoMLJobRuntimeInSeconds': 1800
        }
  }

Then we call the create_auto_ml_job API to launch the Autopilot job:

sm = boto3.Session().client(service_name='sagemaker',region_name=region)
timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())

auto_ml_job_name = 'automl-blog' + timestamp_suffix
print('AutoMLJobName: ' + auto_ml_job_name)

sm.create_auto_ml_job(AutoMLJobName=auto_ml_job_name,
                      InputDataConfig=input_data_config,
                      OutputDataConfig=output_data_config,
                      AutoMLJobConfig = AutoML_Job_Config,
                      RoleArn=role)

Creating an inference pipeline that combines feature selection with Autopilot models

So far, we have created a model that takes raw data with 100 features and selects the 10 most relevant features. We also used Autopilot to create data processing and ML models to predict y. We now combine the feature selection model with Autopilot models to create an inference pipeline. After defining the models and assigning names, we create a PipelineModel that points to our preprocessing and prediction models. The pipeline.py file is available on GitHub. See the following code:

sklearn_image = sklearn_preprocessor.image_name
container_1_source = os.path.join("s3://", 
                                  sagemaker_session.default_bucket(), 
                                  sklearn_preprocessor.latest_training_job.job_name,
                                  "sourcedir.tar.gz"
                                 )
inference_containers = [
        {
            'Image': sklearn_image,
            'ModelDataUrl': sklearn_preprocessor.model_data,
            'Environment': {
                'SAGEMAKER_SUBMIT_DIRECTORY':container_1_source,
                'SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT': "text/csv",
                'SAGEMAKER_PROGRAM':'sklearn_feature_selection.py'
            }
        }]

inference_containers.extend(best_candidate['InferenceContainers'])

response = sagemaker.create_model(
        ModelName=pipeline_name,
        Containers=inference_containers,
        ExecutionRoleArn=role)

We then deploy the pipeline model to a single endpoint:

response = sagemaker.create_endpoint(
        EndpointName=pipeline_endpoint_name,
        EndpointConfigName=pipeline_endpoint_config_name,
    )

Making predictions with the inference pipeline

We can test our pipeline by sending data for prediction. The pipeline accepts raw data, transforms it using the feature selection model, and creates a prediction using the models Autopilot generated.

First, we define a payload variable that contains the data we want to send through the pipeline. We use the first five rows of the training data as our payload. Then we define a predictor using our pipeline endpoint, send the payload to the predictor, and print the model prediction:

from sagemaker.predictor import RealTimePredictor, csv_serializer
from sagemaker.content_types import CONTENT_TYPE_CSV
predictor = RealTimePredictor(
    endpoint=pipeline_endpoint_name,
    serializer=csv_serializer,
    sagemaker_session=sagemaker_session,
    content_type=CONTENT_TYPE_CSV,
    accept=CONTENT_TYPE_CSV)

predictor.content_type = 'text/csv'
predictor.predict(test_data.to_csv(sep=',', header=True, index=False)).decode('utf-8')

Our Amazon SageMaker endpoint returns one prediction for each corresponding row of the data sent. See the following code:

'-102.248855591n-165.823532104n115.50453186n111.306632996n5.91651535034'

Deleting the endpoint

When we are finished with the endpoint, we delete it to save cost:

sm_client = sagemaker_session.boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=pipeline_endpoint_name)

Conclusions

In this post, we demonstrated how to customize an Autopilot inference pipeline with your own data processing code. We first trained a feature selection model and converted our raw data using the trained feature selection model. Then we launched an Amazon SageMaker Autopilot job that automatically trained and tuned the best ML models for our regression problem. We also built an inference pipeline that combined feature selection with the Autopilot models. Lastly, we made predictions with the inference pipeline. For more about Amazon SageMaker Pilot, please see Amazon SageMaker Autopilot.


About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

 

 

 

Piali Das is a Senior Software Engineer in the AWS SageMaker Autopilot team. She previously contributed to building SageMaker Algorithms. She enjoys scientific programming in general and has developed an interest in machine learning and distributed systems.

 

 

 

 

Read More