Customize business rules for intelligent document processing with human review and BI visualization

A massive amount of business documents are processed daily across industries. Many of these documents are paper-based, scanned into your system as images, or in an unstructured format like PDF. Each company may apply unique rules associated with its business background while processing these documents. How to extract information accurately and process them flexibly is a challenge many companies face.

Amazon Intelligent Document Processing (IDP) allows you to take advantage of industry-leading machine learning (ML) technology without previous ML experience. This post introduces a solution included in the Amazon IDP workshop showcasing how to process documents to serve flexible business rules using Amazon AI services. You can use the following step-by-step Jupyter notebook to complete the lab.

Amazon Textract helps you easily extract text from various documents, and Amazon Augmented AI (Amazon A2I) allows you to implement a human review of ML predictions. The default Amazon A2I template allows you to build a human review pipeline based on rules, such as when the extraction confidence score is lower than a pre-defined threshold or required keys are missing. But in a production environment, you need the document processing pipeline to support flexible business rules, such as validating the string format, verifying the data type and range, and validating fields across documents. This post shows how you can use Amazon Textract and Amazon A2I to customize a generic document processing pipeline supporting flexible business rules.

Solution overview

For our sample solution, we use the Tax Form 990, a US IRS (Internal Revenue Service) form that provides the public with financial information about a non-profit organization. For this example, we only cover the extraction logic for some of the fields on the first page of the form. You can find more sample documents on the IRS website.

The following diagram illustrates the IDP pipeline that supports customized business rules with human review.IDP HITM Overview

The architecture is composed of three logical stages:

  • Extraction – Extract data from the 990 Tax Form (we use page 1 as an example).

    • Retrieve a sample image stored in an Amazon Simple Storage Service (Amazon S3) bucket.
    • Call the Amazon Textract analyze_document API using the Queries feature to extract text from the page.
  • Validation – Apply flexible business rules with a human-in-the-loop review.

    • Validate the extracted data against business rules, such as validating the length of an ID field.
    • Send the document to Amazon A2I for a human to review if any business rules fail.
    • Reviewers use the Amazon A2I UI (a customizable website) to verify the extraction result.
  • BI visualization – We use Amazon QuickSight to build a business intelligence (BI) dashboard showing the process insights.

Customize business rules

You can define a generic business rule in the following JSON format. In the sample code, we define three rules:

  • The first rule is for the employer ID field. The rule fails if the Amazon Textract confidence score is lower than 99%. For this post, we set the confidence score threshold high, which will break by design. You could adjust the threshold to a more reasonable value to reduce unnecessary human effort in a real-world environment, such as 90%.
  • The second rule is for the DLN field (the unique identifier of the tax form), which is required for the downstream processing logic. This rule fails if the DLN field is missing or has an empty value.
  • The third rule is also for the DLN field but with a different condition type: LengthCheck. The rule breaks if the DLN length is not 16 characters.

The following code shows our business rules in JSON format:

rules = [
    {
        "description": "Employee Id confidence score should greater than 99",
        "field_name": "d.employer_id",
        "field_name_regex": None, # support Regex: "_confidence$",
        "condition_category": "Confidence",
        "condition_type": "ConfidenceThreshold",
        "condition_setting": "99",
    },
    {
        "description": "dln is required",
        "field_name": "dln",
        "condition_category": "Required",
        "condition_type": "Required",
        "condition_setting": None,
    },
    {
        "description": "dln length should be 16",
        "field_name": "dln",
        "condition_category": "LengthCheck",
        "condition_type": "ValueRegex",
        "condition_setting": "^[0-9a-zA-Z]{16}$",
    }
]

You can expand the solution by adding more business rules following the same structure.

Extract text using an Amazon Textract query

In the sample solution, we call the Amazon Textract analyze_document API query feature to extract fields by asking specific questions. You don’t need to know the structure of the data in the document (table, form, implied field, nested data) or worry about variations across document versions and formats. Queries use a combination of visual, spatial, and language cues to extract the information you seek with high accuracy.

To extract value for the DLN field, you can send a request with questions in natural languages, such as “What is the DLN?” Amazon Textract returns the text, confidence, and other metadata if it finds corresponding information on the image or document. The following is an example of an Amazon Textract query request:

textract.analyze_document(
        Document={'S3Object': {'Bucket': data_bucket, 'Name': s3_key}},
        FeatureTypes=["QUERIES"],
        QueriesConfig={
                'Queries': [
                    {
                        'Text': 'What is the DLN?',
                       'Alias': 'The DLN number - unique identifier of the form'
                    }
               ]
        }
)

Define the data model

The sample solution constructs the data in a structured format to serve the generic business rule evaluation. To keep extracted values, you can define a data model for each document page. The following image shows how the text on page 1 maps to the JSON fields.Custom data model

Each field represents a document’s text, check box, or table/form cell on the page. The JSON object looks like the following code:

{
    "dln": {
        "value": "93493319020929",
        "confidence": 0.9765, 
        "block": {} 
    },
    "omb_no": {
        "value": "1545-0047",
        "confidence": 0.9435,
        "block": {}
    },
    ...
}

You can find the detailed JSON structure definition in the GitHub repo.

Evaluate the data against business rules

The sample solution comes with a Condition class—a generic rules engine that takes the extracted data (as defined in the data model) and the rules (as defined in the customized business rules). It returns two lists with failed and satisfied conditions. We can use the result to decide if we should send the document to Amazon A2I for human review.

The Condition class source code is in the sample GitHub repo. It supports basic validation logic, such as validating a string’s length, value range, and confidence score threshold. You can modify the code to support more condition types and complex validation logic.

Create a customized Amazon A2I web UI

Amazon A2I allows you to customize the reviewer’s web UI by defining a worker task template. The template is a static webpage in HTML and JavaScript. You can pass data to the customized reviewer page using the Liquid syntax.

In the sample solution, the custom Amazon A2I UI template displays the page on the left and the failure conditions on the right. Reviewers can use it to correct the extraction value and add their comments.

The following screenshot shows our customized Amazon A2I UI. It shows the original image document on the left and the following failed conditions on the right:

  • The DLN numbers should be 16 characters long. The actual DLN has 15 characters.
  • The confidence score of employer_id is lower than 99%. The actual confidence score is around 98%.

The reviewers can manually verify these results and add comments in the CHANGE REASON text boxes.Customized A2I review UI

For more information about integrating Amazon A2I into any custom ML workflow, refer to over 60 pre-built worker templates on the GitHub repo and Use Amazon Augmented AI with Custom Task Types.

Process the Amazon A2I output

After the reviewer using the Amazon A2I customized UI verifies the result and chooses Submit, Amazon A2I stores a JSON file in the S3 bucket folder. The JSON file includes the following information on the root level:

  • The Amazon A2I flow definition ARN and human loop name
  • Human answers (the reviewer’s input collected by the customized Amazon A2I UI)
  • Input content (the original data sent to Amazon A2I when starting the human loop task)

The following is a sample JSON generated by Amazon A2I:

{
  "flowDefinitionArn": "arn:aws:sagemaker:us-east-1:711334203977:flow-definition/a2i-custom-ui-demo-workflow",
  "humanAnswers": [
    {
      "acceptanceTime": "2022-08-23T15:23:53.488Z",
      "answerContent": {
        "Change Reason 1": "Missing X at the end.",
        "True Value 1": "93493319020929X",
        "True Value 2": "04-3018996"
      },
      "submissionTime": "2022-08-23T15:24:47.991Z",
      "timeSpentInSeconds": 54.503,
      "workerId": "94de99f1bc6324b8",
      "workerMetadata": {
        "identityData": {
          "identityProviderType": "Cognito",
          "issuer": "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_URd6f6sie",
          "sub": "cef8d484-c640-44ea-8369-570cdc132d2d"
        }
      }
    }
  ],
  "humanLoopName": "custom-loop-9b4e67ff-2c9f-40f9-aae5-0e26316c905c",
  "inputContent": {...} # the original input send to A2I when starting the human review task
}

You can implement extract, transform, and load (ETL) logic to parse information from the Amazon A2I output JSON and store it in a file or database. The sample solution comes with a CSV file with processed data. You can use it to build a BI dashboard by following the instructions in the next section.

Create a dashboard in Amazon QuickSight

The sample solution includes a reporting stage with a visualization dashboard served by Amazon QuickSight. The BI dashboard shows key metrics such as the number of documents processed automatically or manually, the most popular fields that required human review, and other insights. This dashboard can help you get an oversight of the document processing pipeline and analyze the common reasons causing human review. You can optimize the workflow by further reducing human input.

The sample dashboard includes basic metrics. You can expand the solution using Amazon QuickSight to show more insights into the data.BI dashboard

Expand the solution to support more documents and business rules

To expand the solution to support more document pages with corresponding business rules, you need to make the following changes:

  • Create a data model for the new page in JSON structure representing all the values you want to extract out of the pages. Refer to the Define the data model section for a detailed format.
  • Use Amazon Textract to extract text out of the document and populate values to the data model.
  • Add business rules corresponding to the page in JSON format. Refer to the Customize business rules section for the detailed format.

The custom Amazon A2I UI in the solution is generic, which doesn’t require a change to support new business rules.

Conclusion

Intelligent document processing is in high demand, and companies need a customized pipeline to support their unique business logic. Amazon A2I also offers a built-in template integrated with Amazon Textract to implement your human review use cases. It also allows you to customize the reviewer page to serve flexible requirements.

This post guided you through a reference solution using Amazon Textract and Amazon A2I to build an IDP pipeline that supports flexible business rules. You can try it out using the Jupyter notebook in the GitHub IDP workshop repo.


About the authors

Lana Zhang is a Sr. Solutions Architect at the AWS WWSO AI Services team with expertise in AI and ML for intelligent document processing and content moderation. She is passionate about promoting AWS AI services and helping customers transform their business solutions.


Sonali Sahu is leading Intelligent Document Processing AI/ML Solutions Architect team at Amazon Web Services. She is a passionate technophile and enjoys working with customers to solve complex problems using innovation. Her core area of focus are Artificial Intelligence & Machine Learning for Intelligent Document Processing.

Read More

Automate classification of IT service requests with an Amazon Comprehend custom classifier

Enterprises often deal with large volumes of IT service requests. Traditionally, the burden is put on the requester to choose the correct category for every issue. A manual error or misclassification of a ticket usually means a delay in resolving the IT service request. This can result in reduced productivity, a decrease in customer satisfaction, an impact to service level agreements (SLAs), and broader operational impacts. As your enterprise grows, the problem of getting the right service request to the right team becomes even more important. Using an approach based on machine learning (ML) and artificial intelligence can help with your enterprise’s ever-evolving needs.

Supervised ML is a process that uses labeled datasets and outputs to train learning algorithms on how to classify data or predict an outcome. Amazon Comprehend is a natural language processing (NLP) service that uses ML to uncover valuable insights and connections in text. It provides APIs powered by ML to extract key phrases, entities, sentiment analysis, and more.

In this post, we show you how to implement a supervised ML model that can help classify IT service requests automatically using Amazon Comprehend custom classification. Amazon Comprehend custom classification helps you customize Amazon Comprehend for your specific requirements without the skillset required to build ML-based NLP solutions. With automatic ML, or AutoML, Amazon Comprehend custom classification builds customized NLP models on your behalf, using the training data that you provide.

Overview of solution

To illustrate the IT service request classification, this solution uses the SEOSS dataset. This dataset is a systematically retrieved dataset consisting of 33 open-source software projects that contains a large number of typed artifacts and trace links between them. This solution uses the issue data from these 33 open-source projects, summaries, and descriptions as reported by end-users to build a custom classifier model using Amazon Comprehend.

This post demonstrates how to implement and deploy the solution using the AWS Cloud Development Kit (AWS CDK) in an isolated Amazon Virtual Private Cloud (Amazon VPC) environment consisting of only private subnets. We also use the code to demonstrate how you can use the AWS CDK provider framework, a mini-framework for implementing a provider for AWS CloudFormation custom resources to create, update, or delete a custom resource, such as an Amazon Comprehend endpoint. The Amazon Comprehend endpoint includes managed resources that make your custom model available for real-time inference to a client machine or third-party applications. The code for this solution is available on Github.

You use the AWS CDK to deploy the infrastructure, application code, and configuration for the solution. You also need an AWS account and the ability to create AWS resources. You use the AWS CDK to create AWS resources such as a VPC with private subnets, Amazon VPC endpoints, Amazon Elastic File System (Amazon EFS), an Amazon Simple Notification Service (Amazon SNS) topic, an Amazon Simple Storage Service (Amazon S3) bucket, Amazon S3 event notifications, and AWS Lambda functions. Collectively, these AWS resources constitute the training stack, which you use to build and train the custom classifier model.

After you create these AWS resources, you download the SEOSS dataset and upload the dataset to the S3 bucket created by the solution. If you’re deploying this solution in AWS Region us-east-2, the format of the S3 bucket name is comprehendcustom-<AWS-ACCOUNT-NUMBER>-us-east-2-s3stack. The solution uses the Amazon S3 multi-part upload trigger to invoke a Lambda function that starts the pre-processing of the input data, and uses the preprocessed data to train the Amazon Comprehend custom classifier to create the custom classifier model. You then use the Amazon Resource Name (ARN) of the custom classifier model to create the inference stack, which creates an Amazon Comprehend endpoint using the AWS CDK provider framework, which you can then use for inferences from a third-party application or client machine.

The following diagram illustrates the architecture of the training stack.

Training stack architecture

The workflow steps are as follows:

  1. Upload the SEOSS dataset to the S3 bucket created as part of the training stack deployment process. This creates an event trigger that invokes the etl_lambda function.
  2. The etl_lambda function downloads the raw data set from Amazon S3 to Amazon EFS.
  3. The etl_lambda function performs the data preprocessing task of the SEOSS dataset.
  4. When the function execution completes, it uploads the transformed data with prepped_data prefix to the S3 bucket.
  5. After the upload of the transformed data is complete, a successful ETL completion message is send to Amazon SNS.
  6. In Amazon Comprehend, you can classify your documents using two modes: multi-class or multi-label. Multi-class mode identifies one and only one class for each document, and multi-label mode identifies one or more labels for each document. Because we want to identify a single class to each document, we train the custom classifier model in multi-class mode. Amazon SNS triggers the train_classifier_lambda function, which initiates the Amazon Comprehend classifier training in a multi-class mode.
  7. The train_classifier_lambda function initiates the Amazon Comprehend custom classifier training.
  8. Amazon Comprehend downloads the transformed data from the prepped_data prefix in Amazon S3 to train the custom classifier model.
  9. When the model training is complete, Amazon Comprehend uploads the model.tar.gz file to the output_data prefix of the S3 bucket. The average completion time to train this custom classifier model is approximately 10 hours.
  10. The Amazon S3 upload trigger invokes the extract_comprehend_model_name_lambda function, which retrieves the custom classifier model ARN.
  11. The function extracts the custom classifier model ARN from the S3 event payload and the response of list-document-classifiers call.
  12. The function sends the custom classifier model ARN to the email address that you had subscribed earlier as part of the training stack creation process. You then use this ARN to deploy the inference stack.

This deployment creates the inference stack, as shown in the following figure. The inference stack provides you with a REST API secured by an AWS Identity and Access Management (IAM) authorizer, which you can then use to generate confidence scores of the labels based on the input text supplied from a third-party application or client machine.

Inference stack architecture

Prerequisites

For this demo, you should have the following prerequisites:

  • An AWS account.
  • Python 3.7 or later, Node.js, and Git in the development machine. The AWS CDK uses specific versions of Node.js (>=10.13.0, except for version 13.0.0 – 13.6.0). A version in active long-term support (LTS) is recommended.
    To install the active LTS version of Node.js, you can use the following install script for nvm and use nvm to install the Node.js LTS version. You can also install the current active LTS Node.js via package manager depending on the operating system of your choice.

    For macOS, you can install the Node.js via package manager using the following instructions.

    For Windows, you can install the Node.js via package manager using the following instructions.

  • AWS CDK v2 is pre-installed if you’re using an AWS Cloud9 IDE. If you’re using AWS Cloud9 IDE, you can skip this step.If you don’t have the AWS CDK installed in the development machine, install AWS CDK v2 globally using the Node Package Manager command npm install -g aws-cdk. This step requires Node.js to be installed in the development machine.
  • Configure your AWS credentials to access and create AWS resources using the AWS CDK. For instructions, refer to Specifying credentials and region.
  • Download the SEOSS dataset consisting of requirements, bug reports, code history, and trace links of 33 open-source software projects. Save the file dataverse_files.zip on your local machine.

SEOSS dataset

Deploy the AWS CDK training stack

For AWS CDK deployment, we start with the training stack. Complete the following steps:

  1. Clone the GitHub repository:
$ git clone https://github.com/aws-samples/amazon-comprehend-custom-automate-classification-it-service-request.git
  1. Navigate to the amazon-comprehend-custom-automate-classification-it-service-request folder:
$ cd amazon-comprehend-custom-automate-classification-it-service-request/

All the following commands are run within the amazon-comprehend-custom-automate-classification-it-service-request directory.

  1. In the amazon-comprehend-custom-automate-classification-it-service-request directory, initialize the Python virtual environment and install requirements.txt with pip:
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install -r requirements.txt
  1. If you’re using the AWS CDK in a specific AWS account and Region for the first time, see the instructions for bootstrapping your AWS CDK environment:
$ cdk bootstrap aws://<AWS-ACCOUNT-NUMBER>/<AWS-REGION>
  1. Synthesize the CloudFormation templates for this solution using cdk synth and use cdk deploy to create the AWS resources mentioned earlier:
$ cdk synth
$ cdk deploy VPCStack EFSStack S3Stack SNSStack ExtractLoadTransformEndPointCreateStack --parameters SNSStack:emailaddressarnnotification=<emailaddress@example.com>

After you enter cdk deploy, the AWS CDK prompts whether you want to deploy changes for each of the stacks called out in the cdk deploy command.

  1. Enter y for each of the stack creation prompts, then the cdk deploy step creates these stacks. Subscribe the email address provide by you to the SNS topic created as part of the cdk deploy.
  2. After cdk deploy completes successfully, create a folder called raw_data in the S3 bucket comprehendcustom-<AWS-ACCOUNT-NUMBER>-<AWS-REGION>-s3stack.
  3. Upload the SEOSS dataset dataverse_files.zip that you downloaded earlier to this folder.

After the upload is complete, the solution invokes the etl_lambda function using an Amazon S3 event trigger to start the extract, transform, and load (ETL) process. After the ETL process completes successfully, a message is sent to the SNS topic, which invokes the train_classifier_lambda function. This function triggers an Amazon Comprehend custom classifier model training. Depending on whether you train your model on the complete SEOSS dataset, training could take up to 10 hours. When the training process is complete, Amazon Comprehend uploads the model.tar.gz file to the output_data prefix in the S3 bucket.

This upload triggers the extract_comprehend_model_name_lambda function using a S3 event trigger that extracts the custom classifier model ARN and sends it to the email address you had subscribed earlier. This custom classifier model ARN is then used to create the inference stack. When the model training is complete, you can view the performance metrics of the custom classifier model by navigating to the version details section in the Amazon Comprehend console (see the following screenshot), or by using the Amazon Comprehend Boto3 SDK.

Perfomance metrics

Deploy the AWS CDK inference stack

Now you’re ready to deploy the inference stack.

  1. Copy the custom classifier model ARN from the email you received and use the following cdk deploy command to create the inference stack.

This command deploys an API Gateway REST API secured by an IAM authorizer, which you use for inference with an AWS user ID or IAM role that just has the execute-api:Invoke IAM privilege. The following cdk deploy command deploys the inference stack. This stack uses the AWS CDK provider framework to create the Amazon Comprehend endpoint as a custom resource, so that creating, deleting, and updating of the Amazon Comprehend endpoint can be done as part of the inference stack lifecycle using the cdk deploy and cdk destroy commands.

Because you need to run the following command after model training is complete, which could take up to 10 hours, ensure that you’re in the Python virtual environment that you initialized in an earlier step and in the amazon-comprehend-custom-automate-classification-it-service-request directory:

$ cdk deploy APIGWInferenceStack --parameters APIGWInferenceStack:documentclassifierarn=<custom classifier model ARN retrieved from email>

For example:

$ cdk deploy APIGWInferenceStack --parameters APIGWInferenceStack:documentclassifierarn=arn:aws:comprehend:us-east-2:111122223333:document-classifier/ComprehendCustomClassifier-11111111-2222-3333-4444-abc5d67e891f/version/v1
  1. After the cdk deploy command completes successfully, copy the APIGWInferenceStack.ComprehendCustomClassfierInvokeAPI value from the console output, and use this REST API to generate inferences from a client machine or a third-party application that has execute-api:Invoke IAM privilege. If you’re running this solution in us-east-2, the format of this REST API is https://<restapi-id>.execute-api.us-east-2.amazonaws.com/prod/invokecomprehendV1.

Alternatively, you can use the test client apiclientinvoke.py from the GitHub repository to send a request to the custom classifier model. Before using the apiclientinvoke.py, ensure that the following prerequisites are in place:

  • You have the boto3 and requests Python package installed using pip on the client machine.
  • You have configured Boto3 credentials. By default, the test client assumes that a profile named default is present and it has the execute-api:Invoke IAM privilege on the REST API.
  • SigV4Auth points to the Region where the REST API is deployed. Update the <AWS-REGION> value to us-east-2 in apiclientinvoke.py if your REST API is deployed in us-east-2.
  • You have assigned the raw_data variable with the text on which you want to make the class prediction or the classification request:
raw_data="""Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis."""
  • You have assigned the restapi variable with the REST API copied earlier:

restapi="https://<restapi-id>.execute-api.us-east-2.amazonaws.com/prod/invokecomprehendV1"

  1. Run the apiclientinvoke.py after the preceding updates:
$ python3 apiclientinvoke.py

You get the following response from the custom classifier model:

{
 "statusCode": 200,
 "body": [
	{
	 "Name": "SPARK",
	 "Score": 0.9999773502349854
	},
	{
	 "Name": "HIVE",
	 "Score": 1.1613215974648483e-05
	},
	{
	 "Name": "DROOLS",
	 "Score": 1.1110682862636168e-06
	}
   ]
}

Amazon Comprehend returns confidence scores for each label that it has attributed correctly. If the service is highly confident about a label, the score will be closer to 1. Therefore, for the Amazon Comprehend custom classifier model that was trained using the SEOSS dataset, the custom classifier model predicts that the text belongs to class SPARK. This classification returned by the Amazon Comprehend custom classifier model can then be used to classify the IT service requests or predict the correct category of the IT service requests, thereby reducing manual errors or misclassification of service requests.

Clean up

To clean up all the resources created in this post that were created as part of the training stack and inference stack, use the following command. This command deletes all the AWS resources created as part of the previous cdk deploy commands:

$ cdk destroy --all

Conclusion

In this post, we showed you how enterprises can implement a supervised ML model using Amazon Comprehend custom classification to predict the category of IT service requests based on either the subject or the description of the request submitted by the end-user. After you build and train a custom classifier model, you can run real-time analysis for custom classification by creating an endpoint. After you deploy this model to an Amazon Comprehend endpoint, it can be used to run real-time inference by third-party applications or other client machines, including IT service management tools. You can then use this inference to predict the defect category and reduce manual errors or misclassifications of tickets. This helps reduce delays for ticket resolution and increases resolution accuracy and customer productivity, which ultimately results in increased customer satisfaction.

You can extend the concepts in this post to other use cases, such as routing business or IT tickets to various internal teams such as business departments, customer service agents, and Tier 2/3 IT support, created either by end-users or through automated means.

References

  • Rath, Michael; Mäder, Patrick, 2019, “The SEOSS Dataset – Requirements, Bug Reports, Code History, and Trace Links for Entire Projects”, https://doi.org/10.7910/DVN/PDDZ4Q, Harvard Dataverse, V1

About the Authors

Arnab Chakraborty is a Sr. Solutions Architect at AWS based out of Cincinnati, Ohio. He is passionate about topics in Enterprise & Solution architecture, Data analytics, Serverless and Machine Learning. In his spare time, he enjoys watching movies , travel shows and sports.

Viral Desai is a Principal Solutions Architect at AWS. With more than 25 years of experiences in information technology, he has been helping customers adopt AWS and modernize their architectures. He likes hiking, and enjoys diving deep with customers on all things AWS.

Read More

Detect fraud in mobile-oriented businesses using GrabDefence device intelligence and Amazon Fraud Detector

In this post, we present a solution that combines rich mobile device intelligence with customized machine learning (ML) modeling to help you catch fraudsters who exploit mobile apps.

GrabDefence (GD), Grab’s proprietary fraud detection and prevention technology, and AWS have launched GDxAFD, a fraud detection solution tailored for mobile apps that integrates GD’s device intelligence capabilities with Amazon Fraud Detector, AWS’s fully managed ML fraud detection solution. With GDxAFD, you can take advantage of more than 20 years of fraud detection expertise from Amazon as well as extensive mobile fraud experience from Southeast Asia’s leading superapp to safeguard your mobile application from fraudsters.

This solution rides on a larger global wave of anti-fraud efforts, which experts forecast to grow to USD $62.70 billion by 2028. With the rise of the digital economy, fraud syndicates increasingly target online businesses, causing financial loss and destroying the trust between end-users and the platform. The true cost to battle fraud is also increasing rapidly as more fraud checks leads to poorer customer experience, false positives, as well as operational burden, which as a whole is estimated to be three times larger than the actual fraud losses from the True Cost of FraudTM APAC Study by LexisNexis® Risk Solutions.

From the combined industry experience, the solution team believes that many of the modus operandi in a mobile environment is driven by fraudsters having tools and methods to create fake accounts at scale and bypass a platform’s security checks on the device, thereby enabling them to exploit the platform for large returns. Therefore, preventing mobile fraud starts from clearly understanding the risk profile of the devices used to access the mobile app and then using the device risk intelligence gathered, together with additional data about the user, event, or account, to detect potential fraudulent behavior in real time and at scale. By combining rich device intelligence and ML, companies are better positioned to stay ahead of mobile-focused fraud syndicates, and reduce fraud on their platforms.

GD device intelligence

GD is a product from Grab’s fraud prevention team, which has years of experience building solutions for Grab. Grab is a NASDAQ listed company and a leading superapp in South East Asia, with over 30 million monthly transacting users (as per Grab’s Q1 2022 Results). Due to the scale of its operations as a leading superapp in SEA and the nature of a mobile-first business, Grab has been investing heavily in building fraud prevention solutions enabled by rich data, technology focus, and insights gathered from its operational experience and exposure. GD’s device intelligence service collects rich device-level data, excluding any personally identifiable information (PII), from mobile application users and securely analyzes it to understand the risk profile of the device. Learning from a large device network built via Grab’s superapp, GD’s device intelligence service can accurately generate device fingerprints and detect risky attributes such as device or app modification or tampering, emulator usage, and GPS spoofing. As mentioned earlier, many fraud modus operandi on mobile platforms involve mass creation of fake accounts, device reengineering, and location spoofing, which GD device intelligence is capable of detecting. As a result, by integrating GD device intelligence and Amazon Fraud Detector, platforms that face similar fraud attacks can expect up to a 23% increase in fraud detection based on statistical studies done by GrabDefence on Grab’s fraud prevention systems.

Custom fraud detection ML models in Amazon Fraud Detector

Amazon Fraud Detector customizes each model it creates to your own dataset, making the accuracy of models higher than one-size-fits-all ML solutions. During the fully automated model training process, a series of models that have learned patterns of fraud from AWS and Amazon’s own fraud expertise are used to boost your model performance even further.

With the GDxAFD solution, you now have step-by-step guidance and a reference architecture for how to use flexible event schemas in Amazon Fraud Detector to add GD device intelligence findings into your custom fraud detector models. The end result is an ML model that, once trained, has the benefit of learning from multiple data sources, including your own historical data, GD’s device intelligence data, fraud patterns seen across Amazon, and additional third-party data (added automatically by Amazon Fraud Detector). Based on our pilot between GD and Amazon Fraud Detector, our model using GD device intelligence has shown a 23% increase in detection performance for detecting fake account registrations. You can deploy these models to detect mobile fraud to prevent not only fake account registration but also fraudulent payments, promotion abuse, or loyalty program abuse, among others.

To get started, you first integrate GD’s mobile SDK into your mobile application to collect device-level data. Next, you use Amazon Fraud Detector to define the event you want to evaluate for fraud by specifying the event and account data points you have available for the event or account, including the device risk intelligence data points from GD. After this, you train your ML model in Amazon Fraud Detector in just a few steps. After you train the model, you can add it to a detector.

To begin performing real-time predictions, you integrate Amazon Fraud Detector’s low-latency prediction API into your application and begin sending new mobile events to generate fraud predictions. Each fraud prediction considers the GD device intelligence data for the device associated with the event as well as additional data and intelligence automatically added by Amazon Fraud Detector, including signals from fraud patterns experienced across Amazon.

Solution overview

Device intelligence is a critical type of input for risk decisions. One of the common challenges faced in fraud detection in the mobile space is the lack of enriched data availability to make risk decisions. On the other hand, mobile devices are typically the most expensive asset the fraudsters and fraud syndicates possess and, therefore, a significant level of effort is put into masking the true identity and profile of the device being used. Understanding the risk profile of the mobile device (which sometimes isn’t even a real device) and being able to drive insights from the relationship between different mobile devices can significantly improve risk decisions for any mobile business, and becomes central to any mobile-based fraud management strategy.

For generating real-time fraud predictions, the GDxAFD solution uses Amazon Fraud Detector and GrabDefence’s device intelligence SDK, along with Amazon API Gateway and AWS Lambda. You can provision the AWS portions of the solution using AWS CloudFormation.

The following diagram illustrates our solution architecture.

The workflow consists of the following steps:

  1. When an end-user interacts with your mobile app, GD’s mobile SDK passively gathers device data and streams this data to GD’s device intelligence service, where a risk profile for the device is generated.
  2. Then, when that user transacts using the mobile app and you want to assess fraud risk in real time, the mobile app sends the transaction data gathered by the app via API Gateway to a Lambda function.
  3. The Lambda function gathers the GrabDefence risk profile for the device used during the transaction, combines that profile data with the other transaction data, and sends it to the fraud detector.
  4. The fraud detector performs a fraud prediction using your custom fraud detection ML model and ruleset, and returns a risk score and outcome to the Lambda function. This result is sent back to your mobile app via API Gateway.
  5. If desired, the mobile app can then choose to adjust the end-user experience accordingly based on this risk assessment.

Use cases for device intelligence with Amazon Fraud Detector

The ideal end-state solution is an Amazon Fraud Detector model that is trained on a dataset of your historical events and their associated historical GD device intelligence data. To achieve this, you need to integrate the GD Guardian SDK for mobile devices and then gather device intelligence data for your events until you have enough to train a model (for example,10,000 events with at least 400 examples of fraud events). Depending on your use case and availability of fraud labels, you have a couple of ways to get started sooner as you gather data for this solution:

  • Use case A: Use GD device intelligence data directly in the fraud detector rules – With this use case, you create a detector in Amazon Fraud Detector with a ruleset designed to flag high-risk events provided by the device intelligence. This works effectively when you have clear risk mitigation policies that you want to deploy for your platform. (for example, act on the user if the device is jailbroken, or don’t allow redemption of a promo if the device has more than five accounts) In such cases, you can set up your detector rules to flag events based on a combination of GD device risk score and GD device verdicts. This option requires no historical event data or labels to get started, so it can be ready to use sooner than the ML-based detection options.
  • Use case B: Use GD device intelligence and an Amazon Fraud Detector ML model with the fraud detector rules – If you have a historical event dataset and are able to train an Amazon Fraud Detector ML model immediately, you can build on use case A by adding an Amazon Fraud Detector model to your rules-based detector. This way, your detector logic is evaluating device intelligence with rules and all other event data with a customized ML model. This allows you to solve for more complex fraud tactics where statistical methods are required to separate fraud from non-fraud.

Best results are often achieved when both of these scenarios work in tandem, because they can serve different use cases over time even after you have more historical data. With these methods, Amazon Fraud Detector makes it easy to transition to the ideal solution in a few steps.

In the following sections, we walk through the steps to get started using Amazon Fraud Detector with GD device intelligence data.

Integrate the GD mobile SDK and start collecting device intelligence data

Prior to using GrabDefence device intelligence within your application, you must first register as a GrabDefence client. You receive the following credentials from the GrabDefence team:

  • tenant_id – A unique client identifier that represents your organization
  • app_id – A unique application identifier that represents the application you’re integrating

Refer to the GrabDefence documentation for further guidance on how to integrate this SDK.

Create your event type in Amazon Fraud Detector

An event type defines the schema for the event you want to assess for fraud. When creating an event type in Amazon Fraud Detector, you define all the data elements you will have available at the time of the fraud evaluation, including the GD device intelligence risk profile data elements such as the unique device ID and various device verdicts, to Amazon Fraud Detector variables. You need to include event variables (such as IP, email, or billing address) that are unique to the type of event you’re evaluating for fraud, as well as GD device intelligence data. The following table shows examples of event variables, the GD device intelligence data, and the recommended Amazon Fraud Detector variable type to map each element to.

Event Variable Type Event Variable (Not Exhaustive) Amazon Fraud Detector Event Variable Example
Event Metadata EVENT_TIMESTAMP EVENT_TIMESTAMP 2019-11-30T13:01:01Z
EVENT_ID EVENT_ID test0299df10-e2db-11eb-96e2-f7dgje3d3k03
ENTITY_ID ENTITY_ID 123
EVENT_LABEL EVENT_LABEL FRAUD or LEGIT
LABEL_TIMESTAMP LABEL_TIMESTAMP 2019-11-30T13:01:01Z
Event Variables Email EMAIL_ADDRESS test@example.com
IP IP_ADDRESS 192.0.2.1
Phone PHONE_NUMBER 555-0123
GD Device Intelligence Verdicts Verdict: IOS Jailbroken Device CUSTOM: CATEGORICAL GV_IOS_JAIL_BROKEN
Verdict: Debugger Detected CUSTOM: CATEGORICAL GV_DEBUGGER_DETECTED
Verdict: Event Token Signature Mismatch CUSTOM: CATEGORICAL GV_EVENT_TOKEN_SIGNATURE_MISMATCH
Verdict: Server Challenge Mismatch CUSTOM: CATEGORICAL GV_SERVER_CHALLENGE_MISMATCH
GD Risk Scores User account risk score CUSTOM: NUMERICAL 0.9 etc

Build your detection logic in Amazon Fraud Detector

At this point, you need to decide whether you want to start with use case A or use case B. For use case A, you start building a rules-based detector. For use case B, you build an Amazon Fraud Detector model first and, once finished, add the model to your detector.

For instructions on building an Amazon Fraud Detector model and detector, refer to the Amazon Fraud Detector user guide.

The following screenshot shows sample detector rules on the Amazon Fraud Detector console.

Test your detector using Amazon Fraud Detector batch predictions

You can use a batch predictions job to test your detector against a set of events using either the Amazon Fraud Detector console or the CreateBatchPredictionJob API. You need to specify the detector version (created in the previous step) and provide the events via a CSV file (up to 50 MB large) stored in an Amazon Simple Storage Service (Amazon S3) bucket. The output file containing the original input data along with appended results of the detector’s predictions will be available in the same S3 bucket (unless you specify a different location).

For more information on running an Amazon Fraud Detector batch prediction, refer to Amazon Fraud Detector batch predictions documentation page.

Set up the supporting infrastructure

To perform real-time predictions using the detector you built, you must set up a Lambda function that performs the following actions:

  1. Receives transaction data (via API Gateway) gathered from your mobile app. This includes data such as IP address, email address, shipping and billing info, and so on, that is unique to the transaction and use case.
  2. Collects the risk profile from the GD API. This includes device intelligence data and risk signals from GD. You need to convert the GD verdicts to the appropriate Amazon Fraud Detector variable CUSTOM: CATEGORICAL types. For example, if the GD verdict list contains GV_IOS_JAIL_BROKEN, you need to set the Verdict: IOS Jailbroken Device variable to TRUE when sending to Amazon Fraud Detector (as detailed in the next section).
  3. Sends the data to the detector using the GetEventPrediction API (see the next section).

Perform real-time predictions using the Amazon Fraud Detector GetEventPrediction API

Your Lambda function can call the Amazon Fraud Detector GetEventPrediction API to perform real-time predictions and obtain results synchronously. The GetEventPrediction API returns matched outcomes based on the rules you set up earlier. If you attached a model to your detector in Amazon Fraud Detector, the model score is also returned as part of the GetEventPrediction API response. You can find examples of GetEventPrediction requests on the aws-fraud-detector-samples GitHub repository.

You can configure your Lambda function accordingly to parse the response from this API, and return the appropriate action to the mobile application (via API Gateway).

Build and train your model

After you integrate the GD SDK and are generating predictions with Amazon Fraud Detector, your events are stored in Amazon Fraud Detector and you can use the UpdateEventLabel API to add fraud labels for confirmed fraud events. When your stored dataset has 10,000 events with device data and at least 400 labelled as fraud, you can start building a custom Amazon Fraud Detector model that learns from GD’s device intelligence data.

At this point, you’re ready to train the model. This takes a few steps on the Amazon Fraud Detector console, and model training typically takes around an hour but can be longer depending on the size of your training dataset.

  1. On the Amazon Fraud Detector console, choose Create model.
  2. Choose Transaction Fraud Insights as the model type.
  3. Choose the event type you created earlier.
  4. Choose the date range for your training dataset that encompasses the period where you’ve collected GD device intelligence data.
  5. Add all the event type variables, including the GD device-specific elements, to your model’s input configuration.
  6. Strat training the model.

After your model is trained, you can review performance metrics and then deploy it by changing its status to Active. To learn more about model scores and performance metrics, see Model scores and Training performance metrics. At this point, you can now add your model to your detector, add threshold rules to interpret the risk scores that the model outputs, and continue making predictions using the GetEventPrediction API.

Automate the solution

You can use AWS CloudFormation to automate the creation of your Amazon Fraud Detector event type and related resources. For more details, refer to managing resources using AWS CloudFormation.

Conclusion

Congrats! You have successfully built an Amazon Fraud Detector model that integrates GD device intelligence into your detector. The Amazon Fraud Detector ML model you trained has learned from multiple data sources, including your own historical data, GD’s device intelligence data, fraud patterns seen across Amazon, and additional third-party data (added automatically by Amazon Fraud Detector). You can deploy this solution on your mobile apps to detect and capture various types of mobile fraud.

Special thanks to everyone who contributed to this blog including, Abhishek Ravi, Tanay Bhargava, Eric Burris, Puneet Gambhir (GrabDefence), Brian Kim (GrabDefence), and Sing Kwan Ng (GrabDefence).


About the author

Marcel Pividal is a Sr. AI Services Solutions Architect in the World-Wide Specialist Organization. Marcel has more than 20 years of experience solving business problems through technology for Fintechs, Payment Providers, Pharma, and government agencies. His current areas of focus are Risk Management, Fraud Prevention, and Identity Verification.

Adriaan de Jonge is Partner Solutions Architect at AWS in Singapore. He is part of the AWS GSI team in the ASEAN geography. Adriaan is particularly interested in serverless, cloud-native development, and DevOps. In his spare time, he likes to bake cakes that are suitable for people with allergies.

Jianbo Liu is a Research Scientist with Amazon Fraud Detector.

Read More

How Synamedia uses Amazon Rekognition Video to build advanced video search capabilities for long-form video

Synamedia is a leading video technology provider addressing the needs for premium video service providers and direct-to-consumer (D2C) with a comprehensive solution portfolio. Synamedia solutions spread across several pillars such as video networks, TV platforms, advertisement and monetization, and content protection and piracy disruption.

Synamedia partnered with AWS to use artificial intelligence (AI) to develop enhanced video search capabilities for long-form video. This is to assist their customers in searching for videos based on a description of scenes that aren’t described in the metadata of the assets. For example, searching for a video (even within a series) that contained a scene on a boat that isn’t significant enough to be mentioned in the metadata. This enables content discovery driven from real-world objects.

With Amazon Rekognition Video, Synamedia built an AI solution that was able to perform label detection in videos and in images using standard and custom models. This enabled scene-level detection of specific objects in long-form video, based on what is actually in the scene at the time. This new capability allows users to find specific occurrences within the long-form video, based only on a general description of what they’re looking for. This enables Synamedia to perform extremely fast when onboarding new content, which now takes a few hours to spin up and get results. The solution is simple to use and extensive by providing the ability to add further custom models for domain-specific images.

“Amazon Rekognition Video is a powerful service that is simple to use. It gave us ready-made access to best-in-class computer vision capabilities, which we could use to build and test innovative video search features in a matter of weeks.”

– Avi Fruchter, Software Engineering Fellow at Synamedia.

Using AI to index visual content

As both supply of video content and demand for greater video insights continue to grow, effective video search capabilities are becoming more important. Traditional video search, however, is typically limited to basic information such as the video title, or in some instances, to metadata attached as tags that describe the key themes or content of the video.

Most descriptive information needs to be added manually, but this becomes prohibitive as the quantity of video grows. As a result, traditional video search performance is often limited. This limitation is even more pronounced for long-form video content, for which scene-level metadata usually doesn’t exist, given how expensive and time-consuming it is to produce.

To address this limitation, Synamedia set out to develop an AI-powered video search solution using computer vision to automatically identify scene-level details in any given video, and make that information discoverable to users based on general descriptions of those scenes.

Using Amazon Rekognition to build a custom computer vision solution in just 2 weeks

To accomplish this goal, Synamedia’s Software Engineering Fellow, Avi Fruchter, turned to Amazon Rekognition, a fully managed video analysis service that helps accelerate the process of using computer vision models to detect relevant scene-level occurrences such as objects, activities, and even text and scenes.

Amazon Rekognition Video accelerates the development of computer vision solutions for video by automatically processing and tagging video content using computer vision models. These models are fully managed and maintained by Amazon Rekognition. It removes the undifferentiated heavy lifting of managing the necessary infrastructure, and also reduces the technical expertise required to build and deploy these models.

To get started, you simply choose which of Amazon Rekognition’s wide range of capabilities is relevant to your task, and call the relevant API. The results are then returned as an easy-to-manage JSON response for each job.

For example, Synamedia used the StartLabelDetection API to automatically generate a list of labels for objects detected in each video frame of their video library. From this simple API call, Amazon Rekognition returned the list of labels, the confidence score of each, and the relevant timestamps for each frame. This enabled Synamedia to immediately create an entirely new set of search metadata for each video in their test library. Users are then able to search for specific video content just by describing specific objects or scenery they’re interested in, and get results that not only match their query, but that also point them to the specific scene in the video that featured that content.

Other relevant Amazon Rekognition APIs for video analysis are StartFaceDetection, StartPersonTracking, and StartSegmentDetection—a feature that can identify the moment that scenes in a video change.

Amazon Rekognition works on both pre-recorded and live video. Pre-recorded video is read from Amazon Simple Storage Service (Amazon S3), and live video can be processed from Amazon Kinesis Video Streams.

Synamedia chose Amazon Rekogntion for its ability to rapidly expand their capabilities. Synamedia’s innovation team is dedicated solely to building new technical innovations in video and has strong technical expertise. However, even for them it’s not always possible to have deep domain expertise in all areas of video technology. Enter Amazon Rekogntion, which extended their capabilities in computer vision, enabling them to conceptualize a use case and quickly test its viability.

“It was extremely fast to onboard, and the results were extremely quick,” Avi Fruchter says. “We are not always domain experts in all areas of ML, and Amazon Rekognition gives us the ability to leverage our existing expertise into new types of enhanced use cases for our customers.”

Synamedia anticipates their solution will have broad benefits for a wide range of customers, including companies with large video libraries as well as the growing number of companies who need to monitor specific events in live video feeds, such as health and safety risks.

Summary

With Amazon Rekognition Video, Synamedia was able to build and test an advanced video search capability in a matter of weeks, without needing to hire or develop additional specialized computer vision expertise.

This new capability has enabled Synamedia to expand the impact of its innovation team and continue with its mission to drive new video innovation for its customers.

Learn more about how you can quickly build advanced computer vision solutions for video by visiting Amazon Rekognition Video or referring to Amazon Rekognition resources.


About the authors

Daniel Burke is the European lead for AI and ML in the Private Equity group at AWS. Daniel works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption to improve innovation and increase enterprise value.

John Shaw is the North American lead for AI and ML in the Private Equity group at AWS. John works directly with Private Equity funds and their portfolio companies, helping them accelerate their AI and ML adoption to improve innovation and increase enterprise value.

Read More

Increase ML model performance and reduce training time using Amazon SageMaker built-in algorithms with pre-trained models

Model training forms the core of any machine learning (ML) project, and having a trained ML model is essential to adding intelligence to a modern application. A performant model is the output of a rigorous and diligent data science methodology. Not implementing a proper model training process can lead to high infrastructure and personnel costs because it underlines the experimental phase of the ML process and by nature tends to be highly iterative.

Generally speaking, training a model from scratch is time-consuming and compute intensive. When the training data is small, we can’t expect to train a very performant model. A better alternative is to fine-tune a pretrained model on the target dataset. For certain use cases, Amazon SageMaker provides high-quality pretrained models that were trained on very large datasets. Fine-tuning these models takes a fraction of the training time compared to training a model from scratch.

To validate this assertion, we ran a study using built-in algorithms with pretrained models. We also compared two types of pretrained models within Amazon SageMaker Studio, Type 1 (legacy) and Type 2 (latest), against a model trained from scratch using Defect Detection Network (DDN) with regards to training time and infrastructure cost. To demonstrate the training process, we used the default detection dataset from the post Visual inspection automation using Amazon SageMaker JumpStart. This post showcases the results of the study. We also provide a Studio notebook, which you can modify to run the experiments using your own dataset and an algorithm or model of your choosing.

Model training in Studio

SageMaker is a fully managed ML service. With SageMaker, data scientists and developers can quickly and easily build and train ML models, and then directly deploy them into a production-ready hosted environment.

There are many ways with which you can train ML models using SageMaker, such as using Amazon SageMaker Debugger, Spark MLLib, or using custom Python code with TensorFlow, PyTorch, or Apache MXNet. You can also bring your own custom algorithm or choose an algorithm from AWS Marketplace.

Furthermore, SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and ML practitioners get started on training and deploying ML models quickly.

You can use built-in algorithms for either classification or regression problems, or for a variety of unsupervised learning tasks. Other built-in algorithms include text analysis and image processing. You can train a model from scratch using a built-in algorithm for a specific use case. For a full list of available built-in algorithms, see Common Information About Built-in Algorithms.

Some built-in algorithms also include pre-trained models for popular problem types that use the SageMaker SDK as well as Studio. These pre-trained models can greatly reduce the training time as well as infrastructure cost for common use cases such as semantic segmentation, object detection, text summarization, and question answering. For a complete list of pre-trained models, see Models.

For choosing the best model, SageMaker automatic model tuning, also known as hyperparameter tuning or hyperparameter optimization (HPO), can be very useful because it finds the best version of a model by running a slew of training jobs on your dataset using the algorithm and hyperparameters that you specify. Depending on the number of hyperparameters and the size of the search space, finding the best model can require thousands or even tens of thousands of training runs. Automatic model tuning provides a built-in HPO algorithm that removes the undifferentiated heavy lifting required to build your own HPO algorithm. Automatic model tuning provides the option of parallelizing model runs in order to reduce the time and cost of finding the best fit.

After the automatic model tuning has completed multiple runs for a set of hyperparameters, it chooses the hyperparameter values that result in the model with the best performance, as measured by the loss function specific to the model.

Training and validation loss is just one of the metrics needed to pick the best model for the use case. With so many options, it’s not always easy to make the right choice, and picking the best model boils down to the training time, cost of infrastructure, complexity, and quality of the resulting model, among other factors. There are other extraneous costs such as platform and personnel costs that we don’t take into account for this study.

In the subsequent sections, we discuss the study design and the results.

Dataset

We use the NEU-CLS dataset and a detector on the NEU-DET dataset. This dataset contains 1,800 images and 4,189 bounding boxes in total. The type of defects in our dataset are as follows:

  • Crazing (class: Cr, label: 0)
  • Inclusion (class: In, label: 1)
  • Pitted surface (class: PS, label: 2)
  • Patches (class: Pa, label: 3)
  • Rolled-in scale (class: RS, label: 4)
  • Scratches (class: Sc, label: 5)

For more details about the dataset, refer to Visual inspection automation using Amazon SageMaker JumpStart.

Models

We introduced the Defect Detection Network in the post Visual inspection automation using Amazon SageMaker JumpStart. We trained this model from scratch with the default hyperparameters, so we could have a benchmark to evaluate the rest of the models.

For object detection use cases, SageMaker provides the following set of built-in object models:

Aside from training a model from scratch, we used these models to evaluate four approaches that typically reflect an ML model training process. The output of each approach is a trained ML model. In cases 1 and 3, a set of fixed hyperparameters are provided to train a single model, whereas in cases 2 and 4, SageMaker produces the best model and the set of hyperparameters that led to the best fit.

  1. Type 1 (legacy) model – We use the model with a ResNet backbone, which is pre-trained on ImageNet with default hyperparameters and no optimizer.
  2. Fine-tune Type 1 (legacy) with HPO – Now we run HPO to find better hyperparameters that lead to a better model. For a list of all parameters you can fine-tune, refer to Tune an Object Detection Model. In this notebook, we only fine-tune learning rate, momentum, and weight decay. We use automatic model tuning to run HPO. We need to provide hyperparameter ranges for learning rate, momentum, and weight decay. Automatic model tuning will monitor the log and parse the objective metrics. For object detection, we use Mean Average Precision (mAP) on the validation dataset as our metric.
  3. Fine-tune Type 2 (latest) model – For the Type 2 (latest) object detection model, we follow the instructions in Fine-tune a Model and Deploy to a SageMaker Endpoint and use standard SageMaker APIs. You can find all fine-tunable Type 2 (latest) object detection models in the Built-in Algorithms with pre-trained Model table and set FineTunable?=True. Currently, there are nine fine-tunable object detection models. We use the one with the VGG backend and pretrained on VOC dataset. We fine-tune using a set of static hyperparameters.
  4. Fine-tune Type 2 (latest) model with HPO – We provide a range for the ADAM learning rate; the rest of the hyperparameters stay default. Also, note that the Type 2 (latest) model training reports Val_CrossEntropy loss and Val_SmoothL1 loss instead of mAP on the validation dataset. Because we can only specify one evaluation metric for automatic model tuning, we choose to minimize Val_CrossEntropy.

For details on the hyperparameters, you can go through the Studio notebook.

Metrics

Next, we compare the results from the approaches based on important metrics and the infrastructure cost:

  • Loss function difference across models – All the different algorithms define the same loss function for object detection task: cross-entropy and smooth L1 loss. However, we use them differently:

    • The Type 1 (legacy) object detection algorithm has defined mAP on the validation data, and we use it as the metric to find a training job that maximizes mAP.
    • The Type 2 (latest) object detection algorithm, however, doesn’t define mAP. Instead, it defines Val_SmoothL1 loss and Val_CrossEntropy loss on the validation data. During model training with HPO, we need to specify one metric for automatic model tuning to monitor and parse. Therefore, we use Val_CrossEntropy loss as the metric and find the training job that minimizes it.
  • Validation metric (mAP) – We use the mAP on the validation dataset as our metric, where average precision is the average of precision and recall. mAP is the standard evaluation metric used in the COCO challenge for object detection tasks. For more information about the applicability of mAP for object detection, refer to mAP (mean Average Precision) for Object Detection. Because there is a difference in loss function between Type 1 and Type 2 models, we manually calculate the mAP for each type of model on the test dataset. We accomplish this by deploying the models behind a SageMaker endpoint and calling the model endpoint to score on the subset of the dataset. The results are then compared against the ground truth to calculate the mAP for each model type.
  • Training Instances Runtime cost – For simplicity, we only report the infrastructure cost incurred for each of the four approaches highlighted in the previous section. The cost is reported in dollars and calculated based on the runtime of the underlying Amazon Elastic Compute Cloud (Amazon EC2) instances.

Notebook

The Studio notebook is available on GitHub.

Results

The steel surface dataset has a total of 1,800 images in six categories. As discussed in the previous section, because there is a difference in the loss function that Type 1 (legacy) and Type 2 (latest) models maximize to find the best model, we first perform a train/test split on the dataset. In the final phase of the study, we run inference on the test dataset, so that we can compare across the four approaches using the same metric (mAP).

The test set contains 20% of the original dataset, which we randomly allocate from the full dataset. The remaining 80% is used for the model training phase, which requires us to define the training as well as the validation dataset. Therefore, for the training phase, we do a further 80/20 split on the data, where 80% of the training data is used for training and 20% for validation. See the following table.

Data Number of Samples Percentage of Original Dataset
Full 1,800 100
Train 1,152 64
Validation 288 16
Test 360 20

The output of each of the four approaches was a trained ML model. We plot the results from each of the four approaches alongside the bounding boxes from ground truth as well as the DDN model. The following plot also shows the confidence score for the class prediction.

A confidence score is provided as an evaluation standard. This confidence score shows the probability of the object of interest being detected correctly by the algorithm and is given as a percentage. The scores are taken on the mAP at different IoU (Intersection over Union) thresholds.

For the purpose of generating the mAP score against the test dataset, we deployed each model behind its own SageMaker real-time endpoint. Each inferencing test produced a mAP score.

A larger mAP score implies a higher accuracy of the model test results. Clearly, the Type 2 (latest) models outperforms the Type 1 (legacy) models in regards to accuracy, with or without using HPO. Type 2 with HPO has a slighter edge (mAP 0.375) over one without HPO (mAP 0.371).

We also measured the cost of training for each of the four approaches. We used the P3 instance types, specifically the ml.p3.2xlarge instances for each of the approaches. Each ml.p3.2xlarge instance costs $3.06/hour. Both the inference test mAP score and the cost of training are summarized in the following chart for comparison.

For simplicity, we did a cost comparison on the runtime of the training instances only.

For a more granular estimate of the total cost incurred, including the cost of Studio notebooks as well as the real-time endpoints used for inferencing, refer to the AWS Pricing Calculator for SageMaker.

The results indicate considerable gains in accuracy when moving from the Type 1 (legacy) to Type 2 (latest) model. The mAP score went up from 0.067 to 0.371 without using HPO and 0.226 to 0.375 with HPO respectively. The Type 2 model also took longer to train with the same instance type, implying that the accuracy gains also meant higher infrastructure cost. However, all mentioned approaches outperformed the DDN model (introduced in Visual inspection automation using Amazon SageMaker JumpStart) on all metrics. Training the Type 1 (legacy) model took 34 minutes, the Type 2 (latest) model took 1 hour, and the DDN model took over 8 hours. This indicates that fine-tuning a pre-trained model is much more efficient than training a model from scratch.

We also found that HPO (SageMaker automatic model tuning) is extremely effective, especially for models with large hyperparameter search spaces with 4x improvement in mAP score for Type 1 (legacy) model. We noted that we yielded much better model accuracy results when fine-tuning on three hyperparameters (learning rate, momentum, and weight decay) for the Type 1 (legacy) models as opposed to only one hyperparameter (ADAM learning rate) for the Type 2 (latest) model. This is because there is a relatively larger search space and therefore more room for improvement for the Type 1 (legacy) model. However, we need to trade off model performance with infrastructure cost and training time when running HPO.

Conclusion

In this post, we walked through the many ML model training options available with SageMaker and focused specifically on SageMaker built-in algorithms and pre-trained models. We introduced Type 1 (legacy) and Type 2 (latest) models. The built-in Sagemaker object detection models discussed in this post were pre-trained on large-scale datasets—the ImageNet dataset includes 14,197,122 images for 21,841 categories, and the PASCAL VOC dataset includes 11,530 images for 20 categories. The pre-trained models have learned rich and diverse low-level features, and can efficiently transfer knowledge to fine-tuned models and focus on learning high-level semantic features for the target dataset. You can find all built-in algorithms and fine-tunable pre-trained models at Built-in Algorithms with pre-trained Model Table and choose one for your use case. The use cases span from text summarization and question answering to computer vision and regression or classification.

In the beginning, we made an assertion that fine-tuning a SageMaker pre-trained model will take a fraction of training time that training a model from scratch. We trained a DNN model from scratch and introduced two types of SageMaker Built-in algorithms with pretrained models: Type (legacy) and Type 2 (latest). We further showcased four approaches, two of which used SageMaker automated model tuning, and finally arrived at the most performant model. When considering both training time as well as runtime cost, all SageMaker built-in algorithms outperformed the DDN model, thereby validating our assertion.

Although both Type 1 (legacy) and Type 2 (latest) outperformed training the DDN model from scratch, visual and numerical comparison confirmed that the Type 2 (latest) model and Type 2 (latest) model with HPO outperforms Type 1 (legacy) models. HPO had a big impact on accuracy for Type 1 models; however, it saw modest gains using HPO for Type 2 models, due to a constricted hyperparameter space.

In summary, for certain use cases, fine-tuning a pretrained model is both more efficient and more performant. We suggest taking advantage of the pre-trained Sagemaker built-in pretrained models and fine-tune on your target datasets. To get started, you need a Studio environment. For more information, refer to the Studio Development Guide and make sure to enable SageMaker projects and JumpStart. When your Studio setup is complete, navigate to the Studio Launcher to find the full list of JumpStart solutions and models. To recreate or modify the experiment in this post, choose the “Product Defect Detection” solution, which comes prepackaged with the notebook used to experiment, as shown in the following video. After you launch the solution, you can access the mentioned work in the notebook titled visual_object_detection.ipynb.


About the authors

Vedant Jain is a Sr. AI/ML Specialist Solutions Architect, helping customers derive value out of the Machine Learning ecosystem at AWS. Prior to joining AWS, Vedant has held ML/Data Science Specialty positions at various companies such as Databricks, Hortonworks (now Cloudera) & JP Morgan Chase. Outside of his work, Vedant is passionate about making music, using Science to lead a meaningful life & exploring delicious vegetarian cuisine from around the world.

Tao Sun is an Applied Scientist in Amazon Search. He obtained his Ph.D. in Computer Science from University of Massachusetts, Amherst. His research interests lie in deep reinforcement learning and probabilistic modeling. In the past, Tao worked for AWS Sagemaker Reinforcement Learning team and contributed to RL research and applications. Tao is now working on Page Template Optimization at Amazon Search.

Read More