Run image segmentation with Amazon SageMaker JumpStart

In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it easier to develop high-quality models and reducing time to deployment.

This post is the third in a series on using JumpStart for specific ML tasks. In the first post, we showed how you can run image classification use cases on JumpStart. In the second post, we showed how you can run text classification use cases on JumpStart. In this post, we provide a step-by-step walkthrough on how to fine-tune and deploy an image segmentation model, using trained models from MXNet. We explore two ways of obtaining the same result: via JumpStart’s graphical interface on Amazon SageMaker Studio, and programmatically through JumpStart APIs.

If you want to jump straight into the JumpStart API code we explain in this post, you can refer to the following sample Jupyter notebooks:

JumpStart overview

JumpStart helps you get started with ML models for a variety of tasks without writing a single line of code. At the time of writing, JumpStart enables you to do the following:

  • Deploy pre-trained models for common ML tasks – JumpStart enables you to address common ML tasks with no development effort by providing easy deployment of models pre-trained on large, publicly available datasets. The ML research community has put a large amount of effort into making a majority of recently developed models publicly available for use. JumpStart hosts a collection of over 300 models, spanning the 15 most popular ML tasks such as object detection, text classification, and text generation, making it easy for beginners to use them. These models are drawn from popular model hubs such as TensorFlow, PyTorch, Hugging Face, and MXNet.
  • Fine-tune pre-trained models – JumpStart allows you to fine-tune pre-trained models with no need to write your own training algorithm. In ML, the ability to transfer the knowledge learned in one domain to another domain is called transfer learning. You can use transfer learning to produce accurate models on your smaller datasets, with much lower training costs than the ones involved in training the original model. JumpStart also includes popular training algorithms based on LightGBM, CatBoost, XGBoost, and Scikit-learn, which you can train from scratch for tabular regression and classification.
  • Use pre-built solutions – JumpStart provides a set of 17 solutions for common ML use cases, such as demand forecasting and industrial and financial applications, which you can deploy with just a few clicks. Solutions are end-to-end ML applications that string together various AWS services to solve a particular business use case. They use AWS CloudFormation templates and reference architectures for quick deployment, which means they’re fully customizable.
  • Refer to notebook examples for SageMaker algorithms – SageMaker provides a suite of built-in algorithms to help data scientists and ML practitioners get started with training and deploying ML models quickly. JumpStart provides sample notebooks that you can use to quickly use these algorithms.
  • Review training videos and blogs – JumpStart also provides numerous blog posts and videos that teach you how to use different functionalities within SageMaker.

JumpStart accepts custom VPC settings and AWS Key Management Service (AWS KMS) encryption keys, so you can use the available models and solutions securely within your enterprise environment. You can pass your security settings to JumpStart within Studio or through the SageMaker Python SDK.

Semantic segmentation

Semantic segmentation delineates each class of objects appearing in an input image. It tags (classifies) each pixel of the input image with a class label from a predefined set of classes. Multiple objects of the same class are mapped to the same mask.

The model available for fine-tuning builds a fully convolutional network (FCN) “head” on top of the base network. The fine-tuning step fine-tunes the FCNHead while keeping the parameters of the rest of the model frozen, and returns the fine-tuned model. The objective is to minimize per-pixel softmax cross entropy loss to train the FCN. The model returned by fine-tuning can be further deployed for inference.

The input directory should look like the following code if the training data contains two images. The names of the .png files can be anything.

input_directory
    |--images
        |--abc.png
        |--def.png
    |--masks
        |--abc.png
        |--def.png
    class_label_to_prediction_index.json

The mask files should have class label information for each pixel.

Instance segmentation

Instance segmentation detects and delineates each distinct object of interest appearing in an image. It tags every pixel with an instance label. Whereas semantic segmentation assigns the same tag to pixels of multiple objects of the same class, instance segmentation further labels pixels corresponding to each occurrence of an object on the image with a separate tag.

Currently, JumpStart offers inference-only models for instance segmentation and doesn’t support fine-tuning.

The following images illustrate the difference between the inference in semantic segmentation and instance segmentation. The original image has two people in the image. Semantic segmentation treats multiple people in the image as one entity: Person. However, instance segmentation identifies individual people within the Person category.

Solution overview

The following sections provide a step-by-step demo to perform semantic segmentation with JumpStart, both via the Studio UI and via JumpStart APIs.

We walk through the following steps:

  1. Access JumpStart through the Studio UI:
    1. Run inference on the pre-trained model.
    2. Fine-tune the pre-trained model.
  2. Use JumpStart programmatically with the SageMaker Python SDK:
    1. Run inference on the pre-trained model.
    2. Fine-tune the pre-trained model.

We also discuss additional advanced features of JumpStart.

Access JumpStart through the Studio UI

In this section, we demonstrate how to train and deploy JumpStart models through the Studio UI.

Run inference on the pre-trained model

The following video shows you how to find a pre-trained semantic segmentation model on JumpStart and deploy it. The model page contains valuable information about the model, how to use it, expected data format, and some fine-tuning details. You can deploy any of the pre-trained models available in JumpStart. For inference, we pick the ml.g4dn.xlarge instance type. It provides the GPU acceleration needed for low inference latency, but at a lower price point. After you configure the SageMaker hosting instance, choose Deploy. It may take 5–10 minutes until your persistent endpoint is up and running.

After a few minutes, your endpoint is operational and ready to respond to inference requests.

Similarly, you can deploy a pre-trained instance segmentation model by following the same steps in the preceding video while searching for instance segmentation instead of semantic segmentation in the JumpStart search bar.

Fine-tune the pre-trained model

The following video shows how to find and fine-tune a semantic segmentation model in JumpStart. In the video, we fine-tune the model using the PennFudanPed dataset, provided by default in JumpStart, which you can download under the Apache 2.0 License.

Fine-tuning on your own dataset involves taking the correct formatting of data (as explained on the model page), uploading it to Amazon Simple Storage Service (Amazon S3), and specifying its location in the data source configuration. We use the same hyperparameter values set by default (number of epochs, learning rate, and batch size). We also use a GPU-backed ml.p3.2xlarge as our SageMaker training instance.

You can monitor your training job running directly on the Studio console, and are notified upon its completion. After training is complete, you can deploy the fine-tuned model from the same page that holds the training job details. The deployment workflow is the same as deploying a pre-trained model.

Use JumpStart programmatically with the SageMaker SDK

In the preceding sections, we showed how you can use the JumpStart UI to deploy a pre-trained model and fine-tune it interactively, in a matter of a few clicks. However, you can also use JumpStart’s models and easy fine-tuning programmatically by using APIs that are integrated into the SageMaker SDK. We now go over a quick example of how you can replicate the preceding process. All the steps in this demo are available in the accompanying notebooks Introduction to JumpStart – Instance Segmentation and Introduction to JumpStart – Semantic Segmentation.

Run inference on the pre-trained model

In this section, we choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and run inference on the deployed endpoint.

SageMaker is a platform based on Docker containers. JumpStart uses the available framework-specific SageMaker Deep Learning Containers (DLCs). We fetch any additional packages, as well as scripts to handle training and inference for the selected task. Finally, the pre-trained model artifacts are separately fetched with model_uris, which provides flexibility to the platform. You can use any number of models pre-trained for the same task with a single training or inference script. See the following code:

model_id, model_version = "mxnet-semseg-fcn-resnet50-ade", "*"

# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)

# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")

base_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference")

For instance segmentation, we can set model_id to mxnet-semseg-fcn-resnet50-ade. The is in the identifier corresponds to instance segmentation.

Next, we feed the resources into a SageMaker model instance and deploy an endpoint:

# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class,
# for being able to run inference through the sagemaker API.
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

After a few minutes, our model is deployed and we can get predictions from it in real time!

The following code snippet gives you a glimpse of what semantic segmentation looks like. The predicted mask for each pixel is visualized. To get inferences from a deployed model, an input image needs to be supplied in binary format. The response of the endpoint is a predicted label for each pixel in the image. We use the query_endpoint and parse_response helper functions, which are defined in the accompanying notebook:

query_response = query(base_model_predictor, pedestrian_img)
predictions, labels, image_labels = parse_response(query_response)
print("Objects present in the picture:", image_labels)

Fine-tune the pre-trained model

To fine-tune a selected model, we need to get that model’s URI, as well as that of the training script and the container image used for training. Thankfully, these three inputs depend solely on the model name, version (for a list of the available models, see JumpStart Available Model Table), and the type of instance you want to train on. This is demonstrated in the following code snippet:

from sagemaker import image_uris, model_uris, script_uris

model_id, model_version = "mxnet-semseg-fcn-resnet50-ade", "*"
training_instance_type = "ml.p3.2xlarge"
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,)# Retrieve the training script

train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")# Retrieve the pre-trained model tarball to further fine-tune

train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

We retrieve the model_id corresponding to the same model we used previously. You can now fine-tune this JumpStart model on your own custom dataset using the SageMaker SDK. We use a dataset that is publicly hosted on Amazon S3, conveniently focused on semantic segmentation. The dataset should be structured for fine-tuning as explained in the previous section. See the following example code:

# URI of your training dataset
training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
training_data_prefix = "training-datasets/PennFudanPed_SemSeg/"
training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"
training_job_name = name_from_base(f"jumpstart-example-{model_id}-transfer-learning")# Create SageMaker Estimator instance
semseg_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,)# Launch a SageMaker Training job by passing s3 path of the training data
semseg_estimator.fit({"training": training_dataset_s3_path}, logs=True)

We obtain the same default hyperparameters for our selected model as the ones we saw in the previous section, using sagemaker.hyperparameters.retrieve_default(). We then instantiate a SageMaker estimator and call the .fit method to start fine-tuning our model, passing it the Amazon S3 URI for our training data. The entry_point script provided is named transfer_learning.py (the same for other tasks and models), and the input data channel passed to .fit must be named training.

While the algorithm trains, you can monitor its progress either in the SageMaker notebook where you’re running the code itself, or on Amazon CloudWatch. When training is complete, the fine-tuned model artifacts are uploaded to the Amazon S3 output location specified in the training configuration. You can now deploy the model in the same manner as the pre-trained model.

Advanced features

In addition to fine-tuning and deploying pre-trained models, JumpStart offers many advanced features.

The first is automatic model tuning. This allows you to automatically tune your ML models to find the hyperparameter values with the highest accuracy within the range provided through the SageMaker API.

The second is incremental training. This allows you to train a model you have already fine-tuned using an expanded dataset that contains an underlying pattern not accounted for in previous fine-tuning runs, which resulted in poor model performance. Incremental training saves both time and resources because you don’t need to retrain the model from scratch.

Conclusion

In this post, we showed how to fine-tune and deploy a pre-trained semantic segmentation model, and how to adapt it for instance segmentation using JumpStart. You can accomplish this without needing to write code. Try out the solution on your own and send us your comments.

To learn more about JumpStart and how you can use open-source pre-trained models for a variety of other ML tasks, check out the following AWS re:Invent 2020 video.


About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Santosh Kulkarni is an Enterprise Solutions Architect at Amazon Web Services who works with sports customers in Australia. He is passionate about building large-scale distributed applications to solve business problems using his knowledge in AI/ML, big data, and software development.

Leonardo Bachega is a senior scientist and manager in the Amazon SageMaker JumpStart team. He’s passionate about building AI services for computer vision.

Read More

Process mortgage documents with intelligent document processing using Amazon Textract and Amazon Comprehend

Organizations in the lending and mortgage industry process thousands of documents on a daily basis. From a new mortgage application to mortgage refinance, these business processes involve hundreds of documents per application. There is limited automation available today to process and extract information from all the documents, especially due to varying formats and layouts. Due to high volume of applications, capturing strategic insights and getting key information from the contents is a time-consuming, highly manual, error prone and expensive process. Legacy optical character recognition (OCR) tools are cost-prohibitive, error-prone, involve a lot of configuring, and are difficult to scale. Intelligent document processing (IDP) with AWS artificial intelligence (AI) services helps automate and accelerate the mortgage application processing with goals of faster and quality decisions, while reducing overall costs.

In this post, we demonstrate how you can utilize machine learning (ML) capabilities with Amazon Textract, and Amazon Comprehend to process documents in a new mortgage application, without the need for ML skills. We explore the various phases of IDP as shown in the following figure, and how they connect to the steps involved in a mortgage application process, such as application submission, underwriting, verification, and closing.

Image shows the phases of intelligent document processing (IDP).

Although each mortgage application may be unique, we took into account some of the most common documents that are included in a mortgage application, such as the Unified Residential Loan Application (URLA-1003) form, 1099 forms, and mortgage note.

Solution overview

Amazon Textract is an ML service that automatically extracts text, handwriting, and data from scanned documents using pre-trained ML models. Amazon Comprehend is a natural-language processing (NLP) service that uses ML to uncover valuable insights and connections in text and can perform document classification, name entity recognition (NER), topic modeling, and more.

The following figure shows the phases of IDP as it relates to the phases of a mortgage application process.

Image shows a high-level solution architecture for the phases of intelligent document processing (IDP) as it relates to the stages of a mortgage application.

At the start of the process, documents are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. This initiates a document classification process to categorize the documents into known categories. After the documents are categorized, the next step is to extract key information from them. We then perform enrichment for select documents, which can be things like personally identifiable information (PII) redaction, document tagging, metadata updates, and more. The next step involves validating the data extracted in previous phases to ensure completeness of a mortgage application. Validation can be done via business validation rules and cross document validation rules. The confidence scores of the extracted information can also be compared to a set threshold, and automatically routed to a human reviewer through Amazon Augmented AI (Amazon A2I) if the threshold isn’t met. In the final phase of the process, the extracted and validated data is sent to downstream systems for further storage, processing, or data analytics.

In the following sections, we discuss the phases of IDP as it relates to the phases of a mortgage application in detail. We walk through the phases of IDP and discuss the types of documents; how we store, classify, and extract information, and how we enrich the documents using machine learning.

Document storage

Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. We use Amazon S3 to securely store the mortgage documents during and after the mortgage application process. A mortgage application packet may contain several types of forms and documents, such as URLA-1003, 1099-INT/DIV/RR/MISC, W2, paystubs, bank statements, credit card statements, and more. These documents are submitted by the applicant in the mortgage application phase. Without manually looking through them, it might not be immediately clear which documents are included in the packet. This manual process can be time-consuming and expensive. In the next phase, we automate this process using Amazon Comprehend to classify the documents into their respective categories with high accuracy.

Document classification

Document classification is a method by means of which a large number of unidentified documents can be categorized and labeled. We perform this document classification using an Amazon Comprehend custom classifier. A custom classifier is an ML model that can be trained with a set of labeled documents to recognize the classes that are of interest to you. After the model is trained and deployed behind a hosted endpoint, we can utilize the classifier to determine the category (or class) a particular document belongs to. In this case, we train a custom classifier in multi-class mode, which can be done either with a CSV file or an augmented manifest file. For the purposes of this demonstration, we use a CSV file to train the classifier. Refer to our GitHub repository for the full code sample. The following is a high-level overview of the steps involved:

  1. Extract UTF-8 encoded plain text from image or PDF files using the Amazon Textract DetectDocumentText API.
  2. Prepare training data to train a custom classifier in CSV format.
  3. Train a custom classifier using the CSV file.
  4. Deploy the trained model with an endpoint for real-time document classification or use multi-class mode, which supports both real-time and asynchronous operations.

The following diagram illustrates this process.

Image shows Amazon Comprehend custom classifier training process and document classification using the trained and deployed classifier model (real time or batch).

You can automate document classification using the deployed endpoint to identify and categorize documents. This automation is useful to verify whether all the required documents are present in a mortgage packet. A missing document can be quickly identified, without manual intervention, and notified to the applicant much earlier in the process.

Document extraction

In this phase, we extract data from the document using Amazon Textract and Amazon Comprehend. For structured and semi-structured documents containing forms and tables, we use the Amazon Textract AnalyzeDocument API. For specialized documents such as ID documents, Amazon Textract provides the AnalyzeID API. Some documents may also contain dense text, and you may need to extract business-specific key terms from them, also known as entities. We use the custom entity recognition capability of Amazon Comprehend to train a custom entity recognizer, which can identify such entities from the dense text.

In the following sections, we walk through the sample documents that are present in a mortgage application packet, and discuss the methods used to extract information from them. For each of these examples, a code snippet and a short sample output is included.

Extract data from Unified Residential Loan Application URLA-1003

A Unified Residential Loan Application (URLA-1003) is an industry standard mortgage loan application form. It’s a fairly complex document that contains information about the mortgage applicant, type of property being purchased, amount being financed, and other details about the nature of the property purchase. The following is a sample URLA-1003, and our intention is to extract information from this structured document. Because this is a form, we use the AnalyzeDocument API with a feature type of FORM.

Image shows a sample of a Unified Residential Loan Application URLA-1003 form

The FORM feature type extracts form information from the document, which is then returned in key-value pair format. The following code snippet uses the amazon-textract-textractor Python library to extract form information with just a few lines of code. The convenience method call_textract() calls the AnalyzeDocument API internally, and the parameters passed to the method abstract some of the configurations that the API needs to run the extraction task. Document is a convenience method used to help parse the JSON response from the API. It provides a high-level abstraction and makes the API output iterable and easy to get information out of. For more information, refer to Textract Response Parser and Textractor.

from textractcaller.t_call import call_textract, Textract_Features
from trp import Document

response_urla_1003 = call_textract(input_document='s3://<your-bucket>/URLA-1003.pdf', 
                                   features=[Textract_Features.FORMS])
doc_urla_1003 = Document(response_urla_1003)
for page in doc_urla_1003.pages:
    forms=[]
    for field in page.form.fields:
        obj={}
        obj[f'{field.key}']=f'{field.value}'
        forms.append(obj)
print(json.dumps(forms, indent=4))

Note that the output contains values for check boxes or radio buttons that exist in the form. For example, in the sample URLA-1003 document, the Purchase option was selected. The corresponding output for the radio button is extracted as “Purchase” (key) and “SELECTED” (value), indicating that radio button was selected.

[
    { "No. of Units": "1" },
    { "Amount": "$ 450,000.00" },
    { "Year Built": "2010" },
    { "Purchase": "SELECTED" },
    { "Title will be held in what Name(s)": "Alejandro Rosalez" },
    { "Fixed Rate": "SELECTED" },
    ...
]

Extract data from 1099 forms

A mortgage application packet may also contain a number of IRS documents, such as 1099-DIV, 1099-INT, 1099-MISC, and 1099-R. These documents show the applicant’s earnings via interests, dividends, and other miscellaneous income components that are useful during underwriting to make decisions. The following image shows a collection of these documents, which are similar in structure. However, in some instances, the documents contain form information (marked using the red and green bounding boxes) as well as tabular information (marked by the yellow bounding boxes).

Image shows samples of 1099 INT, DIV, MISC, and R forms.

To extract form information, we use similar code as explained earlier with the AnalyzeDocument API. We pass an additional feature of TABLE to the API to indicate that we need both form and table data extracted from the document. The following code snippet uses the AnalyzeDocument API with FORMS and TABLES features on the 1099-INT document:

from textractcaller.t_call import call_textract, Textract_Features
from trp import Document
response_1099_int = call_textract(input_document='s3://<your-bucket>/1099-INT-2018.pdf',
                                  features=[Textract_Features.TABLES, 
                                            Textract_Features.FORMS])
doc_1099_int = Document(response_1099_int)
num_tables=1
for page in doc_1099_int.pages:     
    for table in page.tables:
        num_tables=num_tables+1
        for r, row in enumerate(table.rows):
            for c, cell in enumerate(row.cells):
                print(f"Cell[{r}][{c}] = {cell.text}")
        print('n')

Because the document contains a single table, the output of the code is as follows:

Table 1
-------------------
Cell[0][0] = 15 State 
Cell[0][1] = 16 State identification no. 
Cell[0][2] = 17 State tax withheld 
Cell[1][0] = 
Cell[1][1] = 34564 
Cell[1][2] = $ 2000 
Cell[2][0] = 
Cell[2][1] = 23543 
Cell[2][2] = $ 1000

The table information contains the cell position (row 0, column 0, and so on) and the corresponding text within each cell. We use a convenience method that can transform this table data into easy-to-read grid view:

from textractprettyprinter.t_pretty_print import Textract_Pretty_Print, get_string, Pretty_Print_Table_Format
print(get_string(textract_json=response_1099_int, 
                 table_format=Pretty_Print_Table_Format.grid, 
                 output_type=[Textract_Pretty_Print.TABLES]))

We get the following output:

+----------+-----------------------------+-----------------------+
| 15 State | 16 State identification no. | 17 State tax withheld |
+----------+-----------------------------+-----------------------+
|          | 34564                       | $ 2000                |
+----------+-----------------------------+-----------------------+
|          | 23543                       | $ 1000                |
+----------+-----------------------------+-----------------------+

To get the output in an easy-to-consume CSV format, the format type of Pretty_Print_Table_Format.csv can be passed into the table_format parameter. Other formats such as TSV (tab separated values), HTML, and Latex are also supported. For more information, refer to Textract-PrettyPrinter.

Extract data from a mortgage note

A mortgage application packet may contain unstructured documents with dense text. Some examples of dense text documents are contracts and agreements. A mortgage note is an agreement between a mortgage applicant and the lender or mortgage company, and contains information in dense text paragraphs. In such cases, the lack of structure makes it difficult to find key business information that is important in the mortgage application process. There are two approaches to solving this problem:

In the following sample mortgage note, we’re specifically interested in finding out the monthly payment amount and principal amount.

Image shows a sample of a mortgage note document.

For the first approach, we use the Query and QueriesConfig convenience methods to configure a set of questions that is passed to the Amazon Textract AnalyzeDocument API call. In case the document is multi-page (PDF or TIFF), we can also specify the page numbers where Amazon Textract should look for answers to the question. The following code snippet demonstrates how to create the query configuration, make an API call, and subsequently parse the response to get the answers from the response:

from textractcaller import QueriesConfig, Query
import trp.trp2 as t2

#Setup the queries
query2 = Query(text="What is the principal amount borrower has to pay?", alias="PRINCIPAL_AMOUNT", pages=["1"])
query4 = Query(text="What is the monthly payment amount?", alias="MONTHLY_AMOUNT", pages=["1"])

#Setup the query config with the above queries
queries_config = QueriesConfig(queries=[query1, query2, query3, query4])
#Call AnalyzeDocument with the queries_config
response_mortgage_note = call_textract(input_document='s3://<your-bucket>/Mortgage-Note.pdf',
                                       features=[Textract_Features.QUERIES],
                                       queries_config=queries_config)
doc_mortgage_note: t2.TDocumentSchema = t2.TDocumentSchema().load(response_mortgage_note) 

entities = {}
for page in doc_mortgage_note.pages:
    query_answers = doc_mortgage_note.get_query_answers(page=page)
    if query_answers:
        for answer in query_answers:
            entities[answer[1]] = answer[2]
print(entities)

We get the following output:

{
    'PRINCIPAL_AMOUNT': '$ 555,000.00',
    'MONTHLY_AMOUNT': '$2,721.23',
}

For the second approach, we use the Amazon Comprehend DetectEntities API with the mortgage note, which returns the entities it detects within the text from a predefined set of entities. These are entities that the Amazon Comprehend entity recognizer is pre-trained with. However, because our requirement is to detect specific entities, an Amazon Comprehend custom entity recognizer is trained with a set of sample mortgage note documents, and a list of entities. We define the entity names as PRINCIPAL_AMOUNT and MONTHLY_AMOUNT. Training data is prepared following the Amazon Comprehend training data preparation guidelines for custom entity recognition. The entity recognizer can be trained with document annotations or with entity lists. For the purposes of this example, we use entity lists to train the model. After we train the model, we can deploy it with a real-time endpoint or in batch mode to detect the two entities from the document contents. The following are the steps involved to train a custom entity recognizer and deploy it. For a full code walkthrough, refer to our GitHub repository.

  1. Prepare the training data (the entity list and the documents with (UTF-8 encoded) plain text format).
  2. Start the entity recognizer training using the CreateEntityRecognizer API using the training data.
  3. Deploy the trained model with a real-time endpoint using the CreateEndpoint API.

Extract data from a US passport

The Amazon Textract analyze identity documents capability can detect and extract information from US-based ID documents such as a driver’s license and passport. The AnalyzeID API is capable of detecting and interpreting implied fields in ID documents, which makes it easy to extract specific information from the document. Identity documents are almost always part of a mortgage application packet, because it’s used to verify the identity of the borrower during the underwriting process, and to validate the correctness of the borrower’s biographical data.

Image shows a sample of a US passport

We use a convenience method named call_textract_analyzeid, which calls the AnalyzeID API internally. We then iterate over the response to obtain the detected key-value pairs from the ID document. See the following code:

from textractcaller import call_textract_analyzeid
import trp.trp2_analyzeid as t2id

response_passport = call_textract_analyzeid(document_pages=['s3://<your-bucket>/Passport.pdf'])
doc_passport: t2id.TAnalyzeIdDocument = t2id.TAnalyzeIdDocumentSchema().load(response_passport)

for id_docs in response_passport['IdentityDocuments']:
    id_doc_kvs={}
    for field in id_docs['IdentityDocumentFields']:
        if field['ValueDetection']['Text']:
            id_doc_kvs[field['Type']['Text']] = field['ValueDetection']['Text']
print(id_doc_kvs)

AnalyzeID returns information in a structure called IdentityDocumentFields, which contains the normalized keys and their corresponding value. For example, in the following output, FIRST_NAME is a normalized key and the value is ALEJANDRO. In the example passport image, the field for the first name is labeled as “Given Names / Prénoms / Nombre,” however AnalyzeID was able to normalize that into the key name FIRST_NAME. For a list of supported normalized fields, refer to Identity Documentation Response Objects.

{
    'FIRST_NAME': 'ALEJANDRO',
    'LAST_NAME': 'ROSALEZ',
    'DOCUMENT_NUMBER': '918268822',
    'EXPIRATION_DATE': '31 JAN 2029',
    'DATE_OF_BIRTH': '15 APR 1990',
    'DATE_OF_ISSUE': '29 JAN 2009',
    'ID_TYPE': 'PASSPORT',
    'ENDORSEMENTS': 'SEE PAGE 27',
    'PLACE_OF_BIRTH': 'TEXAS U.S.A.'
}

A mortgage packet may contain several other documents, such as a paystub, W2 form, bank statement, credit card statement, and employment verification letter. We have samples for each of these documents along with the code required to extract data from them. For the complete code base, check out the notebooks in our GitHub repository.

Document enrichment

One of the most common forms of document enrichment is sensitive or confidential information redaction on documents, which may be mandated due to privacy laws or regulations. For example, a mortgage applicant’s paystub may contain sensitive PII data, such as name, address, and SSN, that may need redaction for extended storage.

In the preceding sample paystub document, we perform redaction of PII data such as SSN, name, bank account number, and dates. To identify PII data in a document, we use the Amazon Comprehend PII detection capability via the DetectPIIEntities API. This API inspects the content of the document to identify the presence of PII information. Because this API requires input in UTF-8 encoded plain text format, we first extract the text from the document using the Amazon Textract DetectDocumentText API, which returns the text from the document and also returns geometry information such as bounding box dimensions and coordinates. A combination of both outputs is then used to draw redactions on the document as part of the enrichment process.

Review, validate, and integrate data

Extracted data from the document extraction phase may need validation against specific business rules. Specific information may also be validated across several documents, also known as cross-doc validation. An example of cross-doc validation could be comparing the applicant’s name in the ID document to the name in the mortgage application document. You can also do other validations such as property value estimations and conditional underwriting decisions in this phase.

A third type of validation is related to the confidence score of the extracted data in the document extraction phase. Amazon Textract and Amazon Comprehend return a confidence score for forms, tables, text data, and entities detected. You can configure a confidence score threshold to ensure that only correct values are being sent downstream. This is achieved via Amazon A2I, which compares the confidence scores of detected data with a predefined confidence threshold. If the threshold isn’t met, the document and the extracted output is routed to a human for review through an intuitive UI. The reviewer takes corrective action on the data and saves it for further processing. For more information, refer to Core Concepts of Amazon A2I.

Conclusion

In this post, we discussed the phases of intelligent document processing as it relates to phases of a mortgage application. We looked at a few common examples of documents that can be found in a mortgage application packet. We also discussed ways of extracting and processing structured, semi-structured, and unstructured content from these documents. IDP provides a way to automate end-to-end mortgage document processing that can be scaled to millions of documents, enhancing the quality of application decisions, reducing costs, and serving customers faster.

As a next step, you can try out the code samples and notebooks in our GitHub repository. To learn more about how IDP can help your document processing workloads, visit Automate data processing from documents.


About the authors

Anjan Biswas is a Senior AI Services Solutions Architect with focus on AI/ML and Data Analytics. Anjan is part of the world-wide AI services team and works with customers to help them understand, and develop solutions to business problems with AI and ML. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations and is actively helping customers get started and scale on AWS AI services.

Dwiti Pathak is a Senior Technical Account Manager based out of San Diego. She is focused on helping Semiconductor industry engage in AWS. In her spare time, she likes reading about new technologies and playing board games.

Balaji Puli is a Solutions Architect based in Bay Area, CA. Currently helping select Northwest U.S healthcare life sciences customers accelerate their AWS cloud adoption. Balaji enjoys traveling and loves to explore different cuisines.

Read More

Achieve low-latency hosting for decision tree-based ML models on NVIDIA Triton Inference Server on Amazon SageMaker

Machine learning (ML) model deployments can have very demanding performance and latency requirements for businesses today. Use cases such as fraud detection and ad placement are examples where milliseconds matter and are critical to business success. Strict service level agreements (SLAs) need to be met, and a typical request may require multiple steps such as preprocessing, data transformation, model selection logic, model aggregation, and postprocessing. At scale, this often means maintaining a huge volume of traffic while maintaining low latency. Common design patterns include serial inference pipelines, ensembles (scatter-gather), and business logic workflows, which result in realizing the entire workflow of the request as a Directed Acyclic Graph (DAG). However, as workflows get more complex, this can lead to an increase in overall response times, which in turn can negatively impact the end-user experience and jeopardize business goals. Triton can address these use cases where multiple models are composed in a pipeline with input and output tensors connected between them, helping you address these workloads.

As you evaluate your goals in relation to ML model inference, many options can be considered, but few are as capable and proven as Amazon SageMaker with Triton Inference Server. SageMaker with Triton Inference Server has been a popular choice for many customers because it’s purpose-built to maximize throughput and hardware utilization with ultra-low (single-digit milliseconds) inference latency. It has wide range of supported ML frameworks (including TensorFlow, PyTorch, ONNX, XGBoost, and NVIDIA TensorRT) and infrastructure backends, including NVIDIA GPUs, CPUs, and AWS Inferentia. Additionally, Triton Inference Server is integrated with SageMaker, a fully managed end-to-end ML service, providing real-time inference options for model hosting.

In this post, we walk through deploying a fraud detection ensemble workload to SageMaker with Triton Inference Server.

Solution overview

It’s essential for any project to have a list of requirements and an effort estimation, in order to approximate the total cost of the project. It’s important to estimate the return on investment (ROI) that supports the decision of an organization. Some considerations to take account when moving a workload to Triton include:

Effort estimation is key in software development, and its measurement is often based on incomplete, uncertain, and noisy inputs. ML workloads are no different. Multiple factors will affect an architecture for ML inference, some of which include:

  • Client-side latency budget – It specifies the client-side round-trip maximum acceptable waiting time for an inference response, commonly expressed in percentiles. For workloads that require a latency budget near tens of milliseconds, network transfers could become expensive, so using models at the edge would be a better fit.
  • Data payload distribution size – Payload, often referred to as message body, is the request data transmitted from the client to the model, as well as the response data transmitted from the model to the client. The payload size often has a major impact on latency and should be taken into consideration.
  • Data format – This specifies how the payload is sent to the ML model. Format can be human-readable, such as JSON and CSV, however there are also binary formats, which are often compressed and smaller in size. This is a trade-off between compression overhead and transfer size, meaning that CPU cycles and latency is added to compress or decompress, in order to save bytes transferred over the network. This post shows how to utilize both JSON and binary formats.
  • Software stack and components required – A stack is a collection of components that operate together to support an ML application, including operating system, runtimes, and software layers. Triton comes with built-in popular ML frameworks, called backends, such as ONNX, TensorFlow, FIL, OpenVINO, native Python, and others. You can also author a custom backend for your own homegrown components. This post goes over an XGBoost model and data preprocessing, which we migrate to the NVIDIA provided FIL and Python Triton backends, respectively.

All these factors should play a vital part in evaluating how your workloads perform, but in this use case we focus on the work needed to move your ML models to be hosted in SageMaker with Triton Inference Server. Specifically, we use an example of a fraud detection ensemble composed of an XGBoost model with preprocessing logic written in Python.

NVIDIA Triton Inference Server

Triton Inference Server has been designed from the ground up to enable teams to deploy, run, and scale trained AI models from any framework on GPU or CPU based infrastructure. In addition, it has been optimized to offer high-performance inference at scale with features like dynamic batching, concurrent runs, optimal model configuration, model ensemble, and support for streaming inputs.

The following diagram shows an example NVIDIA Triton ensemble pipeline.

Workloads should take into account the capabilities that Triton provides along with SageMaker hosting to maximize the benefits offered. For example, Triton supports both HTTP and gRPC protocols as well a C API, which allow for flexibility as well as payload optimization when needed. As previously mentioned, Triton supports several popular frameworks out of the box, including TensorFlow, PyTorch, ONNX, XGBoost, and NVIDIA TensorRT. These frameworks are supported through Triton backends, and in the rare event that a backend doesn’t support your use case, Triton allows you to implement your own and integrate it easily.

The following diagram shows an example of the NVIDIA Triton architecture.

NVIDIA Triton on SageMaker

SageMaker hosting services are the set of SageMaker features aimed at making model deployment and serving easier. It provides a variety of options to easily deploy, auto scale, monitor, and optimize ML models tailored for different use cases. This means that you can optimize your deployments for all types of usage patterns, from persistent and always available with serverless options, to transient, long-running, or batch inference needs.

Under the SageMaker hosting umbrella is also the set of SageMaker inference Deep Learning Containers (DLCs), which come prepackaged with the appropriate model server software for their corresponding supported ML framework. This enables you to achieve high inference performance with no model server setup, which is often the most complex technical aspect of model deployment and in general isn’t part of a data scientist’s skill set. Triton inference server is now available on SageMaker DLCs.

This breadth of options, modularity, and ease of use of different serving frameworks makes SageMaker and Triton a powerful match.

NVIDIA FIL backend support

With the 22.05 version release of Triton, NVIDIA now supports forest models trained by several popular ML frameworks, including XGBoost, LightGBM, Scikit-learn, and cuML. When using the FIL backend for Triton, you should ensure that the model artifacts that you provide are supported. For example, FIL supports model_type xgboost, xgboost_json, lightgbm, or treelite_checkpoint, indicating whether the provided model is in XGBoost binary format, XGBoost JSON format, LightGBM text format, or Treelite binary format, respectively.

This backend support is essential for us to use in our example because FIL supports XGBoost models. The only consideration to check is to ensure that the model that we deploy supports binary or JSON formats.

In addition to ensuring that you have the proper model format, other considerations should be taken. The FIL backend for Triton provides configurable options for developers to tune their workloads and optimize model run performance. The configuration dynamic_batching allows Triton to hold client-side requests and batch them on the server side, in order to efficiently use FIL’s parallel computation to inference the entire batch together. The option max_queue_delay_microseconds offers a fail-safe control of how long Triton waits to form a batch. FIL comes with Shapley explainer, which can be activated by the configuration treeshap_output; however, you should keep in mind that Shapley outputs hurt performance due to its output size. Another important aspect is storage_type in order to trade-off between memory footprint and runtime. For example, using storage as SPARSE can reduce the memory consumption, whereas DENSE can reduce your model run performance at the expense of higher memory usage. Deciding the best choice for each of these depends on your workload and your latency budget, so we recommend a deeper look into all options in the FIL backend FAQ and the list of configurations available in FIL.

Steps to host a model on triton

Let’s look at our fraud detection use case as an example of what to consider when moving a workload to Triton.

Identify your workload

In this use case, we have a fraud detection model used during the checkout process of a retail customer. The inference pipeline is using an XGBoost algorithm with preprocessing logic that includes data preparation for preprocessing.

Identify current and target performance metrics and other goals that may apply

You may find that your end-to-end inference time is taking too long to be acceptable. Your goal could be to go from tens of milliseconds of latency to single-digit latency for the same volume of requests and respective throughput. You determine that the bulk of the time is consumed by data preprocessing and the XGBoost model. Other factors such as network and payload size play a minimal role in the overhead associated with the end-to-end inference time.

Work backward to determine if Triton can host your workload based on your requirements

To determine if Triton can meet your requirements, you want to pay attention to two main areas of concern. The first is to ensure that Triton can serve with an acceptable front end option such as HTTP or C API.

As mentioned previously, it’s also critical to determine if Triton supports a backend that can serve your artifacts. Triton supports a number of backends that are tailor-made to support various frameworks like PyTorch and TensorFlow. Check to ensure that your models are supported and that you have the proper model format that Triton expects. To do this, first check to see what model formats the Triton backend supports. In many cases, this doesn’t require any changes for the model. In other cases, your model may require transformation to a different format. Depending on the source and target format, various options exist, such as transforming a Python pickle file to use Treelite’s binary checkpoint format.

For this use case, we determine the FIL backend can support the XGBoost model with no changes needed and that we can use the Python backend for the preprocessing. With the ensemble feature of Triton, you can further optimize your workload by avoiding costly network calls between hosting instances.

Create a plan and estimate the effort required to use Triton for hosting

Let’s talk about the plan to move your models to Triton. Every Triton deployment requires the following:

  • Model artifacts required by Triton backends
  • Triton configuration files
  • A model repository folder with the proper structure

We show an example of how to create these deployment dependencies later in this post.

Run the plan and validate the results

After you create the required files and artifacts in the properly structured model repository, you need to tune your deployment and test it to validate that you have now hit your target metrics.

At this point, you can use SageMaker Inference Recommender to determine what endpoint instance type is best for you based upon your requirements. In addition, Triton provides tools to make build optimizations to get better performance.

Implementation

Now let’s look at the implementation details. For this we have prepared two notebooks that provide an example of what can be expected. The first notebook shows the training of the given XGBoost model as well as the preprocessing logic that is used for both training and inference time. The second notebook shows how we prepare the artifacts needed for deployment on Triton.

The first notebook shows an existing notebook your organization has that uses the RAPIDS suite of libraries and the RAPIDS Conda kernel. This instance runs on a G4DN instance type provided by AWS, which is GPU accelerated by using NVIDIA T4 processors.

Preprocessing tasks in this example benefit from GPU acceleration and heavily use the cuML and cuDF libraries. An example of this is in the following code, where we show categorical label encoding using cuML. We also generate a label_encoders.pkl file that we can use to serialize the encoders and use them for preprocessing during inference time.

The first notebook concludes by training our XGBoost model and saving the artifacts accordingly.

In this scenario, the training code already existed and no changes are needed for the model at training time. Additionally, although we used GPU acceleration for preprocessing during training, we plan to use CPUs for preprocessing at inference time. We explain more later in the post.

Let’s now move on to the second notebook and recall what we need for a successful Triton deployment.

First, we need the model artifacts required by backends. The files that we need to create for this ensemble include:

  • Preprocessing artifacts (model.py, label_encoders.pkl)
  • XGBoost model artifacts (xgboost.json)

The Python backend in Triton requires us to use a Conda environment as a dependency. In this case, we use the Python backend to preprocess the raw data before feeding it into the XGBoost model being run in the FIL backend. Even though we originally used RAPIDS cuDF and cuML libraries to do the data preprocessing (as referenced earlier using our GPU), here we use Pandas and Scikit-learn as preprocessing dependencies for inference time (using our CPU). We do this for three reasons:

  • To show how to create a Conda environment for your dependencies and how to package it in the format expected by Triton’s Python backend.
  • By showing the preprocessing model running in the Python backend on the CPU while the XGBoost model runs on the GPU in the FIL backend, we illustrate how each model in Triton’s ensemble pipeline can run on a different framework backend, and run on different hardware with different configurations.
  • It highlights how the RAPIDS libraries (cuDF, cuML) are compatible with their CPU counterparts (Pandas, Scikit-learn). This way, we can show how LabelEncoders created in cuML can be used in Scikit-learn and vice-versa. Note that if you expect to preprocess large amounts of tabular data during inference time, you can still use RAPIDS to GPU-accelerate it.

Recall that we created the label_encoders.pkl file in the first notebook. There’s nothing more to do for category encoding other than include it in our model.py file for preprocessing.

To create the model.py file required by the Triton Python backend, we adhere to the formatting required by the backend and include our Python logic to process the incoming tensor and use the label encoder referenced earlier. You can review the file used for preprocessing.

For the XGBoost model, nothing more needs to be done. We trained the model in the first notebook and Triton’s FIL backend requires no additional effort for XGBoost models.

Next, we need the Triton configuration files. Each model in the Triton ensemble requires a config.pbtxt file. In addition, we also create a config.pbtxt file for the ensemble as a whole. These files allow Triton to know metadata about the ensemble with information such as the inputs and outputs we expect as well as help defining the DAG associated with the ensemble.

Lastly, to deploy a model on Triton, we need our model repository folder to have the proper folder structure. Triton has specific requirements for model repository layout. Within the top-level model repository directory, each model has its own sub-directory containing the information for the corresponding model. Each model directory in Triton must have at least one numeric sub-directory representing a version of the model. For our use case, the resulting structure should look like the following.

After we have these three prerequisites, we create a compressed file as packaging for deployment and upload it to Amazon Simple Storage Service (Amazon S3).

We can now create a SageMaker model from the model repository we uploaded to Amazon S3 in the previous step.

In this step, we also provide the additional environment variable SAGEMAKER_TRITON_DEFAULT_MODEL_NAME, which specifies the name of the model to be loaded by Triton. The value of this key should match the folder name in the model package uploaded to Amazon S3. This variable is optional in the case of a single model. In the case of ensemble models, this key has to be specified for Triton to start up in SageMaker.

Additionally, you can set SAGEMAKER_TRITON_BUFFER_MANAGER_THREAD_COUNT and SAGEMAKER_TRITON_THREAD_COUNT for optimizing the thread counts. Both configuration values help tune the number of threads that are running on your CPUs, so you can possibly gain better utilization by increasing these values for CPUs with a greater number of cores. In the majority of cases, the default values often work well, but it may be worth experimenting see if further efficiency can be gained for your workloads.

With the preceding model, we create an endpoint configuration where we can specify the type and number of instances we want in the endpoint.

Lastly, we use the preceding endpoint configuration to create a new SageMaker endpoint and wait for the deployment to finish. The status changes to InService after the deployment is successful.

That’s it! Your endpoint is now ready for testing and validation. At this point, you may want to use various tools to help optimize your instance types and configuration to get the best possible performance. The following figure provides an example of the gains that can be achieved by using the FIL backend for an XGBoost model on Triton.

Summary

In this post, we walked you through deploying an XGBoost ensemble workload to SageMaker with Triton Inference Server. Moving workloads to Triton on SageMaker can be a highly beneficial return on investment. As with any adoption of technology, a vetting process and plan are key, and we detailed a five-step process to guide you through what to consider when moving your workloads. In addition, we dove deep into the steps needed to deploy an ensemble that uses using Python preprocessing and an XGBoost model on Triton on SageMaker.

SageMaker provides the tools to remove the undifferentiated heavy lifting from each stage of the ML lifecycle, thereby facilitating the rapid experimentation and exploration needed to fully optimize your model deployments. SageMaker hosting support for Triton Inference Server enables low-latency, high transactions per second (TPS) workloads.

We highly recommend evaluating Triton Inference Server on SageMaker hosting for your inference needs; it can be well worth the effort to move your existing models to take advantage of this technology.

You can find the notebooks used for this example on GitHub.


About the author

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In his spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends.

Jiahong Liu is a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Kshitiz Gupta is a Solutions Architect at NVIDIA. He enjoys educating cloud customers about the GPU AI technologies NVIDIA has to offer and assisting them with accelerating their machine learning and deep learning applications. Outside of work, he enjoys running, hiking and wildlife watching.

Bruno Aguiar de Melo is a Software Development Engineer at Amazon.com, where he helps science teams to build, deploy and release ML workloads. He is interested in instrumentation and controllable aspects within the ML modelling/design phase that must be considered and measured with the insight that model execution performance is just as important as model quality performance, particularly in latency constrained use cases. In his spare time, he enjoys wine, board games and cooking.

Eliuth Triana is a Developer Relations Manager at NVIDIA. He connects Amazon and AWS product leaders, developers, and scientists with NVIDIA technologists and product leaders to accelerate Amazon ML/DL workloads, EC2 products, and AWS AI services. In addition, Eliuth is a passionate mountain biker, skier, and poker player.

Read More

Build a multi-lingual document translation workflow with domain-specific and language-specific customization

In the digital world, providing information in a local language isn’t novel, but it can be a tedious and expensive task. Advancements in machine learning (ML) and natural language processing (NLP) have made this task much easier and less expensive.

We have seen increased adoption of ML for multi-lingual data and document processing workloads. Enterprise and government customers are migrating their manual translation workloads to take advantage of automated ML translation services. Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation between several thousand language pairings that can be used for synchronous (real-time) or asynchronous translation tasks. For a complete list of available translation pairs, refer to Supported languages and language codes.

Customers migrating and modernizing their translation workloads need the ability to customize translations for their business domain. A translation workload may also need the ability to adapt to regional language dialects or usage. For example, the Spanish translation of “elderly” is anciano(a) but in Puerto Rico the word envejeciente is preferred.

In this post, we demonstrate how to incorporate Amazon Translate’s Active Custom Translation (ACT) feature. We propose a solution to create a multi-lingual document translation workflow with domain- and language-specific customizations that you can review and augment as needed to continuously improve results and delight end-users.

Solution overview

ACT produces custom-translated output without the need to build and maintain a custom translation model. Using ACT, Amazon Translate will use your preferred translation examples as parallel data to customize your translation result, eliminating the time and cost of required to build and train a new machine learning model.

The solution covered in this post explains how to create a human-in-the-loop workflow using Amazon Augmented AI (Amazon A2I) to continuously improve the customized translation. Amazon A2I provides a simple way to integrate human oversight into your ML workflows, with no ML experience required. Amazon A2I makes it straightforward to integrate human judgement and AI into any ML application, regardless of whether it’s run on AWS or on another platform.

For more information refer to Designing human review workflows with Amazon Translate and Amazon Augmented AI post.

The following diagram displays the command flow and data flow of the solution. The command flow shows the logical sequence of events in the workflow. A data flow indicates how data is being created or used by various components in the solution.

The following sequence diagram shows two separate processes in the solution: the translation workflow (A) and the process to update parallel data (B).

The translation workflow is initiated by an Amazon CloudWatch scheduled event which starts the Translation Job Invoker AWS Lambda function. This function creates an asynchronous translation job in Amazon Translate, passing along the document to translate and the location of the parallel data to customize the translation. The translation job reads the parallel data, performs the translation, and writes the translated result back to an Amazon S3 bucket. As of this writing, only asynchronous translation jobs can use parallel data.

When the translation job is complete, an event is generated that triggers the Translation Job Completion Handler Lambda function. This function creates a human workflow loop—the main component of the Amazon A2I portion of the workflow.

Human reviewers assess the translation and accept or modify the translation. Any corrections are used to update the translated document and also added to a customization dictionary. When the review is finalized, another event is generated to trigger the Workflow Completion Handler function. This function writes the latest translated document back to Amazon S3. The customization data is used to update an Amazon DynamoDB table with the source and translated text pairs.

To close the loop, we must incorporate this customization data stored in DynamoDB back into the parallel data stored in Amazon S3. To accomplish this, we use a scheduled CloudWatch event to trigger the Parallel Data Refresher function, which reads the data from the DynamoDB table, reformats it as parallel data, and updates the S3 bucket, storing the parallel data.

Deploy the solution with AWS CloudFormation

Launch the provided AWS CloudFormation template to deploy the solution in your account. This stack only works in the us-east-1 Region. If you want to deploy this solution in other Regions, refer to the following GitHub repo.

  1. Choose Launch Stack:
  2. Follow the instructions to populate the necessary parameters. If you’re running this stack for the first time, SNS Email is the only required parameter.
  3. On the Review page, in the Capabilities section, select the check box and choose Create stack.

The stack creates the following key components:

  • Customization data – A DynamoDB table (translate_parallel_data) to maintain the customization data. You migrate the existing customization data to this table. This table is used to continuously add and update customizations.
  • Parallel Data Refresher – The Lambda function to convert the customization data in the DynamoDB table to a parallel data format—CSV, TSV, or TMX—and store it in Amazon S3. It creates and updates parallel data with the new parallel data file in Amazon S3.
  • Translation Job Invoker – The Lambda function to start the Amazon Translate batch job with parallel data.
  • Translation Job Completion Handler – This Lambda function is triggered when the Amazon Translate batch job is complete. The function creates one human loop per document (we’ll refine this in the future to create a human loop only for a select percentage of documents processed). It uses the original and translated documents to create the human loop.
  • Amazon A2I customized template – This template is used to render the translation pair for human review. The template has the Add option for every translation segment. Users can select this option to add the corrections to the customization data. The new customization data is used in the next batch translation job.
  • Workflow Completion Handler – This Lambda function is triggered when the human workflow is complete. The function updates the translated document with corrections and checks for parallel data updates. New parallel data is added to the DynamoDB table.
  • Amazon A2I private team – An Amazon A2I private team is created with a human worker using the email provided. Initial credentials are emailed upon successful creation of the private team. You use this email and credential to log in to the Amazon A2I worker portal.

Test the solution

The sample_text.txt file would have been created under the input prefix of the S3 bucket created by the stack. We use this file for our testing. It contains the following content:

Life insurance companies have the freedom to charge different premiums based on risk
factors that predict mortality. Purchasing a life insurance policy often entails a health 
status check or medical exam, and asking for vaccination status is not banned.

Health insurers are a different story. A slew of state and federal regulations in the 
last three decades have heavily restricted their ability to use health factors in issuing 
or pricing polices. The use of health status in any group health insurance policy is 
prohibited by law. The Affordable Care Act, passed in 2014, prevents insurers from pricing 
plans according to health – with one exception: smoking status.

To test the solution, complete the following steps:

  1. Invoke the Translation Job Invoker function manually, or wait for it to be triggered by CloudWatch based on the cron schedule you specified.
    This function triggers the Amazon Translate batch job. You can observe the progress of the job on the Amazon Translate console.
    This batch job takes approximately 30 minutes to complete. When it’s complete, the TextTranslationJob state change event triggers the Translation Job Completion Handler function. This function creates one human loop per translated document.
  2. Navigate to the Amazon A2I workforces page.
  3. Choose the Private tab.
  4. Log in to the Amazon A2I worker portal by choosing the link for Labelling portal sign-in URL.
  5. Select the task Human review task in the jobs list.
  6. Choose Start working.

    You can see the following page displayed.
  7. Follow the instructions to make domain- and language-specific corrections.
    In the preceding screenshot, the phrase “The use of health status in any group health insurance policy is prohibited by law” has been translated to “La ley prohíbe el uso del estado de salud en cualquier póliza de seguro médico de grupo.” Although the translation is accurate, the phrases have been rearranged.
  8. Let’s modify this to “El uso del estado de salud en cualquier póliza de seguro de salud grupal está prohibido por ley” to make this a more direct translation reflecting the original phraseology.
  9. Select Add to add this to the dictionary.
  10. When you’re done, choose Submit.

This triggers the Workflow Completion Handler function, and the customization data is updated in the DynamoDB table. The function also stores the corrected translation under the post-edits prefix.

You can observe the customizations being added to translate_parallel_data table on the DynamoDB console.

Command flow

The Parallel Data Refresher function is triggered every hour by a CloudWatch scheduled event. This function checks for new updates in the translate_parallel_data table, creates a new parallel data TMX file in Amazon S3 under the parallel_data prefix, and updates the Amazon Translate parallel data component. You can trigger this function manually if you don’t want to wait for the scheduled event trigger.

You can observe the parallel data being updated on the Amazon Translate console.

When it’s complete, the job status should be Active and the value for Updated records should reflect the number of customizations you added (in this case 1).

Now we can run the translation job again with the updated data. Trigger the Translation Job Invoker function again to observe the customization being added to the translation in the second iteration. Amazon Translate now uses the parallel data provided to customize the translation.

You can observe the change in the translation output in the labeling portal. Instead of the default translation, we see the customized translation being applied.

This workflow helps create a virtuous cycle to continuously improve translation output using Amazon A2I and Amazon Translate customization features.

Cost

With Amazon Translate and Amazon A2I, you pay as you go based on the number of text characters that you processed and for each human-reviewed object. We use DynamoDB on-demand mode for this example. DynamoDB charges you for the reads and writes performed on your tables. Refer to the pricing pages for Amazon Translate, Amazon A2I, and Amazon DynamoDB for actual costs.

Clean up

When you’re finished experimenting with this solution, clean up your resources by using the AWS CloudFormation console to delete all the resources deployed in this example. This helps you avoid continuing costs in your account.

Conclusion

You can use the solution presented in this post to build a multi-lingual translation workflow that uses and augments domain-specific customization incrementally to continuously improve translation results. We provided a simple mechanism to integrate your existing customization assets with managed AI services like Amazon Translate and Amazon A2I to build a robust translation service for your application. Amazon Translate can help you scale this solution to support over 5,550 translation pairs out of the box. Amazon A2I can help you easily integrate with your in-house linguistic expert or take advantage of an external workforce to scale the solution.

For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and refer to AWS Translate FAQs. Please share your thoughts with us in the comments section, or in the issues section of the project’s Github repository.


About the Authors

Sathya Balakrishnan is a Sr Customer Delivery Architect in the Professional Services team at AWS, specializing in Data/ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.

Paul W. Joireman is a Sr Customer Delivery Architect in Professional Services at AWS, specializing in Application Migration and working with US federal financial clients. Paul enjoys creating technology solutions, traveling with family and hiking in the Shenandoah National Park, as long as the hike finishes at a local craft brewery.

Read More

AWS Deep Learning Challenge sees innovative and impactful use of Amazon EC2 DL1 instances

In the AWS Deep Learning Challenge held from January 5, 2022, to March 1, 2022, participants from academia, startups, and enterprise organizations joined to test their skills and train a deep learning model of their choice using Amazon Elastic Compute Cloud (Amazon EC2) DL1 instances and Habana’s SynapseAI SDK. The EC2 DL1 instances powered by Gaudi accelerators from Habana Labs, an Intel company, are designed specifically for training deep learning models. Participants were able to realize the significant price/performance benefits that DL1 offers over GPU-based instances.

We are excited to announce the winners and showcase some of the machine learning (ML) models that were trained in this hackathon. You will learn about some of the deep learning use cases that are supported by EC2 DL1 instances, including computer vision, natural language processing, and acoustic modeling.

Winning models

Our first-place winner is a project submitted by Gustavo Zomer. It’s an implementation of multi-lingual CLIP (Contrastive Language-Image Pre-Training). CLIP was introduced by OpenAI in 2021 as a way to train a more generalizable image classifier across larger datasets through self-supervised learning. It’s trained on a large set of images with a wide variety of natural language supervision that’s abundantly available on the internet, but is limited to the English language. This project replaces the text encoder in CLIP with a multi-lingual text encoder called XLM-RoBERTa to broaden the model’s applicability to multiple languages. This modified implementation of CLIP is able to pair images with captions across multiple languages. The model was trained on 16 accelerators across two DL1 instances, showing how ML training can be scaled to use multiple Gaudi accelerators across multiple nodes to increase training throughput and reduce the time to train. The judges were impressed by the impactful use of deep learning to break down language barriers, and the technical implementation, which used distributed training.

In second place, we have a project submitted by Remco van Akker. It uses a GAN (Generative Adversarial Network) to generate synthetic retinal image data for medical applications. Synthetic data is used in model training in medical applications to overcome the scarcity of annotated medical data, which is labor-intensive and costly to produce. Synthetic data can be used as part of data augmentation to remove biases and make vision models in medical applications more generalizable. This project stood out because it implemented a generative model on DL1 to solve a real-world problem impacting the application of AI and ML in healthcare.

Rounding out our top three was a project submitted by Zohar Jackson that implemented a vision transformer model for semantic segmentation. This project uses the Ray Tune library to fine-tune hyperparameters and uses Horovod to parallelize training on 16 Gaudi accelerators across two DL1 instances.

In addition to the top three winners, participants won several other prizes, including best technical implementation, highest potential impact, and most creative project. We offer our congratulations to all the winners of this hackathon for building such a diverse set of impactful projects on Gaudi accelerator-based EC2 DL1 instances. We can’t wait to see what our participants will continue to build on DL1 instances going forward.

Get started with DL1 instances

As demonstrated by the various projects in this hackathon, you can use EC2 DL1 instances to train deep learning models for use cases such as natural language processing, object detection, and image recognition. With DL1 instances, you also get up to 40% better price/performance for training deep learning models compared to current generation GPU-based EC2 instances. Visit Amazon EC2 DL1 Instances to learn more about how DL1 instances can accelerate your training workloads.


About the authors

Dvij Bajpai is a Senior Product Manager at AWS. He works on developing EC2 instances for workloads in machine learning and high-performance computing.

Amr Ragab is a Principal Solutions Architect at AWS. He provides technical guidance to help customers run complex computational workloads at scale.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt EC2 accelerated computing infrastructure for their machine learning needs.

Read More