Process mortgage documents with intelligent document processing using Amazon Textract and Amazon Comprehend

Organizations in the lending and mortgage industry process thousands of documents on a daily basis. From a new mortgage application to mortgage refinance, these business processes involve hundreds of documents per application. There is limited automation available today to process and extract information from all the documents, especially due to varying formats and layouts. Due to high volume of applications, capturing strategic insights and getting key information from the contents is a time-consuming, highly manual, error prone and expensive process. Legacy optical character recognition (OCR) tools are cost-prohibitive, error-prone, involve a lot of configuring, and are difficult to scale. Intelligent document processing (IDP) with AWS artificial intelligence (AI) services helps automate and accelerate the mortgage application processing with goals of faster and quality decisions, while reducing overall costs.

In this post, we demonstrate how you can utilize machine learning (ML) capabilities with Amazon Textract, and Amazon Comprehend to process documents in a new mortgage application, without the need for ML skills. We explore the various phases of IDP as shown in the following figure, and how they connect to the steps involved in a mortgage application process, such as application submission, underwriting, verification, and closing.

Image shows the phases of intelligent document processing (IDP).

Although each mortgage application may be unique, we took into account some of the most common documents that are included in a mortgage application, such as the Unified Residential Loan Application (URLA-1003) form, 1099 forms, and mortgage note.

Solution overview

Amazon Textract is an ML service that automatically extracts text, handwriting, and data from scanned documents using pre-trained ML models. Amazon Comprehend is a natural-language processing (NLP) service that uses ML to uncover valuable insights and connections in text and can perform document classification, name entity recognition (NER), topic modeling, and more.

The following figure shows the phases of IDP as it relates to the phases of a mortgage application process.

Image shows a high-level solution architecture for the phases of intelligent document processing (IDP) as it relates to the stages of a mortgage application.

At the start of the process, documents are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. This initiates a document classification process to categorize the documents into known categories. After the documents are categorized, the next step is to extract key information from them. We then perform enrichment for select documents, which can be things like personally identifiable information (PII) redaction, document tagging, metadata updates, and more. The next step involves validating the data extracted in previous phases to ensure completeness of a mortgage application. Validation can be done via business validation rules and cross document validation rules. The confidence scores of the extracted information can also be compared to a set threshold, and automatically routed to a human reviewer through Amazon Augmented AI (Amazon A2I) if the threshold isn’t met. In the final phase of the process, the extracted and validated data is sent to downstream systems for further storage, processing, or data analytics.

In the following sections, we discuss the phases of IDP as it relates to the phases of a mortgage application in detail. We walk through the phases of IDP and discuss the types of documents; how we store, classify, and extract information, and how we enrich the documents using machine learning.

Document storage

Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. We use Amazon S3 to securely store the mortgage documents during and after the mortgage application process. A mortgage application packet may contain several types of forms and documents, such as URLA-1003, 1099-INT/DIV/RR/MISC, W2, paystubs, bank statements, credit card statements, and more. These documents are submitted by the applicant in the mortgage application phase. Without manually looking through them, it might not be immediately clear which documents are included in the packet. This manual process can be time-consuming and expensive. In the next phase, we automate this process using Amazon Comprehend to classify the documents into their respective categories with high accuracy.

Document classification

Document classification is a method by means of which a large number of unidentified documents can be categorized and labeled. We perform this document classification using an Amazon Comprehend custom classifier. A custom classifier is an ML model that can be trained with a set of labeled documents to recognize the classes that are of interest to you. After the model is trained and deployed behind a hosted endpoint, we can utilize the classifier to determine the category (or class) a particular document belongs to. In this case, we train a custom classifier in multi-class mode, which can be done either with a CSV file or an augmented manifest file. For the purposes of this demonstration, we use a CSV file to train the classifier. Refer to our GitHub repository for the full code sample. The following is a high-level overview of the steps involved:

  1. Extract UTF-8 encoded plain text from image or PDF files using the Amazon Textract DetectDocumentText API.
  2. Prepare training data to train a custom classifier in CSV format.
  3. Train a custom classifier using the CSV file.
  4. Deploy the trained model with an endpoint for real-time document classification or use multi-class mode, which supports both real-time and asynchronous operations.

The following diagram illustrates this process.

Image shows Amazon Comprehend custom classifier training process and document classification using the trained and deployed classifier model (real time or batch).

You can automate document classification using the deployed endpoint to identify and categorize documents. This automation is useful to verify whether all the required documents are present in a mortgage packet. A missing document can be quickly identified, without manual intervention, and notified to the applicant much earlier in the process.

Document extraction

In this phase, we extract data from the document using Amazon Textract and Amazon Comprehend. For structured and semi-structured documents containing forms and tables, we use the Amazon Textract AnalyzeDocument API. For specialized documents such as ID documents, Amazon Textract provides the AnalyzeID API. Some documents may also contain dense text, and you may need to extract business-specific key terms from them, also known as entities. We use the custom entity recognition capability of Amazon Comprehend to train a custom entity recognizer, which can identify such entities from the dense text.

In the following sections, we walk through the sample documents that are present in a mortgage application packet, and discuss the methods used to extract information from them. For each of these examples, a code snippet and a short sample output is included.

Extract data from Unified Residential Loan Application URLA-1003

A Unified Residential Loan Application (URLA-1003) is an industry standard mortgage loan application form. It’s a fairly complex document that contains information about the mortgage applicant, type of property being purchased, amount being financed, and other details about the nature of the property purchase. The following is a sample URLA-1003, and our intention is to extract information from this structured document. Because this is a form, we use the AnalyzeDocument API with a feature type of FORM.

Image shows a sample of a Unified Residential Loan Application URLA-1003 form

The FORM feature type extracts form information from the document, which is then returned in key-value pair format. The following code snippet uses the amazon-textract-textractor Python library to extract form information with just a few lines of code. The convenience method call_textract() calls the AnalyzeDocument API internally, and the parameters passed to the method abstract some of the configurations that the API needs to run the extraction task. Document is a convenience method used to help parse the JSON response from the API. It provides a high-level abstraction and makes the API output iterable and easy to get information out of. For more information, refer to Textract Response Parser and Textractor.

from textractcaller.t_call import call_textract, Textract_Features
from trp import Document

response_urla_1003 = call_textract(input_document='s3://<your-bucket>/URLA-1003.pdf', 
                                   features=[Textract_Features.FORMS])
doc_urla_1003 = Document(response_urla_1003)
for page in doc_urla_1003.pages:
    forms=[]
    for field in page.form.fields:
        obj={}
        obj[f'{field.key}']=f'{field.value}'
        forms.append(obj)
print(json.dumps(forms, indent=4))

Note that the output contains values for check boxes or radio buttons that exist in the form. For example, in the sample URLA-1003 document, the Purchase option was selected. The corresponding output for the radio button is extracted as “Purchase” (key) and “SELECTED” (value), indicating that radio button was selected.

[
    { "No. of Units": "1" },
    { "Amount": "$ 450,000.00" },
    { "Year Built": "2010" },
    { "Purchase": "SELECTED" },
    { "Title will be held in what Name(s)": "Alejandro Rosalez" },
    { "Fixed Rate": "SELECTED" },
    ...
]

Extract data from 1099 forms

A mortgage application packet may also contain a number of IRS documents, such as 1099-DIV, 1099-INT, 1099-MISC, and 1099-R. These documents show the applicant’s earnings via interests, dividends, and other miscellaneous income components that are useful during underwriting to make decisions. The following image shows a collection of these documents, which are similar in structure. However, in some instances, the documents contain form information (marked using the red and green bounding boxes) as well as tabular information (marked by the yellow bounding boxes).

Image shows samples of 1099 INT, DIV, MISC, and R forms.

To extract form information, we use similar code as explained earlier with the AnalyzeDocument API. We pass an additional feature of TABLE to the API to indicate that we need both form and table data extracted from the document. The following code snippet uses the AnalyzeDocument API with FORMS and TABLES features on the 1099-INT document:

from textractcaller.t_call import call_textract, Textract_Features
from trp import Document
response_1099_int = call_textract(input_document='s3://<your-bucket>/1099-INT-2018.pdf',
                                  features=[Textract_Features.TABLES, 
                                            Textract_Features.FORMS])
doc_1099_int = Document(response_1099_int)
num_tables=1
for page in doc_1099_int.pages:     
    for table in page.tables:
        num_tables=num_tables+1
        for r, row in enumerate(table.rows):
            for c, cell in enumerate(row.cells):
                print(f"Cell[{r}][{c}] = {cell.text}")
        print('n')

Because the document contains a single table, the output of the code is as follows:

Table 1
-------------------
Cell[0][0] = 15 State 
Cell[0][1] = 16 State identification no. 
Cell[0][2] = 17 State tax withheld 
Cell[1][0] = 
Cell[1][1] = 34564 
Cell[1][2] = $ 2000 
Cell[2][0] = 
Cell[2][1] = 23543 
Cell[2][2] = $ 1000

The table information contains the cell position (row 0, column 0, and so on) and the corresponding text within each cell. We use a convenience method that can transform this table data into easy-to-read grid view:

from textractprettyprinter.t_pretty_print import Textract_Pretty_Print, get_string, Pretty_Print_Table_Format
print(get_string(textract_json=response_1099_int, 
                 table_format=Pretty_Print_Table_Format.grid, 
                 output_type=[Textract_Pretty_Print.TABLES]))

We get the following output:

+----------+-----------------------------+-----------------------+
| 15 State | 16 State identification no. | 17 State tax withheld |
+----------+-----------------------------+-----------------------+
|          | 34564                       | $ 2000                |
+----------+-----------------------------+-----------------------+
|          | 23543                       | $ 1000                |
+----------+-----------------------------+-----------------------+

To get the output in an easy-to-consume CSV format, the format type of Pretty_Print_Table_Format.csv can be passed into the table_format parameter. Other formats such as TSV (tab separated values), HTML, and Latex are also supported. For more information, refer to Textract-PrettyPrinter.

Extract data from a mortgage note

A mortgage application packet may contain unstructured documents with dense text. Some examples of dense text documents are contracts and agreements. A mortgage note is an agreement between a mortgage applicant and the lender or mortgage company, and contains information in dense text paragraphs. In such cases, the lack of structure makes it difficult to find key business information that is important in the mortgage application process. There are two approaches to solving this problem:

In the following sample mortgage note, we’re specifically interested in finding out the monthly payment amount and principal amount.

Image shows a sample of a mortgage note document.

For the first approach, we use the Query and QueriesConfig convenience methods to configure a set of questions that is passed to the Amazon Textract AnalyzeDocument API call. In case the document is multi-page (PDF or TIFF), we can also specify the page numbers where Amazon Textract should look for answers to the question. The following code snippet demonstrates how to create the query configuration, make an API call, and subsequently parse the response to get the answers from the response:

from textractcaller import QueriesConfig, Query
import trp.trp2 as t2

#Setup the queries
query2 = Query(text="What is the principal amount borrower has to pay?", alias="PRINCIPAL_AMOUNT", pages=["1"])
query4 = Query(text="What is the monthly payment amount?", alias="MONTHLY_AMOUNT", pages=["1"])

#Setup the query config with the above queries
queries_config = QueriesConfig(queries=[query1, query2, query3, query4])
#Call AnalyzeDocument with the queries_config
response_mortgage_note = call_textract(input_document='s3://<your-bucket>/Mortgage-Note.pdf',
                                       features=[Textract_Features.QUERIES],
                                       queries_config=queries_config)
doc_mortgage_note: t2.TDocumentSchema = t2.TDocumentSchema().load(response_mortgage_note) 

entities = {}
for page in doc_mortgage_note.pages:
    query_answers = doc_mortgage_note.get_query_answers(page=page)
    if query_answers:
        for answer in query_answers:
            entities[answer[1]] = answer[2]
print(entities)

We get the following output:

{
    'PRINCIPAL_AMOUNT': '$ 555,000.00',
    'MONTHLY_AMOUNT': '$2,721.23',
}

For the second approach, we use the Amazon Comprehend DetectEntities API with the mortgage note, which returns the entities it detects within the text from a predefined set of entities. These are entities that the Amazon Comprehend entity recognizer is pre-trained with. However, because our requirement is to detect specific entities, an Amazon Comprehend custom entity recognizer is trained with a set of sample mortgage note documents, and a list of entities. We define the entity names as PRINCIPAL_AMOUNT and MONTHLY_AMOUNT. Training data is prepared following the Amazon Comprehend training data preparation guidelines for custom entity recognition. The entity recognizer can be trained with document annotations or with entity lists. For the purposes of this example, we use entity lists to train the model. After we train the model, we can deploy it with a real-time endpoint or in batch mode to detect the two entities from the document contents. The following are the steps involved to train a custom entity recognizer and deploy it. For a full code walkthrough, refer to our GitHub repository.

  1. Prepare the training data (the entity list and the documents with (UTF-8 encoded) plain text format).
  2. Start the entity recognizer training using the CreateEntityRecognizer API using the training data.
  3. Deploy the trained model with a real-time endpoint using the CreateEndpoint API.

Extract data from a US passport

The Amazon Textract analyze identity documents capability can detect and extract information from US-based ID documents such as a driver’s license and passport. The AnalyzeID API is capable of detecting and interpreting implied fields in ID documents, which makes it easy to extract specific information from the document. Identity documents are almost always part of a mortgage application packet, because it’s used to verify the identity of the borrower during the underwriting process, and to validate the correctness of the borrower’s biographical data.

Image shows a sample of a US passport

We use a convenience method named call_textract_analyzeid, which calls the AnalyzeID API internally. We then iterate over the response to obtain the detected key-value pairs from the ID document. See the following code:

from textractcaller import call_textract_analyzeid
import trp.trp2_analyzeid as t2id

response_passport = call_textract_analyzeid(document_pages=['s3://<your-bucket>/Passport.pdf'])
doc_passport: t2id.TAnalyzeIdDocument = t2id.TAnalyzeIdDocumentSchema().load(response_passport)

for id_docs in response_passport['IdentityDocuments']:
    id_doc_kvs={}
    for field in id_docs['IdentityDocumentFields']:
        if field['ValueDetection']['Text']:
            id_doc_kvs[field['Type']['Text']] = field['ValueDetection']['Text']
print(id_doc_kvs)

AnalyzeID returns information in a structure called IdentityDocumentFields, which contains the normalized keys and their corresponding value. For example, in the following output, FIRST_NAME is a normalized key and the value is ALEJANDRO. In the example passport image, the field for the first name is labeled as “Given Names / Prénoms / Nombre,” however AnalyzeID was able to normalize that into the key name FIRST_NAME. For a list of supported normalized fields, refer to Identity Documentation Response Objects.

{
    'FIRST_NAME': 'ALEJANDRO',
    'LAST_NAME': 'ROSALEZ',
    'DOCUMENT_NUMBER': '918268822',
    'EXPIRATION_DATE': '31 JAN 2029',
    'DATE_OF_BIRTH': '15 APR 1990',
    'DATE_OF_ISSUE': '29 JAN 2009',
    'ID_TYPE': 'PASSPORT',
    'ENDORSEMENTS': 'SEE PAGE 27',
    'PLACE_OF_BIRTH': 'TEXAS U.S.A.'
}

A mortgage packet may contain several other documents, such as a paystub, W2 form, bank statement, credit card statement, and employment verification letter. We have samples for each of these documents along with the code required to extract data from them. For the complete code base, check out the notebooks in our GitHub repository.

Document enrichment

One of the most common forms of document enrichment is sensitive or confidential information redaction on documents, which may be mandated due to privacy laws or regulations. For example, a mortgage applicant’s paystub may contain sensitive PII data, such as name, address, and SSN, that may need redaction for extended storage.

In the preceding sample paystub document, we perform redaction of PII data such as SSN, name, bank account number, and dates. To identify PII data in a document, we use the Amazon Comprehend PII detection capability via the DetectPIIEntities API. This API inspects the content of the document to identify the presence of PII information. Because this API requires input in UTF-8 encoded plain text format, we first extract the text from the document using the Amazon Textract DetectDocumentText API, which returns the text from the document and also returns geometry information such as bounding box dimensions and coordinates. A combination of both outputs is then used to draw redactions on the document as part of the enrichment process.

Review, validate, and integrate data

Extracted data from the document extraction phase may need validation against specific business rules. Specific information may also be validated across several documents, also known as cross-doc validation. An example of cross-doc validation could be comparing the applicant’s name in the ID document to the name in the mortgage application document. You can also do other validations such as property value estimations and conditional underwriting decisions in this phase.

A third type of validation is related to the confidence score of the extracted data in the document extraction phase. Amazon Textract and Amazon Comprehend return a confidence score for forms, tables, text data, and entities detected. You can configure a confidence score threshold to ensure that only correct values are being sent downstream. This is achieved via Amazon A2I, which compares the confidence scores of detected data with a predefined confidence threshold. If the threshold isn’t met, the document and the extracted output is routed to a human for review through an intuitive UI. The reviewer takes corrective action on the data and saves it for further processing. For more information, refer to Core Concepts of Amazon A2I.

Conclusion

In this post, we discussed the phases of intelligent document processing as it relates to phases of a mortgage application. We looked at a few common examples of documents that can be found in a mortgage application packet. We also discussed ways of extracting and processing structured, semi-structured, and unstructured content from these documents. IDP provides a way to automate end-to-end mortgage document processing that can be scaled to millions of documents, enhancing the quality of application decisions, reducing costs, and serving customers faster.

As a next step, you can try out the code samples and notebooks in our GitHub repository. To learn more about how IDP can help your document processing workloads, visit Automate data processing from documents.


About the authors

Anjan Biswas is a Senior AI Services Solutions Architect with focus on AI/ML and Data Analytics. Anjan is part of the world-wide AI services team and works with customers to help them understand, and develop solutions to business problems with AI and ML. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations and is actively helping customers get started and scale on AWS AI services.

Dwiti Pathak is a Senior Technical Account Manager based out of San Diego. She is focused on helping Semiconductor industry engage in AWS. In her spare time, she likes reading about new technologies and playing board games.

Balaji Puli is a Solutions Architect based in Bay Area, CA. Currently helping select Northwest U.S healthcare life sciences customers accelerate their AWS cloud adoption. Balaji enjoys traveling and loves to explore different cuisines.

Read More

Achieve low-latency hosting for decision tree-based ML models on NVIDIA Triton Inference Server on Amazon SageMaker

Machine learning (ML) model deployments can have very demanding performance and latency requirements for businesses today. Use cases such as fraud detection and ad placement are examples where milliseconds matter and are critical to business success. Strict service level agreements (SLAs) need to be met, and a typical request may require multiple steps such as preprocessing, data transformation, model selection logic, model aggregation, and postprocessing. At scale, this often means maintaining a huge volume of traffic while maintaining low latency. Common design patterns include serial inference pipelines, ensembles (scatter-gather), and business logic workflows, which result in realizing the entire workflow of the request as a Directed Acyclic Graph (DAG). However, as workflows get more complex, this can lead to an increase in overall response times, which in turn can negatively impact the end-user experience and jeopardize business goals. Triton can address these use cases where multiple models are composed in a pipeline with input and output tensors connected between them, helping you address these workloads.

As you evaluate your goals in relation to ML model inference, many options can be considered, but few are as capable and proven as Amazon SageMaker with Triton Inference Server. SageMaker with Triton Inference Server has been a popular choice for many customers because it’s purpose-built to maximize throughput and hardware utilization with ultra-low (single-digit milliseconds) inference latency. It has wide range of supported ML frameworks (including TensorFlow, PyTorch, ONNX, XGBoost, and NVIDIA TensorRT) and infrastructure backends, including NVIDIA GPUs, CPUs, and AWS Inferentia. Additionally, Triton Inference Server is integrated with SageMaker, a fully managed end-to-end ML service, providing real-time inference options for model hosting.

In this post, we walk through deploying a fraud detection ensemble workload to SageMaker with Triton Inference Server.

Solution overview

It’s essential for any project to have a list of requirements and an effort estimation, in order to approximate the total cost of the project. It’s important to estimate the return on investment (ROI) that supports the decision of an organization. Some considerations to take account when moving a workload to Triton include:

Effort estimation is key in software development, and its measurement is often based on incomplete, uncertain, and noisy inputs. ML workloads are no different. Multiple factors will affect an architecture for ML inference, some of which include:

  • Client-side latency budget – It specifies the client-side round-trip maximum acceptable waiting time for an inference response, commonly expressed in percentiles. For workloads that require a latency budget near tens of milliseconds, network transfers could become expensive, so using models at the edge would be a better fit.
  • Data payload distribution size – Payload, often referred to as message body, is the request data transmitted from the client to the model, as well as the response data transmitted from the model to the client. The payload size often has a major impact on latency and should be taken into consideration.
  • Data format – This specifies how the payload is sent to the ML model. Format can be human-readable, such as JSON and CSV, however there are also binary formats, which are often compressed and smaller in size. This is a trade-off between compression overhead and transfer size, meaning that CPU cycles and latency is added to compress or decompress, in order to save bytes transferred over the network. This post shows how to utilize both JSON and binary formats.
  • Software stack and components required – A stack is a collection of components that operate together to support an ML application, including operating system, runtimes, and software layers. Triton comes with built-in popular ML frameworks, called backends, such as ONNX, TensorFlow, FIL, OpenVINO, native Python, and others. You can also author a custom backend for your own homegrown components. This post goes over an XGBoost model and data preprocessing, which we migrate to the NVIDIA provided FIL and Python Triton backends, respectively.

All these factors should play a vital part in evaluating how your workloads perform, but in this use case we focus on the work needed to move your ML models to be hosted in SageMaker with Triton Inference Server. Specifically, we use an example of a fraud detection ensemble composed of an XGBoost model with preprocessing logic written in Python.

NVIDIA Triton Inference Server

Triton Inference Server has been designed from the ground up to enable teams to deploy, run, and scale trained AI models from any framework on GPU or CPU based infrastructure. In addition, it has been optimized to offer high-performance inference at scale with features like dynamic batching, concurrent runs, optimal model configuration, model ensemble, and support for streaming inputs.

The following diagram shows an example NVIDIA Triton ensemble pipeline.

Workloads should take into account the capabilities that Triton provides along with SageMaker hosting to maximize the benefits offered. For example, Triton supports both HTTP and gRPC protocols as well a C API, which allow for flexibility as well as payload optimization when needed. As previously mentioned, Triton supports several popular frameworks out of the box, including TensorFlow, PyTorch, ONNX, XGBoost, and NVIDIA TensorRT. These frameworks are supported through Triton backends, and in the rare event that a backend doesn’t support your use case, Triton allows you to implement your own and integrate it easily.

The following diagram shows an example of the NVIDIA Triton architecture.

NVIDIA Triton on SageMaker

SageMaker hosting services are the set of SageMaker features aimed at making model deployment and serving easier. It provides a variety of options to easily deploy, auto scale, monitor, and optimize ML models tailored for different use cases. This means that you can optimize your deployments for all types of usage patterns, from persistent and always available with serverless options, to transient, long-running, or batch inference needs.

Under the SageMaker hosting umbrella is also the set of SageMaker inference Deep Learning Containers (DLCs), which come prepackaged with the appropriate model server software for their corresponding supported ML framework. This enables you to achieve high inference performance with no model server setup, which is often the most complex technical aspect of model deployment and in general isn’t part of a data scientist’s skill set. Triton inference server is now available on SageMaker DLCs.

This breadth of options, modularity, and ease of use of different serving frameworks makes SageMaker and Triton a powerful match.

NVIDIA FIL backend support

With the 22.05 version release of Triton, NVIDIA now supports forest models trained by several popular ML frameworks, including XGBoost, LightGBM, Scikit-learn, and cuML. When using the FIL backend for Triton, you should ensure that the model artifacts that you provide are supported. For example, FIL supports model_type xgboost, xgboost_json, lightgbm, or treelite_checkpoint, indicating whether the provided model is in XGBoost binary format, XGBoost JSON format, LightGBM text format, or Treelite binary format, respectively.

This backend support is essential for us to use in our example because FIL supports XGBoost models. The only consideration to check is to ensure that the model that we deploy supports binary or JSON formats.

In addition to ensuring that you have the proper model format, other considerations should be taken. The FIL backend for Triton provides configurable options for developers to tune their workloads and optimize model run performance. The configuration dynamic_batching allows Triton to hold client-side requests and batch them on the server side, in order to efficiently use FIL’s parallel computation to inference the entire batch together. The option max_queue_delay_microseconds offers a fail-safe control of how long Triton waits to form a batch. FIL comes with Shapley explainer, which can be activated by the configuration treeshap_output; however, you should keep in mind that Shapley outputs hurt performance due to its output size. Another important aspect is storage_type in order to trade-off between memory footprint and runtime. For example, using storage as SPARSE can reduce the memory consumption, whereas DENSE can reduce your model run performance at the expense of higher memory usage. Deciding the best choice for each of these depends on your workload and your latency budget, so we recommend a deeper look into all options in the FIL backend FAQ and the list of configurations available in FIL.

Steps to host a model on triton

Let’s look at our fraud detection use case as an example of what to consider when moving a workload to Triton.

Identify your workload

In this use case, we have a fraud detection model used during the checkout process of a retail customer. The inference pipeline is using an XGBoost algorithm with preprocessing logic that includes data preparation for preprocessing.

Identify current and target performance metrics and other goals that may apply

You may find that your end-to-end inference time is taking too long to be acceptable. Your goal could be to go from tens of milliseconds of latency to single-digit latency for the same volume of requests and respective throughput. You determine that the bulk of the time is consumed by data preprocessing and the XGBoost model. Other factors such as network and payload size play a minimal role in the overhead associated with the end-to-end inference time.

Work backward to determine if Triton can host your workload based on your requirements

To determine if Triton can meet your requirements, you want to pay attention to two main areas of concern. The first is to ensure that Triton can serve with an acceptable front end option such as HTTP or C API.

As mentioned previously, it’s also critical to determine if Triton supports a backend that can serve your artifacts. Triton supports a number of backends that are tailor-made to support various frameworks like PyTorch and TensorFlow. Check to ensure that your models are supported and that you have the proper model format that Triton expects. To do this, first check to see what model formats the Triton backend supports. In many cases, this doesn’t require any changes for the model. In other cases, your model may require transformation to a different format. Depending on the source and target format, various options exist, such as transforming a Python pickle file to use Treelite’s binary checkpoint format.

For this use case, we determine the FIL backend can support the XGBoost model with no changes needed and that we can use the Python backend for the preprocessing. With the ensemble feature of Triton, you can further optimize your workload by avoiding costly network calls between hosting instances.

Create a plan and estimate the effort required to use Triton for hosting

Let’s talk about the plan to move your models to Triton. Every Triton deployment requires the following:

  • Model artifacts required by Triton backends
  • Triton configuration files
  • A model repository folder with the proper structure

We show an example of how to create these deployment dependencies later in this post.

Run the plan and validate the results

After you create the required files and artifacts in the properly structured model repository, you need to tune your deployment and test it to validate that you have now hit your target metrics.

At this point, you can use SageMaker Inference Recommender to determine what endpoint instance type is best for you based upon your requirements. In addition, Triton provides tools to make build optimizations to get better performance.

Implementation

Now let’s look at the implementation details. For this we have prepared two notebooks that provide an example of what can be expected. The first notebook shows the training of the given XGBoost model as well as the preprocessing logic that is used for both training and inference time. The second notebook shows how we prepare the artifacts needed for deployment on Triton.

The first notebook shows an existing notebook your organization has that uses the RAPIDS suite of libraries and the RAPIDS Conda kernel. This instance runs on a G4DN instance type provided by AWS, which is GPU accelerated by using NVIDIA T4 processors.

Preprocessing tasks in this example benefit from GPU acceleration and heavily use the cuML and cuDF libraries. An example of this is in the following code, where we show categorical label encoding using cuML. We also generate a label_encoders.pkl file that we can use to serialize the encoders and use them for preprocessing during inference time.

The first notebook concludes by training our XGBoost model and saving the artifacts accordingly.

In this scenario, the training code already existed and no changes are needed for the model at training time. Additionally, although we used GPU acceleration for preprocessing during training, we plan to use CPUs for preprocessing at inference time. We explain more later in the post.

Let’s now move on to the second notebook and recall what we need for a successful Triton deployment.

First, we need the model artifacts required by backends. The files that we need to create for this ensemble include:

  • Preprocessing artifacts (model.py, label_encoders.pkl)
  • XGBoost model artifacts (xgboost.json)

The Python backend in Triton requires us to use a Conda environment as a dependency. In this case, we use the Python backend to preprocess the raw data before feeding it into the XGBoost model being run in the FIL backend. Even though we originally used RAPIDS cuDF and cuML libraries to do the data preprocessing (as referenced earlier using our GPU), here we use Pandas and Scikit-learn as preprocessing dependencies for inference time (using our CPU). We do this for three reasons:

  • To show how to create a Conda environment for your dependencies and how to package it in the format expected by Triton’s Python backend.
  • By showing the preprocessing model running in the Python backend on the CPU while the XGBoost model runs on the GPU in the FIL backend, we illustrate how each model in Triton’s ensemble pipeline can run on a different framework backend, and run on different hardware with different configurations.
  • It highlights how the RAPIDS libraries (cuDF, cuML) are compatible with their CPU counterparts (Pandas, Scikit-learn). This way, we can show how LabelEncoders created in cuML can be used in Scikit-learn and vice-versa. Note that if you expect to preprocess large amounts of tabular data during inference time, you can still use RAPIDS to GPU-accelerate it.

Recall that we created the label_encoders.pkl file in the first notebook. There’s nothing more to do for category encoding other than include it in our model.py file for preprocessing.

To create the model.py file required by the Triton Python backend, we adhere to the formatting required by the backend and include our Python logic to process the incoming tensor and use the label encoder referenced earlier. You can review the file used for preprocessing.

For the XGBoost model, nothing more needs to be done. We trained the model in the first notebook and Triton’s FIL backend requires no additional effort for XGBoost models.

Next, we need the Triton configuration files. Each model in the Triton ensemble requires a config.pbtxt file. In addition, we also create a config.pbtxt file for the ensemble as a whole. These files allow Triton to know metadata about the ensemble with information such as the inputs and outputs we expect as well as help defining the DAG associated with the ensemble.

Lastly, to deploy a model on Triton, we need our model repository folder to have the proper folder structure. Triton has specific requirements for model repository layout. Within the top-level model repository directory, each model has its own sub-directory containing the information for the corresponding model. Each model directory in Triton must have at least one numeric sub-directory representing a version of the model. For our use case, the resulting structure should look like the following.

After we have these three prerequisites, we create a compressed file as packaging for deployment and upload it to Amazon Simple Storage Service (Amazon S3).

We can now create a SageMaker model from the model repository we uploaded to Amazon S3 in the previous step.

In this step, we also provide the additional environment variable SAGEMAKER_TRITON_DEFAULT_MODEL_NAME, which specifies the name of the model to be loaded by Triton. The value of this key should match the folder name in the model package uploaded to Amazon S3. This variable is optional in the case of a single model. In the case of ensemble models, this key has to be specified for Triton to start up in SageMaker.

Additionally, you can set SAGEMAKER_TRITON_BUFFER_MANAGER_THREAD_COUNT and SAGEMAKER_TRITON_THREAD_COUNT for optimizing the thread counts. Both configuration values help tune the number of threads that are running on your CPUs, so you can possibly gain better utilization by increasing these values for CPUs with a greater number of cores. In the majority of cases, the default values often work well, but it may be worth experimenting see if further efficiency can be gained for your workloads.

With the preceding model, we create an endpoint configuration where we can specify the type and number of instances we want in the endpoint.

Lastly, we use the preceding endpoint configuration to create a new SageMaker endpoint and wait for the deployment to finish. The status changes to InService after the deployment is successful.

That’s it! Your endpoint is now ready for testing and validation. At this point, you may want to use various tools to help optimize your instance types and configuration to get the best possible performance. The following figure provides an example of the gains that can be achieved by using the FIL backend for an XGBoost model on Triton.

Summary

In this post, we walked you through deploying an XGBoost ensemble workload to SageMaker with Triton Inference Server. Moving workloads to Triton on SageMaker can be a highly beneficial return on investment. As with any adoption of technology, a vetting process and plan are key, and we detailed a five-step process to guide you through what to consider when moving your workloads. In addition, we dove deep into the steps needed to deploy an ensemble that uses using Python preprocessing and an XGBoost model on Triton on SageMaker.

SageMaker provides the tools to remove the undifferentiated heavy lifting from each stage of the ML lifecycle, thereby facilitating the rapid experimentation and exploration needed to fully optimize your model deployments. SageMaker hosting support for Triton Inference Server enables low-latency, high transactions per second (TPS) workloads.

We highly recommend evaluating Triton Inference Server on SageMaker hosting for your inference needs; it can be well worth the effort to move your existing models to take advantage of this technology.

You can find the notebooks used for this example on GitHub.


About the author

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In his spare time he enjoys seeking out new cultures, new experiences,  and staying up to date with the latest technology trends.

Jiahong Liu is a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Kshitiz Gupta is a Solutions Architect at NVIDIA. He enjoys educating cloud customers about the GPU AI technologies NVIDIA has to offer and assisting them with accelerating their machine learning and deep learning applications. Outside of work, he enjoys running, hiking and wildlife watching.

Bruno Aguiar de Melo is a Software Development Engineer at Amazon.com, where he helps science teams to build, deploy and release ML workloads. He is interested in instrumentation and controllable aspects within the ML modelling/design phase that must be considered and measured with the insight that model execution performance is just as important as model quality performance, particularly in latency constrained use cases. In his spare time, he enjoys wine, board games and cooking.

Eliuth Triana is a Developer Relations Manager at NVIDIA. He connects Amazon and AWS product leaders, developers, and scientists with NVIDIA technologists and product leaders to accelerate Amazon ML/DL workloads, EC2 products, and AWS AI services. In addition, Eliuth is a passionate mountain biker, skier, and poker player.

Read More

Build a multi-lingual document translation workflow with domain-specific and language-specific customization

In the digital world, providing information in a local language isn’t novel, but it can be a tedious and expensive task. Advancements in machine learning (ML) and natural language processing (NLP) have made this task much easier and less expensive.

We have seen increased adoption of ML for multi-lingual data and document processing workloads. Enterprise and government customers are migrating their manual translation workloads to take advantage of automated ML translation services. Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation between several thousand language pairings that can be used for synchronous (real-time) or asynchronous translation tasks. For a complete list of available translation pairs, refer to Supported languages and language codes.

Customers migrating and modernizing their translation workloads need the ability to customize translations for their business domain. A translation workload may also need the ability to adapt to regional language dialects or usage. For example, the Spanish translation of “elderly” is anciano(a) but in Puerto Rico the word envejeciente is preferred.

In this post, we demonstrate how to incorporate Amazon Translate’s Active Custom Translation (ACT) feature. We propose a solution to create a multi-lingual document translation workflow with domain- and language-specific customizations that you can review and augment as needed to continuously improve results and delight end-users.

Solution overview

ACT produces custom-translated output without the need to build and maintain a custom translation model. Using ACT, Amazon Translate will use your preferred translation examples as parallel data to customize your translation result, eliminating the time and cost of required to build and train a new machine learning model.

The solution covered in this post explains how to create a human-in-the-loop workflow using Amazon Augmented AI (Amazon A2I) to continuously improve the customized translation. Amazon A2I provides a simple way to integrate human oversight into your ML workflows, with no ML experience required. Amazon A2I makes it straightforward to integrate human judgement and AI into any ML application, regardless of whether it’s run on AWS or on another platform.

For more information refer to Designing human review workflows with Amazon Translate and Amazon Augmented AI post.

The following diagram displays the command flow and data flow of the solution. The command flow shows the logical sequence of events in the workflow. A data flow indicates how data is being created or used by various components in the solution.

The following sequence diagram shows two separate processes in the solution: the translation workflow (A) and the process to update parallel data (B).

The translation workflow is initiated by an Amazon CloudWatch scheduled event which starts the Translation Job Invoker AWS Lambda function. This function creates an asynchronous translation job in Amazon Translate, passing along the document to translate and the location of the parallel data to customize the translation. The translation job reads the parallel data, performs the translation, and writes the translated result back to an Amazon S3 bucket. As of this writing, only asynchronous translation jobs can use parallel data.

When the translation job is complete, an event is generated that triggers the Translation Job Completion Handler Lambda function. This function creates a human workflow loop—the main component of the Amazon A2I portion of the workflow.

Human reviewers assess the translation and accept or modify the translation. Any corrections are used to update the translated document and also added to a customization dictionary. When the review is finalized, another event is generated to trigger the Workflow Completion Handler function. This function writes the latest translated document back to Amazon S3. The customization data is used to update an Amazon DynamoDB table with the source and translated text pairs.

To close the loop, we must incorporate this customization data stored in DynamoDB back into the parallel data stored in Amazon S3. To accomplish this, we use a scheduled CloudWatch event to trigger the Parallel Data Refresher function, which reads the data from the DynamoDB table, reformats it as parallel data, and updates the S3 bucket, storing the parallel data.

Deploy the solution with AWS CloudFormation

Launch the provided AWS CloudFormation template to deploy the solution in your account. This stack only works in the us-east-1 Region. If you want to deploy this solution in other Regions, refer to the following GitHub repo.

  1. Choose Launch Stack:
  2. Follow the instructions to populate the necessary parameters. If you’re running this stack for the first time, SNS Email is the only required parameter.
  3. On the Review page, in the Capabilities section, select the check box and choose Create stack.

The stack creates the following key components:

  • Customization data – A DynamoDB table (translate_parallel_data) to maintain the customization data. You migrate the existing customization data to this table. This table is used to continuously add and update customizations.
  • Parallel Data Refresher – The Lambda function to convert the customization data in the DynamoDB table to a parallel data format—CSV, TSV, or TMX—and store it in Amazon S3. It creates and updates parallel data with the new parallel data file in Amazon S3.
  • Translation Job Invoker – The Lambda function to start the Amazon Translate batch job with parallel data.
  • Translation Job Completion Handler – This Lambda function is triggered when the Amazon Translate batch job is complete. The function creates one human loop per document (we’ll refine this in the future to create a human loop only for a select percentage of documents processed). It uses the original and translated documents to create the human loop.
  • Amazon A2I customized template – This template is used to render the translation pair for human review. The template has the Add option for every translation segment. Users can select this option to add the corrections to the customization data. The new customization data is used in the next batch translation job.
  • Workflow Completion Handler – This Lambda function is triggered when the human workflow is complete. The function updates the translated document with corrections and checks for parallel data updates. New parallel data is added to the DynamoDB table.
  • Amazon A2I private team – An Amazon A2I private team is created with a human worker using the email provided. Initial credentials are emailed upon successful creation of the private team. You use this email and credential to log in to the Amazon A2I worker portal.

Test the solution

The sample_text.txt file would have been created under the input prefix of the S3 bucket created by the stack. We use this file for our testing. It contains the following content:

Life insurance companies have the freedom to charge different premiums based on risk
factors that predict mortality. Purchasing a life insurance policy often entails a health 
status check or medical exam, and asking for vaccination status is not banned.

Health insurers are a different story. A slew of state and federal regulations in the 
last three decades have heavily restricted their ability to use health factors in issuing 
or pricing polices. The use of health status in any group health insurance policy is 
prohibited by law. The Affordable Care Act, passed in 2014, prevents insurers from pricing 
plans according to health – with one exception: smoking status.

To test the solution, complete the following steps:

  1. Invoke the Translation Job Invoker function manually, or wait for it to be triggered by CloudWatch based on the cron schedule you specified.
    This function triggers the Amazon Translate batch job. You can observe the progress of the job on the Amazon Translate console.
    This batch job takes approximately 30 minutes to complete. When it’s complete, the TextTranslationJob state change event triggers the Translation Job Completion Handler function. This function creates one human loop per translated document.
  2. Navigate to the Amazon A2I workforces page.
  3. Choose the Private tab.
  4. Log in to the Amazon A2I worker portal by choosing the link for Labelling portal sign-in URL.
  5. Select the task Human review task in the jobs list.
  6. Choose Start working.

    You can see the following page displayed.
  7. Follow the instructions to make domain- and language-specific corrections.
    In the preceding screenshot, the phrase “The use of health status in any group health insurance policy is prohibited by law” has been translated to “La ley prohíbe el uso del estado de salud en cualquier póliza de seguro médico de grupo.” Although the translation is accurate, the phrases have been rearranged.
  8. Let’s modify this to “El uso del estado de salud en cualquier póliza de seguro de salud grupal está prohibido por ley” to make this a more direct translation reflecting the original phraseology.
  9. Select Add to add this to the dictionary.
  10. When you’re done, choose Submit.

This triggers the Workflow Completion Handler function, and the customization data is updated in the DynamoDB table. The function also stores the corrected translation under the post-edits prefix.

You can observe the customizations being added to translate_parallel_data table on the DynamoDB console.

Command flow

The Parallel Data Refresher function is triggered every hour by a CloudWatch scheduled event. This function checks for new updates in the translate_parallel_data table, creates a new parallel data TMX file in Amazon S3 under the parallel_data prefix, and updates the Amazon Translate parallel data component. You can trigger this function manually if you don’t want to wait for the scheduled event trigger.

You can observe the parallel data being updated on the Amazon Translate console.

When it’s complete, the job status should be Active and the value for Updated records should reflect the number of customizations you added (in this case 1).

Now we can run the translation job again with the updated data. Trigger the Translation Job Invoker function again to observe the customization being added to the translation in the second iteration. Amazon Translate now uses the parallel data provided to customize the translation.

You can observe the change in the translation output in the labeling portal. Instead of the default translation, we see the customized translation being applied.

This workflow helps create a virtuous cycle to continuously improve translation output using Amazon A2I and Amazon Translate customization features.

Cost

With Amazon Translate and Amazon A2I, you pay as you go based on the number of text characters that you processed and for each human-reviewed object. We use DynamoDB on-demand mode for this example. DynamoDB charges you for the reads and writes performed on your tables. Refer to the pricing pages for Amazon Translate, Amazon A2I, and Amazon DynamoDB for actual costs.

Clean up

When you’re finished experimenting with this solution, clean up your resources by using the AWS CloudFormation console to delete all the resources deployed in this example. This helps you avoid continuing costs in your account.

Conclusion

You can use the solution presented in this post to build a multi-lingual translation workflow that uses and augments domain-specific customization incrementally to continuously improve translation results. We provided a simple mechanism to integrate your existing customization assets with managed AI services like Amazon Translate and Amazon A2I to build a robust translation service for your application. Amazon Translate can help you scale this solution to support over 5,550 translation pairs out of the box. Amazon A2I can help you easily integrate with your in-house linguistic expert or take advantage of an external workforce to scale the solution.

For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and refer to AWS Translate FAQs. Please share your thoughts with us in the comments section, or in the issues section of the project’s Github repository.


About the Authors

Sathya Balakrishnan is a Sr Customer Delivery Architect in the Professional Services team at AWS, specializing in Data/ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.

Paul W. Joireman is a Sr Customer Delivery Architect in Professional Services at AWS, specializing in Application Migration and working with US federal financial clients. Paul enjoys creating technology solutions, traveling with family and hiking in the Shenandoah National Park, as long as the hike finishes at a local craft brewery.

Read More

AWS Deep Learning Challenge sees innovative and impactful use of Amazon EC2 DL1 instances

In the AWS Deep Learning Challenge held from January 5, 2022, to March 1, 2022, participants from academia, startups, and enterprise organizations joined to test their skills and train a deep learning model of their choice using Amazon Elastic Compute Cloud (Amazon EC2) DL1 instances and Habana’s SynapseAI SDK. The EC2 DL1 instances powered by Gaudi accelerators from Habana Labs, an Intel company, are designed specifically for training deep learning models. Participants were able to realize the significant price/performance benefits that DL1 offers over GPU-based instances.

We are excited to announce the winners and showcase some of the machine learning (ML) models that were trained in this hackathon. You will learn about some of the deep learning use cases that are supported by EC2 DL1 instances, including computer vision, natural language processing, and acoustic modeling.

Winning models

Our first-place winner is a project submitted by Gustavo Zomer. It’s an implementation of multi-lingual CLIP (Contrastive Language-Image Pre-Training). CLIP was introduced by OpenAI in 2021 as a way to train a more generalizable image classifier across larger datasets through self-supervised learning. It’s trained on a large set of images with a wide variety of natural language supervision that’s abundantly available on the internet, but is limited to the English language. This project replaces the text encoder in CLIP with a multi-lingual text encoder called XLM-RoBERTa to broaden the model’s applicability to multiple languages. This modified implementation of CLIP is able to pair images with captions across multiple languages. The model was trained on 16 accelerators across two DL1 instances, showing how ML training can be scaled to use multiple Gaudi accelerators across multiple nodes to increase training throughput and reduce the time to train. The judges were impressed by the impactful use of deep learning to break down language barriers, and the technical implementation, which used distributed training.

In second place, we have a project submitted by Remco van Akker. It uses a GAN (Generative Adversarial Network) to generate synthetic retinal image data for medical applications. Synthetic data is used in model training in medical applications to overcome the scarcity of annotated medical data, which is labor-intensive and costly to produce. Synthetic data can be used as part of data augmentation to remove biases and make vision models in medical applications more generalizable. This project stood out because it implemented a generative model on DL1 to solve a real-world problem impacting the application of AI and ML in healthcare.

Rounding out our top three was a project submitted by Zohar Jackson that implemented a vision transformer model for semantic segmentation. This project uses the Ray Tune library to fine-tune hyperparameters and uses Horovod to parallelize training on 16 Gaudi accelerators across two DL1 instances.

In addition to the top three winners, participants won several other prizes, including best technical implementation, highest potential impact, and most creative project. We offer our congratulations to all the winners of this hackathon for building such a diverse set of impactful projects on Gaudi accelerator-based EC2 DL1 instances. We can’t wait to see what our participants will continue to build on DL1 instances going forward.

Get started with DL1 instances

As demonstrated by the various projects in this hackathon, you can use EC2 DL1 instances to train deep learning models for use cases such as natural language processing, object detection, and image recognition. With DL1 instances, you also get up to 40% better price/performance for training deep learning models compared to current generation GPU-based EC2 instances. Visit Amazon EC2 DL1 Instances to learn more about how DL1 instances can accelerate your training workloads.


About the authors

Dvij Bajpai is a Senior Product Manager at AWS. He works on developing EC2 instances for workloads in machine learning and high-performance computing.

Amr Ragab is a Principal Solutions Architect at AWS. He provides technical guidance to help customers run complex computational workloads at scale.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt EC2 accelerated computing infrastructure for their machine learning needs.

Read More

Conduct what-if analyses with Amazon Forecast, up to 80% faster than before

Now with Amazon Forecast, you can seamlessly conduct what-if analyses up to 80% faster to analyze and quantify the potential impact of business levers on your demand forecasts. Forecast is a service that uses machine learning (ML) to generate accurate demand forecasts, without requiring any ML experience. Simulating scenarios through what-if analyses is a powerful business tool to navigate through the uncertainty of future events by capturing possible outcomes from hypothetical scenarios. It’s a common practice to assess the impact of business decisions on revenue or profitability, quantify the risk associated with market trends, evaluate how to organize logistics and workforce to meet customer demand, and much more.

Conducting a what-if analysis for demand forecasting can be challenging because you first need accurate models to forecast demand and then a quick and easy way to reproduce the forecast across a range of scenarios. Until now, although Forecast provided accurate demand forecasts, conducting what-if analysis using Forecast could be cumbersome and time-consuming. For example, retail promotion planning is a common application of what-if analysis to identify the optimal price point for a product to maximize the revenue. Previously on Forecast, you had to prepare and import a new input file for each scenario you wanted to test. If you wanted to test three different price points, you first had to create three new input files by manually transforming the data offline and then importing each file into Forecast separately. In effect, you were doing the same set of tasks for each and every scenario. Additionally, to compare scenarios, you had to download the prediction from each scenario individually and then merge them offline.

With today’s launch, you can easily conduct what-if analysis up to 80% faster. We have made it easy to create new scenarios by removing the need for offline data manipulation and import for each scenario. Now, you can define a scenario by transforming your initial dataset through simple operations, such as multiplying the price for product A by 90% or decreasing the price for product B by $10. These transformations can also be combined with conditions to control the parameters that the scenario applies in (for example, reducing product A’s price in one location only). With this launch, you can define and run multiple scenarios of the same type of analysis (such as promotion analysis) or different types of analyses (such as promotion analysis in geographical region 1 and inventory planning in geographical region 2) simultaneously. Lastly, you no longer need to merge and compare results of scenarios offline. Now, you can view the forecast predictions across all scenarios in the same graph or bulk export the data for offline review.

Solution overview

The steps in this post demonstrate how to use what-if analysis on the AWS Management Console. To directly use Forecast APIs for what-if analysis, follow the notebook in our GitHub repo that provides an analogous demonstration.

Import your training data

To conduct a what-if analysis, you must import two CSV files representing the target time series data (showing the prediction target) and the related time series data (showing attributes that impact the target). Our example target time series file contains the product item ID, timestamp, demand, store ID, city, and region, and our related time series file contains the product item ID, store ID, timestamp, city, region, and price.

To import your data, complete the following steps:

  1. On the Forecast console, choose View dataset groups.
Figure 1: View dataset group on the Amazon Forecast home page

Figure 1: View dataset group on the Amazon Forecast home page

  1. Choose Create dataset group.
Figure 2: Creating a dataset group

Figure 2: Creating a dataset group

  1. For Dataset group name, enter a dataset name (for this post, my_company_consumer_sales_history).
  2. For Forecasting domain, choose a forecasting domain (for this post, Retail).
  3. Choose Next.
Figure 3: Provide a dataset name and select your forecasting domain

Figure 3: Provide a dataset name and select your forecasting domain

  1. On the Create target time series dataset page, provide the dataset name, frequency of your data, and data schema
  2. Provide the dataset import details.
  3. Choose Start.

The following screenshot shows the information for the target time series page filled out for our example.

Figure 4: Sample information filled out for the target time series data import page

Figure 4: Sample information filled out for the target time series data import page

You will be taken to the dashboard that you can use to track progress.

  1. To import the related time series file, on the dashboard, choose Import.
Figure 5: Dashboard that allows you to track progress

Figure 5: Dashboard that allows you to track progress

  1. On the Create related time series dataset page, provide the dataset name and data schema.
  2. Provide the dataset import details.
  3. Choose Start.

The following screenshot shows the information filled out for our example.

Figure 6: Sample information filled out for the related time series data import page

Figure 6: Sample information filled out for the related time series data import page

Train a predictor

Next, we train a predictor.

  1. On the dashboard, choose Train predictor.
Figure 7: Dashboard of completed dataset import step and button to train a predictor

Figure 7: Dashboard of completed dataset import step and button to train a predictor

  1. On the Train predictor page, enter a name for your predictor, how long in the future you want to forecast and at what frequency, and the number of quantiles you want to forecast for.
  2. Enable AutoPredictor – this is required to use what-if analysis.
  3. Choose Create.

The following screenshot shows the information filled out for our example.

Figure 8: Sample information filled out to train a predictor

Figure 8: Sample information filled out to train a predictor

Create a forecast

After our predictor is trained (this can take approximately 2.5 hours), we create a forecast. You will know that your predictor is trained when you see the View Predictors button on your dashboard.

  1. Choose Create a forecast on the Dashboard

Figure 9: Dashboard of completed train predictor step and button to create a forecast

  1. On the Create a forecast page, enter a forecast name, choose the predictor that you created, and specify the forecast quantiles (optional) and the items to generate a forecast for.
  2. Choose Start.
Figure 10: Sample information filled out to create a forecast

Figure 10: Sample information filled out to create a forecast

After you complete these steps, you have successfully created a forecast. This represents your baseline forecast scenario that you use to do what-if analyses on.

If you need more help creating your baseline forecasts, refer to Getting Started (Console). We now move to the next steps of conducting a what-if analysis.

Create a what-if analysis

At this point, we have created our baseline forecast and will start the walkthrough of how to conduct a what-if analysis. There are three stages to conducting a what-if analysis: setting up the analysis, creating the what-if forecast by defining what is changed in the scenario, and comparing the results.

  1. To set up your analysis, choose Explore what-if analysis on the dashboard.
Figure 11: Dashboard of complete create forecast step and button to start what-if analysis

Figure 11: Dashboard of complete create forecast step and button to start what-if analysis

  1. Choose Create.
Figure 12: Page to create a new what-if analysis

Figure 12: Page to create a new what-if analysis

  1. Enter a unique name and select the baseline forecast on the drop-down menu.
  2. Choose the items in your dataset you want to conduct a what-if analysis for. You have two options:
    1. Select all items is the default, which we choose in this post.
    2. If you want to pick specific items, choose Select items with a file and import a CSV file containing the unique identifier for the corresponding item and any associated dimension (such as region).
  3. Choose Create what-if analysis.
Figure 13: Option to specify items to conduct what-if analysis for and button to create the analysis

Figure 13: Option to specify items to conduct what-if analysis for and button to create the analysis

Create a what-if forecast

Next, we create a what-if forecast to define the scenario we want to analyze.

  1. Choose Create.

Figure 14: Creating a what-if forecast

  1. Enter a name of your scenario.

You can define your scenario through two options:

  • Use transformation functions – Use the transformation builder to transform the related time series data you imported. For this walkthrough, we evaluate how the demand an item in our dataset changes when the price is reduced by 10% and then by 30% when compared to the price in the baseline forecast.
  • Define the what-if forecast with a replacement dataset – Replace the related time series dataset you imported.
Figure 15: Options to define a scenario

Figure 15: Options to define a scenario

The transformation function builder provides the capability to transform the related time series data you imported earlier through simple operations to add, subtract, divide, and multiply features in your data (for example price) by a value you specify. For our example, we create a scenario where we reduce the price by 10%, and price is a feature in the dataset.

  1. For What-if forecast definition method, select Use transformation functions.
  2. Choose Multiply as our operator, price as our time series, and enter 0.9.
Figure 16: Using the transformation builder to reduce price by 10%

Figure 16: Using the transformation builder to reduce price by 10%

You can also add conditions to further refine your scenario. For example if your dataset contained store information organized by region, you could limit the price reduction scenario by region. You could define a scenario of a 10% price reduction that’s applicable to stores not in Region_1.

  1. Choose Add condition.
  2. Choose Not equals as the operation and enter Region_1.
Figure 17: Using the transformation builder to reduce price by 10% for stores that are not in region 1

Figure 17: Using the transformation builder to reduce price by 10% for stores that are not in region 1

Another option to modify your related time series is by importing a new dataset that already contains the data defining the scenario. For example, to define a scenario with 10% price reduction, we can upload a new dataset specifying the unique identifier for the items that are changing and the price change that is 10% lower. To do so, select Define the what-if forecast with a replacement dataset and import a CSV containing the price change.

Figure 18: Importing a replacement dataset to define a new scenario

Figure 18: Importing a replacement dataset to define a new scenario

  1. To complete the what-if forecast definition, choose Create.
Figure 19: Completing the what-if forecast creation

Figure 19: Completing the what-if forecast creation

Repeat the process to create another what-if forecast with a 30% price reduction.

Figure 20: Showing the completed run of the two what-if forecasts

After the what-if analysis has run for each what-if forecast, the status will change to active. This concludes the second stage, and you can move on to comparing the what-if forecasts.

Compare the forecasts

We can now compare the what-if forecasts for both our scenarios, comparing a 10% price reduction with a 30% price reduction.

  1. On the analysis insights page, navigate to the Compare what-if forecasts section.

Figure 21: Inputs required to compare what-if forecasts

  1. For item_id, enter the item to analyze.
  2. For What-if forecasts, choose the scenarios to compare (for this post, Scenario_1 and Scenario_2).
  3. Choose Compare what-if.
Figure 22: button to generate what-if forecast comparison graph

Figure 22: button to generate what-if forecast comparison graph

The following graph shows the resulting demand in both our scenarios.

Figure 23: What-if forecast comparison for scenario 1 and 2

Figure 23: What-if forecast comparison for scenario 1 and 2

By default, it showcases the P50 and the base case scenario. You can view all quantiles generated by selecting your preferred quantiles on the Choose forecasts drop-down menu.

Export your data

To export your data to CSV, complete the following steps:

  1. Choose Create export.

Figure 24: Creating a what-if forecast export

  1. Enter a name for your export file (for this post, my_scenario_export)
  2. Specify the scenarios to be exported by selecting the scenarios on the What-If Forecast drop-down menu. You can export multiple scenarios at once in a combined file.
  3. For Export location, specify the Amazon Simple Storage Service (Amazon S3) location.
  4. To begin the export, choose Create Export.
Figure 25: specifying the scenario information and export location for the bulk export

Figure 25: specifying the scenario information and export location for the bulk export

  1. To download the export, first navigate to S3 file path location from the AWS Management Console and the select the file and choose the download button. The export file will contain the timestamp, item ID, dimensions, and the forecasts for each quantile for all scenarios selected (including the base scenario).

Conclusion

Scenario analysis is a critical tool to help navigate through the uncertainties of business. It provides foresight and a mechanism to stress-test ideas, leaving businesses more resilient, better prepared, and in control of their future. Forecast now supports forecasting what-if scenario analyses. To conduct your scenario analysis, open the Forecast console and follow the steps outlined in this post, or refer to our GitHub notebook on how to access the functionality via API.

To learn more, refer to the CreateWhatIfAnalysis page in the developer guide.


About the authors

Brandon Nair is a Sr. Product Manager for Amazon Forecast. His professional interest lies in creating scalable machine learning services and applications. Outside of work he can be found exploring national parks, perfecting his golf swing or planning an adventure trip.

Akhil Raj Azhikodan is a Software Development Engineer working on Amazon Forecast. His interests are in designing and building reliable systems that solve complex customer problems. Outside of work, he enjoys learning about history, hiking and playing video games.

Conner Smith is a Software Development Engineer working on Amazon Forecast. He focuses on building secure, scalable distributed systems that provide value to customers. Outside of work he spends time reading fiction, playing guitar, and watching random YouTube videos.

Shannon Killingsworth is the UX Designer for Amazon Forecast. He has been improving the user experience in Forecast for two years by simplifying processes as well as adding new features in ways that make sense to our users. Outside of work he enjoys running, drawing, and reading.

Read More