Specify and extract information from documents using the new Queries feature in Amazon Textract

Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract now offers the flexibility to specify the data you need to extract from documents using the new Queries feature within the Analyze Document API. You don’t need to know the structure of the data in the document (table, form, implied field, nested data) or worry about variations across document versions and formats.

In this post, we discuss the following topics:

  • Success stories from AWS customers and benefits of the new Queries feature
  • How the Analyze Document Queries API helps extract information from documents
  • A walkthrough of the Amazon Textract console
  • Code examples to utilize the Analyze Document Queries API
  • How to process the response with the Amazon Textract parser library

Benefits of the new Queries feature

Traditional OCR solutions struggle to extract data accurately from most semi-structured and unstructured documents because of significant variations in how the data is laid out across multiple versions and formats of these documents. You need to implement custom postprocessing code or manually review the extracted information from these documents. With the Queries feature, you can specify the information you need in the form of natural language questions (for example, “What is the customer name”) and receive the exact information (“John Doe”) as part of the API response. The feature uses a combination of visual, spatial, and language models to extract the information you seek with high accuracy. The Queries feature is pre-trained on a large variety of semi-structured and unstructured documents. Some examples include paystubs, bank statements, W-2s, loan application forms, mortgage notes, and vaccine and insurance cards.

Amazon Textract enables us to automate the document processing needs of our customers. With the Queries feature, we will be able to extract data from a variety of documents with even greater flexibility and accuracy,said Robert Jansen, Chief Executive Officer at TekStream Solutions. “We see this as a big productivity win for our business customers, who will be able to use the Queries capability as part of our IDP solution to quickly get key information out of their documents.

Amazon Textract enables us to extract text as well as structured elements like Forms and Tables from images with high accuracy. Amazon Textract Queries has helped us drastically improve the quality of information extraction from several business-critical documents such as safety data sheets or material specificationssaid Thorsten Warnecke, Principal | Head of PC Analytics, Camelot Management Consultants. “The natural language query system offers great flexibility and accuracy which has reduced our post-processing load and enabled us to add new documents to our data extraction tools quicker.

How the Analyze Document Queries API helps extract information from documents

Companies have increased their adoption of digital platforms, especially in light of the COVID-19 pandemic. Most organizations now offer a digital way to acquire their services and products utilizing smartphones and other mobile devices, which offers flexibility to users but also adds to the scale at which digital documents need to be reviewed, processed, and analyzed. In some workloads where, for example, mortgage documents, vaccination cards, paystubs, insurance cards, and other documents must be digitally analyzed, the complexity of data extraction can become exponentially aggravated because these documents lack a standard format or have significant variations in data format across different versions of the document.

Even powerful OCR solutions struggle to extract data accurately from these documents, and you may have to implement custom postprocessing for these documents. This includes mapping possible variations of form keys to customer-native field names or including custom machine learning to identify specific information in an unstructured document.

The new Analyze Document Queries API in Amazon Textract can take natural language written questions such as “What is the interest rate?” and perform powerful AI and ML analysis on the document to figure out the desired information and extract it from the document without any postprocessing. The Queries feature doesn’t require any custom model training or setting up of templates or configurations. You can quickly get started by uploading your documents and specifying questions on those documents via the Amazon Textract console, the AWS Command Line Interface (AWS CLI), or AWS SDK.

In subsequent sections of this post, we go through detailed examples of how to use this new functionality on common workload use cases and how to use the Analyze Document Queries API to add agility to the process of digitalizing your workload.

Use the Queries feature on the Amazon Textract console

Before we get started with the API and code samples, let’s review the Amazon Textract console. The following image shows an example of a vaccination card on the Queries tab for the Analyze Document API on the Amazon Textract console. After you upload the document to the Amazon Textract console, choose Queries in the Configure Document section. You can then add queries in the form of natural language questions. After you add all your queries, choose Apply Configuration. The answers to the questions are located on the Queries tab.

Code examples

In this section, we explain how to invoke the Analyze Document API with the Queries parameter to get answers to natural language questions about the document. The input document is either in a byte array format or located in an Amazon Simple Storage Service (Amazon S3) bucket. You pass image bytes to an Amazon Textract API operation by using the Bytes property. For example, you can use the Bytes property to pass a document loaded from a local file system. Image bytes passed by using the Bytes property must be base64 encoded. Your code might not need to encode document file bytes if you’re using an AWS SDK to call Amazon Textract API operations. Alternatively, you can pass images stored in an S3 bucket to an Amazon Textract API operation by using the S3Object property. Documents stored in an S3 bucket don’t need to be base64 encoded.

You can use the Queries feature to get answers from different types of documents like paystubs, vaccination cards, mortgage documents, bank statements, W-2 forms, 1099 forms, and others. In the following sections, we go over some of these documents and show how the Queries feature works.

Paystub

In this example, we walk through the steps to analyze a paystub using the Queries feature, as shown in the following example image.

We use the following sample Python code:

import boto3
import json

#create a Textract Client
textract = boto3.client('textract')

image_filename = "paystub.jpg"

response = None
with open(image_filename, 'rb') as document:
    imageBytes = bytearray(document.read())

# Call Textract AnalyzeDocument by passing a document from local disk
response = textract.analyze_document(
    Document={'Bytes': imageBytes},
    FeatureTypes=["QUERIES"],
    QueriesConfig={
        "Queries": [{
            "Text": "What is the year to date gross pay",
            "Alias": "PAYSTUB_YTD_GROSS"
        },
        {
            "Text": "What is the current gross pay?",
            "Alias": "PAYSTUB_CURRENT_GROSS"
        }]
    })

The following code is a sample AWS CLI command:

aws textract analyze-document —document '{"S3Object":{"Bucket":"your-s3-bucket","Name":"paystub.jpg"}}' —feature-types '["QUERIES"]' —queries-config '{"Queries":[{"Text":"What is the year to date gross pay", "Alias": "PAYSTUB_YTD_GROSS"}]}' 

Let’s analyze the response we get for the two queries we passed to the Analyze Document API in the preceding example. The following response has been trimmed to only show the relevant parts:

{
         "BlockType":"QUERY",
         "Id":"cbbba2fa-45be-452b-895b-adda98053153", #id of first QUERY
         "Relationships":[
            {
               "Type":"ANSWER",
               "Ids":[
                  "f2db310c-eaa6-481d-8d18-db0785c33d38" #id of first QUERY_RESULT
               ]
            }
         ],
         "Query":{
            "Text":"What is the year to date gross pay", #First Query
            "Alias":"PAYSTUB_YTD_GROSS"
         }
      },
      {
         "BlockType":"QUERY_RESULT",
         "Confidence":87.0,
         "Text":"23,526.80", #Answer to the first Query
         "Geometry":{...},
         "Id":"f2db310c-eaa6-481d-8d18-db0785c33d38" #id of first QUERY_RESULT
      },
      {
         "BlockType":"QUERY",
         "Id":"4e2a17f0-154f-4847-954c-7c2bf2670c52", #id of second QUERY
         "Relationships":[
            {
               "Type":"ANSWER",
               "Ids":[
                  "350ab92c-4128-4aab-a78a-f1c6f6718959"#id of second QUERY_RESULT
               ]
            }
         ],
         "Query":{
            "Text":"What is the current gross pay?", #Second Query
            "Alias":"PAYSTUB_CURRENT_GROSS"
         }
      },
      {
         "BlockType":"QUERY_RESULT",
         "Confidence":95.0,
         "Text":"$ 452.43", #Answer to the Second Query
         "Geometry":{...},
         "Id":"350ab92c-4128-4aab-a78a-f1c6f6718959" #id of second QUERY_RESULT
      }

The response has a BlockType of QUERY that shows the question that was asked and a Relationships section that has the ID for the block that has the answer. The answer is in the BlockType of QUERY_RESULT. The alias that is passed in as an input to the Analyze Document API is returned as part of the response and can be used to label the answer.

We use the Amazon Textract Response Parser to extract just the questions, the alias, and the corresponding answers to those questions:

import trp.trp2 as t2

d = t2.TDocumentSchema().load(response)
page = d.pages[0]

# get_query_answers returns a list of [query, alias, answer]
query_answers = d.get_query_answers(page=page)
for x in query_answers:
    print(f"{image_filename},{x[1]},{x[2]}")

from tabulate import tabulate
print(tabulate(query_answers, tablefmt="github"))

The preceding code returns the following results:

|------------------------------------|-----------------------|-----------|
| What is the current gross pay?     | PAYSTUB_CURRENT_GROSS | $ 452.43  |
| What is the year to date gross pay | PAYSTUB_YTD_GROSS     | 23,526.80 |

More questions and the full code can be found in the notebook on the GitHub repo.

Mortgage note

The Analyze Document Queries API also works well with mortgage notes like the following.

The process to call the API and process results is the same as the previous example. You can find the full code example on the GitHub repo.

The following code shows the example responses obtained using the API:

|------------------------------------------------------------|----------------------------------|---------------|
| When is this document dated?                               | MORTGAGE_NOTE_DOCUMENT_DATE      | March 4, 2022 |
| What is the note date?                                     | MORTGAGE_NOTE_DATE               | March 4, 2022 |
| When is the Maturity date the borrower has to pay in full? | MORTGAGE_NOTE_MATURITY_DATE      | April, 2032   |
| What is the note city and state?                           | MORTGAGE_NOTE_CITY_STATE         | Anytown, ZZ   |
| what is the yearly interest rate?                          | MORTGAGE_NOTE_YEARLY_INTEREST    | 4.150%        |
| Who is the lender?                                         | MORTGAGE_NOTE_LENDER             | AnyCompany    |
| When does payments begin?                                  | MORTGAGE_NOTE_BEGIN_PAYMENTS     | April, 2022   |
| What is the beginning date of payment?                     | MORTGAGE_NOTE_BEGIN_DATE_PAYMENT | April, 2022   |
| What is the initial monthly payments?                      | MORTGAGE_NOTE_MONTHLY_PAYMENTS   | $ 2500        |
| What is the interest rate?                                 | MORTGAGE_NOTE_INTEREST_RATE      | 4.150%        |
| What is the principal amount borrower has to pay?          | MORTGAGE_NOTE_PRINCIPAL_PAYMENT  | $ 500,000     |

Vaccination card

The Amazon Textract Queries feature also works very well to extract information from vaccination cards or cards that resemble it, like in the following example.

The process to call the API and parse the results is the same as used for a paystub. After we process the response, we get the following information:

|------------------------------------------------------------|--------------------------------------|--------------|
| What is the patients first name                            | PATIENT_FIRST_NAME                   | Major        |
| What is the patients last name                             | PATIENT_LAST_NAME                    | Mary         |
| Which clinic site was the 1st dose COVID-19 administrated? | VACCINATION_FIRST_DOSE_CLINIC_SITE   | XYZ          |
| Who is the manufacturer for 1st dose of COVID-19?          | VACCINATION_FIRST_DOSE_MANUFACTURER  | Pfizer       |
| What is the date for the 2nd dose covid-19?                | VACCINATION_SECOND_DOSE_DATE         | 2/8/2021     |
| What is the patient number                                 | PATIENT_NUMBER                       | 012345abcd67 |
| Who is the manufacturer for 2nd dose of COVID-19?          | VACCINATION_SECOND_DOSE_MANUFACTURER | Pfizer       |
| Which clinic site was the 2nd dose covid-19 administrated? | VACCINATION_SECOND_DOSE_CLINIC_SITE  | CVS          |
| What is the lot number for 2nd dose covid-19?              | VACCINATION_SECOND_DOSE_LOT_NUMBER   | BB5678       |
| What is the date for the 1st dose covid-19?                | VACCINATION_FIRST_DOSE_DATE          | 1/18/21      |
| What is the lot number for 1st dose covid-19?              | VACCINATION_FIRST_DOSE_LOT_NUMBER    | AA1234       |
| What is the MI?                                            | MIDDLE_INITIAL                       | M            |

The full code can be found in the notebook on the GitHub repo.

Insurance card

The Queries feature also works well with insurance cards like the following.

The process to call the API and process results is the same as showed earlier. The full code example is available in the notebook on the GitHub repo.

The following are the example responses obtained using the API:

|-------------------------------------|-----------------------------------|---------------|
| What is the insured name?           | INSURANCE_CARD_NAME               | Jacob Michael |
| What is the level of benefits?      | INSURANCE_CARD_LEVEL_BENEFITS     | SILVER        |
| What is medical insurance provider? | INSURANCE_CARD_PROVIDER           | Anthem        |
| What is the OOP max?                | INSURANCE_CARD_OOP_MAX            | $6000/$12000  |
| What is the effective date?         | INSURANCE_CARD_EFFECTIVE_DATE     | 11/02/2021    |
| What is the office visit copay?     | INSURANCE_CARD_OFFICE_VISIT_COPAY | $55/0%        |
| What is the specialist visit copay? | INSURANCE_CARD_SPEC_VISIT_COPAY   | $65/0%        |
| What is the member id?              | INSURANCE_CARD_MEMBER_ID          | XZ 9147589652 |
| What is the plan type?              | INSURANCE_CARD_PLAN_TYPE          | Pathway X-EPO |
| What is the coinsurance amount?     | INSURANCE_CARD_COINSURANCE        | 30%           |

Best practices for crafting queries

When crafting your queries, consider the following best practices:

  • In general, ask a natural language question that starts with “What is,” “Where is,” or “Who is.” The exception is when you’re trying to extract standard key-value pairs, in which case you can pass the key name as a query.
  • Avoid ill-formed or grammatically incorrect questions, because these could result in unexpected answers. For example, an ill-formed query is “When?” whereas a well-formed query is “When was the first vaccine dose administered?”
  • Where possible, use words from the document to construct the query. Although the Queries feature tries to do acronym and synonym matching for some common industry terms such as “SSN,” “tax ID,” and “Social Security number,” using language directly from the document improves results. For example, if the document says “job progress,” try to avoid using variations like “project progress,” “program progress,” or “job status.”
  • Construct a query that contains words from both the row header and column header. For example, in the preceding vaccination card example, in order to know the date of the second vaccination, you can frame the query as “What date was the 2nd dose administered?”
  • Long answers increase response latency and can lead to timeouts. Try to ask questions that respond with answers fewer than 100 words.
  • Passing only the key name as the question works when trying to extract standard key-value pairs from a form. We recommend framing full questions for all other extraction use cases.
  • Be as specific as possible. For example:
    • When the document contains multiple sections (such as “Borrower” and “Co-Borrower”) and both sections have a field called “SSN,” ask “What is the SSN for Borrower?” and “What is the SSN for Co-Borrower?”
    • When the document has multiple date-related fields, be specific in the query language and ask “What is the date the document was signed on?” or “What is the date of birth of the application?” Avoid asking ambiguous questions like “What is the date?”
  • If you know the layout of the document beforehand, give location hints to improve accuracy of results. For example, ask “What is the date at the top?” or “What is the date on the left?” or “What is the date at the bottom?”

For more information about the Queries feature, refer to [link to documentation].

Conclusion

In this post, we provided an overview of the new Queries feature of Amazon Textract to quickly and easily retrieve information from documents such as paystubs, mortgage notes, insurance cards, and vaccination cards based on natural language questions. We also described how you can parse the response JSON.

For more information, see Analyzing Documents , or check out the Amazon Textract console and try out this feature.


About the Authors

Uday Narayanan is a Sr. Solutions Architect at AWS. He enjoys helping customers find innovative solutions to complex business challenges. His core areas of focus are data analytics, big data systems, and machine learning. In his spare time, he enjoys playing sports, binge-watching TV shows, and traveling.

Rafael Caixeta is a Sr. Solutions Architect at AWS based in California. He has over 10 years of experience developing architectures for the cloud. His core areas are serverless, containers, and machine learning. In his spare time, he enjoys reading fiction books and traveling the world.

Navneeth Nair is a Senior Product Manager, Technical with the Amazon Textract team. He is focused on building machine learning-based services for AWS customers.

Martin Schade is a Senior ML Product SA with the Amazon Textract team. He has over 20 years of experience with internet-related technologies, engineering, and architecting solutions. He joined AWS in 2014, first guiding some of the largest AWS customers on the most efficient and scalable use of AWS services, and later focused on AI/ML with a focus on computer vision. Currently, he’s obsessed with extracting information from documents.

Read More

Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra

Organizations use collaborative document authoring solutions like Salesforce Quip to embed real-time, collaborative documents inside Salesforce records. Quip is Salesforce’s productivity platform that transforms the way enterprises work together, delivering modern collaboration securely and simply across any device. A Quip repository captures invaluable organizational knowledge in the form of collaborative documents and workflows. However, finding this organizational knowledge easily and securely along with other document repositories, such as Box or Amazon Simple Storage Service (Amazon S3), can be challenging. Additionally, the conversational nature of collaborative workflows renders the traditional keyword-based approach to search ineffective due to having fragmented, dispersed information in multiple places.

We’re excited to announce that you can now use the Amazon Kendra connector for Quip to search messages and documents in your Quip repository. In this post, we show you how to find the information you need in your Quip repository using the intelligent search function of Amazon Kendra, powered by machine learning.

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to configure a Quip repository as a data source of a search index using the Amazon Kendra connector for Quip.

The following screenshot shows an example Quip repository.

The workspace in this example has a private folder that is not shared. That folder has a subfolder that is used to keep expense receipts. Another folder called example.com is shared with others and used to collaborate with the team. This folder has five subfolders that hold documentation for development.

To configure the Quip connector, we first note the domain name, folder IDs, and access token of the Quip repository. Then we simply create the Amazon Kendra index and add Quip as a data source.

Prerequisites

To get started using the Quip connector for Amazon Kendra, you must have a Quip repository.

Gather information from Quip

Before we set up the Quip data source, we need a few details about your repository. Let’s gather those in advance.

Domain name

Find out the domain name. For example , for the Quip URL https://example-com.quip.com/browse, the domain name is quip. Depending on how single sign-on (SSO) is set up in your organization, the domain name may vary. Save this domain name to use later.

Folder IDs

Folders in Quip have a unique ID associated with them. We need to configure the Quip connector to access the right folders by supplying the correct folder IDs. For this post, we index the folder example.com.

To find the ID of the folder, choose the folder. The URL changes to show the folder ID.

The folder ID in this case is xj1vOyaCGB3u. Make a list of the folder IDs to scan; we use these IDs when configuring the connector.

Access token

Log in to Quip and open https://{subdomain.domain}/dev/token in a web browser. In the following example, we navigate to https://example-com.quip.com/dev/token. Then choose Get Personal Access Token.

Copy the token to use in a later step.

We now have the information we need to configure the data source.

Create an Amazon Kendra index

To set up your Amazon Kendra index, complete the following steps:

  1. Sign in to the AWS Management Console and open the Amazon Kendra console.

If you’re using Amazon Kendra for the first time, you should see the following screenshot.

  1. Choose Create an index.
  2. For Index name, enter my-quip-example-index.
  3. For Description, enter an optional description.
  4. For IAM role, use an existing role or create a new one.
  5. Choose Next.
  6. Under Access control settings, select No to make all indexed content available to all users.
  7. For User-group expansion, select None.
  8. Choose Next.

For Provisioning editions, you can choose from two options depending on the volume of the content and frequency of access.

  1. For this post, select Developer edition.
  2. Choose Create.

Role creation takes approximately 30 seconds; index creation can take up to 30 minutes. When complete, you can view your index on the Amazon Kendra console.

Add Quip as a data source

Now let’s add Quip as a data source to the index.

  1. On the Amazon Kendra console, under Data management in the navigation pane, choose Data sources.
  2. Choose Add connector under Quip.
  3. For Data source name, enter my-quip-data-source.
  4. For Description, enter an optional description.
  5. Choose Next.
  6. Enter the Quip domain name that you saved earlier.
  7. Under Secrets, choose Create and add a new Secrets Manager secret.
  8. For Secret name, enter the name of your secret.
  9. For Quip token, enter the access token you saved earlier.
  10. Choose Save and add secret.
  11. Under IAM role, choose a role or create a new one.
  12. Choose Next.
  13. Under Sync scope, for Add folder IDs to crawl, enter the folder IDs you saved earlier.
  14. Under Sync run schedule¸ for Frequency, select Run on demand.
  15. Choose Next.

The Quip connector lets you capture additional fields like authors, categories, and folder names (and even rename as needed).

  1. For this post, we don’t configure any field mappings.
  2. Choose Next.
  3. Confirm all the options and add the data source.

Your data source is ready in a few minutes.

  1. When your data source is ready, choose Sync now.

Depending on the size of the data in the Quip repository, this process can take a few minutes to a few hours. Syncing is a two-step process. First, the documents are crawled to determine the ones to index. Then the selected documents are indexed. Some factors that affect sync speed include repository throughput and throttling, network bandwidth, and the size of documents.

The sync status shows as successful when the sync is complete. Your Quip repository is now connected.

Run a search in Amazon Kendra

Let’s test the connector by running a few searches.

  1. On the Amazon Kendra console, under Data management in the navigation pane, choose Search indexed content.
  2. Enter your search in the search field. For this post, we search for EC2 on Linux.

The following screenshot shows our results.

Limitations

There are some known limitations for the data source ingestion. Some limitations are due to the need for admin access for accessing some of the content, others due to specific implementation details. They are as follows:

  • Only full crawls are supported. If you want the connector to support changelog crawls, admin API access is required, and you have to enable the admin API on the Quip website.
  • Only shared folders are crawled. Even if we use the personal access token of an admin user, we can’t crawl data in the private folders of other users.
  • The solution doesn’t support specifying file types for inclusion and exclusion, because Quip doesn’t store the file type extension, just the file name.
  • Real-time events require a subscription and admin API access.

Conclusion

The Amazon Kendra connector for Quip enables organizations to make the invaluable information stored in Quip documents available to their users securely using intelligent search powered by Amazon Kendra. The connector also provides facets for Quip repository attributes such as authors, file type, source URI, creation dates, parent files, and category so users can interactively refine the search results based on what they’re looking for.

For more information on how you can create, modify, and delete data and metadata using custom document enrichment as content is ingested from the Quip repository, refer to Customizing document metadata during the ingestion process and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the Authors

Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.

Vikas Shah is an Enterprise Solutions Architect at Amazon web services. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His areas of interest are ML, IoT, robotics and storage. In his spare time, Vikas enjoys building robots, hiking, and traveling.

Read More

Integrate ServiceNow with Amazon Lex chatbot for ticket processing

Conversational interfaces (or chatbots) can provide an intuitive interface for processes such as creating and monitoring tickets. Let’s consider a situation in which a recent hire on your team is required to cut tickets for office equipment. To do so, they have to interact with a ticketing software that the organization uses. This often requires accessing the ticketing system, knowing which ticket to open, and then tracking the ticket manually through the process until completion. In this post, we show you how to integrate an Amazon Lex chatbot with ServiceNow . The bot will make it easier for creation and tracking of tickets for day-to-day activities such as issuing new office equipment for new hires. You can also integrate the experience into a customer support call to seamlessly create tickets for the callers.

Solution overview

The following diagram illustrates the solution workflow.

The solution includes the following steps:

  1. A user sends a message to create a ticket or get pending tickets in the queue through a Slack app.
  2. Slack forwards the message to be processed by Amazon Lex.
  3. Amazon Lex invokes fulfillment Lambda function:
    1. Amazon Lex sends the event to the fulfillment AWS Lambda function.
    2. AWS Lambda function processes the message and makes HTTP requests to the backend ServiceNow instance.
  4. Response is sent to the user:
    1. The ServiceNow instance returns a response to the fulfillment Lambda function.
    2. Fulfillment Lambda function returns the response to Amazon Lex bot based on Sentiment.
    3. Amazon Lex returns the response to the user through Slack bot.
    4. The user is able to see the response on the Slack bot and reply with another query.

To implement this architecture, you create the following:

  • A ServiceNow instance
  • The fulfillment Lambda function
  • An Amazon Lex bot
  • A Slack app

Prerequisites

Before getting started, make sure you have the following prerequisites:

Create the ServiceNow developer instance

To create your ServiceNow instance, complete the following steps:

  1. Sign up for a ServiceNow developer instance.

You receive an email with a personal sandbox environment in the format devNNNNN.service-now.com.

This step sends a verification email to the email that you used during the signup process.

  1. After you’re verified, you can sign in to your account.
  2. Enter your email and choose Next.

You’re asked if you need a developer oriented IDE or a guided experience.

  1. For this post, choose I need a guided experience.
  2. Select the check box to agree to the terms of service and choose Finish Setup.

You’re redirected to a page where you should be able to see that the instance is being set up.

When the instance is ready, you should be able to see the instance details.

  1. Note the instance URL, user name, and password, which you use in the following steps.

You need to log in as the system administrator user so you can view the ServiceNow incidents.

  1. Navigate to the following URL (replace https://devNNNNN.service-now.com with your own instance URL that you noted earlier): https://devNNNNN.service-now.com/nav_to.do?uri=change_request_list.do.
  2. Log in using the user name admin and the password you noted earlier.

You’re redirected to the ServiceNow console.

  1. Choose Incidents in the navigation pane.

The default search criteria should show you a sample incident.

  1. If you remove all the search criteria and choose Run, you should be able to see all the ServiceNow incidents available.

The following screenshot shows the search with no filters and the sample ServiceNow incidents.

Create the Lambda function

Now that you have set up a ServiceNow instance and logged in to check out the incidents, you’re ready to set up the solution. The first step is to create the Lambda function and configure environment variables for this function for storing the ServiceNow instance URL and credentials in a secure manner, and for the function to use the ServiceNow instance account.

Create the fulfillment Lambda function

In this step, you create a Lambda function that helps the Amazon Lex bot communicate with ServiceNow to create or describe the incidents, and have some logic to frame a response to Amazon Lex based on the sentiment analysis that Amazon Lex forwards to Lambda. To create your function, complete the following steps:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose Create function.
  3. Select Author from scratch.
  4. For Function name, enter a name (for this post, ChatBotLambda).
  5. For Runtime, choose Node.js 14x.

We use the latest Node.js runtime (as of this writing), but you can use your preferred runtime.

  1. For the function permissions, select Create a new role with basic Lambda permissions.
  2. Use the policy AWSLambdaBasicExecutionRole.

This execution role should be sufficient for this post. For more information, see AWS Lambda execution role.

  1. Choose Create function.
  2. After you create the function, you can use the inline editor to edit the code for index.js.

The following is sample code for the function that you’re using as the compute layer for our logic:

var https = require('https');
exports.handler = async (event, context) => {
    console.log('Received event:',JSON.stringify(event, null,2));
    var intent = event.sessionState.intent.name;
    var slots = event.sessionState.intent.slots;
    var ticketType = slots.ticketType.value.interpretedValue.toLowerCase();
    
    return new Promise((resolve, reject) => {
      if (intent == 'GetTicket') {
        for (var i=0; i<event.interpretations.length; i++){
          if (event.interpretations[i].intent.name == intent){
            var sentimentResponse = event.interpretations[i].sentimentResponse;
            if (sentimentResponse.sentiment != null) {
            var sentimentLabel = sentimentResponse.sentiment;
            }
          }
        }
        
        var ticketCount = slots.ticketCount.value.interpretedValue;
        getTickets(ticketType, ticketCount, sentimentLabel, resolve, reject);       // Get the records
      }
      else if (intent == 'LogTicket') {
        var shortDesc = slots.shortDesc.value.interpretedValue;
        logTicket(ticketType, shortDesc, resolve, reject);
      }
    });
};
// Get tickets from ServiceNow
//
function getTickets(recType, count, sentimentLabel, resolve, reject) {
  var snowInstance = process.env.SERVICENOW_HOST;
  console.log("sentimentLabel:-", sentimentLabel);
  var options = {
      hostname: snowInstance,
      port: 443,
      path: '/api/now/table/' + recType + '?sysparm_query=ORDERBYDESCsys_updated_on&sysparm_limit='+count,
      method: 'get',
      headers: {
        'Content-Type': 'application/json',
        Accept: 'application/json',
        Authorization: 'Basic ' + Buffer.from(process.env.SERVICENOW_USERNAME + ":" + process.env.SERVICENOW_PASSWORD).toString('base64'),
      }
  };
  var request = https.request(options, function(response) {
      var returnData = '';
      response.on('data', chunk => returnData += chunk);
      response.on('end', function() {
        var responseObj = JSON.parse(returnData);
        var speechText = "";
        if(responseObj.result){
          if (sentimentLabel == "NEGATIVE") {
            speechText =  "I am sorry you are having a bad day. Here are the " + count + " most recent incidents: ";
          }
          else {
            speechText =  "Here are the " + count + " most recent incidents: "; 
          }
          for (let i = 0; i < count; i++) {
            var rec_number = i + 1;
            speechText += "Record " + rec_number + " " + responseObj.result[i].short_description + ". ";
          }
          speechText += "End of tickets.";
          var retMsg = {
            "sessionState": {
              "dialogAction": {
                "type": "Close"
              },
              "intent": {
                "confirmationState": "Confirmed",
                "name": "GetTicket",
                "state": "Fulfilled",
              },
            },
            "messages": [
              {
                "contentType": "PlainText",
                "content": speechText,
                
              }
            ]
          };
          resolve(retMsg);
        }
        else{
          reject(JSON.parse('{"Error": "No tickets Found"}'));
        }
      });
      response.on('error', e => context.fail('error:' + e.message));
    });
    request.end();
}
function logTicket(recType, shortDesc, resolve, request) {
  var requestData = {
        "short_description": shortDesc,
        "created_by": 'me',
        "caller_id": 'me'
  };
  var postData = JSON.stringify(requestData);
  var options = {
        host: process.env.SERVICENOW_HOST,
        port: '443',
        path: '/api/now/table/' + recType,
        method: 'POST',
        headers: {
            'Content-Type': 'application/json',
            'Accept': 'application/json',
            'Authorization': 'Basic ' + Buffer.from(process.env.SERVICENOW_USERNAME + ":" + process.env.SERVICENOW_PASSWORD).toString('base64'),
            'Content-Length': Buffer.byteLength(postData)
        }
    };
    var request = https.request(options, function (res) {
        console.log("res:", res);
        var body = '';
        res.on('data', chunk => body += chunk);
        res.on('end', function() {
        var responseObj = JSON.parse(body);
        console.log("responseObj:", responseObj);
        var ticketNumber = responseObj.result.number;
        var ticketType = responseObj.result.sys_class_name;
          var retMsg = {
            "sessionState": {
              "dialogAction": {
                "type": "Close"
              },
              "intent": {
                "confirmationState": "Confirmed",
                "name": "LogTicket",
                "state": "Fulfilled",
              },
            },
            "messages": [
              {
                "contentType": "PlainText",
                "content": "Done! I've opened an " + ticketType + " ticket for you in ServiceNow. Your ticket number is: " + ticketNumber + "."
              }
            ]
          };  
          resolve(retMsg);
        });
        res.on('error', e => context.fail('error:' + e.message));
    });
    request.write(postData);
    request.end();
}

Before moving on to the next step, don’t forget to choose Deploy to deploy this code to the $LATEST version of the Lambda function.

Configure the fulfillment Lambda function

Next, you create the following environment variables with appropriate values. You use these variables to securely store the ServiceNow instance URL and credentials that the function uses to connect to the ServiceNow instance. Every time the user sends a message through the Amazon Lex bot to create or get incident tickets, this Lambda function is invoked to make a request to the ServiceNow instance to create or get the incidents. Therefore, it needs the instance URL and credentials in order to connect to the instance.

  • SERVICENOW_HOST – The domain name for the ServiceNow instance that you created earlier
  • SERVICENOW_USERNAME – The user name for the system administrator role (admin)
  • SERVICENOW_PASSWORD – The password that you received earlier

These variables are available on the Configuration tab, as shown in the following screenshot.

Create the Amazon Lex chatbot

Now that you have created the Lambda function, you create the conversational interface (the chatbot) using Amazon Lex. For this post, you build the chatbot IncidentBot to communicate with ServiceNow and read or create incident tickets to process the events. This type of bot can be created for organizations or businesses that have multiple interfaces to internal systems, ranging from HR to travel to support, which employees must memorize for bookmarks. The chatbot also performs sentiment analysis on the users’ messages sent via the bot and returns a response based on the sentiment detected.

You create two intents:

  • GetTicket – Gets the existing tickets from ServiceNow
  • LogTicket – Submits a new ticket, which creates a ServiceNow incident in our instance

This post uses the following conversation to model a bot:

  • User: Create an incident ticket to order a new laptop.
  • IncidentBot: Done! I’ve opened an incident ticket for you in ServiceNow. Your ticket number is: INC0010006.
  • User: List top 2 incident.
  • IncidentBot: Here are the 2 most recent incidents: Record 1 order a new laptop. Record 2 request access to ServiceNow. End of tickets.

The Lambda function that you set up earlier only works with Lex V2. If you’re using the V1 console, choose Try the new Lex V2 Console as shown in the following screenshot, or choose Switch to the new Lex V2 Console in the navigation pane.

Complete the following steps to create your chatbot:

  1. Download the file IncidentBot.zip.
  2. On the Amazon Lex console, choose Bots in the navigation pane.
  3. On the Action menu, choose Import.
  4. For Bot name, enter IncidentBot.
  5. For Input file¸ choose Browse file and choose the .zip file you downloaded.
  6. Select Create a role with basic Amazon Lex permissions.

This creates a new IAM role that the chatbot uses to make requests to other AWS services.

  1. In the section Children’s Online Privacy Protection Act (COPPA), select No (COPPA doesn’t apply to this example).
  2. Keep the remaining fields at their default and choose Create bot.
  3. When the bot is available, choose Aliases in the navigation pane to see the alias created for this bot.
  4. Choose the alias TestBotAlias to see the alias details.

As shown in the following screenshot, this chatbot just uses the language English (US).

To have an effective conversation, it’s important to understand the sentiment and respond appropriately. In a conversation, a simple acknowledgment when talking to an unhappy user might be helpful, such as, “I am sorry you are having a bad day.”

To achieve such a conversational flow with a bot, you have to detect the sentiment expressed by the user and react appropriately. Previously, you had to build a custom integration by using Amazon Comprehend APIs. As of this writing, you can determine the sentiment natively in Amazon Lex.

You can enable sentiment analysis on the Lex V2 bot by editing the alias.

  1. On the alias details page, choose Edit.
  2. Select Enable sentiment analysis and choose Confirm.

For this post, you analyze the messages that you receive from end-users in order to understand their mood and return the appropriate response, which is governed by the Lambda logic that uses the detected sentiment to change the response text accordingly.

  1. To add the function to the alias, on the alias details page, choose English (US).
  2. For Source, choose ChatBotLamba.
  3. For Lambda function version or alias, chose $LATEST.
  4. Choose Save.

You’re now ready to build the intent.

  1. In the navigation pane, choose Bot versions.
  2. Choose the draft version of your bot to see its details.
  3. Choose Intents in the navigation pane to explore the intents you created.
  4. To build the bot, choose Build.

Test the Amazon Lex bot

We test the following scenarios:

  • The user sends a message to create a new ServiceNow incident using the example utterance “create an incident ticket with request access to ServiceNow.”
  • The user retrieves the existing ServiceNow incidents using the utterance “list top 2 incident tickets.”
  • The user can also show negative sentiment in the message and retrieve the response accordingly using the utterance “what are the top 2 bad incident tickets.”

To test the bot, on the Intents page, choose Test.

As shown in the following screenshot, you created two incident tickets using the following utterances:

  • create an incident ticket with request access to service now
  • create an incident ticket with order a new laptop

This creates two tickets in the ServiceNow instance.

Now let’s retrieve the last two tickets using the utterance “list top 2 incident tickets.”

You can test sentiment analysis as shown in the following screenshot, in which the bot responds to a negative sentiment.

Create a Slack app and integrate Slack with the bot

You can integrate the Amazon Lex bot with various web or mobile applications and client-side codes, in addition to popular platforms like Facebook Messenger, Slack, Kik, and Twilio SMS. For this post, you create a Slack app and integrate your Amazon Lex bot with Slack. For instructions, see Integrating an Amazon Lex Bot with Slack.

An added benefit is that the chabot can determine the sentiment of the user and respond accordingly. The real-time sentiment analysis gives the supervisors the feedback that they need in an organic and automated way without requiring any separate process for feedback collection. The sentiment analysis can be used by supervisors to track negative sentiment on tickets created by users and can also be used to return the response accordingly from our Lambda function– for example different responses for negative sentiment v/s positive or neutral sentiment.

You should now be able to use your Slack app to send messages to the Amazon Lex bot and retrieve the same responses as you tested earlier. The following screenshot shows the same messages tested on the Slack app, with the same results.

Congratulations! You just built an incident bot using Amazon Lex with sentiment analysis that integrates with ServiceNow.

Clean up

To avoid incurring future charges, delete the resources that you created and clean up your account.

You can clean up the AWS environment using the following steps:

  1. On the Lex V2 console, choose Bots in the navigation pane to see a list of all your Lex V2 bots.
  2. Select the bot you created and on the Actions menu, choose Delete.
  3. On the Lambda console, choose Functions in the navigation pane.
  4. Select the function you created and on the Actions menu, choose Delete.

Conclusion

This post showed how you can integrate Amazon Lex bot with ServiceNow incident management and a Slack app.  You can integrate the same experience  to create and manage tickets as part of your customer support calls. For more information about incorporating these techniques into your bots, see the Lex V2 Developer Guide.


About the Authors

Chanki Nathani is a Cloud Application Architect for AWS Professional Services. As an architect, he supports customers with architecting, designing, automating and building new applications, as well as migrating existing applications to AWS. He is passionate about Cloud and Serverless Technologies. In his spare time, he enjoys traveling and blogging about food from different places.

Vaibhav Chaddha is a Machine Learning Engineer with AWS Professional Services. He spends his time helping customers design and implement solutions using Amazon ML services, to address their business challenges.

Read More

Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker

Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech signals to text, also referred as speech-to-text applications.

This technology has matured in recent years, and many of the latest models can achieve a very good performance, such as transformer-based models Wav2Vec2 and Speech2Text. Transformer is a sequence-to-sequence deep learning architecture originally proposed for machine translation. Now it’s extended to solve all kinds of natural language processing (NLP) tasks, such as text classification, text summarization, and ASR. The transformer architecture yields very good model performance and results in various NLP tasks; however, the models’ sizes (the number of parameters) as well as the amount of data they’re pre-trained on increase exponentially when pursuing better performance. It becomes very time-consuming and costly to train a transformer from scratch, for example training a BERT model from scratch could take 4 days and cost $6,912 (for more information, see The Staggering Cost of Training SOTA AI Models). Hugging Face, an AI company, provides an open-source platform where developers can share and reuse thousands of pre-trained transformer models. With the transfer learning technique, you can fine-tune your model with a small set of labeled data for a target use case. This reduces the overall compute cost, speeds up the development lifecycle, and lessens the carbon footprint of the community.

AWS announced collaboration with Hugging Face in 2021. Developers can easily work with Hugging Face models on Amazon SageMaker and benefit from both worlds. You can fine-tune and optimize all models from Hugging Face, and SageMaker provides managed training and inference services that offer high performance resources and high scalability via Amazon SageMaker distributed training libraries. This collaboration can help you accelerate your NLP tasks’ productization journey and realize business benefits.

This post shows how to use SageMaker to easily fine-tune the latest Wav2Vec2 model from Hugging Face, and then deploy the model with a custom-defined inference process to a SageMaker managed inference endpoint. Finally, you can test the model performance with sample audio clips, and review the corresponding transcription as output.

Wav2Vec2 background

Wav2Vec2 is a transformer-based architecture for ASR tasks and was released in September 2020. The following diagram shows its simplified architecture. For more details, see the original paper. As the diagram shows, the model is composed of a multi-layer convolutional network (CNN) as a feature extractor, which takes an input audio signal and outputs audio representations, also considered as features. They are fed into a transformer network to generate contextualized representations. This part of training can be self-supervised; the transformer can be trained with unlabeled speech and learn from it. Then the model is fine-tuned on labeled data with the Connectionist Temporal Classification (CTC) algorithm for specific ASR tasks. The base model we use in this post is Wav2Vec2-Base-960h, fine-tuned on 960 hours of Librispeech on 16 kHz sampled speech audio.

CTC is a character-based algorithm. During training, it’s able to demarcate each character of the transcription in the speech automatically, so the timeframe alignment isn’t required between audio signal and transcription. For example, if the audio clip says “Hello World,” we don’t need to know in which second the word “hello” is located. It saves a lot of labeling effort for ASR use cases. For more information about how the algorithm works, refer to Sequence Modeling With CTC.

Solution overview

In this post, we use the SUPERB (Speech processing Universal PERformance Benchmark) dataset available from the Hugging Face Datasets library, and fine-tune the Wav2Vec2 model and deploy it as a SageMaker endpoint for real-time inference for an ASR task. SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks.

The following diagram provides a high-level view of the solution workflow.

First, we show how to load and preprocess the SUPERB dataset in a SageMaker environment in order to obtain a tokenizer and feature extractor, which are required for fine-tuning the Wav2Vec2 model. Then we use SageMaker Script Mode for training and inference steps, which allows you to define and use custom training and inference scripts, and SageMaker provides supported Hugging Face framework Docker containers. For more information about training and serving Hugging Face models on SageMaker, see Use Hugging Face with Amazon SageMaker. This functionality is available through the development of Hugging Face AWS Deep Learning Containers (DLCs).

The notebook and code from this post are available on GitHub. The notebook is tested in both Amazon SageMaker Studio and SageMaker notebook environments.

Data preprocessing

In this section, we walk through the steps to preprocess the data.

Process the dataset

In this post we use SUPERB dataset, which you can load from the Hugging Face Datasets library directly using the load_dataset function. The SUPERB dataset also includes speaker_id and chapter_id; we remove these columns and only keep audio files and transcriptions to fine-tune the Wav2Vec2 model for an ASR task, which transcribes speech to text. To speed up the fine-tuning process for this example, we only take the test dataset from the original dataset, then split it into train and test datasets. See the following code:

data = load_dataset("superb", 'asr', ignore_verifications=True) 
data = data.remove_columns(['speaker_id', 'chapter_id', 'id'])
# reduce the data volume for this example. only take the test data from the original dataset for fine-tune
data = data['test'] 

train_test = data.train_test_split(test_size=0.2)
dataset = DatasetDict({
    'train': train_test['train'],
    'test': train_test['test']})

After we process the data, the dataset structure is as follows:

DatasetDict({
    train: Dataset({
        features: ['file', 'audio', 'text'],
        num_rows: 2096
    })
    test: Dataset({
        features: ['file', 'audio', 'text'],
        num_rows: 524
    })
})

Let’s print one data point from the train dataset and examine the information in each feature. ‘file’ is the audio file path where it’s saved and cached in the local repository. ‘audio’ contains three components: ‘path’ is the same as ‘file’, ‘array’ is the numerical representation of the raw waveform of the audio file in NumPy array format, and ‘sampling_rate’ shows the number of samples of audio recorded every second. ‘text’ is the transcript of the audio file.

print(dataset['train'][0])
result: 
{ {'file': '/root/.cache/huggingface/datasets/downloads/extracted/e0f3d50e856945385982ba36b58615b72eef9b2ba5a2565bdcc225b70f495eed/LibriSpeech/test-clean/7021/85628/7021-85628-0000.flac',
 'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/e0f3d50e856945385982ba36b58615b72eef9b2ba5a2565bdcc225b70f495eed/LibriSpeech/test-clean/7021/85628/7021-85628-0000.flac',
  'array': array([-0.00018311, -0.00024414, -0.00018311, ...,  0.00061035,
          0.00064087,  0.00061035], dtype=float32),
  'sampling_rate': 16000},
 'text': 'but anders cared nothing about that'}

Build a vocabulary file

The Wav2Vec2 model uses the CTC algorithm to train deep neural networks in sequence problems, and its output is a single letter or blank. It uses a character-based tokenizer. Therefore, we extract distinct letters from the dataset and build the vocabulary file using the following code:

def extract_characters(batch):
  texts = " ".join(batch["text"])
  vocab = list(set(texts))
  return {"vocab": [vocab], "texts": [texts]}

vocabs = dataset.map(extract_characters, batched=True, batch_size=-1, 
                   keep_in_memory=True, remove_columns= dataset.column_names["train"])

vocab_list = list(set(vocabs["train"]["vocab"][0]) | set(vocabs["test"]["vocab"][0]))
vocab_dict = {v: k for k, v in enumerate(vocab_list)}
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]

vocab_dict["[UNK]"] = len(vocab_dict) # add "unknown" token 
vocab_dict["[PAD]"] = len(vocab_dict) # add a padding token that corresponds to CTC's "blank token"

with open('vocab.json', 'w') as vocab_file:
    json.dump(vocab_dict, vocab_file)

Create a tokenizer and feature extractor

The Wav2Vec2 model contains a tokenizer and feature extractor. In this step, we use the vocab.json file that we created from the previous step to create the Wav2Vec2CTCTokenizer. We use Wav2Vec2FeatureExtractor to make sure that the dataset used in fine-tuning has the same audio sampling rate as the dataset used for pre-training. Finally, we create a Wav2Vec2 processor that can wrap the feature extractor and the tokenizer into one single processor. See the following code:

# create Wav2Vec2 tokenizer
tokenizer = Wav2Vec2CTCTokenizer("vocab.json", unk_token="[UNK]",
                                  pad_token="[PAD]", word_delimiter_token="|")

# create Wav2Vec2 feature extractor
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=16000, 
                                             padding_value=0.0, do_normalize=True, return_attention_mask=False)
# create a processor pipeline 
processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)

Prepare the train and test datasets

Next, we extract the array representation of the audio files and its sampling_rate from the dataset and process them using the processor, in order to have train and test data that can be consumed by the model:

# extract the numerical representation from the dataset
def extract_array_samplingrate(batch):
    batch["speech"] = batch['audio']['array'].tolist()
    batch["sampling_rate"] = batch['audio']['sampling_rate']
    batch["target_text"] = batch["text"]
    return batch

dataset = dataset.map(extract_array_samplingrate, 
                      remove_columns=dataset.column_names["train"])

# process the dataset with processor pipeline that created above
def process_dataset(batch):  
    batch["input_values"] = processor(batch["speech"], 
                            sampling_rate=batch["sampling_rate"][0]).input_values

    with processor.as_target_processor():
        batch["labels"] = processor(batch["target_text"]).input_ids
    return batch

data_processed = dataset.map(process_dataset, 
                    remove_columns=dataset.column_names["train"], batch_size=8, 
                    batched=True)

train_dataset = data_processed['train']
test_dataset = data_processed['test']

Then we upload the train and test data to Amazon Simple Storage Service (Amazon S3) using the following code:

from datasets.filesystems import S3FileSystem
s3 = S3FileSystem()

# save train_dataset to s3
training_input_path = f's3://{BUCKET}/{PREFIX}/train'
train_dataset.save_to_disk(training_input_path,fs=s3)

# save test_dataset to s3
test_input_path = f's3://{BUCKET}/{PREFIX}/test'
test_dataset.save_to_disk(test_input_path,fs=s3)

Fine-tune the Hugging Face model (Wav2Vec2)

We use SageMaker Hugging Face DLC script mode to construct the training and inference job, which allows you to write custom training and serving code and using Hugging Face framework containers that are maintained and supported by AWS.

When we create a training job using the script mode, the entry_point script, hyperparameters, its dependencies (inside requirements.txt), and input data (train and test datasets) are copied into the container. Then it invokes the entry_point training script, where the train and test datasets are loaded, training steps are performed, and model artifacts are saved in /opt/ml/model in the container. After training, artifacts in this directory are uploaded to Amazon S3 for later model hosting.

You can inspect the training script in the GitHub repo, in the scripts/ directory.

Create an estimator and start a training job

We use the Hugging Face estimator class to train our model. When creating the estimator, you need to specify the following parameters:

  • entry_point – The name of the training script. It loads data from the input channels, configures training with hyperparameters, trains a model, and saves the model.
  • source_dir – The location of the training scripts.
  • transformers_version – The Hugging Face Transformers library version we want to use.
  • pytorch_version – The PyTorch version that’s compatible with the Transformers library.

For this use case and dataset, we use one ml.p3.2xlarge instance, and the training job is able to finish in around 2 hours. You can select a more powerful instance with more memory and GPU to reduce the training time; however, it incurs more cost.

When you create a Hugging Face estimator, you can configure hyperparameters and provide a custom parameter into the training script, such as vocab_url in this example. Also, you can specify the metrics in the estimator, parse the logs of these metrics, and send them to Amazon CloudWatch to monitor and track the training performance. For more details, see Monitor and Analyze Training Jobs Using Amazon CloudWatch Metrics.

from sagemaker.huggingface import HuggingFace

#create an unique id to tag training job, model name and endpoint name. 
id = int(time.time())

TRAINING_JOB_NAME = f"huggingface-wav2vec2-training-{id}"
vocab_url = f"s3://{BUCKET}/{PREFIX}/vocab.json"

hyperparameters = {'epochs':10, # you can increase the epoch number to improve model accuracy
                   'train_batch_size': 8,
                   'model_name': "facebook/wav2vec2-base",
                   'vocab_url': vocab_url
                  }
                  
# define metrics definitions
metric_definitions=[
        {'Name': 'eval_loss', 'Regex': "'eval_loss': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'eval_wer', 'Regex': "'eval_wer': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'eval_runtime', 'Regex': "'eval_runtime': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'eval_samples_per_second', 'Regex': "'eval_samples_per_second': ([0-9]+(.|e-)[0-9]+),?"},
        {'Name': 'epoch', 'Regex': "'epoch': ([0-9]+(.|e-)[0-9]+),?"}]

OUTPUT_PATH= f's3://{BUCKET}/{PREFIX}/{TRAINING_JOB_NAME}/output/'

huggingface_estimator = HuggingFace(entry_point='train.py',
                                    source_dir='./scripts',
                                    output_path= OUTPUT_PATH, 
                                    instance_type='ml.p3.2xlarge',
                                    instance_count=1,
                                    transformers_version='4.6.1',
                                    pytorch_version='1.7.1',
                                    py_version='py36',
                                    role=ROLE,
                                    hyperparameters = hyperparameters,
                                    metric_definitions = metric_definitions,
                                   )

#Starts the training job using the fit function, training takes approximately 2 hours to complete.
huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path},
                          job_name=TRAINING_JOB_NAME)

In the following figure of CloudWatch training job logs, you can see that, after 10 epochs of training, the model evaluation metrics WER (word error rate) can achieve around 0.17 for the subset of the SUPERB dataset. WER is a commonly used metric to evaluate speech recognition model performance, and the objective is to minimize it. You can increase the number of epochs or use the full SUPERB dataset to improve the model further.

Deploy the model as an endpoint on SageMaker and run inference

In this section, we walk through the steps to deploy the model and perform inference.

Inference script

We use the SageMaker Hugging Face Inference Toolkit to host our fine-tuned model. It provides default functions for preprocessing, predicting, and postprocessing for certain tasks. However, the default capabilities can’t inference our model properly. Therefore, we defined the custom functions model_fn(), input_fn(), predict_fn(), and output_fn() in the inference.py script to override the default settings with custom requirements. For more details, refer to the GitHub repo.

As of January 2022, the Inference Toolkit can inference tasks from architectures that end with 'TapasForQuestionAnswering', 'ForQuestionAnswering', 'ForTokenClassification', 'ForSequenceClassification', 'ForMultipleChoice', 'ForMaskedLM', 'ForCausalLM', 'ForConditionalGeneration', 'MTModel', 'EncoderDecoderModel','GPT2LMHeadModel', and 'T5WithLMHeadModel'. The Wav2Vec2 model is not currently supported.

You can inspect the full inference script in the GitHub repo, in the scripts/ directory.

Create a Hugging Face model from the estimator

We use the Hugging Face Model class to create a model object, which you can deploy to a SageMaker endpoint. When creating the model, specify the following parameters:

  • entry_point – The name of the inference script. The methods defined in the inference script are implemented to the endpoint.
  • source_dir – The location of the inference scripts.
  • transformers_version – The Hugging Face Transformers library version we want to use. It should be consistent with the training step.
  • pytorch_version – The PyTorch version that is compatible with the Transformers library. It should be consistent with the training step.
  • model_data – The Amazon S3 location of a SageMaker model data .tar.gz file.
from sagemaker.huggingface import HuggingFaceModel

huggingface_model = HuggingFaceModel(
        entry_point = 'inference.py',
        source_dir='./scripts',
        name = f'huggingface-wav2vec2-model-{id}',
        transformers_version='4.6.1', 
        pytorch_version='1.7.1', 
        py_version='py36',
        model_data=huggingface_estimator.model_data,
        role=ROLE,
    )

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge", 
    endpoint_name = f'huggingface-wav2vec2-endpoint-{id}'
)

When you create a predictor by using the model.deploy function, you can change the instance count and instance type based on your performance requirements.

Inference audio files

After you deploy the endpoint, you can run prediction tests to check the model performance. You can download an audio file from the S3 bucket by using the following code:

import boto3
s3 = boto3.client('s3')
s3.download_file(BUCKET, 'huggingface-blog/sample_audio/xxx.wav', 'downloaded.wav')
file_name ='downloaded.wav'

Alternatively, you can download a sample audio file to run the inference request:

import soundfile
!wget https://datashare.ed.ac.uk/bitstream/handle/10283/343/MKH800_19_0001.wav
file_name ='MKH800_19_0001.wav'
speech_array, sampling_rate = soundfile.read(file_name)
json_request_data = {"speech_array": speech_array.tolist(),
                     "sampling_rate": sampling_rate}

prediction = predictor.predict(json_request_data)
print(prediction)

The predicted result is as follows:

['"she had your dark suit in grecy wash water all year"', 'application/json']

Clean up

When you’re finished using the solution, delete the SageMaker endpoint to avoid ongoing charges:

predictor.delete_endpoint()

Conclusion

In this post, we showed how to fine-tune the pre-trained Wav2Vec2 model on SageMaker using a Hugging Face estimator, and also how to host the model on SageMaker as a real-time inference endpoint using the SageMaker Hugging Face Inference Toolkit. For both training and inference steps, we provided custom defined scripts for greater flexibility, which are enabled and supported by SageMaker Hugging Face DLCs. You can use the method from this post to fine-tune a We2Vec2 model with your own datasets, or to fine-tune and deploy a different transformer model from Hugging Face.

Check out the notebook and code of this project from GitHub, and let us know your comments. For more comprehensive information, see Hugging Face on SageMaker and Use Hugging Face with Amazon SageMaker.

In addition, Hugging Face and AWS announced a partnership in 2022 that makes it even easier to train Hugging Face models on SageMaker. This functionality is available through the development of Hugging Face AWS DLCs. These containers include the Hugging Face Transformers, Tokenizers, and Datasets libraries, which allow us to use these resources for training and inference jobs. For a list of the available DLC images, see Available Deep Learning Containers Images. They are maintained and regularly updated with security patches. You can find many examples of how to train Hugging Face models with these DLCs and the Hugging Face Python SDK in the following GitHub repo.


About the Author

Ying Hou, PhD, is a Machine Learning Prototyping Architect at AWS. Her main areas of interests are deep learning, computer vision, NLP, and time series data prediction. In her spare time, she enjoys reading novels and hiking in national parks in the UK.

Read More

Build a virtual credit approval agent with Amazon Lex, Amazon Textract, and Amazon Connect

Banking and financial institutions review thousands of credit applications per week. The credit approval process requires financial organizations to invest time and resources in reviewing documents like W2s, bank statements, and utility bills. The overall experience can be costly for the organization. At the same time, organizations have to consider borrowers, who are waiting for decisions on their credit applications. To retain customers, organizations need to process borrower applications quickly with low turnaround times.

With an automated credit approval assistant using machine learning, financial organizations can expedite the process, reduce cost, and provide better customer experience with faster decisions. Banks and Fintechs can build a virtual agent that can review a customer’s financial documents and provide a decision instantly. Building an effective credit approval process not only improves the customer experience, but also lowers the cost.

In this post, we show how to build a virtual credit approval assistant that reviews the financial documents required for loan approval and makes decisions instantly for a seamless customer experience. The solution uses Amazon Lex, Amazon Textract, and Amazon Connect, among other AWS services.

Overview of the solution

You can deploy the solution using an AWS CloudFormation template. The solution creates a virtual agent using Amazon Lex and associates it with Amazon Connect, which acts as the conversational interface with customers and asks the loan applicant to upload the necessary documents. The documents are stored in an Amazon Simple Storage Service (Amazon S3) bucket used only for that customer.

This solution is completely serverless and uses Amazon S3 to store a static website that hosts the front end and custom JavaScript to enable the rest of the requests. Amazon CloudFront serves as a content delivery network (CDN) to allow a public front end for the website. CloudFront is a fast CDN service that securely delivers data, videos, applications, and APIs to customers globally with low latency and high transfer speeds, all within a developer-friendly environment.

This is a sample project designed to be easily deployable for experimentation. The AWS Identity and Access Management (IAM) policy permissions in this solution use least privilege, however the CloudFront and Amazon API Gateway resources deployed are publicly accessible. To take the appropriate measures to secure your CloudFront distribution and API Gateway resources, refer to Configuring secure access and restricting access to content and Security in Amazon API Gateway, respectively.

Additionally, the backend features API Gateway with HTTP routes for two AWS Lambda functions. The first function creates the session with Amazon Connect for chat; the second passes the pre-signed URL link fetched by the front end from Amazon Connect to Amazon Lex. Amazon Lex triggers the Lambda function associated with it and lets Amazon Textract read the documents and capture all the fields and information in them. This function also makes the credit decisions based on business processes previously defined by the organization. The solution is integrated with Amazon Connect to let customers connect to contact center agents if the customer is having difficulty or needs help through the process.

The following example depicts the interaction between bot and borrower.

The following diagram illustrates the solution architecture.

The solution workflow is as follows:

  1. Customers navigate to a URL served by CloudFront, which fetches webpages from an S3 bucket and sends JavaScript to the web browser.
  2. The web browser renders the webpages and makes an API call to API Gateway.
  3. API Gateway triggers the associated Lambda function.
  4. The function initiates a startChatContact API call with Amazon Connect and triggers the contact flow associated with it.
  5. Amazon Connect triggers Amazon Lex with the utterance to classify the intent. After the intent is classified, Amazon Lex elicits the required slots and asks the customer to upload the document to fulfill the intent.
  6. The applicant uploads the W2 document to the S3 bucket using the upload attachment icon in the chat window.

As a best practice, consider implementing encryption at rest for the S3 bucket using AWS Key Management Service (AWS KMS). Additionally, you can attach a bucket policy to the S3 bucket to ensure data is always encrypted in transit. Consider enabling server access logging for the S3 bucket to capture detailed records of requests to assist with security and access audits. For more information, see Security Best Practices for Amazon S3.

  1. The web browser makes a call to Amazon Connect to retrieve a pre-signed URL of the uploaded image. Make sure the pre-signed URLs expire a few minutes after the Lambda function runs the logic.
  2. After the document has been uploaded successfully, the web application makes an API call to API Gateway to updates the file location for use in Amazon Lex session attributes.
  3. API Gateway triggers a Lambda function to pass the W2 pre-signed URL location. The function updates the session attributes in Amazon Lex with the pre-signed URL of the W2 document.
  4. The web browser also updates the slot to uploaded, which fulfills the intent.
  5. Amazon Lex triggers a Lambda function, which downloads the W2 image data and sends it to Amazon Textract for processing.
  6. Amazon Textract reads all the fields from the W2 image document, converts them into key-value pairs, and passes the data back to the Lambda function.

Amazon Textract conforms to the AWS shared responsibility model, which outlines the responsibilities for data protection between AWS and the customer. For more information, refer to Data Protection in Amazon Textract.

  1. Lambda uses the W2 data for evaluation of the loan application and returns the result to the web browser.

Follow the best practices for enabling logging in Lambda. Refer to part 1 and part 2 of the blog series “Operating Lambda: Building a solid security foundation.

Data in-transit is secured using TLS, and it’s highly recommended to encrypt data at rest. For more information about protecting data inside your S3 bucket, refer to Strengthen the security of sensitive data stored in Amazon S3 by using additional AWS services.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  1. An AWS account.
  2. An Amazon Connect contact center instance in the us-east-1 Region. You can use an existing one or create a new one. For instructions, refer to Get started with Amazon Connect. If you have an existing Amazon Connect instance and chat isn’t enabled, refer to Enabling Chat in an Existing Amazon Connect Contact Center.
  3. Chat attachments enabled in Amazon Connect. For instructions, refer to Enable attachments to share files using chat. For CORS setup, use option 2, which uses the * wildcard to AllowedOrigin.
  4. The example project located in the GitHub repository. You need to clone this repository on your local machine and use AWS Serverless Application Model (AWS SAM) to deploy the project. To install the AWS SAM CLI and configure AWS credentials, refer to Getting started with AWS SAM.
  5. Python 3.9 runtime to support the AWS SAM deployment.

Import the Amazon Connect flow

To import the Amazon Connect flow, complete the following steps:

  1. Log in to your Amazon Connect instance.
  2. Under Routing, choose Contact Flows.
  3. Choose Create contact flow.
  4. On the Save menu, choose Import flow.
  5. Choose Select and choose the import flow file located in the /flow subdirectory, called Loan_App_Connect_Flow.
  6. Save the flow. Do not publish yet.
  7. Expand Show additional flow information and choose the copy icon to capture the ARN.
  8. Save these IDs for use as parameters in the CloudFormation template to be deployed in the next step:
    arn:aws:connect:us-east-1:123456789012:instance/11111111-1111-1111-1111-111111111111/contact-flow/22222222-2222-2222-2222-222222222222

The Amazon Connect instance ID is the long alphanumeric value between the slashes immediately following instance in the ARN. For this post, the instance ID is 11111111-1111-1111-1111-111111111111.

The contact flow ID is the long value after the slash following contact-flow in the ARN. For this post, the flow ID is 22222222-2222-2222-2222-222222222222.

Deploy with AWS SAM

With the instance and flow IDs captured, we’re ready to deploy the project.

  1. Open a terminal window and clone the GitHub repository in a directory of your choice.
  2. Navigate to the amazon-connect-virtual-credit-agent directory and follow the deployment instructions in GitHub repo.
  3. Record the Amazon Lex bot name from the Outputs section of the deployment for the next steps (called Loan_App_Bot if you accepted the default name).
  4. Return to these instructions once the AWS SAM deploy completes successfully.

Update the contact flow blocks

To update the contact flow blocks, complete the following steps:

  1. Log in to your Amazon Connect instance
  2. Under Routing, choose Contact Flows.
  3. Choose the flow named Loan_App_Flow.
  4. Choose the Get customer input block.
  5. Under the Amazon Lex section, choose the bot named Loan_App_Bot and the dev alias created earlier.
  6. Choose Save.
  7. Choose the Set working queue block.
  8. Choose the X icon and on the drop-down menu, choose BasicQueue.
  9. Choose Save.
  10. Save the flow.
  11. Publish the flow.

Test the solution

You’re now ready to test the solution.

  1. Log in to you Amazon Connect instance for setting up an Amazon Connect agent for a chat.
  2. On the dashboard, choose the phone icon to open the Contact Control Panel (CCP) in a separate window.
  3. In the CCP, change the agent state to Available.
  4. On the Outputs tab for your CloudFormation stack, choose the value for cloudFrontDistribution.

This is a link to your CloudFront URL. You’re redirected to a webpage with your loan services bot. A floating action button (FAB) is on the bottom right of the screen.

  1. Choose the FAB to open the chat bot.
  2. After you get the welcome message, enter I need a loan.
  3. When prompted, choose a loan type and enter a loan amount.
  4. Upload an image of a W2 document.

A sample W2 image file is located in the project repository in the /img subdirectory. The file is called w2.png.

After the image is uploaded, the bot asks you if you want to submit the application.

  1. Choose Yes to submit.

After submission, the bot evaluates the W2 image and provides a response. After a few seconds, you’re connected to an agent.

You should see a request to connect with chat in the CCP.

  1. Choose the request to accept.

The agent is now connected to the chat user. You can simulate each side of the conversation to test the chat session.

  1. Choose End Chat when you’re done.

Troubleshooting

After you deploy the stack, if you see an Amazon S3 permission error when viewing the CloudFront URL, it means the domain isn’t ready yet. The CDN can take up to 1 hour to be ready.

If you can’t add your attachments, check your CORS setting. For instructions, refer to Enable attachments to share files using chat. For CORS setup, use option 2, which uses the * wildcard to AllowedOrigin.

Clean up

To avoid incurring future charges, remove all resources created by deleting the CloudFormation stack.

Conclusion

In this post, we demonstrated how to quickly and securely set up a loan application processing solution. Data at rest and in transit are both encrypted and secured. This solution can act as a blueprint to build other self-service processing flows where Amazon Connect and Amazon Lex provide a conversational interface for customer engagement. We look forward to seeing what other solutions you build using this architecture.

Should you need assistance building these capabilities and Amazon Connect contact flows, please reach out to one of the dozens of Amazon Connect partners available worldwide.


About the Authors

Dipkumar Mehta is a Senior Conversational AI Consultant with the Amazon ProServe Natural Language AI team. He focuses on helping customers design, deploy and scale end-to-end Conversational AI solutions in production on AWS. He is also passionate about improving customer experience and drive business outcomes by leveraging data.

Cecil Patterson is a Natural Language AI consultant with AWS Professional services based in North Texas. He has many years of experience working with large enterprises to enable and support global infrastructure solutions. Cecil uses his experience and diverse skill set to build exceptional conversational solutions for customers of all types.

Sanju Sunny is a Digital Innovation Specialist with Amazon ProServe. He engages with customers in a variety of industries around Amazon’s distinctive customer-obsessed innovation mechanisms in order to rapidly conceive, validate and prototype new products, services and experiences.

Matt Kurio is a Security Transformation Consultant with the Amazon ProServe Shared Delivery Team.  He excels helping enterprise customers build secure platforms and manage security effectively and efficiently.  He also enjoys relaxing at the beach and outdoor activities with his family.

Read More