Expedite IVR development with industry grammars on Amazon Lex

Amazon Lex is a service for building conversational interfaces into any application using voice and text. With Amazon Lex, you can easily build sophisticated, natural language, conversational bots (chatbots), virtual agents, and interactive voice response (IVR) systems. You can now use industry grammars to accelerate IVR development on Amazon Lex as part of your IVR migration effort. Industry grammars are a set of XML files made available as a grammar slot type. You can select from a range of pre-built industry grammars across domains, such as financial services, insurance, and telecom. In this post, we review the industry grammars for these industries and use them to create IVR experiences.

Financial services

You can use Amazon Lex in the financial services domain to automate customer service interactions such as credit card payments, mortgage loan applications, portfolio status, and account updates. During these interactions, the IVR flow needs to collect several details, including credit card number, mortgage loan ID, and portfolio details, to fulfill the user’s request. We use the financial services industry grammars in the following sample conversation:

Agent: Welcome to ACME bank. To get started, can I get your account ID?

User: Yes, it’s AB12345.

IVR: Got it. How can I help you?

User: I’d like to transfer funds to my savings account.

IVR: Sure. How much would you like to transfer?

User: $100

IVR: Great, thank you.

The following grammars are supported for financial services: account ID, credit card number, transfer amount, and different date formats such as expiration date (mm/yy) and payment date (mm/dd).

Let’s review the sample account ID grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">


        <!-- Test Cases

        Grammar will support the following inputs:

            Scenario 1:
                Input: My account number is A B C 1 2 3 4
                Output: ABC1234

            Scenario 2:
                Input: My account number is 1 2 3 4 A B C
                Output: 1234ABC

            Scenario 3:
                Input: Hmm My account number is 1 2 3 4 A B C 1
                Output: 123ABC1
        -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item><ruleref uri="#alphanumeric"/><tag>out += rules.alphanumeric.alphanum;</tag></item>
            <item repeat="0-1"><ruleref uri="#alphabets"/><tag>out += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits.numbers</tag></item>
        </rule>

        <rule id="text">
            <item repeat="0-1"><ruleref uri="#hesitation"/></item>
            <one-of>
                <item repeat="0-1">account number is</item>
                <item repeat="0-1">Account Number</item>
                <item repeat="0-1">Here is my Account Number </item>
                <item repeat="0-1">Yes, It is</item>
                <item repeat="0-1">Yes It is</item>
                <item repeat="0-1">Yes It's</item>
                <item repeat="0-1">My account Id is</item>
                <item repeat="0-1">This is the account Id</item>
                <item repeat="0-1">account Id</item>
            </one-of>
        </rule>

        <rule id="hesitation">
          <one-of>
             <item>Hmm</item>
             <item>Mmm</item>
             <item>My</item>
          </one-of>
        </rule>

        <rule id="alphanumeric" scope="public">
            <tag>out.alphanum=""</tag>
            <item><ruleref uri="#alphabets"/><tag>out.alphanum += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.alphanum += rules.digits.numbers</tag></item>
        </rule>

        <rule id="alphabets">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.letters=""</tag>
            <tag>out.firstOccurence=""</tag>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.firstOccurence += rules.digits.numbers; out.letters += out.firstOccurence;</tag></item>
            <item repeat="1-">
                <one-of>
                    <item>A<tag>out.letters+='A';</tag></item>
                    <item>B<tag>out.letters+='B';</tag></item>
                    <item>C<tag>out.letters+='C';</tag></item>
                    <item>D<tag>out.letters+='D';</tag></item>
                    <item>E<tag>out.letters+='E';</tag></item>
                    <item>F<tag>out.letters+='F';</tag></item>
                    <item>G<tag>out.letters+='G';</tag></item>
                    <item>H<tag>out.letters+='H';</tag></item>
                    <item>I<tag>out.letters+='I';</tag></item>
                    <item>J<tag>out.letters+='J';</tag></item>
                    <item>K<tag>out.letters+='K';</tag></item>
                    <item>L<tag>out.letters+='L';</tag></item>
                    <item>M<tag>out.letters+='M';</tag></item>
                    <item>N<tag>out.letters+='N';</tag></item>
                    <item>O<tag>out.letters+='O';</tag></item>
                    <item>P<tag>out.letters+='P';</tag></item>
                    <item>Q<tag>out.letters+='Q';</tag></item>
                    <item>R<tag>out.letters+='R';</tag></item>
                    <item>S<tag>out.letters+='S';</tag></item>
                    <item>T<tag>out.letters+='T';</tag></item>
                    <item>U<tag>out.letters+='U';</tag></item>
                    <item>V<tag>out.letters+='V';</tag></item>
                    <item>W<tag>out.letters+='W';</tag></item>
                    <item>X<tag>out.letters+='X';</tag></item>
                    <item>Y<tag>out.letters+='Y';</tag></item>
                    <item>Z<tag>out.letters+='Z';</tag></item>
                </one-of>
            </item>
        </rule>

        <rule id="digits">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.numbers=""</tag>
            <item repeat="1-10">
                <one-of>
                    <item>0<tag>out.numbers+=0;</tag></item>
                    <item>1<tag>out.numbers+=1;</tag></item>
                    <item>2<tag>out.numbers+=2;</tag></item>
                    <item>3<tag>out.numbers+=3;</tag></item>
                    <item>4<tag>out.numbers+=4;</tag></item>
                    <item>5<tag>out.numbers+=5;</tag></item>
                    <item>6<tag>out.numbers+=6;</tag></item>
                    <item>7<tag>out.numbers+=7;</tag></item>
                    <item>8<tag>out.numbers+=8;</tag></item>
                    <item>9<tag>out.numbers+=9;</tag></item>
                </one-of>
            </item>
        </rule>
</grammar>

Using the industry grammar for financial services

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called Financialbot and adds the grammars for financial services, which we store in Amazon Simple Storage Service (Amazon S3):

  1. Download the Amazon Lex bot definition.
  2. On the Amazon Lex console, choose Actions and then choose Import.
  3. Choose the Financialbot.zip file that you downloaded, and choose Import.
  4. Copy the grammar XML files for financial services, listed in the preceding section.
  5. On the Amazon S3 console, upload the XML files.
  6. Navigate to the slot types on the Amazon Lex console and choose the accountID slot type so you can associate the fin_accountNumber.grxml file.
  7. In the slot type, enter the Amazon S3 link for the XML file and the object key.
  8. Choose Save slot type.

The AWS Identity and Access Management (IAM) role used to create the bot must have permission to read files from the S3 bucket.

  1. Repeat steps 6–8 for the transferFunds slot type with fin_transferAmount.grxml.
  2. After you save the grammars, choose Build.
  3. Download the financial services contact flow to integrate it with the Amazon Lex bot via Amazon Connect.
  4. On the Amazon Connect console, choose Contact flows.
  5. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
  6. Select the contact flow to load it into the application.
  7. Test the IVR flow by calling in to the phone number.

Insurance

You can use Amazon Lex in the insurance domain to automate customer service interactions such as claims processing, policy management, and premium payments. During these interactions, the IVR flow needs to collect several details, including policy ID, license plate, and premium amount, to fulfill the policy holder’s request. We use the insurance industry grammars in the following sample conversation:

Agent: Welcome to ACME insurance company. To get started, can I get your policy ID?

Caller: Yes, it’s AB1234567.

IVR: Got it. How can I help you?

Caller: I’d like to file a claim.

IVR: Sure. Is this claim regarding your auto policy or home owners’ policy?

Caller: Auto

IVR: What’s the license plate on the vehicle?

Caller: ABCD1234

IVR: Thank you. And how much is the claim for?

Caller: $900

IVR: What was the date and time of the accident?

Caller: March 1st 2:30pm.

IVR: Thank you. I’ve got that started for you. Someone from our office should be in touch with you shortly. Your claim ID is 12345.

The following grammars are supported for the insurance domain: policy ID, driver’s license, social security number, license plate, claim number, and renewal date.

Let’s review the sample claimDateTime grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">

         <!-- Test Cases

         Grammar will support the following inputs:

             Scenario 1:
                 Input: The accident occured at july three at five am
                 Output:  july 3 5am

             Scenario 2:
                 Input: Damage was reported at july three at five am
                 Output:  july 3 5am

             Scenario 3:
                 Input: Schedule virtual inspection for july three at five am
                 Output:  july 3 5am
         -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item repeat="1-10">
                <item><ruleref uri="#months"/><tag>out = out + rules.months + " ";</tag></item>
                <one-of>
                    <item><ruleref uri="#digits"/><tag>out += rules.digits + " ";</tag></item>
                    <item><ruleref uri="#teens"/><tag>out += rules.teens+ " ";</tag></item>
                    <item><ruleref uri="#above_twenty"/><tag>out += rules.above_twenty+ " ";</tag></item>
                </one-of>
                <item><ruleref uri="#at"/><tag>out += rules.at.new;</tag></item>
                <item repeat="0-1"><ruleref uri="#mins"/><tag>out +=":" + rules.mins.min;</tag></item>
                <item><ruleref uri="#ampm"/><tag>out += rules.ampm;</tag></item>
            </item>
            <item repeat="0-1"><ruleref uri="#thanks"/></item>
        </rule>

        <rule id="text">
           <one-of>
             <item repeat="0-1">The accident occured at</item>
             <item repeat="0-1">Time of accident is</item>
             <item repeat="0-1">Damage was reported at</item>
             <item repeat="0-1">Schedule virtual inspection for</item>
           </one-of>
        </rule>

        <rule id="thanks">
            <one-of>
               <item>Thanks</item>
               <item>I think</item>
            </one-of>
          </rule>

        <rule id="months">
           <item repeat="0-1"><ruleref uri="#text"/></item>
           <one-of>
             <item>january<tag>out="january";</tag></item>
             <item>february<tag>out="february";</tag></item>
             <item>march<tag>out="march";</tag></item>
             <item>april<tag>out="april";</tag></item>
             <item>may<tag>out="may";</tag></item>
             <item>june<tag>out="june";</tag></item>
             <item>july<tag>out="july";</tag></item>
             <item>august<tag>out="august";</tag></item>
             <item>september<tag>out="september";</tag></item>
             <item>october<tag>out="october";</tag></item>
             <item>november<tag>out="november";</tag></item>
             <item>december<tag>out="december";</tag></item>
             <item>jan<tag>out="january";</tag></item>
             <item>feb<tag>out="february";</tag></item>
             <item>aug<tag>out="august";</tag></item>
             <item>sept<tag>out="september";</tag></item>
             <item>oct<tag>out="october";</tag></item>
             <item>nov<tag>out="november";</tag></item>
             <item>dec<tag>out="december";</tag></item>
           </one-of>
       </rule>

        <rule id="digits">
            <one-of>
                <item>0<tag>out=0;</tag></item>
                <item>1<tag>out=1;</tag></item>
                <item>2<tag>out=2;</tag></item>
                <item>3<tag>out=3;</tag></item>
                <item>4<tag>out=4;</tag></item>
                <item>5<tag>out=5;</tag></item>
                <item>6<tag>out=6;</tag></item>
                <item>7<tag>out=7;</tag></item>
                <item>8<tag>out=8;</tag></item>
                <item>9<tag>out=9;</tag></item>
                <item>first<tag>out=1;</tag></item>
                <item>second<tag>out=2;</tag></item>
                <item>third<tag>out=3;</tag></item>
                <item>fourth<tag>out=4;</tag></item>
                <item>fifth<tag>out=5;</tag></item>
                <item>sixth<tag>out=6;</tag></item>
                <item>seventh<tag>out=7;</tag></item>
                <item>eighth<tag>out=8;</tag></item>
                <item>ninth<tag>out=9;</tag></item>
                <item>one<tag>out=1;</tag></item>
                <item>two<tag>out=2;</tag></item>
                <item>three<tag>out=3;</tag></item>
                <item>four<tag>out=4;</tag></item>
                <item>five<tag>out=5;</tag></item>
                <item>six<tag>out=6;</tag></item>
                <item>seven<tag>out=7;</tag></item>
                <item>eight<tag>out=8;</tag></item>
                <item>nine<tag>out=9;</tag></item>
            </one-of>
        </rule>


      <rule id="at">
        <tag>out.new=""</tag>
        <item>at</item>
        <one-of>
          <item repeat="0-1"><ruleref uri="#digits"/><tag>out.new+= rules.digits</tag></item>
          <item repeat="0-1"><ruleref uri="#teens"/><tag>out.new+= rules.teens</tag></item>
        </one-of>
      </rule>

      <rule id="mins">
        <tag>out.min=""</tag>
        <item repeat="0-1">:</item>
        <item repeat="0-1">and</item>
        <one-of>
          <item repeat="0-1"><ruleref uri="#digits"/><tag>out.min+= rules.digits</tag></item>
          <item repeat="0-1"><ruleref uri="#teens"/><tag>out.min+= rules.teens</tag></item>
          <item repeat="0-1"><ruleref uri="#above_twenty"/><tag>out.min+= rules.above_twenty</tag></item>
        </one-of>
      </rule>

      <rule id="ampm">
            <tag>out=""</tag>
            <one-of>
                <item>AM<tag>out="am";</tag></item>
                <item>PM<tag>out="pm";</tag></item>
                <item>am<tag>out="am";</tag></item>
                <item>pm<tag>out="pm";</tag></item>
            </one-of>
        </rule>


        <rule id="teens">
            <one-of>
                <item>ten<tag>out=10;</tag></item>
                <item>tenth<tag>out=10;</tag></item>
                <item>eleven<tag>out=11;</tag></item>
                <item>twelve<tag>out=12;</tag></item>
                <item>thirteen<tag>out=13;</tag></item>
                <item>fourteen<tag>out=14;</tag></item>
                <item>fifteen<tag>out=15;</tag></item>
                <item>sixteen<tag>out=16;</tag></item>
                <item>seventeen<tag>out=17;</tag></item>
                <item>eighteen<tag>out=18;</tag></item>
                <item>nineteen<tag>out=19;</tag></item>
                <item>tenth<tag>out=10;</tag></item>
                <item>eleventh<tag>out=11;</tag></item>
                <item>twelveth<tag>out=12;</tag></item>
                <item>thirteenth<tag>out=13;</tag></item>
                <item>fourteenth<tag>out=14;</tag></item>
                <item>fifteenth<tag>out=15;</tag></item>
                <item>sixteenth<tag>out=16;</tag></item>
                <item>seventeenth<tag>out=17;</tag></item>
                <item>eighteenth<tag>out=18;</tag></item>
                <item>nineteenth<tag>out=19;</tag></item>
            </one-of>
        </rule>

        <rule id="above_twenty">
            <one-of>
                <item>twenty<tag>out=20;</tag></item>
                <item>thirty<tag>out=30;</tag></item>
            </one-of>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits;</tag></item>
        </rule>
</grammar>

Using the industry grammar for insurance

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called InsuranceBot and adds the grammars for the insurance domain:

  1. Download the Amazon Lex bot definition.
  2. On the Amazon Lex console, choose Actions, then choose Import.
  3. Choose the InsuranceBot.zip file that you downloaded, and choose Import.
  4. Copy the grammar XML files for insurance, listed in the preceding section.
  5. On the Amazon S3 console, upload the XML files.
  6. Navigate to the slot types on the Amazon Lex console and select the policyID slot type so you can associate the ins_policyNumber.grxml grammar file.
  7. In the slot type, enter the Amazon S3 link for the XML file and the object key.
  8. Choose Save slot type.

The IAM role used to create the bot must have permission to read files from the S3 bucket.

  1. Repeat steps 6–8 for the licensePlate slot type (ins_NJ_licensePlateNumber.grxml) and dateTime slot type (ins_claimDateTime.grxml).
  2. After you save the grammars, choose Build.
  3. Download the insurance contact flow to integrate with the Amazon Lex bot.
  4. On the Amazon Connect console, choose Contact flows.
  5. In the Amazon Lex section, and select your Lex bot and make it available for use in the Amazon Connect contact flows.
  6. Select the contact flow to load it into the application.
  7. Test the IVR flow by calling in to the phone number.

Telecom

You can use Amazon Lex in the telecom domain to automate customer service interactions such as activating service, paying bills, and managing device installations. During these interactions, the IVR flow needs to collect several details, including SIM number, zip code, and the service start date, to fulfill the user’s request. We use the financial services industry grammars in the following sample conversation:

Agent: Welcome to ACME cellular. To get started, can I have the telephone number associated with your account?

User: Yes, it’s 123 456 7890.

IVR: Thanks. How can I help you?

User: I am calling to activate my service.

IVR: Sure. What’s the SIM number on the device?

IVR: 12345ABC

IVR: Ok. And can I have the zip code?

User: 12345

IVR: Great, thank you. The device has been activated.

The following grammars are supported for telecom: SIM number, device serial number, zip code, phone number, service start date, and ordinals.

Let’s review the sample SIM number grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">


        <!-- Test Cases

        Grammar will support the following inputs:

            Scenario 1:
                Input: My SIM number is A B C 1 2 3 4
                Output: ABC1234

            Scenario 2:
                Input: My SIM number is 1 2 3 4 A B C
                Output: 1234ABC

            Scenario 3:
                Input: My SIM number is 1 2 3 4 A B C 1
                Output: 123ABC1
        -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item><ruleref uri="#alphanumeric"/><tag>out += rules.alphanumeric.alphanum;</tag></item>
            <item repeat="0-1"><ruleref uri="#alphabets"/><tag>out += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits.numbers</tag></item>
        </rule>

        <rule id="text">
            <item repeat="0-1"><ruleref uri="#hesitation"/></item>
            <one-of>
                <item repeat="0-1">My SIM number is</item>
                <item repeat="0-1">SIM number is</item>
            </one-of>
        </rule>

        <rule id="hesitation">
          <one-of>
             <item>Hmm</item>
             <item>Mmm</item>
             <item>My</item>
          </one-of>
        </rule>

        <rule id="alphanumeric" scope="public">
            <tag>out.alphanum=""</tag>
            <item><ruleref uri="#alphabets"/><tag>out.alphanum += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.alphanum += rules.digits.numbers</tag></item>
        </rule>

        <rule id="alphabets">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.letters=""</tag>
            <tag>out.firstOccurence=""</tag>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.firstOccurence += rules.digits.numbers; out.letters += out.firstOccurence;</tag></item>
            <item repeat="1-">
                <one-of>
                    <item>A<tag>out.letters+='A';</tag></item>
                    <item>B<tag>out.letters+='B';</tag></item>
                    <item>C<tag>out.letters+='C';</tag></item>
                    <item>D<tag>out.letters+='D';</tag></item>
                    <item>E<tag>out.letters+='E';</tag></item>
                    <item>F<tag>out.letters+='F';</tag></item>
                    <item>G<tag>out.letters+='G';</tag></item>
                    <item>H<tag>out.letters+='H';</tag></item>
                    <item>I<tag>out.letters+='I';</tag></item>
                    <item>J<tag>out.letters+='J';</tag></item>
                    <item>K<tag>out.letters+='K';</tag></item>
                    <item>L<tag>out.letters+='L';</tag></item>
                    <item>M<tag>out.letters+='M';</tag></item>
                    <item>N<tag>out.letters+='N';</tag></item>
                    <item>O<tag>out.letters+='O';</tag></item>
                    <item>P<tag>out.letters+='P';</tag></item>
                    <item>Q<tag>out.letters+='Q';</tag></item>
                    <item>R<tag>out.letters+='R';</tag></item>
                    <item>S<tag>out.letters+='S';</tag></item>
                    <item>T<tag>out.letters+='T';</tag></item>
                    <item>U<tag>out.letters+='U';</tag></item>
                    <item>V<tag>out.letters+='V';</tag></item>
                    <item>W<tag>out.letters+='W';</tag></item>
                    <item>X<tag>out.letters+='X';</tag></item>
                    <item>Y<tag>out.letters+='Y';</tag></item>
                    <item>Z<tag>out.letters+='Z';</tag></item>
                </one-of>
            </item>
        </rule>

        <rule id="digits">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.numbers=""</tag>
            <item repeat="1-10">
                <one-of>
                    <item>0<tag>out.numbers+=0;</tag></item>
                    <item>1<tag>out.numbers+=1;</tag></item>
                    <item>2<tag>out.numbers+=2;</tag></item>
                    <item>3<tag>out.numbers+=3;</tag></item>
                    <item>4<tag>out.numbers+=4;</tag></item>
                    <item>5<tag>out.numbers+=5;</tag></item>
                    <item>6<tag>out.numbers+=6;</tag></item>
                    <item>7<tag>out.numbers+=7;</tag></item>
                    <item>8<tag>out.numbers+=8;</tag></item>
                    <item>9<tag>out.numbers+=9;</tag></item>
                </one-of>
            </item>
        </rule>
</grammar>

Using the industry grammar for telecom

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called TelecomBot and adds the grammars for telecom:

  1. Download the Amazon Lex bot definition.
  2. On the Amazon Lex console, choose Actions, then choose Import.
  3. Choose the TelecomBot.zip file that you downloaded, and choose Import.
  4. Copy the grammar XML files for the telecom domain, listed in the preceding section.
  5. On the Amazon S3 console, upload the XML files.
  6. Navigate to the slot types on the Amazon Lex console and select phoneNumber so you can associate the tel_phoneNumber.grxml grammar.
  7. In the slot type, enter the Amazon S3 link for the XML file and the object key.
  8. Choose Save slot type.

The IAM role used to create the bot must have permission to read files from the S3 bucket.

  1. Repeat steps 6–8 for the slot types SIM number (tel_simNumber.grxml) and zipcode (tel_usZipcode.grxml).
  2. After you save the grammars, choose Build.
  3. Download the insurance contact flow to integrate with the Amazon Lex bot.
  4. On the Amazon Connect console, choose Contact flows.
  5. In the Amazon Lex section, and select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
  6. Select the contact flow to load it into the application.
  7. Test the IVR flow by calling in to the phone number.

Test the solution

You can call in to the Amazon Connect phone number and interact with the bot. You can also test the solution directly on the Amazon Lex V2 console using voice or text.

Conclusion

Industry grammars provide a set of pre-built XML files that you can use to quickly create IVR flows. You can select grammars to enable customer service conversations for use cases across financial services, insurance, and telecom. The grammars are available as a grammar slot type and can be used in an Amazon Lex bot configuration. You can download the grammars and enable these via the Amazon Lex V2 console or SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to Using a custom grammar slot type.


About the Authors

John Heater has over 15 years of experience in AI and automation. As the SVP of the Contact Center Practice at NeuraFlash, he leads the implementation of the latest AI and automation techniques for a portfolio of products and customer solutions.

Sandeep Srinivasan is a Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.

Read More

Easily migrate your IVR flows to Amazon Lex using the IVR migration tool

This post was co-written by John Heater, SVP of the Contact Center Practice at NeuraFlash. NeuraFlash is an Advanced AWS Partner with over 40 collective years of experience in the voice and automation space. With a dedicated team of conversation designers, data engineers, and AWS developers, NeuraFlash helps customers take advantage of the power of Amazon Lex in their contact centers.

Amazon Lex provides automatic speech recognition and natural language understanding technologies so you can build sophisticated conversational experiences and create effective interactive voice response (IVR) flows. A native integration with Amazon Connect, AWS’s cloud-based contact center, enables the addition of a conversational interface to any call center application. You can design IVR experiences to identify user requests and fulfill these by running the appropriate business logic.

Today, NeuraFlash, an AWS APN partner, launched a migration tool on AWS Marketplace that helps you easily migrate your VoiceXML (VXML) IVR flows to Amazon Lex and Amazon Connect. The migration tool takes the VXML configuration and grammar XML files as input and provides an Amazon Lex bot definition. It also supports grammars and Amazon Connect contact flows so you can quickly get started with your IVR conversational experiences.

In this post, we cover the use of IVR migration tool and review the resulting Amazon Lex bot definition and Amazon Connect contact flows.

Sample conversation overview

You can use the sample VXML and grammar files as input to try out the tool. The sample IVR supports the following conversation:

IVR: Welcome to ACME bank. For verification, can I get the last four on SSN on the account?

Caller: Yes, it’s 1234.

IVR: Great. And the date of birth for the primary holder?

Caller: Jan 1st 2000.

IVR: Thank you. How can I help you today?

Caller: I’d like to make a payment.

IVR: Sure. What’s the credit card number?

Caller: 1234 5678 1234 5678

IVR: Got it. What’s the CVV?

Caller: 123

IVR: How about the expiration date?

Caller: Jan 2025.

IVR: Great. How much are we paying today?

Caller: $100

IVR: Thank you. Your payment of $100 on card ending in 5678 is processed. Anything else we can help you with?

Caller: No thanks.

IVR: Have a great day.

Migration tool overview

The following diagram illustrates the architecture of the migration tool.

You can access the migration tool in the AWS Marketplace. Follow the instructions to upload your VXML and grammar XML files.

The tool processes the input XML files to create an IVR flow. You can download the Amazon Connect contact flow, Amazon Lex bot definition, and supporting grammar files.

Migration methodology

The IVR migration tool analyzes the uploaded IVR application and generates an Amazon Lex bot, Amazon Connect flows, and SRGS grammar files. One bot is generated per VXML application (or VXML file). Each input state in the VXML file is mapped to a dialog prompt in the Amazon Lex bot. The corresponding grammar file for the input state is used to create a grammar slot. For the Amazon Connect flow, each VXML file maps to a node in the IVR flow. Within the flow, a GetCustomerInputBlock hands off the control to Amazon Lex to manage the dialog.

Let’s consider the following VXML content in the sample dialog for user verification. You can download the VerifyAccount VXML file.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="1.0" application="app_root.vxml">


<!--*** Verify user with SSN ***-->
<form id="Verify_SSN">
  <field name="Verify_SSN">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/last4ssn.grxml'"/>
      <prompt>
            <audio expr="'./prompts/Verify_SSN/Init.wav'">
                To verify your account, can I please have the last four digits of your social security number.
            </audio>
        </prompt>
<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/Verify_SSN/nm1.wav'">
         I'm sorry, I didn't understand. Please tell me the last four digits of your social security number.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/Verify_SSN/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter the last four digits of your social security number.  You can also say I dont know if you do not have it.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/Verify_SSN/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please tell me the last four digits of your social security number.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/Verify_SSN/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter the last four digits of your social security number.  You can also say I dont know if you do not have it.
  </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="Verify_SSN.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <elseif cond="Verify_SSN.option == 'dunno'" />
                <assign name="transfer_reason" expr="'no_ssn'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="last4_ssn" expr="Verify_SSN.option"/>
                <goto next="#Verify_DOB"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Verify user with date of birth ***-->
<form id="Verify_DOB">
  <field name="Verify_DOB">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/dateofbirth.grxml'"/>
      <prompt>
            <audio expr="'./prompts/Verify_DOB/Init.wav'">
                Thank you.  And can I also have your date of birth?
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/Verify_DOB/nm1.wav'">
         I'm sorry, I didn't understand. Please say your date of birth.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/Verify_DOB/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your date of birth.  For example, you can say July twenty fifth nineteen eighty or enter zero seven two five one nine eight zero.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/Verify_DOB/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say your date of birth.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/Verify_DOB/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your date of birth.  For example, you can say July twenty fifth nineteen eighty or enter zero seven two five one nine eight zero.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="Verify_DOB.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="date_of_birth" expr="Verify_DOB.option"/>
                <goto next="validate_authentication.vxml"/>
            </if>
        </filled>
    </field>
</form>


</vxml>

In addition to the preceding VXML file, we include the following SRGS grammars from the IVR application in the IVR migration tool:

An Amazon Lex bot is created that to verify the caller. The Verification bot has one intent (VerifyAccount).

The bot has two slots (SSN, DOB) that reference the grammar files for the SSN and date of birth grammars, respectively. You can download the last4SSN.grxml and dateOfBirth.grxml grammar files as output to create the custom slot types in Amazon Lex.

In another example of a payment flow, the IVR migration tool reads in the payment collection flows to generate an Amazon Lex bot that can handle payments. You can download the corresponding Payment VXML file and SRGS grammars.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="1.0" application="app_root.vxml">


<!--*** Collect the users credit card for payment ***-->
<form id="CreditCard_Collection">
  <field name="CreditCard_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard.grxml'"/>
      <prompt>
            <audio expr="'./prompts/CreditCard_Collection/Init.wav'">
                To start your payment, can I please have your credit card number.
            </audio>
        </prompt>
<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/CreditCard_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please tell me your credit card number.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/CreditCard_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your credit card number.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/CreditCard_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please tell me your credit card number.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/CreditCard_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your credit card number.
  </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
                <assign name="creditcard_number" expr="CreditCard_Collection.option"/>
                <goto next="#ExpirationDate_Collection"/>
        </filled>
    </field>
</form>

<!--*** Collect the credit card expiration date ***-->
<form id="ExpirationDate_Collection">
  <field name="ExpirationDate_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard_expiration.grxml'"/>
      <prompt>
            <audio expr="'./prompts/ExpirationDate_Collection/Init.wav'">
                Thank you.  Now please provide your credit card expiration date.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/ExpirationDate_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the expiration date.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/ExpirationDate_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your credit card expiration date.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/ExpirationDate_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the expiration date.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/ExpirationDate_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your credit card expiration date.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="ExpirationDate_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="creditcard_expiration" expr="ExpirationDate_Collection.option"/>
                <goto next="#CVV_Collection"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Collect the credit card CVV number ***-->
<form id="CVV_Collection">
  <field name="CVV_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard_cvv.grxml'"/>
      <prompt>
            <audio expr="'./prompts/CVV_Collection/Init.wav'">
                Almost done.  Now please tell me the CVV code.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/CVV_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the CVV on the credit card.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/CVV_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter the credit card CVV.  It can be found on the back of the card.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/CVV_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the CVV on the credit card.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/CVV_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter the credit card CVV.  It can be found on the back of the card.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="CVV_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="creditcard_cvv" expr="CVV_Collection.option"/>
                <goto next="#PaymentAmount_Collection"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Collect the payment amount ***-->
<form id="PaymentAmount_Collection">
  <field name="PaymentAmount_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/amount.grxml'"/>
      <prompt>
            <audio expr="'./prompts/PaymentAmount_Collection/Init.wav'">
                Finally, please tell me how much you will be paying.  You can also say full amount.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/PaymentAmount_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the amount of your payment.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/PaymentAmount_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your payment amount.  If you will be paying in full you can just say full amount.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/PaymentAmount_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the amount of your payment.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/PaymentAmount_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your payment amount.  If you will be paying in full you can just say full amount.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="PaymentAmount_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <elseif cond="Verify_SSN.option == 'full_amount'" />
                <assign name="creditcard_amount" expr="'full''"/>
                <goto next="processpayment.vxml"/>
            <else/>
                <assign name="creditcard_amount" expr="PaymentAmount_Collection.option"/>
                <goto next="processpayment.vxml"/>
            </if>
        </filled>
    </field>
</form>

</vxml>

In addition to the preceding VXML file, we include the following SRGS grammars from the IVR application in the IVR migration tool:

An Amazon Lex bot is created to collect the payment details. The Payment bot has one intent (MakePayment).

The bot has four slots (credit card number, expiration date, CVV, payment amount) that reference the grammar file. You can download the creditCard.grxml, creditCardExpiration.grxml, creditCardCVV.grxml, and paymentAmount.grxml grammar files as output to create the custom slot types in Amazon Lex.

Lastly, the migration tool provides the payment IVR contact flow to manage the end-to-end conversation.

Conclusion

Amazon Lex enables you to easily build sophisticated, natural language conversational experiences. The IVR migration tool allows you to easily migrate your VXML IVR flows to Amazon Lex. The tool provides the bot definitions and grammars in addition to the Amazon Connect contact flows. It enables you to migrate your IVR flows as is and get started on Amazon Lex, giving you the flexibility to build out the conversational experience at your own pace.

Use the migration tool on AWS Marketplace and migrate your IVR to Amazon Lex today.


About the Authors

John Heater has over 15 years of experience in AI and automation. As the SVP of the Contact Center Practice at NeuraFlash, he leads the implementation of the latest AI and automation techniques for a portfolio of products and customer solutions.

Sandeep Srinivasan is a Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.

Read More

How Amazon Search achieves low-latency, high-throughput T5 inference with NVIDIA Triton on AWS

Amazon Search’s vision is to enable customers to search effortlessly. Our spelling correction helps you find what you want even if you don’t know the exact spelling of the intended words. In the past, we used classical machine learning (ML) algorithms with manual feature engineering for spelling correction. To make the next generational leap in spelling correction performance, we are embracing a number of deep-learning approaches, including sequence-to-sequence models. Deep learning (DL) models are compute-intensive both in training and inference, and these costs have historically made DL models impractical in a production setting at Amazon’s scale. In this post, we present the results of an inference optimization experimentation where we overcome those obstacles and achieve 534% inference speed-up for the popular Hugging Face T5 Transformer.

Challenge

The Text-to-Text Transfer Transformer (T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Reffel et al) is the state-of-the-art natural language processing (NLP) model architecture. T5 is a promising architecture for spelling correction, that we found to perform well in our experiments. T5 models are easy to research, develop, and train, thanks to open-source deep learning frameworks and ongoing academic and enterprise research.

However, it’s difficult to achieve production-grade, low-latency inference with a T5. For example, a single inference with a PyTorch T5 takes 45 milliseconds on one of the four NVIDIA V100 Tensor Core GPUs equipping an Amazon Elastic Compute Cloud (EC2) p3.8xlarge instance. (All inference numbers reported are for an input of 9 tokens and output of 11 tokens. The latency of T5 architectures is sensitive to both input and output lengths.)

Low-latency, cost-efficient T5 inference at scale is a known difficulty that has been reported by several AWS customers beyond Amazon Search, which boosts our motivation to contribute this post. To go from an offline, scientific achievement to a customer-facing production service, Amazon Search faces the following challenges:

  • Latency – How to realize T5 inference in less than 50-millisecond P99 latency
  • Throughput – How to handle large-scale concurrent inference requests
  • Cost efficiency – How to keep costs under control

In the rest of this post, we explain how the NVIDIA inference optimization stack—namely the NVIDIA TensorRT compiler and the open source NVIDIA Triton Inference Server—solves those challenges. Read NVIDIA’s press release to learn about the updates.

NVIDIA TensorRT: Reducing costs and latency with inference optimization

Deep learning frameworks are convenient to iterate fast on the science, and come with numerous functionalities for scientific modeling, data loading, and training optimization. However, most of those tools are suboptimal for inference, which only requires a minimal set of operators for matrix multiplication and activation functions. Therefore, significant gains can be realized by using a specialized, prediction-only application instead of running inference in the deep learning development framework.

NVIDIA TensorRT is an SDK for high-performance deep learning inference. TensorRT delivers both an optimized runtime, using low-level optimized kernels available on NVIDIA GPUs, and an inference-only model graph, which rearranges inference computation in an optimized order.

In the following section, we will talk about the details happening behind TensorRT and how it speeds performance.

  1. Reduced Precision maximizes throughput with FP16 or INT8 by quantizing models while maintaining correctness.
  2. Layer and Tensor Fusion optimizes use of GPU memory and bandwidth by fusing nodes in a kernel to avoid kernel launch latency.
  3. Kernel Auto-Tuning selects best data layers and algorithms based on the target GPU platform and data kernel shapes.
  4. Dynamic Tensor Memory minimizes memory footprint by freeing unnecessary memory consumption of intermediate results and reuses memory for tensors efficiently.
  5. Multi-Stream Execution uses a scalable design to process multiple input streams in parallel with dedicated CUDA streams.
  6. Time Fusion optimizes recurrent neural networks over time steps with dynamically generated kernels.

T5 uses transformer layers as building blocks for its architectures. The latest release of NVIDIA TensorRT 8.2 introduces new optimizations for the T5 and GPT-2 models for real-time inference. In the following table, we can see the speedup with TensorRT on some public T5 models running on Amazon EC2G4dn instances, powered by NVIDIA T4 GPUs and EC2 G5 instances, powered by NVIDIA A10G GPUs.

 

Model Instance Baseline Pytorch Latency (ms) TensorRT 8.2 Latency (ms) Speedup vs. the HF baseline
FP32 FP32 FP16 FP32 FP16
Encoder Decoder End to End Encoder Decoder End to End Encoder Decoder End to End End to End End to End
t5-small g4dn.xlarge 5.98 9.74 30.71 1.28 2.25 7.54 0.93 1.59 5.91 407.40% 519.34%
g5.xlarge 4.63 7.56 24.22 0.61 1.05 3.99 0.47 0.80 3.19 606.66% 760.01%
t5-base g4dn.xlarge 11.61 19.05 78.44 3.18 5.45 19.59 3.15 2.96 13.76 400.48% 569.97%
g5.xlarge 8.59 14.23 59.98 1.55 2.47 11.32 1.54 1.65 8.46 530.05% 709.20%

For more information about optimizations and replication of the attached performance, refer to Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT.

It is important to note that compilation preserves model accuracy, as it operates on the inference environment and the computation scheduling, leaving the model science unaltered – unlike weight removal compression such as distillation or pruning. NVIDIA TensorRT allows to combine compilation with quantization for further gains. Quantization has double benefits on recent NVIDIA hardware: it reduces memory usage, and enables the use of NVIDIA Tensor Cores, DL-specific cells that run a fused matrix-multiply-add in mixed precision.

In the case of the Amazon Search experimentation with Hugging Face T5 model, replacing PyTorch with TensorRT for model inference increases speed by 534%.

NVIDIA Triton: Low-latency, high-throughput inference serving

Modern model serving solutions can transform offline trained models into customer-facing ML-powered products. To maintain reasonable costs at such a scale, it’s important to keep serving overhead low (HTTP handling, preprocessing and postprocessing, CPU-GPU communication), and fully take advantage of the parallel processing ability of GPUs.

NVIDIA Triton is an inference serving software proposing wide support of model runtimes (NVIDIA TensorRT, ONNX, PyTorch, XGBoost among others) and infrastructure backends, including GPUs, CPU and AWS Inferentia.

ML practitioners love Triton for multiple reasons. Its dynamic batching ability allows to accumulate inference requests during a user-defined delay and within a maximal user-defined batch size, so that GPU inference is batched, amortizing the CPU-GPU communication overhead. Note that dynamic batching happens server-side and within very short time frames, so that the requesting client still has a synchronous, near-real-time invocation experience. Triton users also enjoy its concurrent model execution capacity. GPUs are powerful multitaskers that excel in executing compute-intensive workloads in parallel. Triton maximize the GPU utilization and throughput by using CUDA streams to run multiple model instances concurrently. These model instances can be different models from different frameworks for different use cases, or a direct copy of the same model. This translates to direct throughput improvement when you have enough idle GPU memory. Also, as Triton is not tied to a specific DL development framework, it allows scientist to fully express themselves, in the tool of their choice.

With Triton on AWS, Amazon Search expects to better serve Amazon.com customers and meet latency requirements at low cost. The tight integration between the TensorRT runtime and the Triton server facilitates the development experience. Using AWS cloud infrastructure allows to scale up or down in minutes based on throughput requirements, while maintaining the bar high or reliability and security.

How AWS lowers the barrier to entry

While Amazon Search conducted this experiment on Amazon EC2 infrastructure, other AWS services exist to facilitate the development, training and hosting of state-of-the-art deep learning solutions.

For example, AWS and NVIDIA have collaborated to release a managed implementation of Triton Inference Server in Amazon SageMaker ; for more information, see Deploy fast and scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker. AWS also collaborated with Hugging Face to develop a managed, optimized integration between Amazon SageMaker and Hugging Face Transformers, the open-source framework from which Amazon Search T5 model is derived ; read more at https://aws.amazon.com/machine-learning/hugging-face/.

We encourage customers with latency-sensitive CPU and GPU deep learning serving applications to consider NVIDIA TensorRT and Triton on AWS. Let us know what you build!

Passionate about deep learning and building deep learning-based solutions for Amazon Search? Check out our careers page.


About the Authors

RJ is an engineer in Search M5 team leading the efforts for building large scale deep learning systems for training and inference. Outside of work he explores different cuisines of food and plays racquet sports.

Hemant Pugaliya is an Applied Scientist at Search M5. He works on applying latest natural language processing and deep learning research to improve customer experience on Amazon shopping worldwide. His research interests include natural language processing and large-scale machine learning systems. Outside of work, he enjoys hiking, cooking and reading.

Andy Sun is a Software Engineer and Technical Lead for Search Spelling Correction. His research interests include optimizing deep learning inference latency, and building rapid experimentation platforms. Outside of work, he enjoys filmmaking, and acrobatics.

Le Cai is a Software Engineer at Amazon Search. He works on improving Search Spelling Correction performance to help customers with their shopping experience. He is focusing on high-performance online inference and distributed training optimization for deep learning model. Outside of work, he enjoys skiing, hiking and cycling.

Anthony Ko is currently working as a software engineer at Search M5 Palo Alto, CA. He works on building tools and products for model deployment and  inference optimization. Outside of work, he enjoys cooking and playing racquet sports.

Olivier Cruchant is a Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Anish Mohan is a Machine Learning Architect at NVIDIA and the technical lead for ML and DL engagements with its customers in the greater Seattle region.

Jiahong Liu is a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Eliuth Triana is a Developer Relations Manager at NVIDIA. He connects Amazon and AWS product leaders, developers, and scientists with NVIDIA technologists and product leaders to accelerate Amazon ML/DL workloads, EC2 products, and AWS AI services. In addition, Eliuth is a passionate mountain biker, skier, and poker player.

Read More

Enable conversational chatbots for telephony using Amazon Lex and the Amazon Chime SDK

Conversational AI can deliver powerful, automated, interactive experiences through voice and text. Amazon Lex is a service that combines automatic speech recognition and natural language understanding technologies, so you can build these sophisticated conversational experiences. A common application of conversational AI is found in contact centers: self-service virtual agents. We’re excited to announce that you can now use Amazon Chime SDK Public Switched Telephone Network (PSTN) audio to enable conversational self-service applications to reduce call resolution times and automate informational responses.

The Amazon Chime SDK is a set of real-time communications components that developers can use to add audio, messaging, video, and screen-sharing to your web and mobile applications. Amazon Chime SDK PSTN audio integration with Amazon Lex enables builders to develop conversational interfaces for calls to or from the public telephone network. You can now build AI-powered self-service applications such as conversational interactive voice response systems (IVRs), virtual agents, and other telephony applications that use Session Initiation Protocol (SIP) for voice communications.

In addition, we have launched several new features. Amazon Voice Focus for PSTN provides deep learning-based noise suppression to reduce unwanted noise on calls. You can also now use machine learning (ML)-driven text-to-speech in your application through our native integration to Amazon Polly. All features are now directly integrated with Amazon Chime SDK PSTN audio.

In this post, we teach you how to build a conversational IVR system for a fictitious travel service that accepts reservations over the phone using Amazon Lex.

Solution overview

Amazon Chime SDK PSTN audio makes it easy for developers to build customized telephony applications using the agility and operational simplicity of serverless AWS Lambda functions.

For this solution, we use the following components:

  • Amazon Chime SDK PSTN audio
  • AWS Lambda
  • Amazon Lex
  • Amazon Polly

Amazon Lex natively integrates with Amazon Polly to provide text-to-speech capabilities. In this post, we also enable Amazon Voice Focus to reduce background noise on phone calls. In a previous post, we showed how to integrate with Amazon Lex v1 using the API interface. That is no longer required. The heavy lifting of working with Amazon Lex and Amazon Polly is now replaced by a few simple function calls.

The following diagram illustrates the high-level design of the Amazon Chime SDK Amazon Lex chatbot system.

To help you learn to build using the Amazon Chime SDK PSTN audio service, we have published a repository of source code and documentation explaining how that source code works. The source code is in a workshop format, with each example program building upon the previous lesson. The final lesson is how to build a complete Amazon Lex-driven chatbot over the phone. That is the lesson we focus on in this post.

As part of this solution, you create the following resources:

  • SIP media application – A managed object that specifies a Lambda function to invoke.
  • SIP rule – A managed object that specifies a phone number to trigger on and which SIP media application managed object to use to invoke a Lambda function.
  • Phone number – An Amazon Chime SDK PSTN phone number provisioned for receiving phone calls.
  • Lambda function – A function written in Typescript that is integrated with the PSTN audio service. It receives invocations from the SIP media application and sends actions back that instruct the SIP media application to perform Amazon Polly and Amazon Lex tasks.

The demo code is deployed in two parts. The Amazon Lex chatbot example is one of a series of workshop examples that teach how to use Amazon Chime SDK PSTN audio. For this post, you complete the following high-level steps to deploy the chatbot:

  1. Configure the Amazon Lex chatbot.
  2. Clone the code from the GitHub repository.
  3. Deploy the common resources for the workshop (including a phone number).
  4. Deploy the Lambda function that connects Amazon Lex to the phone number.

We go through each step in detail.

Prerequisites

You must have the following prerequisites:

  • node V12+/npm installed
  • The AWS Command Line Interface (AWS CLI) installed
  • Node Version Manager (nvm) installed
  • The node modules typescript aws-sdk (using nvm) installed
  • AWS credentials configured for the account and Region that you use for this demo
  • Permissions to create Amazon Chime SIP media applications and phone numbers (make sure your service quota in us-east-1 or us-west-2 for phone numbers, voice connectors, SIP media applications, and SIP rules hasn’t been reached)
  • Deployment must be done in us-east-1 or us-west-2 to align with PSTN audio resources

For detailed installation instructions, including a script that can automate the installation and an AWS Cloud Development Kit (AWS CDK) project to easily create an Amazon Elastic Compute Cloud (Amazon EC2) development environment, see the workshop instructions.

Configure the Amazon Lex chatbot

You can build a complete conversational voice bot using Amazon Lex. In this example, you use the Amazon Lex console to build a bot. We skip the steps where you build the Lambda function for Amazon Lex. The focus here is how to connect Amazon Chime PSTN audio to Amazon Lex. For instructions on building custom Amazon Lex bots, refer to Amazon Lex: How It Works. In this example, we use the pre-built “book trip” example.

Create a bot

To create your chatbot, complete the following steps:

  1. Sign in to the Amazon Lex console in the same Region that you deployed the Amazon Chime SDK resources in.

This must be in either us-east-1 or us-west-2, depending on where you deployed the Amazon Chime SDK resources using AWS CDK.

  1. In the navigation pane, choose Bots.
  2. Choose Create bot.
  3. Select Start with an example.

  4. For Bot name, enter a name (for example, BookTrip).
  5. For Description, enter an optional description.
  6. Under IAM permissions, select Create a role with basic Amazon Lex permissions.
  7. Under Children’s Online Privacy Protection Act, select No.

This example doesn’t need that protection, but for your own bot creation you should select this option accordingly.

  1. Under Idle session timeout¸ set Session timeout to 1 minute.
  2. You can skip the Advanced settings section.
  3. Choose Next.

  1. For Select Language, choose your preferred language (for this post, we choose English (US)).
  2. For Voice interaction, choose the voice you want to use.
  3. You can enter a voice sample and choose Play to test the phrase and confirm the voice is to your liking.
  4. Leave other settings at their default.
  5. Choose Done.

  1. In the Fulfilment section, enter the following text for On successful fulfilment:
Thank you!  We'll see you on {CheckInDate}.
  1. Under Closing responses, enter the following text for Message:

Goodbye!

  1. Choose Save intent.
  2. Choose Build.

The build process takes a few moments to complete. When it’s finished, you can test the bot on the Amazon Lex console.

Create a version

You have now built the bot. Next, we create a version.

  1. Navigate to the Versions page of your bot (under the bot name in the navigation pane).
  2. Choose Create version.
  3. Accept all the default values and choose Create.

Your new version is now listed on the Versions page.

Create an alias

Next, we create an alias.

  1. In the navigation pane, choose Aliases.
  2. Choose Create alias.
  3. For Alias name, enter a name (for example, production).
  4. Under Associate with a version, choose Version 1 on the drop-down menu.

If you had more than one version of the bot, you could choose the appropriate version here.

  1. Choose Create.

The alias is now listed on the Aliases page.

  1. On the Aliases page, choose the alias you just created.
  2. Under Resource-based policy, choose Edit.
  3. Add the following policy, which allows the Amazon Chime SDK PSTN audio to invoke Amazon Lex for you:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SMALexAccess",
      "Effect": "Allow",
      "Principal": {
        "Service": "voiceconnector.chime.amazonaws.com"
      },
      "Action": "lex:StartConversation",
      "Resource": "<Resource-ARN-for-the-Alias>",
      "Condition": {
        "StringEquals": {
          "AWS:SourceAccount": "<account-num>"
        },
        "ArnEquals": {
          "AWS:SourceArn": "arn:aws:voiceconnector:<region>:<account-num>:*"
        }
      }
    }
  ]
}

In the preceding code, provide the resource ARN (located directly above the text box), which is the ARN for the bot alias. Also provide your account number and specify the Region you’re deploying into (us-east-1 or us-west-2). That defines the ARN of the PSTN audio control plane in your account.

  1. Choose Save to store the policy.
  2. Choose Copy next to the resource ARN to use in a later step.

Congratulations! You have configured an Amazon Lex bot!

In a real chatbot application, you would almost certainly implement a Lambda function to process the intents. This demo program focuses on explaining how to connect to Amazon Chime SDK PSTN audio, so we don’t go into that level of detail. For more information, refer to Add the Lambda Function as a Code Hook.

Clone the GitHub repository

You can get the code for the entire workshop by cloning the repository:

git clone https://github.com/aws-samples/amazon-chime-sdk-pstn-audio-workshop
cd amazon-chime-sdk-pstn-audio-workshop

Deploy the common resources for the workshop

This workshop uses the AWS CDK to automate the deployment of all needed resources (except the Amazon Lex bot, which you already did). To deploy, run the following code from your terminal:

cdk bootstrap
yarn deploy

The AWS CDK deploys the resources. We do the bootstrap step to make sure that AWS CDK is properly initialized in the Region you’re deploying into. Note that these examples use AWS CDK version 2.

The repository has a series of lessons that are designed to explain how to develop PSTN audio applications. We recommend reviewing these documents to understand the basics using the first few sample programs. You can then review the Lambda sample program folder. Lastly, follow the steps to configure and then deploy your code. In the terminal, enter the following command:

cd lambdas/call-lex-bot

Configure your Lambda function to use the Amazon Lex bot ARN

Open the src/index.ts source code file for the Lambda function and edit the variable botAlias near the top of the file (provide the ARN you copied earlier):

const botAlias = "<Resource-ARN-for-the-Alias>";

You can now deploy the bot with yarn deploy and swap the new Lambda function into PSTN audio with yarn swap. You can also note the welcome text in the startBotConversationAction object:

const startBotConversationAction = {
  Type: "StartBotConversation",
  Parameters: {
    BotAliasArn: "none",
    LocaleId: "en_US",
    Configuration: {
      SessionState: {
        DialogAction: {
          Type: "ElicitIntent"
        }
      },
      WelcomeMessages: [
        {
          ContentType: "PlainText",
          Content: "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to book a room, or, I'd like to rent a car."
        },
      ]
    }
  }
}

Amazon Lex starts the bot and uses Amazon Polly to read that text. This gives the caller a greeting, and tells them what they should do next.

How it works

The following example adds more actions to what we learned in the Call and Bridge Call lesson. The NEW_INBOUND_CALL event arrives and is processed the same way. We enable Amazon Voice Focus (which enhances the ability of Amazon Lex to understand words) and then immediately hand the incoming call off to the bot with a StartBotConversation action. An example of that action looks like the following object:

{
    "SchemaVersion": "1.0",
    "Actions": [
        {
            "Type": "Pause",
            "Parameters": {
                "DurationInMilliseconds": "1000"
            }
        },
        {
            "Type": "VoiceFocus",
            "Parameters": {
                "Enable": true,
                "CallId": "2947dfba-0748-46fc-abc5-a2c21c7569eb"
            }
        },
        {
            "Type": "StartBotConversation",
            "Parameters": {
                "BotAliasArn": "arn:aws:lex:us-east-1:<account-num>:bot-alias/RQXM74UXC7/ZYXLOINIJL",
                "LocaleId": "en_US",
                "Configuration": {
                    "SessionState": {
                        "DialogAction": {
                            "Type": "ElicitIntent"
                        }
                    },
                    "WelcomeMessages": [
                        {
                            "ContentType": "PlainText",
                            "Content": "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to order flowers."
                        }
                    ]
                }
            }
        }
    ]
}

When the bot returns an ACTION_SUCCESSFUL event, the data collected by the Amazon Lex bot is included in the event. The collected data from the bot is included, and your Lambda function can use that data if needed. However, a common practice for building Amazon Lex applications is to process in the data with the function associated with the Amazon Lex bot. Examples of the event and the returned action are provided in the workshop documentation for this session.

Sequence diagram

The following diagram shows the sequence of calls made between PSTN audio and the Lambda function:

For a more detailed explanation of the operation, refer to the workshop documentation.

Clean up

To clean up the resources used in this demo and avoid incurring further charges, complete the following steps:

  1. In the terminal, enter the following code:
yarn destroy
  1. Return to the workshop folder (cd ../../) and enter the following code:
yarn destroy

The AWS CloudFormation stack created by the AWS CDK is destroyed, removing all the allocated resources.

Conclusion

In this post, you learned how to build a conversational interactive voice response (IVR) system using Amazon Lex and Amazon Chime SDK PSTN audio. You can use these techniques to build your own system to reduce your own customer call resolution times and automate informational responses on your customers calls.

For more information, see the project GitHub repository and Using the Amazon Chime SDK PSTN Audio service.


About the Author

Greg Herlein has led software teams for over 25 years at large and small companies, including several startups. He is currently the Principal Evangelist for the Amazon Chime SDK service where he is passionate about how to help customers build advanced communications software.

Read More

Build a traceable, custom, multi-format document parsing pipeline with Amazon Textract

Organizational forms serve as a primary business tool across industries—from financial services, to healthcare, and more. Consider, for example, tax filing forms in the tax management industry, where new forms come out each year with largely the same information. AWS customers across sectors need to process and store information in forms as part of their daily business practice. These forms often serve as a primary means for information to flow into an organization where technological means of data capture are impractical.

In addition to using forms to capture information, over the years of offering Amazon Textract, we have observed that AWS customers frequently version their organizational forms based on structural changes made, fields added or changed, or other considerations such as a change of year or version of the form.

When the structure or content of a form changes, frequently this can cause challenges for traditional OCR systems or impact downstream tools used to capture information, even when you need to capture the same information year over year and aggregate the data for use regardless of the format of the document.

To solve this problem, in this post we demonstrate how you can build and deploy an event-driven, serverless, multi-format document parsing pipeline with Amazon Textract.

Solution overview

The following diagram illustrates our solution architecture:

First, the solution offers pipeline ingest using Amazon Simple Storage Service (Amazon S3), Amazon S3 Event Notifications, and an Amazon Simple Queue Service (Amazon SQS) queue so that processing begins when a form lands in the target Amazon S3 partition. An event on Amazon EventBridge is created and sent to an AWS Lambda target that triggers an Amazon Textract job.

You can use serverless AWS services such as Lambda and AWS Step Functions to create asynchronous service integrations between AWS AI services and AWS Analytics and Database services for warehousing, analytics, and AI and machine learning (ML). In this post, we demonstrate how to use Step Functions to asynchronously control and maintain the state of requests to Amazon Textract asynchronous APIs. This is achieved by using a state machine for managing calls and responses. We use Lambda within the state machine to merge the paginated API response data from Amazon Textract into a single JSON object containing semi-structured text data extracted using OCR.

Then we filter across different forms using a standardized approach to aggregate this OCR data into a common structured format using Amazon Athena and a SQL Amazon Textract JSON SerDe.

You can trace the steps taken through this pipeline using serverless Step Functions to track the processing state and retain the output of each state. This is something that customers in some industries prefer to do when working with data where you must retain the results of all predictions from services such as Amazon Textract for promoting explainability of your pipeline results in the long term.

Finally, you can query the extracted data in Athena tables.

In the following sections, we walk you through setting up the pipeline using AWS CloudFormation, testing the pipeline, and adding new form versions. This pipeline provides a maintainable solution because every component (ingest, text extraction, text processing) is independent and isolated.

Define default input parameters for CloudFormation stacks

To define the input parameters for the CloudFormation stacks, open default.properties under the params folder and enter the following code:

- set the default value for parameter 'pInputBucketName' for Input S3 bucket 
- set the default value for parameter 'pOutputBucketName' for Output S3 bucket 
- set the default value for parameter 'pInputQueueName' for Ingest SQS (a.k.a job scheduler)

Deploy the solution

To deploy your pipeline, complete the following steps:

  1. Choose Launch Stack:
  2. Choose Next.
  3. Specify the stack details as shown in the following screenshot and choose Next.
  4. In the Configure stack options section, add optional tags, permissions, and other advanced settings.
  5. Choose Next.
  6. Review the stack details and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  7. Choose Create stack.

This initiates stack deployment in your AWS account.

After the stack is deployed successfully, then you can start testing the pipeline as described in the next section.

Test the pipeline

After a successful deployment, complete the following steps to test your pipeline:

  1. Download the sample files onto your computer.
  2. Create an /uploads folder (partition) under the newly created input S3 bucket.
  3. Create the separate folders (partitions) like jobapplications under /uploads.
  4. Upload the first version of the job application from the sample docs folder to the /uploads/jobapplications partition.

When the pipeline is complete, you can find the extracted key-value for this version of the document in /OuputS3/03-textract-parsed-output/jobapplications on the Amazon S3 console.

You can also find it in the Athena table (applications_data_table) on the Database menu (jobapplicationsdatabase).

  1. Upload the second version of the job application from the sample docs folder to the /uploads/jobapplications partition.

When the pipeline is complete, you can find the extracted key-value for this version in /OuputS3/03-textract-parsed-output/jobapplications on the Amazon S3 console.

You can also find it in the Athena table (applications_data_table) on the Database menu (jobapplicationsdatabase).

You’re done! You’ve successfully deployed your pipeline.

Add new form versions

Updating the solution for a new form version is straightforward—each form version only needs to be updated by testing the queries in the processing stack.

After you make the updates, you can redeploy the updated pipeline using AWS CloudFormation APIs and process new documents, arriving at the same standard data points for your schema with minimal disruption and development effort needed to make changes to your pipeline. This flexibility, which is achieved by decoupling the parsing and extraction behavior and using the JSON SerDe functionality in Athena, makes this pipeline a maintainable solution for any number of form versions that your organization needs to process to gather information.

As you run the ingest solution, data from incoming forms is automatically populated to Athena with information about the files and inputs associated to them. When the data in your forms moves from unstructured to structured data, it’s ready to use for downstream applications such as analytics, ML modeling, and more.

Clean up

To avoid incurring ongoing charges, delete the resources you created as part of this solution when you’re done.

  1. On the Amazon S3 console, manually delete the buckets you created as part of the CloudFormation stack.
  2. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  3. Select the main stack and choose Delete.

This automatically deletes the nested stacks.

Conclusion

In this post, we demonstrated how customers seeking to trace and customize the document processing can build and deploy an event-driven, serverless, multi-format document parsing pipeline with Amazon Textract. This pipeline provides a maintainable solution because every component (ingest, text extraction, text processing) are independent and isolated, allowing organizations to operationalize their solutions to address diverse processing needs.

Try the solution today and leave your feedback in the comments section.


About the Authors

Emily Soward is a Data Scientist with AWS Professional Services. She holds a Master of Science with Distinction in Artificial Intelligence from the University of Edinburgh in Scotland, United Kingdom with emphasis on Natural Language Processing (NLP). Emily has served in applied scientific and engineering roles focused on AI-enabled product research and development, operational excellence, and governance for AI workloads running at organizations in the public and private sector. She contributes to customer guidance as an AWS Senior Speaker and recently, as an author for AWS Well-Architected in the Machine Learning Lens.

Sandeep Singh is a Data Scientist with AWS Professional Services. He holds a Master of Science in Information Systems with concentration in AI and Data Science from San Diego State University (SDSU), California. He is a full stack Data Scientist with a strong computer science background and Trusted adviser with specialization in AI Systems and Control design. He is passionate about helping customers to get their high impact projects in the right direction, advising and guiding them in their Cloud journey, and building state-of-the-art AI/ML enabled solutions.

Read More

Amazon SageMaker JumpStart models and algorithms now available via API

In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it easier to develop high-quality models and reducing time to deployment.

Previously, all JumpStart content was available only through Amazon SageMaker Studio, which provides a user-friendly graphical interface to interact with the feature. Today, we’re excited to announce the launch of easy-to-use JumpStart APIs as an extension of the SageMaker Python SDK. These APIs allow you to programmatically deploy and fine-tune a vast selection of JumpStart-supported pre-trained models on your own datasets. This launch unlocks the usage of JumpStart capabilities in your code workflows, MLOps pipelines, and anywhere else you’re interacting with SageMaker via SDK.

In this post, we provide an update on the current state of JumpStart’s capabilities and guide you through the usage flow of the JumpStart API with an example use case.

JumpStart overview

JumpStart is a multi-faceted product that includes different capabilities to help get you quickly started with ML on SageMaker. At the time of writing, JumpStart enables you to do the following:

  • Deploy pre-trained models for common ML tasks – JumpStart enables you to solve common ML tasks with no development effort by providing easy deployment of models pre-trained on publicly available large datasets. The ML research community has put a large amount of effort into making a majority of recently developed models publicly available for use. JumpStart hosts a collection of over 300 models, spanning the 15 most popular ML tasks such as object detection, text classification, and text generation, making it easy for beginners to use them. These models are drawn from popular model hubs, such as TensorFlow, PyTorch, Hugging Face, and MXNet Hub.
  • Fine-tune pre-trained models – JumpStart allows you to fine-tune pre-trained models with no need to write your own training algorithm. In ML, the ability to transfer the knowledge learned in one domain to another domain is called transfer learning. You can use transfer learning to produce accurate models on your smaller datasets, with much lower training costs than the ones involved in training the original model from scratch. JumpStart also includes popular training algorithms based on LightGBM, CatBoost, XGBoost, and Scikit-learn that you can train from scratch for tabular data regression and classification.
  • Use pre-built solutions – JumpStart provides a set of 17 pre-built solutions for common ML use cases, such as demand forecasting and industrial and financial applications, which you can deploy with just a few clicks. The solutions are end-to-end ML applications that string together various AWS services to solve a particular business use case. They use AWS CloudFormation templates and reference architectures for quick deployment, which means they are fully customizable.
  • Use notebook examples for SageMaker algorithms – SageMaker provides a suite of built-in algorithms to help data scientists and ML practitioners get started with training and deploying ML models quickly. JumpStart provides sample notebooks that you can use to quickly use these algorithms.
  • Take advantage of training videos and blogs – JumpStart also provides numerous blog posts and videos that teach you how to use different functionalities within SageMaker.

JumpStart accepts custom VPC settings and KMS encryption keys, so that you can use the available models and solutions securely within your enterprise environment. You can pass your security settings to JumpStart within SageMaker Studio or through the SageMaker Python SDK.

JumpStart-supported ML tasks and API example Notebooks

JumpStart currently supports 15 of the most popular ML tasks; 13 of them are vision and NLP-based tasks, of which 8 support no-code fine-tuning. It also supports four popular algorithms for tabular data modeling. The tasks and links to their sample notebooks are summarized in the following table.

Task Inference with pre-trained models Training on custom dataset Frameworks supported Example Notebooks
Image Classification yes yes PyTorch, TensorFlow Introduction to JumpStart – Image Classification
Object Detection yes yes PyTorch, TensorFlow, MXNet Introduction to JumpStart – Object Detection
Semantic Segmentation yes yes MXNet Introduction to JumpStart – Semantic Segmentation
Instance Segmentation yes no MXNet Introduction to JumpStart – Instance Segmentation
Image Embedding yes no TensorFlow, MXNet Introduction to JumpStart – Image Embedding
Text Classification yes yes TensorFlow Introduction to JumpStart – Text Classification
Sentence Pair Classification yes yes TensorFlow, Hugging Face Introduction to JumpStart – Sentence Pair Classification
Question Answering yes yes PyTorch Introduction to JumpStart – Question Answering
Named Entity Recognition yes no Hugging Face Introduction to JumpStart – Named Entity Recognition
Text Summarization yes no Hugging Face Introduction to JumpStart – Text Summarization
Text Generation yes no Hugging Face Introduction to JumpStart – Text Generation
Machine Translation yes no Hugging Face Introduction to JumpStart – Machine Translation
Text Embedding yes no TensorFlow, MXNet Introduction to JumpStart – Text Embedding
Tabular Classification yes yes LightGBM, CatBoost, XGBoost, Linear Learner Introduction to JumpStart – Tabular Classification – LightGBM, CatBoost
Introduction to JumpStart – Tabular Classification – XGBoost, Linear Learner
Tabular Regression yes yes LightGBM, CatBoost, XGBoost, Linear Learner Introduction to JumpStart – Tabular Regression – LightGBM, CatBoost
Introduction to JumpStart – Tabular Regression – XGBoost, Linear Learner

Depending on the task, the sample notebooks linked in the preceding table can guide you on all or a subset of the following processes:

  • Select a JumpStart supported pre-trained model for your specific task.
  • Host a pre-trained model, get predictions from it in real-time, and adequately display the results.
  • Fine-tune a pre-trained model with your own selection of hyperparameters and deploy it for inference.

Fine-tune and deploy an object detection model with JumpStart APIs

In the following sections, we provide a step-by-step walkthrough of how to use the new JumpStart APIs on the representative task of object detection. We show how to use a pre-trained object detection model to identify objects from a predefined set of classes in an image with bounding boxes. Finally, we show how to fine-tune a pre-trained model on your own dataset to detect objects in images that are specific to your business needs, simply by bringing your own data. We provide an accompanying notebook for this walkthrough.

We walk through the following high-level steps:

  1. Run inference on the pre-trained model.
    1. Retrieve JumpStart artifacts and deploy an endpoint.
    2. Query the endpoint, parse the response, and display model predictions.
  2. Fine-tune the pre-trained model on your own dataset.
    1. Retrieve training artifacts.
    2. Run training.

Run inference on the pre-trained model

In this section, we choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and show how to run inference on the deployed endpoint. All the steps are available in the accompanying Jupyter notebook.

Retrieve JumpStart artifacts and deploy an endpoint

SageMaker is a platform based on Docker containers. JumpStart uses the available framework-specific SageMaker Deep Learning Containers (DLCs). We fetch any additional packages, as well as scripts to handle training and inference for the selected task. Finally, the pre-trained model artifacts are separately fetched with model_uris, which provides flexibility to the platform. You can use any number of models pre-trained for the same task with a single training or inference script. See the following code:

infer_model_id, infer_model_version  = "pytorch-od-nvidia-ssd", "*"

# Retrieve the inference docker container uri. This is the base container PyTorch image for the model selected above. 
deploy_image_uri = image_uris.retrieve(region=None, framework=None, image_scope="inference",model_id=infer_model_id, model_version=infer_model_version, instance_type=inference_instance_type)

# Retrieve the inference script uri. This includes all dependencies and scripts for model loading, inference handling etc.
deploy_source_uri = script_uris.retrieve(model_id=infer_model_id, model_version=infer_model_version, script_scope="inference")

# Retrieve the base model uri. This includes the pre-trained nvidia-ssd model and parameters.
base_model_uri = model_uris.retrieve(model_id=infer_model_id, model_version=infer_model_version, model_scope="inference")

Next, we feed the resources into a SageMaker Model instance and deploy an endpoint:

# Create the SageMaker model instance
model = Model(image_uri=deploy_image_uri, source_dir=deploy_source_uri, model_data=base_model_uri, entry_point="inference.py", role=aws_role, predictor_cls=Predictor, name=endpoint_name)

# deploy the Model. Note that we need to pass Predictor class when we deploy model through Model class for being able to run inference through the sagemaker API.
base_model_predictor = model.deploy(initial_instance_count=1, instance_type=inference_instance_type, predictor_cls=Predictor, endpoint_name=endpoint_name)

Endpoint deployment may take a few minutes to complete.

Query the endpoint, parse the response, and display predictions

To get inferences from a deployed model, an input image needs to be supplied in binary format along with an accept type. In JumpStart, you can define the number of bounding boxes returned. In the following code snippet, we predict ten bounding boxes per image by appending ;n_predictions=10 to Accept. To predict xx boxes, you can change it to ;n_predictions=xx , or get all the predicted boxes by omitting ;n_predictions=xx entirely.

def query(model_predictor, image_file_name):

    with open(image_file_name, "rb") as file:
        input_img_rb = file.read()

    return model_predictor.predict(input_img_rb,{
            "ContentType": "application/x-image",
            "Accept": "application/json;verbose;n_predictions=10"})

query_response = query(base_model_predictor, Naxos_Taverna)

The following code snippet gives you a glimpse of what object detection looks like. The probability predicted for each object class is visualized, along with its bounding box. We use the parse_response and display_predictions helper functions, which are defined in the accompanying notebook.

normalized_boxes, classes_names, confidences = parse_response(query_response)
display_predictions(Naxos_Taverna, normalized_boxes, classes_names, confidences)

The following screenshot shows the output of an image with prediction labels and bounding boxes.

Fine-tune a pre-trained model on your own dataset

Existing object detection models in JumpStart are pre-trained either on the COCO or the VOC datasets. However, if you need to identify object classes that don’t exist in the original pre-training dataset, you have to fine-tune the model on a new dataset that includes these new object types. For example, if you need to identify kitchen utensils and run inference on a deployed pre-trained SSD model, the model doesn’t recognize any characteristics of the new image types and therefore the output is incorrect.

In this section, we demonstrate how easy it is to fine-tune a pre-trained model to detect new object classes using JumpStart APIs. The full code example with more details is available in the accompanying notebook.

Retrieve training artifacts

Training artifacts are similar to the inference artifacts discussed in the preceding section. Training requires a base Docker container, namely the MXNet container in the following example code. Any additional packages required for training are included with the training scripts in train_sourcer_uri. The pre-trained model and its parameters are packaged separately.

train_model_id, train_model_version, train_scope = "mxnet-od-ssd-512-vgg16-atrous-coco","*","training"
training_instance_type = "ml.p2.xlarge"

# Retrieve the docker image. This is the base container MXNet image for the model selected above. 
train_image_uri = image_uris.retrieve(region=None, framework=None, 
                            model_id=train_model_id, model_version=train_model_version,
                            image_scope=train_scope,instance_type=training_instance_type)

# Retrieve the training script and dependencies. This contains all the necessary files including data processing, model training etc.
train_source_uri = script_uris.retrieve(model_id=train_model_id, model_version=train_model_version, script_scope=train_scope)

# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
model_id=train_model_id, model_version=train_model_version, model_scope=train_scope)

Run training

To run training, we simply feed the required artifacts along with some additional parameters to a SageMaker Estimator and call the .fit function:

# Create SageMaker Estimator instance
od_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",  # Entry-point file in source_dir and present in train_source_uri.
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

# Launch a SageMaker Training job by passing s3 path of the training data
od_estimator.fit({"training": training_dataset_s3_path}, logs=True)

While the algorithm trains, you can monitor its progress either in the SageMaker notebook where you’re running the code itself, or on Amazon CloudWatch. When training is complete, the fine-tuned model artifacts are uploaded to the Amazon Simple Storage Service (Amazon S3) output location specified in the training configuration. You can now deploy the model in the same manner as the pre-trained model. You can follow the rest of the process in the accompanying notebook.

Conclusion

In this post, we described the value of the newly released JumpStart APIs and how to use them. We provided links to 17 example notebooks for the different ML tasks supported in JumpStart, and walked you through the object detection notebook.

We look forward to hearing from you as you experiment with JumpStart.


About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post-Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design, and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

João Moura is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use-cases and helping customers optimize Deep Learning model training and deployment.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, and ACL conferences.

Read More