NVIDIA Inception Introduces New and Updated Benefits for Startup Members to Accelerate Computing

This week at GTC, we’re celebrating – celebrating the amazing and impactful work that developers and startups are doing around the world.

Nowhere is that more apparent than among the members of our global NVIDIA Inception program, designed to nurture cutting-edge startups who are revolutionizing industries. The program is free for startups of all sizes and stages of growth, offering go-to-market support, expertise and technology.

Inception members are doing amazing things on NVIDIA platforms across a multitude of areas, from digital twins and climate science, to healthcare and robotics. Now with over 10,000 members in 110 countries, Inception is a true reflection of the global startup ecosystem.

And we’re continuing momentum by offering new benefits to help startups accelerate even more.

Expanded Benefits

Inception members are now eligible for discounts across the NVIDIA Enterprise Software Suite, including NVIDIA AI Enterprise (NVAIE), Omniverse Enterprise and Riva Enterprise. NVAIE is a cloud-native software suite that is optimized, certified and supported by NVIDIA to streamline AI development and deployment. NVIDIA Omniverse Enterprise positions startups to build high-quality 3D tools or to simplify and accelerate complex 3D workflows. NVIDIA Riva Enterprise helps easily develop real-time applications like virtual assistants, transcription services and chatbots.

These discounts provide Inception members greater access to NVIDIA software tools to build computing applications in alignment with their own solutions.

Another new benefit for Inception members is access to special leasing for NVIDIA DGX systems. Available now for members in the U.S., this offers an enhanced opportunity for startups to leverage DGX to deliver leading solutions for enterprise AI infrastructure at scale.

Inception members continue to receive credits and exclusive discounts for technical self-paced courses and instructor-led workshops through the NVIDIA Deep Learning Institute. Upcoming DLI workshops include “Building Conversational AI Applications” and “Applications of AI for Predictive Maintenance” and courses include “Building Real-TIme Video AI Applications” and “Deploying a Model for Inference at Production Scale.”

A Growing Ecosystem

NVIDIA Inception is home for startups to do all types of interesting work, and welcomes developers in every field, area and industry.

Within the program, healthcare is a leading field, with over 1,600 healthcare startups. This is followed closely by over 1,500 IT services startups, more than 825 media and entertainment (M&E) startups and upwards of 800 video analytics startups. More than 660 robotics startups are members of Inception, paving the next wave of AI, through digital and physical robots.

An indicator of Inception’s growing popularity is the increase in startups who are doing work in emerging areas, such as NVIDIA Omniverse, a development platform for 3D design collaboration and real-time, physically accurate simulation, as well as climate sciences and more. Several Inception startups are already developing on the Omniverse platform.

Inception member Charisma is leveraging Omniverse to build digital humans for virtual worlds, games and education. The company enters interactive dialogue into the Omniverse Audio2Face app, tapping into NVIDIA V100 Tensor Core GPUs in the cloud.

Another Inception member, RIOS, helps enterprises automate factories, warehouses and supply chain operations by deploying AI-powered end-to-end robotic workcells. The company is harnessing Isaac Sim on Omniverse, which it also uses for customer deployments.

And RADiCAL is developing computer vision technology focused on detecting and reconstructing 3D human motion from 2D content. The startup is already developing on Omniverse to accelerate its work.

In the field of climate science, many Inception members are also doing revolutionary work to push the boundaries of what’s possible.

Inception member TrueOcean is running NVIDIA DGX A100 systems to develop AI algorithms for predicting quantification of carbon dioxide capture within seagrass meadows as well as for understanding subsea geology. Seagrass meadows can absorb and store carbon in the oxygen-depleted seabed, where it decomposes much slower than on land.

In alignment with NVIDIA’s own plans to build the world’s most powerful AI supercomputer for predicting climate change, Inception member Blackshark provides a semantic, photorealistic 3D digital twin of Earth as a plugin for Unreal Engine, relying on Omniverse as one its platforms for building large virtual geographic environments.

If you’re a startup doing disruptive and exciting development, join NVIDIA Inception today.

Check out GTC sessions on Omniverse and climate change from NVIDIA Inception members. Registration is free. And watch NVIDIA founder and CEO Jensen Huang’s GTC keynote address, which features a new I AM AI video with Inception members HeartDub and PRENAV.

The post NVIDIA Inception Introduces New and Updated Benefits for Startup Members to Accelerate Computing appeared first on NVIDIA Blog.

Read More

NVIDIA Omniverse Upgrade Delivers Extraordinary Benefits to 3D Content Creators

At GTC, NVIDIA announced significant updates for millions of creators using the NVIDIA Omniverse real-time 3D design collaboration platform.

The announcements kicked off with updates to the Omniverse apps Create, Machinima and Showroom, with an immement View release. Powered by GeForce RTX and NVIDIA RTX GPUs, they dramatically accelerate 3D creative workflows.

New Omniverse Connections are expanding the ecosystem and are now available in beta: Unreal Engine 5 Omniverse Connector and the Adobe Substance 3D Material Extension, with the Adobe Substance 3D Painter Omniverse Connector very close behind.

Maxon’s Cinema 4D now has Universal Scene Description (USD) support. Unlocking Cinema 4D workflows via OmniDrive brings deeper integration and flexibility to the Omniverse ecosystem.

Leveraging the hydra render delegate feature, artists can now use Pixar HDStorm, Chaos V-Ray, Maxon Redshift and OTOY Octane Hydra render delegates within the viewport of all Omniverse apps, with Blender Cycles coming soon.

Whether refining 3D scenes or exporting final projects, artists can switch between the lightning-fast Omniverse RTX Renderer or their preferred renderer, giving them ultimate freedom to create however they like.

The Junk Shop by Alex Treviño. Original Concept by Anaïs Maamar. Note Hydra render delegates displayed in the renderer toggle menu.

These updates and more are available today in the Omniverse launcher, free to download, alongside the March NVIDIA Studio Driver release.

To celebrate the Machinima app update, we’re kicking off the #MadeInMachinima contest, in which artists can remix iconic characters from Squad, Mount & Blade II: Bannerlord and Mechwarrior 5 into a cinematic short in Omniverse Machinima to win NVIDIA Studio laptops. The submission window opens on March 29 and runs through June 27. Visit the contest landing page for details.

Can’t Wait to Create

Omniverse Create allows users to interactively assemble full-fidelity scenes by connecting to their favorite creative apps. Artists can add lighting, simulate physically accurate scenes and choose to render with Omniverse’s advanced RTX Renderer, or their favorite Hydra Render delegate.

Create version 2022.1 includes USD support for NURBS curves, a type of curve modeling useful for hair, particles and more. Scenes can now be rendered in passes with arbitrary output variables, or AOVs, delivering more control to artists during the compositing stage.

Animation curve editing is now possible with the addition of a graph editor. The feature helps animators feel comfortable working in creative apps such as Autodesk Maya and Blender. They can iterate simpler, faster and more intuitively.

The new ActionGraph feature unlocks keyboard shortcuts and user-interface buttons to trigger complex events simultaneously.

Apply different colors and textures with ease in Omniverse Create.

NVIDIA PhysX 5.0 updates provide soft and deformable body support for objects such as fabric, jelly and balloons, adding further realism to scenes with no animation necessary.

VMaterials 2.0, a curated collection of MDL materials and lights, now has over 900 physical materials for artists to apply physically accurate, real-world materials to their scenes with just a double click, no shader writing necessary.

Several new Create features are also available in beta:

  • AnimGraph based on OmniGraph brings characters to life with a new graph editor for simple, no-code, realistic animations.
  • New animation retargeting allows artists to map animations from one character to another, automating complex animation tasks such as joint mapping, reference post matching and previewing. When used with AnimGraph, artists can automate character rigging, saving artists countless hours of manual, tedious work.
  • Users can drag and drop assets they own, or click on others to purchase directly from the asset’s product page. Nearly 1 million assets from TurboSquid by Shutterstock, Sketchfab and Reallusion ActorCore are directly searchable in the Omniverse asset browser.

This otherworldly set of features is Create-ing infectious excitement for 3D workflows.

Machinima Magic

Omniverse Machinima 2022.1 beta provides tools for artists to remix, recreate and redefine animated video game storytelling through immersive visualization, collaborative design and photorealistic rendering.

The integration of NVIDIA Maxine’s body pose estimation feature gives users the ability to track and capture motion in real time using a single camera — without requiring a MoCap suit — with live conversion from a 2D camera capture to a 3D model.

Prerecorded videos can now be converted to animations with a new easy-to-use interface.

The retargeting feature applies these captured animations to custom-built skeletons, providing an easy way to animate a character with a webcam. No fancy, expensive device necessary, just a webcam.

Sequencer functionality updates include a new user interface for easier navigation; new tools including splitting, looping, hold and scale; more drag-and-drop functionality to simplify pipelines; and a new audio graph display.

Stitching and building cinematics is now as intuitive as editing video projects.

Step Into the Showroom

Omniverse Showroom 2022.1 includes seven new scenes that invite the newest of users to get started and embrace the incredible possibilities and technology within the platform.

Artists can engage with tech demos showcasing PhysX, rigid and soft body dynamics, flow, combustible fluid, smoke and fire, and blast, featuring destruction and fractures.

Enjoy the View

Omniverse View 2022.1 will enable non-technical project reviewers to collaboratively and interactively review 3D design projects in stunning photorealism, with several astonishing new features.

Markup gives artists the ability to add 2D feedback based on their viewpoint, including shapes and scribbles, for 3D feedback in the cloud.

Turntable places an interactive scene on a virtual table that can be rotated to see how realistic lighting conditions affect the scene in real time, advantageous for high-end movie production and architects.

Teleport and Waypoints allow artists to easily jump around their scenes and preset fully interactive views of Omniverse scenes for sharing.

Omniverse Ecosystem Expansion Continues

New beta Omniverse Connectors and extensions add variety and versatility to 3D creative workflows.

Now available, an Omniverse Connector for Unreal Engine 5 allows live-sync workflows.

The Adobe Substance 3D Material extension is now available, with a beta Substance 3D Painter Omniverse Connector coming soon, enabling artists to achieve more seamless, live-sync texture and material workflows.

Maxon’s Cinema4D now supports USD and is compatible with OmniDrive, unlocking Omniverse workflows for visualization specialists.

Finally, a new CAD importer enables product designers to convert 26 popular CAD formats into Omniverse USD scenes.

More Machinima Magic — With Prizes

The #MadeInMachinima contest asks participants to build scenes and assets — composed of characters from Squad, Mount & Blade II: Bannerlord and Mechwarrior 5 — using Omniverse Machinima.

Legendary Halo Red vs. Blue Studio, Rooster Teeth, produced this magnificent cinematic short in Machinima. Take a look to see what’s possible.

Machinima expertise, while welcome, is not required; this contest is for creators of all levels. Three talented winners will get an NVIDIA Studio laptop, powerful and purpose-built with vivid color displays and blazing-fast memory and storage, to boost future Omniverse sessions.

Machinima will be prominently featured at the Game Developers Conference, where game artists, producers, developers and designers come together to exchange ideas, educate and inspire. At the show, we also launched Omniverse for Developers, providing a more collaborative environment for the creation of virtual worlds.

NVIDIA offers sessions at GDC to assist content creators featuring virtual worlds and AI, real-time ray tracing, and developer tools. Check out the complete list.

Launch or download Omniverse today.

The post NVIDIA Omniverse Upgrade Delivers Extraordinary Benefits to 3D Content Creators appeared first on NVIDIA Blog.

Read More

Expedite IVR development with industry grammars on Amazon Lex

Amazon Lex is a service for building conversational interfaces into any application using voice and text. With Amazon Lex, you can easily build sophisticated, natural language, conversational bots (chatbots), virtual agents, and interactive voice response (IVR) systems. You can now use industry grammars to accelerate IVR development on Amazon Lex as part of your IVR migration effort. Industry grammars are a set of XML files made available as a grammar slot type. You can select from a range of pre-built industry grammars across domains, such as financial services, insurance, and telecom. In this post, we review the industry grammars for these industries and use them to create IVR experiences.

Financial services

You can use Amazon Lex in the financial services domain to automate customer service interactions such as credit card payments, mortgage loan applications, portfolio status, and account updates. During these interactions, the IVR flow needs to collect several details, including credit card number, mortgage loan ID, and portfolio details, to fulfill the user’s request. We use the financial services industry grammars in the following sample conversation:

Agent: Welcome to ACME bank. To get started, can I get your account ID?

User: Yes, it’s AB12345.

IVR: Got it. How can I help you?

User: I’d like to transfer funds to my savings account.

IVR: Sure. How much would you like to transfer?

User: $100

IVR: Great, thank you.

The following grammars are supported for financial services: account ID, credit card number, transfer amount, and different date formats such as expiration date (mm/yy) and payment date (mm/dd).

Let’s review the sample account ID grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">


        <!-- Test Cases

        Grammar will support the following inputs:

            Scenario 1:
                Input: My account number is A B C 1 2 3 4
                Output: ABC1234

            Scenario 2:
                Input: My account number is 1 2 3 4 A B C
                Output: 1234ABC

            Scenario 3:
                Input: Hmm My account number is 1 2 3 4 A B C 1
                Output: 123ABC1
        -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item><ruleref uri="#alphanumeric"/><tag>out += rules.alphanumeric.alphanum;</tag></item>
            <item repeat="0-1"><ruleref uri="#alphabets"/><tag>out += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits.numbers</tag></item>
        </rule>

        <rule id="text">
            <item repeat="0-1"><ruleref uri="#hesitation"/></item>
            <one-of>
                <item repeat="0-1">account number is</item>
                <item repeat="0-1">Account Number</item>
                <item repeat="0-1">Here is my Account Number </item>
                <item repeat="0-1">Yes, It is</item>
                <item repeat="0-1">Yes It is</item>
                <item repeat="0-1">Yes It's</item>
                <item repeat="0-1">My account Id is</item>
                <item repeat="0-1">This is the account Id</item>
                <item repeat="0-1">account Id</item>
            </one-of>
        </rule>

        <rule id="hesitation">
          <one-of>
             <item>Hmm</item>
             <item>Mmm</item>
             <item>My</item>
          </one-of>
        </rule>

        <rule id="alphanumeric" scope="public">
            <tag>out.alphanum=""</tag>
            <item><ruleref uri="#alphabets"/><tag>out.alphanum += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.alphanum += rules.digits.numbers</tag></item>
        </rule>

        <rule id="alphabets">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.letters=""</tag>
            <tag>out.firstOccurence=""</tag>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.firstOccurence += rules.digits.numbers; out.letters += out.firstOccurence;</tag></item>
            <item repeat="1-">
                <one-of>
                    <item>A<tag>out.letters+='A';</tag></item>
                    <item>B<tag>out.letters+='B';</tag></item>
                    <item>C<tag>out.letters+='C';</tag></item>
                    <item>D<tag>out.letters+='D';</tag></item>
                    <item>E<tag>out.letters+='E';</tag></item>
                    <item>F<tag>out.letters+='F';</tag></item>
                    <item>G<tag>out.letters+='G';</tag></item>
                    <item>H<tag>out.letters+='H';</tag></item>
                    <item>I<tag>out.letters+='I';</tag></item>
                    <item>J<tag>out.letters+='J';</tag></item>
                    <item>K<tag>out.letters+='K';</tag></item>
                    <item>L<tag>out.letters+='L';</tag></item>
                    <item>M<tag>out.letters+='M';</tag></item>
                    <item>N<tag>out.letters+='N';</tag></item>
                    <item>O<tag>out.letters+='O';</tag></item>
                    <item>P<tag>out.letters+='P';</tag></item>
                    <item>Q<tag>out.letters+='Q';</tag></item>
                    <item>R<tag>out.letters+='R';</tag></item>
                    <item>S<tag>out.letters+='S';</tag></item>
                    <item>T<tag>out.letters+='T';</tag></item>
                    <item>U<tag>out.letters+='U';</tag></item>
                    <item>V<tag>out.letters+='V';</tag></item>
                    <item>W<tag>out.letters+='W';</tag></item>
                    <item>X<tag>out.letters+='X';</tag></item>
                    <item>Y<tag>out.letters+='Y';</tag></item>
                    <item>Z<tag>out.letters+='Z';</tag></item>
                </one-of>
            </item>
        </rule>

        <rule id="digits">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.numbers=""</tag>
            <item repeat="1-10">
                <one-of>
                    <item>0<tag>out.numbers+=0;</tag></item>
                    <item>1<tag>out.numbers+=1;</tag></item>
                    <item>2<tag>out.numbers+=2;</tag></item>
                    <item>3<tag>out.numbers+=3;</tag></item>
                    <item>4<tag>out.numbers+=4;</tag></item>
                    <item>5<tag>out.numbers+=5;</tag></item>
                    <item>6<tag>out.numbers+=6;</tag></item>
                    <item>7<tag>out.numbers+=7;</tag></item>
                    <item>8<tag>out.numbers+=8;</tag></item>
                    <item>9<tag>out.numbers+=9;</tag></item>
                </one-of>
            </item>
        </rule>
</grammar>

Using the industry grammar for financial services

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called Financialbot and adds the grammars for financial services, which we store in Amazon Simple Storage Service (Amazon S3):

  1. Download the Amazon Lex bot definition.
  2. On the Amazon Lex console, choose Actions and then choose Import.
  3. Choose the Financialbot.zip file that you downloaded, and choose Import.
  4. Copy the grammar XML files for financial services, listed in the preceding section.
  5. On the Amazon S3 console, upload the XML files.
  6. Navigate to the slot types on the Amazon Lex console and choose the accountID slot type so you can associate the fin_accountNumber.grxml file.
  7. In the slot type, enter the Amazon S3 link for the XML file and the object key.
  8. Choose Save slot type.

The AWS Identity and Access Management (IAM) role used to create the bot must have permission to read files from the S3 bucket.

  1. Repeat steps 6–8 for the transferFunds slot type with fin_transferAmount.grxml.
  2. After you save the grammars, choose Build.
  3. Download the financial services contact flow to integrate it with the Amazon Lex bot via Amazon Connect.
  4. On the Amazon Connect console, choose Contact flows.
  5. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
  6. Select the contact flow to load it into the application.
  7. Test the IVR flow by calling in to the phone number.

Insurance

You can use Amazon Lex in the insurance domain to automate customer service interactions such as claims processing, policy management, and premium payments. During these interactions, the IVR flow needs to collect several details, including policy ID, license plate, and premium amount, to fulfill the policy holder’s request. We use the insurance industry grammars in the following sample conversation:

Agent: Welcome to ACME insurance company. To get started, can I get your policy ID?

Caller: Yes, it’s AB1234567.

IVR: Got it. How can I help you?

Caller: I’d like to file a claim.

IVR: Sure. Is this claim regarding your auto policy or home owners’ policy?

Caller: Auto

IVR: What’s the license plate on the vehicle?

Caller: ABCD1234

IVR: Thank you. And how much is the claim for?

Caller: $900

IVR: What was the date and time of the accident?

Caller: March 1st 2:30pm.

IVR: Thank you. I’ve got that started for you. Someone from our office should be in touch with you shortly. Your claim ID is 12345.

The following grammars are supported for the insurance domain: policy ID, driver’s license, social security number, license plate, claim number, and renewal date.

Let’s review the sample claimDateTime grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">

         <!-- Test Cases

         Grammar will support the following inputs:

             Scenario 1:
                 Input: The accident occured at july three at five am
                 Output:  july 3 5am

             Scenario 2:
                 Input: Damage was reported at july three at five am
                 Output:  july 3 5am

             Scenario 3:
                 Input: Schedule virtual inspection for july three at five am
                 Output:  july 3 5am
         -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item repeat="1-10">
                <item><ruleref uri="#months"/><tag>out = out + rules.months + " ";</tag></item>
                <one-of>
                    <item><ruleref uri="#digits"/><tag>out += rules.digits + " ";</tag></item>
                    <item><ruleref uri="#teens"/><tag>out += rules.teens+ " ";</tag></item>
                    <item><ruleref uri="#above_twenty"/><tag>out += rules.above_twenty+ " ";</tag></item>
                </one-of>
                <item><ruleref uri="#at"/><tag>out += rules.at.new;</tag></item>
                <item repeat="0-1"><ruleref uri="#mins"/><tag>out +=":" + rules.mins.min;</tag></item>
                <item><ruleref uri="#ampm"/><tag>out += rules.ampm;</tag></item>
            </item>
            <item repeat="0-1"><ruleref uri="#thanks"/></item>
        </rule>

        <rule id="text">
           <one-of>
             <item repeat="0-1">The accident occured at</item>
             <item repeat="0-1">Time of accident is</item>
             <item repeat="0-1">Damage was reported at</item>
             <item repeat="0-1">Schedule virtual inspection for</item>
           </one-of>
        </rule>

        <rule id="thanks">
            <one-of>
               <item>Thanks</item>
               <item>I think</item>
            </one-of>
          </rule>

        <rule id="months">
           <item repeat="0-1"><ruleref uri="#text"/></item>
           <one-of>
             <item>january<tag>out="january";</tag></item>
             <item>february<tag>out="february";</tag></item>
             <item>march<tag>out="march";</tag></item>
             <item>april<tag>out="april";</tag></item>
             <item>may<tag>out="may";</tag></item>
             <item>june<tag>out="june";</tag></item>
             <item>july<tag>out="july";</tag></item>
             <item>august<tag>out="august";</tag></item>
             <item>september<tag>out="september";</tag></item>
             <item>october<tag>out="october";</tag></item>
             <item>november<tag>out="november";</tag></item>
             <item>december<tag>out="december";</tag></item>
             <item>jan<tag>out="january";</tag></item>
             <item>feb<tag>out="february";</tag></item>
             <item>aug<tag>out="august";</tag></item>
             <item>sept<tag>out="september";</tag></item>
             <item>oct<tag>out="october";</tag></item>
             <item>nov<tag>out="november";</tag></item>
             <item>dec<tag>out="december";</tag></item>
           </one-of>
       </rule>

        <rule id="digits">
            <one-of>
                <item>0<tag>out=0;</tag></item>
                <item>1<tag>out=1;</tag></item>
                <item>2<tag>out=2;</tag></item>
                <item>3<tag>out=3;</tag></item>
                <item>4<tag>out=4;</tag></item>
                <item>5<tag>out=5;</tag></item>
                <item>6<tag>out=6;</tag></item>
                <item>7<tag>out=7;</tag></item>
                <item>8<tag>out=8;</tag></item>
                <item>9<tag>out=9;</tag></item>
                <item>first<tag>out=1;</tag></item>
                <item>second<tag>out=2;</tag></item>
                <item>third<tag>out=3;</tag></item>
                <item>fourth<tag>out=4;</tag></item>
                <item>fifth<tag>out=5;</tag></item>
                <item>sixth<tag>out=6;</tag></item>
                <item>seventh<tag>out=7;</tag></item>
                <item>eighth<tag>out=8;</tag></item>
                <item>ninth<tag>out=9;</tag></item>
                <item>one<tag>out=1;</tag></item>
                <item>two<tag>out=2;</tag></item>
                <item>three<tag>out=3;</tag></item>
                <item>four<tag>out=4;</tag></item>
                <item>five<tag>out=5;</tag></item>
                <item>six<tag>out=6;</tag></item>
                <item>seven<tag>out=7;</tag></item>
                <item>eight<tag>out=8;</tag></item>
                <item>nine<tag>out=9;</tag></item>
            </one-of>
        </rule>


      <rule id="at">
        <tag>out.new=""</tag>
        <item>at</item>
        <one-of>
          <item repeat="0-1"><ruleref uri="#digits"/><tag>out.new+= rules.digits</tag></item>
          <item repeat="0-1"><ruleref uri="#teens"/><tag>out.new+= rules.teens</tag></item>
        </one-of>
      </rule>

      <rule id="mins">
        <tag>out.min=""</tag>
        <item repeat="0-1">:</item>
        <item repeat="0-1">and</item>
        <one-of>
          <item repeat="0-1"><ruleref uri="#digits"/><tag>out.min+= rules.digits</tag></item>
          <item repeat="0-1"><ruleref uri="#teens"/><tag>out.min+= rules.teens</tag></item>
          <item repeat="0-1"><ruleref uri="#above_twenty"/><tag>out.min+= rules.above_twenty</tag></item>
        </one-of>
      </rule>

      <rule id="ampm">
            <tag>out=""</tag>
            <one-of>
                <item>AM<tag>out="am";</tag></item>
                <item>PM<tag>out="pm";</tag></item>
                <item>am<tag>out="am";</tag></item>
                <item>pm<tag>out="pm";</tag></item>
            </one-of>
        </rule>


        <rule id="teens">
            <one-of>
                <item>ten<tag>out=10;</tag></item>
                <item>tenth<tag>out=10;</tag></item>
                <item>eleven<tag>out=11;</tag></item>
                <item>twelve<tag>out=12;</tag></item>
                <item>thirteen<tag>out=13;</tag></item>
                <item>fourteen<tag>out=14;</tag></item>
                <item>fifteen<tag>out=15;</tag></item>
                <item>sixteen<tag>out=16;</tag></item>
                <item>seventeen<tag>out=17;</tag></item>
                <item>eighteen<tag>out=18;</tag></item>
                <item>nineteen<tag>out=19;</tag></item>
                <item>tenth<tag>out=10;</tag></item>
                <item>eleventh<tag>out=11;</tag></item>
                <item>twelveth<tag>out=12;</tag></item>
                <item>thirteenth<tag>out=13;</tag></item>
                <item>fourteenth<tag>out=14;</tag></item>
                <item>fifteenth<tag>out=15;</tag></item>
                <item>sixteenth<tag>out=16;</tag></item>
                <item>seventeenth<tag>out=17;</tag></item>
                <item>eighteenth<tag>out=18;</tag></item>
                <item>nineteenth<tag>out=19;</tag></item>
            </one-of>
        </rule>

        <rule id="above_twenty">
            <one-of>
                <item>twenty<tag>out=20;</tag></item>
                <item>thirty<tag>out=30;</tag></item>
            </one-of>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits;</tag></item>
        </rule>
</grammar>

Using the industry grammar for insurance

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called InsuranceBot and adds the grammars for the insurance domain:

  1. Download the Amazon Lex bot definition.
  2. On the Amazon Lex console, choose Actions, then choose Import.
  3. Choose the InsuranceBot.zip file that you downloaded, and choose Import.
  4. Copy the grammar XML files for insurance, listed in the preceding section.
  5. On the Amazon S3 console, upload the XML files.
  6. Navigate to the slot types on the Amazon Lex console and select the policyID slot type so you can associate the ins_policyNumber.grxml grammar file.
  7. In the slot type, enter the Amazon S3 link for the XML file and the object key.
  8. Choose Save slot type.

The IAM role used to create the bot must have permission to read files from the S3 bucket.

  1. Repeat steps 6–8 for the licensePlate slot type (ins_NJ_licensePlateNumber.grxml) and dateTime slot type (ins_claimDateTime.grxml).
  2. After you save the grammars, choose Build.
  3. Download the insurance contact flow to integrate with the Amazon Lex bot.
  4. On the Amazon Connect console, choose Contact flows.
  5. In the Amazon Lex section, and select your Lex bot and make it available for use in the Amazon Connect contact flows.
  6. Select the contact flow to load it into the application.
  7. Test the IVR flow by calling in to the phone number.

Telecom

You can use Amazon Lex in the telecom domain to automate customer service interactions such as activating service, paying bills, and managing device installations. During these interactions, the IVR flow needs to collect several details, including SIM number, zip code, and the service start date, to fulfill the user’s request. We use the financial services industry grammars in the following sample conversation:

Agent: Welcome to ACME cellular. To get started, can I have the telephone number associated with your account?

User: Yes, it’s 123 456 7890.

IVR: Thanks. How can I help you?

User: I am calling to activate my service.

IVR: Sure. What’s the SIM number on the device?

IVR: 12345ABC

IVR: Ok. And can I have the zip code?

User: 12345

IVR: Great, thank you. The device has been activated.

The following grammars are supported for telecom: SIM number, device serial number, zip code, phone number, service start date, and ordinals.

Let’s review the sample SIM number grammar. You can refer to the other grammars in the documentation.

<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.w3.org/2001/06/grammar
                             http://www.w3.org/TR/speech-grammar/grammar.xsd"
         xml:lang="en-US" version="1.0"
         root="main"
         mode="voice"
         tag-format="semantics/1.0">


        <!-- Test Cases

        Grammar will support the following inputs:

            Scenario 1:
                Input: My SIM number is A B C 1 2 3 4
                Output: ABC1234

            Scenario 2:
                Input: My SIM number is 1 2 3 4 A B C
                Output: 1234ABC

            Scenario 3:
                Input: My SIM number is 1 2 3 4 A B C 1
                Output: 123ABC1
        -->

        <rule id="main" scope="public">
            <tag>out=""</tag>
            <item><ruleref uri="#alphanumeric"/><tag>out += rules.alphanumeric.alphanum;</tag></item>
            <item repeat="0-1"><ruleref uri="#alphabets"/><tag>out += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out += rules.digits.numbers</tag></item>
        </rule>

        <rule id="text">
            <item repeat="0-1"><ruleref uri="#hesitation"/></item>
            <one-of>
                <item repeat="0-1">My SIM number is</item>
                <item repeat="0-1">SIM number is</item>
            </one-of>
        </rule>

        <rule id="hesitation">
          <one-of>
             <item>Hmm</item>
             <item>Mmm</item>
             <item>My</item>
          </one-of>
        </rule>

        <rule id="alphanumeric" scope="public">
            <tag>out.alphanum=""</tag>
            <item><ruleref uri="#alphabets"/><tag>out.alphanum += rules.alphabets.letters;</tag></item>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.alphanum += rules.digits.numbers</tag></item>
        </rule>

        <rule id="alphabets">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.letters=""</tag>
            <tag>out.firstOccurence=""</tag>
            <item repeat="0-1"><ruleref uri="#digits"/><tag>out.firstOccurence += rules.digits.numbers; out.letters += out.firstOccurence;</tag></item>
            <item repeat="1-">
                <one-of>
                    <item>A<tag>out.letters+='A';</tag></item>
                    <item>B<tag>out.letters+='B';</tag></item>
                    <item>C<tag>out.letters+='C';</tag></item>
                    <item>D<tag>out.letters+='D';</tag></item>
                    <item>E<tag>out.letters+='E';</tag></item>
                    <item>F<tag>out.letters+='F';</tag></item>
                    <item>G<tag>out.letters+='G';</tag></item>
                    <item>H<tag>out.letters+='H';</tag></item>
                    <item>I<tag>out.letters+='I';</tag></item>
                    <item>J<tag>out.letters+='J';</tag></item>
                    <item>K<tag>out.letters+='K';</tag></item>
                    <item>L<tag>out.letters+='L';</tag></item>
                    <item>M<tag>out.letters+='M';</tag></item>
                    <item>N<tag>out.letters+='N';</tag></item>
                    <item>O<tag>out.letters+='O';</tag></item>
                    <item>P<tag>out.letters+='P';</tag></item>
                    <item>Q<tag>out.letters+='Q';</tag></item>
                    <item>R<tag>out.letters+='R';</tag></item>
                    <item>S<tag>out.letters+='S';</tag></item>
                    <item>T<tag>out.letters+='T';</tag></item>
                    <item>U<tag>out.letters+='U';</tag></item>
                    <item>V<tag>out.letters+='V';</tag></item>
                    <item>W<tag>out.letters+='W';</tag></item>
                    <item>X<tag>out.letters+='X';</tag></item>
                    <item>Y<tag>out.letters+='Y';</tag></item>
                    <item>Z<tag>out.letters+='Z';</tag></item>
                </one-of>
            </item>
        </rule>

        <rule id="digits">
            <item repeat="0-1"><ruleref uri="#text"/></item>
            <tag>out.numbers=""</tag>
            <item repeat="1-10">
                <one-of>
                    <item>0<tag>out.numbers+=0;</tag></item>
                    <item>1<tag>out.numbers+=1;</tag></item>
                    <item>2<tag>out.numbers+=2;</tag></item>
                    <item>3<tag>out.numbers+=3;</tag></item>
                    <item>4<tag>out.numbers+=4;</tag></item>
                    <item>5<tag>out.numbers+=5;</tag></item>
                    <item>6<tag>out.numbers+=6;</tag></item>
                    <item>7<tag>out.numbers+=7;</tag></item>
                    <item>8<tag>out.numbers+=8;</tag></item>
                    <item>9<tag>out.numbers+=9;</tag></item>
                </one-of>
            </item>
        </rule>
</grammar>

Using the industry grammar for telecom

To create the sample bot and add the grammars, perform the following steps. This creates an Amazon Lex bot called TelecomBot and adds the grammars for telecom:

  1. Download the Amazon Lex bot definition.
  2. On the Amazon Lex console, choose Actions, then choose Import.
  3. Choose the TelecomBot.zip file that you downloaded, and choose Import.
  4. Copy the grammar XML files for the telecom domain, listed in the preceding section.
  5. On the Amazon S3 console, upload the XML files.
  6. Navigate to the slot types on the Amazon Lex console and select phoneNumber so you can associate the tel_phoneNumber.grxml grammar.
  7. In the slot type, enter the Amazon S3 link for the XML file and the object key.
  8. Choose Save slot type.

The IAM role used to create the bot must have permission to read files from the S3 bucket.

  1. Repeat steps 6–8 for the slot types SIM number (tel_simNumber.grxml) and zipcode (tel_usZipcode.grxml).
  2. After you save the grammars, choose Build.
  3. Download the insurance contact flow to integrate with the Amazon Lex bot.
  4. On the Amazon Connect console, choose Contact flows.
  5. In the Amazon Lex section, and select your Amazon Lex bot and make it available for use in the Amazon Connect contact flows.
  6. Select the contact flow to load it into the application.
  7. Test the IVR flow by calling in to the phone number.

Test the solution

You can call in to the Amazon Connect phone number and interact with the bot. You can also test the solution directly on the Amazon Lex V2 console using voice or text.

Conclusion

Industry grammars provide a set of pre-built XML files that you can use to quickly create IVR flows. You can select grammars to enable customer service conversations for use cases across financial services, insurance, and telecom. The grammars are available as a grammar slot type and can be used in an Amazon Lex bot configuration. You can download the grammars and enable these via the Amazon Lex V2 console or SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to Using a custom grammar slot type.


About the Authors

John Heater has over 15 years of experience in AI and automation. As the SVP of the Contact Center Practice at NeuraFlash, he leads the implementation of the latest AI and automation techniques for a portfolio of products and customer solutions.

Sandeep Srinivasan is a Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.

Read More

Easily migrate your IVR flows to Amazon Lex using the IVR migration tool

This post was co-written by John Heater, SVP of the Contact Center Practice at NeuraFlash. NeuraFlash is an Advanced AWS Partner with over 40 collective years of experience in the voice and automation space. With a dedicated team of conversation designers, data engineers, and AWS developers, NeuraFlash helps customers take advantage of the power of Amazon Lex in their contact centers.

Amazon Lex provides automatic speech recognition and natural language understanding technologies so you can build sophisticated conversational experiences and create effective interactive voice response (IVR) flows. A native integration with Amazon Connect, AWS’s cloud-based contact center, enables the addition of a conversational interface to any call center application. You can design IVR experiences to identify user requests and fulfill these by running the appropriate business logic.

Today, NeuraFlash, an AWS APN partner, launched a migration tool on AWS Marketplace that helps you easily migrate your VoiceXML (VXML) IVR flows to Amazon Lex and Amazon Connect. The migration tool takes the VXML configuration and grammar XML files as input and provides an Amazon Lex bot definition. It also supports grammars and Amazon Connect contact flows so you can quickly get started with your IVR conversational experiences.

In this post, we cover the use of IVR migration tool and review the resulting Amazon Lex bot definition and Amazon Connect contact flows.

Sample conversation overview

You can use the sample VXML and grammar files as input to try out the tool. The sample IVR supports the following conversation:

IVR: Welcome to ACME bank. For verification, can I get the last four on SSN on the account?

Caller: Yes, it’s 1234.

IVR: Great. And the date of birth for the primary holder?

Caller: Jan 1st 2000.

IVR: Thank you. How can I help you today?

Caller: I’d like to make a payment.

IVR: Sure. What’s the credit card number?

Caller: 1234 5678 1234 5678

IVR: Got it. What’s the CVV?

Caller: 123

IVR: How about the expiration date?

Caller: Jan 2025.

IVR: Great. How much are we paying today?

Caller: $100

IVR: Thank you. Your payment of $100 on card ending in 5678 is processed. Anything else we can help you with?

Caller: No thanks.

IVR: Have a great day.

Migration tool overview

The following diagram illustrates the architecture of the migration tool.

You can access the migration tool in the AWS Marketplace. Follow the instructions to upload your VXML and grammar XML files.

The tool processes the input XML files to create an IVR flow. You can download the Amazon Connect contact flow, Amazon Lex bot definition, and supporting grammar files.

Migration methodology

The IVR migration tool analyzes the uploaded IVR application and generates an Amazon Lex bot, Amazon Connect flows, and SRGS grammar files. One bot is generated per VXML application (or VXML file). Each input state in the VXML file is mapped to a dialog prompt in the Amazon Lex bot. The corresponding grammar file for the input state is used to create a grammar slot. For the Amazon Connect flow, each VXML file maps to a node in the IVR flow. Within the flow, a GetCustomerInputBlock hands off the control to Amazon Lex to manage the dialog.

Let’s consider the following VXML content in the sample dialog for user verification. You can download the VerifyAccount VXML file.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="1.0" application="app_root.vxml">


<!--*** Verify user with SSN ***-->
<form id="Verify_SSN">
  <field name="Verify_SSN">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/last4ssn.grxml'"/>
      <prompt>
            <audio expr="'./prompts/Verify_SSN/Init.wav'">
                To verify your account, can I please have the last four digits of your social security number.
            </audio>
        </prompt>
<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/Verify_SSN/nm1.wav'">
         I'm sorry, I didn't understand. Please tell me the last four digits of your social security number.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/Verify_SSN/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter the last four digits of your social security number.  You can also say I dont know if you do not have it.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/Verify_SSN/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please tell me the last four digits of your social security number.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/Verify_SSN/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter the last four digits of your social security number.  You can also say I dont know if you do not have it.
  </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="Verify_SSN.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <elseif cond="Verify_SSN.option == 'dunno'" />
                <assign name="transfer_reason" expr="'no_ssn'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="last4_ssn" expr="Verify_SSN.option"/>
                <goto next="#Verify_DOB"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Verify user with date of birth ***-->
<form id="Verify_DOB">
  <field name="Verify_DOB">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/dateofbirth.grxml'"/>
      <prompt>
            <audio expr="'./prompts/Verify_DOB/Init.wav'">
                Thank you.  And can I also have your date of birth?
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/Verify_DOB/nm1.wav'">
         I'm sorry, I didn't understand. Please say your date of birth.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/Verify_DOB/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your date of birth.  For example, you can say July twenty fifth nineteen eighty or enter zero seven two five one nine eight zero.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/Verify_DOB/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say your date of birth.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/Verify_DOB/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your date of birth.  For example, you can say July twenty fifth nineteen eighty or enter zero seven two five one nine eight zero.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="Verify_DOB.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="date_of_birth" expr="Verify_DOB.option"/>
                <goto next="validate_authentication.vxml"/>
            </if>
        </filled>
    </field>
</form>


</vxml>

In addition to the preceding VXML file, we include the following SRGS grammars from the IVR application in the IVR migration tool:

An Amazon Lex bot is created that to verify the caller. The Verification bot has one intent (VerifyAccount).

The bot has two slots (SSN, DOB) that reference the grammar files for the SSN and date of birth grammars, respectively. You can download the last4SSN.grxml and dateOfBirth.grxml grammar files as output to create the custom slot types in Amazon Lex.

In another example of a payment flow, the IVR migration tool reads in the payment collection flows to generate an Amazon Lex bot that can handle payments. You can download the corresponding Payment VXML file and SRGS grammars.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="1.0" application="app_root.vxml">


<!--*** Collect the users credit card for payment ***-->
<form id="CreditCard_Collection">
  <field name="CreditCard_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard.grxml'"/>
      <prompt>
            <audio expr="'./prompts/CreditCard_Collection/Init.wav'">
                To start your payment, can I please have your credit card number.
            </audio>
        </prompt>
<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/CreditCard_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please tell me your credit card number.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/CreditCard_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your credit card number.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/CreditCard_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please tell me your credit card number.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/CreditCard_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your credit card number.
  </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
                <assign name="creditcard_number" expr="CreditCard_Collection.option"/>
                <goto next="#ExpirationDate_Collection"/>
        </filled>
    </field>
</form>

<!--*** Collect the credit card expiration date ***-->
<form id="ExpirationDate_Collection">
  <field name="ExpirationDate_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard_expiration.grxml'"/>
      <prompt>
            <audio expr="'./prompts/ExpirationDate_Collection/Init.wav'">
                Thank you.  Now please provide your credit card expiration date.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/ExpirationDate_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the expiration date.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/ExpirationDate_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your credit card expiration date.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/ExpirationDate_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the expiration date.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/ExpirationDate_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your credit card expiration date.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="ExpirationDate_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="creditcard_expiration" expr="ExpirationDate_Collection.option"/>
                <goto next="#CVV_Collection"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Collect the credit card CVV number ***-->
<form id="CVV_Collection">
  <field name="CVV_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/creditcard_cvv.grxml'"/>
      <prompt>
            <audio expr="'./prompts/CVV_Collection/Init.wav'">
                Almost done.  Now please tell me the CVV code.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/CVV_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the CVV on the credit card.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/CVV_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter the credit card CVV.  It can be found on the back of the card.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/CVV_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the CVV on the credit card.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/CVV_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter the credit card CVV.  It can be found on the back of the card.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="CVV_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <else/>
                <assign name="creditcard_cvv" expr="CVV_Collection.option"/>
                <goto next="#PaymentAmount_Collection"/>
            </if>
        </filled>
    </field>
</form>

<!--*** Collect the payment amount ***-->
<form id="PaymentAmount_Collection">
  <field name="PaymentAmount_Collection">
      <grammar type="application/srgs+xml" srcexpr="'./grammar/amount.grxml'"/>
      <prompt>
            <audio expr="'./prompts/PaymentAmount_Collection/Init.wav'">
                Finally, please tell me how much you will be paying.  You can also say full amount.
            </audio>
        </prompt>

<!--*** Handling when the user input is not understood ***-->
<nomatch count="1">
  <audio expr="./prompts/PaymentAmount_Collection/nm1.wav'">
         I'm sorry, I didn't understand. Please say the amount of your payment.
    </audio>
</nomatch>
<nomatch count="2">
    <audio expr="./prompts/PaymentAmount_Collection/nm2.wav'">
        I'm sorry, I still didn't get that.  Please say or enter your payment amount.  If you will be paying in full you can just say full amount.
    </audio>
</nomatch>

<!--*** Handling when the user does not provide input ***-->

<noinput count="1">
    <audio expr="./prompts/PaymentAmount_Collection/ni1.wav'">
         I'm sorry, I couldn't hear you.  Please say the amount of your payment.
    </audio>
</noinput>
<noinput count="2">
  <audio expr="./prompts/PaymentAmount_Collection/ni1.wav'">
       I'm sorry, I still could not hear you. Please say or enter your payment amount.  If you will be paying in full you can just say full amount.
   </audio>
</noinput>

<!--*** Handling when the user input is recognized ***-->
        <filled>
            <if cond="PaymentAmount_Collection.option == 'agent'">
                <assign name="transfer_reason" expr="'agent'"/>
                <goto next="transfer.vxml"/>
            <elseif cond="Verify_SSN.option == 'full_amount'" />
                <assign name="creditcard_amount" expr="'full''"/>
                <goto next="processpayment.vxml"/>
            <else/>
                <assign name="creditcard_amount" expr="PaymentAmount_Collection.option"/>
                <goto next="processpayment.vxml"/>
            </if>
        </filled>
    </field>
</form>

</vxml>

In addition to the preceding VXML file, we include the following SRGS grammars from the IVR application in the IVR migration tool:

An Amazon Lex bot is created to collect the payment details. The Payment bot has one intent (MakePayment).

The bot has four slots (credit card number, expiration date, CVV, payment amount) that reference the grammar file. You can download the creditCard.grxml, creditCardExpiration.grxml, creditCardCVV.grxml, and paymentAmount.grxml grammar files as output to create the custom slot types in Amazon Lex.

Lastly, the migration tool provides the payment IVR contact flow to manage the end-to-end conversation.

Conclusion

Amazon Lex enables you to easily build sophisticated, natural language conversational experiences. The IVR migration tool allows you to easily migrate your VXML IVR flows to Amazon Lex. The tool provides the bot definitions and grammars in addition to the Amazon Connect contact flows. It enables you to migrate your IVR flows as is and get started on Amazon Lex, giving you the flexibility to build out the conversational experience at your own pace.

Use the migration tool on AWS Marketplace and migrate your IVR to Amazon Lex today.


About the Authors

John Heater has over 15 years of experience in AI and automation. As the SVP of the Contact Center Practice at NeuraFlash, he leads the implementation of the latest AI and automation techniques for a portfolio of products and customer solutions.

Sandeep Srinivasan is a Product Manager on the Amazon Lex team. As a keen observer of human behavior, he is passionate about customer experience. He spends his waking hours at the intersection of people, technology, and the future.

Read More

At GTC: NVIDIA RTX Professional Laptop GPUs Debut, New NVIDIA Studio Laptops, a Massive Omniverse Upgrade and NVIDIA Canvas Update

Digital artists and creative professionals have plenty to be excited about at NVIDIA GTC.

Impressive NVIDIA Studio laptop offerings from ASUS and MSI launch with upgraded RTX GPUs, providing more options for professional content creators to elevate and expand creative possibilities.

NVIDIA Omniverse gets a significant upgrade — including updates to the Omniverse Create, Machinima and Showroom apps; with an upcoming, imminent, View release. A new Unreal Engine Omniverse Connector beta is out now with our Adobe Substance 3D Painter Connector close behind.

Omniverse artists can now use Pixar HDStorm, Chaos V-Ray, Maxon Redshift and OTOY Octane Hydra render delegates within the viewport of all Omniverse apps, bringing more freedom and choice to 3D creative workflows, with Blender Cycles coming soon. Read our Omniverse blog for more details.

NVIDIA Canvas, the beta app sensation using advanced AI to quickly turn simple brushstrokes into realistic landscape images, has received a stylish update.

The March Studio Driver, available for download today, optimizes the latest creative app updates, featuring Blender Cycles 3.1, all with the stability and reliability NVIDIA Studio delivers.

To celebrate, NVIDIA is kicking off the #MadeInMachinima contest. Artists can remix iconic characters from Squad, Mount & Blade II: Bannerlord and Mechwarrior 5 into a cinematic short in Omniverse Machinima to win NVIDIA Studio laptops. The submission window opens on March 29 and runs through June 27. Visit the contest landing page for details.

New NVIDIA RTX Laptop GPUs Unlock Endless Creative Possibilities

Professionals on the go have powerful new laptop GPUs to choose from, with faster speeds and larger memory options: RTX A5500, RTX A4500 , RTX A3000 12GB, RTX A2000 8GB and NVIDIA RTX A1000. These GPUs incorporate the latest RTX and Max-Q technology, are available in thin and light laptops, and deliver extraordinary performance.

New NVIDIA RTX laptop GPUs tackle creative workflows enabling creation from anywhere.

Our new flagship laptop GPU, the NVIDIA RTX A5500 with 16GB of memory, is capable of handling the most challenging 3D and video workloads; with up to double the rendering performance of the previous generation RTX 5000.

The most complex, advanced, creative workflows have met their match.

NVIDIA Studio Laptop Drop

Three extraordinary Studio laptops are available for purchase today.

The ASUS ProArt Studiobook 16 is capable of incredible performance, and is configurable with a wide-range of professional and consumer GPUs. It’s rich with creative features: certified color-accurate 16-inch 120 Hz 3.2K OLED wide-view 16:10 display, a three-button touchpad for 3D designers, ASUS dial for video editing and an enlarged touchpad for stylus support.

MSI’s Creator Z16P and Z17 sport an elegant and minimalist design, featuring up to an NVIDIA RTX 3080 Ti or RTX A5500 GPU, and boast a factory-calibrated True Pixel display with QHD+ resolution and 100 percent DCI-P3 color.

NVIDIA Studio laptops are tested and validated for maximum performance and reliability. They feature the latest NVIDIA technologies that deliver real-time ray tracing, AI-enhanced features and time-saving rendering capabilities. These laptops have access to the exclusive Studio suite of software — including best-in-class Studio Drivers, NVIDIA Omniverse, Canvas, Broadcast and more.

In the weeks ahead, ASUS and GIGABYTE will make it even easier for new laptop owners to enjoy one of the Studio benefits. Upgraded livestreams, voice chats and video calls — powered by AI — will be available immediately with the NVIDIA Broadcast app preinstalled in their Pro Art and AERO product lines.

To Omniverse and Beyond

New Omniverse Connections are expanding the ecosystem and are now available in beta: Unreal Engine 5 Omniverse Connector and the Adobe Substance 3D Material Extension, with the Adobe Substance 3D Painter Omniverse Connector very close behind, allowing users to enjoy seamless, live-edit texture and material workflows.

Maxon’s Cinema4D now supports USD and is compatible with OmniDrive, unlocking Omniverse workflows for visualization specialists.

Artists can now use Pixar HD Storm, Chaos V-Ray, Maxon Redshift and OTOY Octane renderers within the viewport of all Omniverse apps, with Blender Cycles coming soon. Be it refining 3D scenes or exporting final projects, artists can switch between the lightning-fast Omniverse RTX Renderer, or their preferred renderer with advantageous features.

The Junk Shop by Alex Treviño. Original Concept by Anaïs Maamar. Note Hydra render delegates displayed in the renderer toggle menu.

CAD designers can now directly import 26 popular CAD formats into Omniverse USD scenes.

The integration of NVIDIA Maxine’s body pose estimation feature in the Omniverse Machinima app gives users the ability to track and capture motion in real time using a single camera — without requiring a MoCap suit — with live conversion from a 2D camera capture to a 3D model.

Read more about Omniverse for content creators here.

And if you haven’t downloaded Omniverse, now’s the time.

Your Canvas, Never Out of Style

Styles in Canvas — preset filters that modify the look and feel of the painting — can now be modified in up to 10 different variations.

More style variations enhance artist creativity while providing additional options within the theme of their selected style.

Check out style variations; and if you haven’t already, download Canvas, which is free for RTX owners.

3D Creative App Updates Backed by March NVIDIA Studio Driver

In addition to supporting the latest updates for NVIDIA Omniverse and NVIDIA Canvas, the March Studio Driver also supports a host of other recent creative app and renderer updates.

The highly anticipated Blender 3.1 update adds USD preview surface material export support, making it easier to move assets between USD-supported apps, including Omniverse.

Blender artists equipped with NVIDIA RTX GPUs maintain performance advantages over Mac. Midrange GeForce RTX 3060 Studio laptops deliver 3.5x faster rendering than the fastest M1 Max Macbooks per Blender’s benchmark testing.

Performance testing conducted by NVIDIA in March 2022 with Intel Core i9-12900HK, 32GB RAM and MacBook Pro 16 with M1 Max, 32GB RAM. NVIDIA Driver 511.97.

Luxion Keyshot 11 brings several updates: GPU-accelerated 3D paint features, physical simulation using NVIDIA PhysX, and NVIDIA Optix shader enhancements, speeding up animation workflows by up to 3x.

GPU Audio Inc., with an eye on the future, taps into parallel processing power for audio solutions, introducing an NVIDIA GPU-based VST filter to remove extreme frequencies and improve sound quality — an audio production game changer.

Download the March Studio Driver today.

On-Demand Sessions for Creators

Join the first GTC breakout session dedicated to the creative community.

NVIDIA Studio and Omniverse for the Next Era of Creativity” will include artists and directors from NVIDIA’s creative team. Network with fellow 3D artists and get Omniverse feature support to enhance 3D workflows. Join this free session on Wednesday, March 23, from 7-8 a.m. Pacific.

It’s just one of many Omniverse sessions available to watch live or on demand, including the featured sessions below:

Themed GTC sessions and demos covering visual effects, virtual production and rendering, AI art galleries, and building and infrastructure design are also available to help realize your creative ambition.

Real or rendered?

Also this week, game artists, producers, developers and designers are coming together for the annual Game Developers Conference where NVIDIA launched Omniverse for Developers, providing a more collaborative environment for the creation of virtual worlds.

At GDC, NVIDIA sessions to assist content creators in the gaming industry will feature virtual worlds and AI, real-time ray tracing, and developer tools. Check out the complete list.

To boost your creativity throughout the year, follow NVIDIA Studio on Facebook, Twitter and Instagram. There you’ll find the latest information on creative app updates, new Studio apps, creator contests and more. Get updates directly to your inbox by subscribing to the Studio newsletter.

The post At GTC: NVIDIA RTX Professional Laptop GPUs Debut, New NVIDIA Studio Laptops, a Massive Omniverse Upgrade and NVIDIA Canvas Update appeared first on NVIDIA Blog.

Read More

Z-code multilingual model representation diagram

Microsoft Translator enhanced with Z-code Mixture of Experts models

Z-code multilingual model representation diagram

Translator, a Microsoft Azure Cognitive Service, is adopting Z-code Mixture of Experts models, a breakthrough AI technology that significantly improves the quality of production translation models. As a component of Microsoft’s larger XYZ-code initiative to combine AI models for text, vision, audio, and language, Z-code supports the creation of AI systems that can speak, see, hear, and understand. This effort is a part of Azure AI and Project Turing, focusing on building multilingual, large-scale language models that support various production teams. Translator is using NVIDIA GPUs and Triton Inference Server to deploy and scale these models efficiently for high-performance inference. Translator is the first machine translation provider to introduce this technology live for customers.

Z-code MoE boosts efficiency and quality

Z-code models utilize a new architecture called Mixture of Experts (MoE), where different parts of the models can learn different tasks. The models learn to translate between multiple languages at the same time. The Z-code MoE model utilizes more parameters while dynamically selecting which parameters to use for a given input. This enables the model to specialize a subset of the parameters (experts) during training. At runtime, the model uses the relevant experts for the task, which is more computationally efficient than utilizing all model’s parameters.

animated graphic showing Z-code MoE model translating from English to French
Figure 1: Z-code MoE model translating from English to French. The model dynamically selects subsets of its parameters to be utilized for each input.

Newly introduced Z-code MoE models leverage transfer learning, which enables efficient knowledge sharing across similar languages. Moreover, the models utilize both parallel and monolingual data during the training process. This opens the way to high quality machine translation beyond the high-resource languages and improves the quality of low-resource languages that lack significant training data. This approach can provide a positive impact on AI fairness, since both high-resource and low-resource languages see improvements.

We have trained translation systems for research purposes with 200 billion parameters supporting 100 language pairs. Though such large systems significantly improved the translation quality, this also introduced challenges to deploy them in a production environment cost effectively. For our production model deployment, we opted for training a set of 5 billion parameter models, which are 80 times larger than our currently deployed models. We trained a multilingual model per set of languages, where each model can serve up to 20 language pairs and therefore replace up to 20 of the current systems. This enabled our model to maximize the transfer learning among languages while being deployable with effective runtime cost. We compared the quality improvements of the new MoE to the current production system using human evaluation. The figure below shows the results of the models on various language pairs. The Z-code-MoE systems outperformed individual bilingual systems, with average improvements of 4%. For instance, the models improved English to French translations by 3.2 percent, English to Turkish by 5.8 percent, Japanese to English by 7.6 percent, English to Arabic by 9.3 percent, and English to Slovenian by 15 percent.

graphic showing quality gains of Z-code MoE models over existing models. Languages are ordered by training data sizes.
Figure 2: Quality gains of Z-code MoE models over existing models. Languages are ordered by training data sizes.

Training large models with billions of parameters is challenging. The Translator team collaborated with Microsoft DeepSpeed to develop a high-performance system that helped train massive scale Z-code MoE models, enabling us to efficiently scale and deploy Z-code models for translation.

We partnered with NVIDIA to optimize faster engines that can be used at runtime to deploy the new Z-code/MoE models on GPUs. NVIDIA developed custom CUDA kernels and leveraged the CUTLASS and FasterTransformer libraries to efficiently implement MoE layers on a single V100 GPU. This implementation achieved up to 27x throughput improvements over standard GPU (PyTorch) runtimes. We used NVIDIA’s open source Triton Inference Server to serve Z-code MoE models. We used Triton’s dynamic batching feature to pool several requests into a big batch for higher throughput that enabled us to ship large models with relatively low runtime costs.

How can you use the new Z-code models?

Z-code models are available now by invitation to customers using Document Translation, a feature that translates entire documents, or volumes of documents, in a variety of different file formats preserving their original formatting. Z-code models will be made available to all customers and to other Translator products in phases. Please fill out this form to request access to Document Translation using Z-code models.

Learn more

Acknowledgements

The following people contributed to this work: Abdelrahman Abouelenin, Ahmed Salah, Akiko Eriguchi, Alex Cheng, Alex Muzio, Amr Hendy, Arul Menezes, Brad Ballinger, Christophe Poulain, Evram Narouz, Fai Sigalov, Hany Hassan Awadalla, Hitokazu Matsushita, Mohamed Afify, Raffy Bekhit, Rohit Jain, Steven Nguyen, Vikas Raunak, Vishal Chowdhary, and Young Jin Kim.

The post Microsoft Translator enhanced with Z-code Mixture of Experts models appeared first on Microsoft Research.

Read More

Animation showing the process of how encrypted data is transferred between the GPU drive and the GPU through a secure channel. The GPU driver on the host CPU and the SEC2 microcontroller on the NVIDIA A100 Tensor Core GPU work together to achieve end-to-end encryption of data transfers

Powering the next generation of trustworthy AI in a confidential cloud using NVIDIA GPUs

Animation showing the process of how encrypted data is transferred between the GPU drive and the GPU through a secure channel. The GPU driver on the host CPU and the SEC2 microcontroller on the NVIDIA A100 Tensor Core GPU work together to achieve end-to-end encryption of data transfers

Cloud computing is powering a new age of data and AI by democratizing access to scalable compute, storage, and networking infrastructure and services. Thanks to the cloud, organizations can now collect data at an unprecedented scale and use it to train complex models and generate insights.  

While this increasing demand for data has unlocked new possibilities, it also raises concerns about privacy and security, especially in regulated industries such as government, finance, and healthcare. One area where data privacy is crucial is patient records, which are used to train models to aid clinicians in diagnosis. Another example is in banking, where models that evaluate borrower creditworthiness are built from increasingly rich datasets, such as bank statements, tax returns, and even social media profiles. This data contains very personal information, and to ensure that it’s kept private, governments and regulatory bodies are implementing strong privacy laws and regulations to govern the use and sharing of data for AI, such as the General Data Protection Regulation (GDPR) and the proposed EU AI Act. You can learn more about some of the industries where it’s imperative to protect sensitive data in this Microsoft Azure Blog post.

Commitment to a confidential cloud

Microsoft recognizes that trustworthy AI requires a trustworthy cloud—one in which security, privacy, and transparency are built into its core. A key component of this vision is confidential computing—a set of hardware and software capabilities that give data owners technical and verifiable control over how their data is shared and used. Confidential computing relies on a new hardware abstraction called trusted execution environments (TEEs). In TEEs, data remains encrypted not just at rest or during transit, but also during use. TEEs also support remote attestation, which enables data owners to remotely verify the configuration of the hardware and firmware supporting a TEE and grant specific algorithms access to their data.  

At Microsoft, we are committed to providing a confidential cloud, where confidential computing is the default for all cloud services. Today, Azure offers a rich confidential computing platform comprising different kinds of confidential computing hardware (Intel SGX, AMD SEV-SNP), core confidential computing services like Azure Attestation and Azure Key Vault managed HSM, and application-level services such as Azure SQL Always Encrypted, Azure confidential ledger, and confidential containers on Azure. However, these offerings are limited to using CPUs. This poses a challenge for AI workloads, which rely heavily on AI accelerators like GPUs to provide the performance needed to process large amounts of data and train complex models.  

The Confidential Computing group at Microsoft Research identified this problem and defined a vision for confidential AI powered by confidential GPUs, proposed in two papers, “Oblivious Multi-Party Machine Learning on Trusted Processors” and “Graviton: Trusted Execution Environments on GPUs.” In this post, we share this vision. We also take a deep dive into the NVIDIA GPU technology that’s helping us realize this vision, and we discuss the collaboration among NVIDIA, Microsoft Research, and Azure that enabled NVIDIA GPUs to become a part of the Azure confidential computing ecosystem.

Vision for confidential GPUs

Today, CPUs from companies like Intel and AMD allow the creation of TEEs, which can isolate a process or an entire guest virtual machine (VM), effectively eliminating the host operating system and the hypervisor from the trust boundary. Our vision is to extend this trust boundary to GPUs, allowing code running in the CPU TEE to securely offload computation and data to GPUs.  

Diagram showing the trust boundary extended from the host trusted execution environment of the CPU to the trusted execution environment of the GPU through a secure channel.
Figure 1: Vision for confidential computing with NVIDIA GPUs.

Unfortunately, extending the trust boundary is not straightforward. On the one hand, we must protect against a variety of attacks, such as man-in-the-middle attacks where the attacker can observe or tamper with traffic on the PCIe bus or on a NVIDIA NVLink connecting multiple GPUs, as well as impersonation attacks, where the host assigns an incorrectly configured GPU, a GPU running older versions or malicious firmware, or one without confidential computing support for the guest VM. At the same time, we must ensure that the Azure host operating system has enough control over the GPU to perform administrative tasks. Furthermore, the added protection must not introduce large performance overheads, increase thermal design power, or require significant changes to the GPU microarchitecture.  

Our research shows that this vision can be realized by extending the GPU with the following capabilities:

  • A new mode where all sensitive state on the GPU, including GPU memory, is isolated from the host
  • A hardware root-of-trust on the GPU chip that can generate verifiable attestations capturing all security sensitive state of the GPU, including all firmware and microcode 
  • Extensions to the GPU driver to verify GPU attestations, set up a secure communication channel with the GPU, and transparently encrypt all communications between the CPU and GPU 
  • Hardware support to transparently encrypt all GPU-GPU communications over NVLink  
  • Support in the guest operating system and hypervisor to securely attach GPUs to a CPU TEE, even if the contents of the CPU TEE are encrypted

Confidential computing with NVIDIA A100 Tensor Core GPUs

NVIDIA and Azure have taken a significant step toward realizing this vision with a new feature called Ampere Protected Memory (APM) in the NVIDIA A100 Tensor Core GPUs. In this section, we describe how APM supports confidential computing within the A100 GPU to achieve end-to-end data confidentiality.  

APM introduces a new confidential mode of execution in the A100 GPU. When the GPU is initialized in this mode, the GPU designates a region in high-bandwidth memory (HBM) as protected and helps prevent leaks through memory-mapped I/O (MMIO) access into this region from the host and peer GPUs. Only authenticated and encrypted traffic is permitted to and from the region.  

In confidential mode, the GPU can be paired with any external entity, such as a TEE on the host CPU. To enable this pairing, the GPU includes a hardware root-of-trust (HRoT). NVIDIA provisions the HRoT with a unique identity and a corresponding certificate created during manufacturing. The HRoT also implements authenticated and measured boot by measuring the firmware of the GPU as well as that of other microcontrollers on the GPU, including a security microcontroller called SEC2. SEC2, in turn, can generate attestation reports that include these measurements and that are signed by a fresh attestation key, which is endorsed by the unique device key. These reports can be used by any external entity to verify that the GPU is in confidential mode and running last known good firmware.  

When the NVIDIA GPU driver in the CPU TEE loads, it checks whether the GPU is in confidential mode. If so, the driver requests an attestation report and checks that the GPU is a genuine NVIDIA GPU running known good firmware. Once confirmed, the driver establishes a secure channel with the SEC2 microcontroller on the GPU using the Security Protocol and Data Model (SPDM)-backed Diffie-Hellman-based key exchange protocol to establish a fresh session key. When that exchange completes, both the GPU driver and SEC2 hold the same symmetric session key.  

The GPU driver uses the shared session key to encrypt all subsequent data transfers to and from the GPU. Because pages allocated to the CPU TEE are encrypted in memory and not readable by the GPU DMA engines, the GPU driver allocates pages outside the CPU TEE and writes encrypted data to those pages. On the GPU side, the SEC2 microcontroller is responsible for decrypting the encrypted data transferred from the CPU and copying it to the protected region. Once the data is in high bandwidth memory (HBM) in cleartext, the GPU kernels can freely use it for computation.

Diagram showing how the GPU driver on the host CPU and the SEC2 microcontroller on the NVIDIA Ampere GPU work together to achieve end-to-end encryption of data transfers.
Figure 2: The GPU driver on the host CPU and the SEC2 microcontroller on the NVIDIA A100 Tensor Core GPU work together to achieve end-to-end encryption of data transfers.

Accelerating innovation with confidential AI

The implementation of APM is an important milestone toward achieving broader adoption of confidential AI in the cloud and beyond. APM is the foundational building block of Azure Confidential GPU VMs, now in private preview. These VMs, designed in collaboration with NVIDIA, Azure, and Microsoft Research, feature up to four A100 GPUs with 80 GB of HBM and APM technology and enable users to host AI workloads on Azure with a new level of security.  

But this is just the beginning. We look forward to taking our collaboration with NVIDIA to the next level with NVIDIA’s Hopper architecture, which will enable customers to protect both the confidentiality and integrity of data and AI models in use. We believe that confidential GPUs can enable a confidential AI platform where multiple organizations can collaborate to train and deploy AI models by pooling together sensitive datasets while remaining in full control of their data and models. Such a platform can unlock the value of large amounts of data while preserving data privacy, giving organizations the opportunity to drive innovation.  

A real-world example involves Bosch Research, the research and advanced engineering division of Bosch, which is developing an AI pipeline to train models for autonomous driving. Much of the data it uses includes personal identifiable information (PII), such as license plate numbers and people’s faces. At the same time, it must comply with GDPR, which requires a legal basis for processing PII, namely, consent from data subjects or legitimate interest. The former is challenging because it is practically impossible to get consent from pedestrians and drivers recorded by test cars. Relying on legitimate interest is challenging too because, among other things, it requires showing that there is a no less privacy-intrusive way of achieving the same result. This is where confidential AI shines: Using confidential computing can help reduce risks for data subjects and data controllers by limiting exposure of data (for example, to specific algorithms), while enabling organizations to train more accurate models.   

At Microsoft Research, we are committed to working with the confidential computing ecosystem, including collaborators like NVIDIA and Bosch Research, to further strengthen security, enable seamless training and deployment of confidential AI models, and help power the next generation of technology.

About confidential computing at Microsoft Research  

The Confidential Computing team at Microsoft Research Cambridge conducts pioneering research in system design that aims to guarantee strong security and privacy properties to cloud users. We tackle problems around secure hardware design, cryptographic and security protocols, side channel resilience, and memory safety. We are also interested in new technologies and applications that security and privacy can uncover, such as blockchains and multiparty machine learning. Please visit our careers page to learn about opportunities for both researchers and engineers. We’re hiring.

Related GTC Conference sessions

The post Powering the next generation of trustworthy AI in a confidential cloud using NVIDIA GPUs appeared first on Microsoft Research.

Read More

How Amazon Search achieves low-latency, high-throughput T5 inference with NVIDIA Triton on AWS

Amazon Search’s vision is to enable customers to search effortlessly. Our spelling correction helps you find what you want even if you don’t know the exact spelling of the intended words. In the past, we used classical machine learning (ML) algorithms with manual feature engineering for spelling correction. To make the next generational leap in spelling correction performance, we are embracing a number of deep-learning approaches, including sequence-to-sequence models. Deep learning (DL) models are compute-intensive both in training and inference, and these costs have historically made DL models impractical in a production setting at Amazon’s scale. In this post, we present the results of an inference optimization experimentation where we overcome those obstacles and achieve 534% inference speed-up for the popular Hugging Face T5 Transformer.

Challenge

The Text-to-Text Transfer Transformer (T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, Reffel et al) is the state-of-the-art natural language processing (NLP) model architecture. T5 is a promising architecture for spelling correction, that we found to perform well in our experiments. T5 models are easy to research, develop, and train, thanks to open-source deep learning frameworks and ongoing academic and enterprise research.

However, it’s difficult to achieve production-grade, low-latency inference with a T5. For example, a single inference with a PyTorch T5 takes 45 milliseconds on one of the four NVIDIA V100 Tensor Core GPUs equipping an Amazon Elastic Compute Cloud (EC2) p3.8xlarge instance. (All inference numbers reported are for an input of 9 tokens and output of 11 tokens. The latency of T5 architectures is sensitive to both input and output lengths.)

Low-latency, cost-efficient T5 inference at scale is a known difficulty that has been reported by several AWS customers beyond Amazon Search, which boosts our motivation to contribute this post. To go from an offline, scientific achievement to a customer-facing production service, Amazon Search faces the following challenges:

  • Latency – How to realize T5 inference in less than 50-millisecond P99 latency
  • Throughput – How to handle large-scale concurrent inference requests
  • Cost efficiency – How to keep costs under control

In the rest of this post, we explain how the NVIDIA inference optimization stack—namely the NVIDIA TensorRT compiler and the open source NVIDIA Triton Inference Server—solves those challenges. Read NVIDIA’s press release to learn about the updates.

NVIDIA TensorRT: Reducing costs and latency with inference optimization

Deep learning frameworks are convenient to iterate fast on the science, and come with numerous functionalities for scientific modeling, data loading, and training optimization. However, most of those tools are suboptimal for inference, which only requires a minimal set of operators for matrix multiplication and activation functions. Therefore, significant gains can be realized by using a specialized, prediction-only application instead of running inference in the deep learning development framework.

NVIDIA TensorRT is an SDK for high-performance deep learning inference. TensorRT delivers both an optimized runtime, using low-level optimized kernels available on NVIDIA GPUs, and an inference-only model graph, which rearranges inference computation in an optimized order.

In the following section, we will talk about the details happening behind TensorRT and how it speeds performance.

  1. Reduced Precision maximizes throughput with FP16 or INT8 by quantizing models while maintaining correctness.
  2. Layer and Tensor Fusion optimizes use of GPU memory and bandwidth by fusing nodes in a kernel to avoid kernel launch latency.
  3. Kernel Auto-Tuning selects best data layers and algorithms based on the target GPU platform and data kernel shapes.
  4. Dynamic Tensor Memory minimizes memory footprint by freeing unnecessary memory consumption of intermediate results and reuses memory for tensors efficiently.
  5. Multi-Stream Execution uses a scalable design to process multiple input streams in parallel with dedicated CUDA streams.
  6. Time Fusion optimizes recurrent neural networks over time steps with dynamically generated kernels.

T5 uses transformer layers as building blocks for its architectures. The latest release of NVIDIA TensorRT 8.2 introduces new optimizations for the T5 and GPT-2 models for real-time inference. In the following table, we can see the speedup with TensorRT on some public T5 models running on Amazon EC2G4dn instances, powered by NVIDIA T4 GPUs and EC2 G5 instances, powered by NVIDIA A10G GPUs.

 

Model Instance Baseline Pytorch Latency (ms) TensorRT 8.2 Latency (ms) Speedup vs. the HF baseline
FP32 FP32 FP16 FP32 FP16
Encoder Decoder End to End Encoder Decoder End to End Encoder Decoder End to End End to End End to End
t5-small g4dn.xlarge 5.98 9.74 30.71 1.28 2.25 7.54 0.93 1.59 5.91 407.40% 519.34%
g5.xlarge 4.63 7.56 24.22 0.61 1.05 3.99 0.47 0.80 3.19 606.66% 760.01%
t5-base g4dn.xlarge 11.61 19.05 78.44 3.18 5.45 19.59 3.15 2.96 13.76 400.48% 569.97%
g5.xlarge 8.59 14.23 59.98 1.55 2.47 11.32 1.54 1.65 8.46 530.05% 709.20%

For more information about optimizations and replication of the attached performance, refer to Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA TensorRT.

It is important to note that compilation preserves model accuracy, as it operates on the inference environment and the computation scheduling, leaving the model science unaltered – unlike weight removal compression such as distillation or pruning. NVIDIA TensorRT allows to combine compilation with quantization for further gains. Quantization has double benefits on recent NVIDIA hardware: it reduces memory usage, and enables the use of NVIDIA Tensor Cores, DL-specific cells that run a fused matrix-multiply-add in mixed precision.

In the case of the Amazon Search experimentation with Hugging Face T5 model, replacing PyTorch with TensorRT for model inference increases speed by 534%.

NVIDIA Triton: Low-latency, high-throughput inference serving

Modern model serving solutions can transform offline trained models into customer-facing ML-powered products. To maintain reasonable costs at such a scale, it’s important to keep serving overhead low (HTTP handling, preprocessing and postprocessing, CPU-GPU communication), and fully take advantage of the parallel processing ability of GPUs.

NVIDIA Triton is an inference serving software proposing wide support of model runtimes (NVIDIA TensorRT, ONNX, PyTorch, XGBoost among others) and infrastructure backends, including GPUs, CPU and AWS Inferentia.

ML practitioners love Triton for multiple reasons. Its dynamic batching ability allows to accumulate inference requests during a user-defined delay and within a maximal user-defined batch size, so that GPU inference is batched, amortizing the CPU-GPU communication overhead. Note that dynamic batching happens server-side and within very short time frames, so that the requesting client still has a synchronous, near-real-time invocation experience. Triton users also enjoy its concurrent model execution capacity. GPUs are powerful multitaskers that excel in executing compute-intensive workloads in parallel. Triton maximize the GPU utilization and throughput by using CUDA streams to run multiple model instances concurrently. These model instances can be different models from different frameworks for different use cases, or a direct copy of the same model. This translates to direct throughput improvement when you have enough idle GPU memory. Also, as Triton is not tied to a specific DL development framework, it allows scientist to fully express themselves, in the tool of their choice.

With Triton on AWS, Amazon Search expects to better serve Amazon.com customers and meet latency requirements at low cost. The tight integration between the TensorRT runtime and the Triton server facilitates the development experience. Using AWS cloud infrastructure allows to scale up or down in minutes based on throughput requirements, while maintaining the bar high or reliability and security.

How AWS lowers the barrier to entry

While Amazon Search conducted this experiment on Amazon EC2 infrastructure, other AWS services exist to facilitate the development, training and hosting of state-of-the-art deep learning solutions.

For example, AWS and NVIDIA have collaborated to release a managed implementation of Triton Inference Server in Amazon SageMaker ; for more information, see Deploy fast and scalable AI with NVIDIA Triton Inference Server in Amazon SageMaker. AWS also collaborated with Hugging Face to develop a managed, optimized integration between Amazon SageMaker and Hugging Face Transformers, the open-source framework from which Amazon Search T5 model is derived ; read more at https://aws.amazon.com/machine-learning/hugging-face/.

We encourage customers with latency-sensitive CPU and GPU deep learning serving applications to consider NVIDIA TensorRT and Triton on AWS. Let us know what you build!

Passionate about deep learning and building deep learning-based solutions for Amazon Search? Check out our careers page.


About the Authors

RJ is an engineer in Search M5 team leading the efforts for building large scale deep learning systems for training and inference. Outside of work he explores different cuisines of food and plays racquet sports.

Hemant Pugaliya is an Applied Scientist at Search M5. He works on applying latest natural language processing and deep learning research to improve customer experience on Amazon shopping worldwide. His research interests include natural language processing and large-scale machine learning systems. Outside of work, he enjoys hiking, cooking and reading.

Andy Sun is a Software Engineer and Technical Lead for Search Spelling Correction. His research interests include optimizing deep learning inference latency, and building rapid experimentation platforms. Outside of work, he enjoys filmmaking, and acrobatics.

Le Cai is a Software Engineer at Amazon Search. He works on improving Search Spelling Correction performance to help customers with their shopping experience. He is focusing on high-performance online inference and distributed training optimization for deep learning model. Outside of work, he enjoys skiing, hiking and cycling.

Anthony Ko is currently working as a software engineer at Search M5 Palo Alto, CA. He works on building tools and products for model deployment and  inference optimization. Outside of work, he enjoys cooking and playing racquet sports.

Olivier Cruchant is a Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Anish Mohan is a Machine Learning Architect at NVIDIA and the technical lead for ML and DL engagements with its customers in the greater Seattle region.

Jiahong Liu is a Solution Architect on the Cloud Service Provider team at NVIDIA. He assists clients in adopting machine learning and AI solutions that leverage NVIDIA accelerated computing to address their training and inference challenges. In his leisure time, he enjoys origami, DIY projects, and playing basketball.

Eliuth Triana is a Developer Relations Manager at NVIDIA. He connects Amazon and AWS product leaders, developers, and scientists with NVIDIA technologists and product leaders to accelerate Amazon ML/DL workloads, EC2 products, and AWS AI services. In addition, Eliuth is a passionate mountain biker, skier, and poker player.

Read More