Universities Expand Research Horizons with NVIDIA Systems, Networks

Just as the Dallas/Fort Worth airport became a hub for travelers crisscrossing America, the north Texas region will be a gateway to AI if folks at Southern Methodist University have their way.

SMU is installing an NVIDIA DGX SuperPOD, an accelerated supercomputer it expects will power projects in machine learning for its sprawling metro community with more than 12,000 students and 2,400 faculty and staff.

It’s one of three universities in the south-central U.S. announcing plans to use NVIDIA technologies to shift research into high gear.

Texas A&M and Mississippi State University are adopting NVIDIA Quantum-2, our 400 Gbit/second InfiniBand networking platform, as the backbone for their latest high-performance computers. In addition, a supercomputer in the U.K. has upgraded its InfiniBand network.

Texas Lassos a SuperPOD

“We’re the second university in America to get a DGX SuperPOD and that will put this community ahead in AI capabilities to fuel our degree programs and corporate partnerships,” said Michael Hites, chief information officer of SMU, referring to a system installed earlier this year at the University of Florida.

A September report called the Dallas area “hobbled” by a lack of major AI research. Ironically, the story hit the local newspaper just as SMU was buttoning up its plans for its DGX SuperPOD.

Previewing its initiative, an SMU report in March said AI is “at the heart of digital transformation … and no sector of society will remain untouched” by the technology. “The potential for dramatic improvements in K-12 education and workforce development is enormous and will contribute to the sustained economic growth of the region,” it added.

SMU Ignite, a $1.5 billion fundraiser kicked off in September, will fuel the AI initiative, helping propel Southern Methodist into the top ranks of university research nationally. The university is hiring a chief innovation officer to help guide the effort.

Crafting a Computational Crucible

It’s all about the people, says Jason Warner, who manages the IT teams that support SMU’s researchers. So, he hired a seminal group of data science specialists to staff a new center at SMU’s Ford Hall for Research and Innovation, a hub Warner calls SMU’s “computational crucible.”

Eric Godat leads that team. He earned his Ph.D. in particle physics at SMU modeling nuclear structure using data from the Large Hadron Collider.

Now he’s helping fire up SMU’s students about opportunities on the DGX SuperPOD. As a first step, he asked two SMU students to build a miniature model of a DGX SuperPOD using NVIDIA Jetson modules.

“We wanted to give people — especially those in nontechnical fields who haven’t done AI — a sense of what’s coming,” Godat said.

SMU's Jetson SuperPOD
SMU undergrad Connor Ozenne helped build a miniature DGX SuperPOD that was featured in SMU’s annual report. It uses 16 Jetson modules in a cluster students will benchmark as if it were a TOP500 system.

The full-sized supercomputer, made up of 20 NVIDIA DGX A100 systems on an NVIDIA Quantum InfiniBand network, could be up and running as early as January thanks to its Lego-like, modular architecture. It will deliver a whopping 100 petaflops of computing power, enough to give it a respectable slot on the TOP500 list of the world’s fastest supercomputers.

Aggies Tap NVIDIA Quantum-2 InfiniBand for ACES

About 200 miles south, the high performance computing center at Texas A&M will be among the first to plug into the NVIDIA Quantum-2 InfiniBand platform. Its ACES supercomputer, built by Dell Technologies, will use the 400G InfiniBand network to connect researchers to a mix of five accelerators from four vendors.

NVIDIA Quantum-2 ensures “that a single job on ACES can scale up using all the computing cores and accelerators.  Besides the obvious 2x jump in throughput from NVIDIA Quantum-1 InfiniBand at 200G, it will provide improved total cost of ownership, beefed up in-network computing features and increased scaling,” said Honggao Liu, ACES’s principal investigator and project director.

Texas A&M already gives researchers access to accelerated computing in four systems that include more than 600 NVIDIA A100 Tensor Core and prior-generation GPUs. Two of the four systems use an earlier version of NVIDIA’s InfiniBand technology.

MSU Rides a 400G Train

Mississippi State University will also tap the NVIDIA Quantum-2 InfiniBand platform. It’s the network of choice for a new system that supplements Orion, the largest of four clusters MSU manages, all using earlier versions of InfiniBand.

Both Orion and the new system are funded by the U.S. National Oceanic and Atmospheric Administration (NOAA) and built by Dell. They conduct work for NOAA’s missions as well as research for MSU.

Orion was listed as the fourth largest academic supercomputer in America when it debuted on the TOP500 list in June 2019.

“We’re using InfiniBand in four generations of supercomputers here at MSU so we know it’s both powerful and mature to run our big jobs reliably,” said Trey Breckenridge, director of high performance computing at MSU.

“We’re adding a new system with NVIDIA Quantum-2 to stay at the leading edge in HPC,” he added.

Quantum Nets Cover the UK

Across the pond in the U.K., the Data Intensive supercomputer at the University of Leicester, known as the DIaL system, has upgraded to NVIDIA Quantum, the 200G version of InfiniBand.

“DIaL is specifically designed to tackle the complex, data-intensive questions which must be answered to evolve our understanding of the universe around us,” said Mark Wilkinson, professor of theoretical astrophysics at the University of Leicester and director of its HPC center.

“The intense requirements of these specialist workloads rely on the unparalleled bandwidth and latency that only InfiniBand can provide to make the research possible,” he said.

DIaL is one of four supercomputers in the U.K.’s DiRAC facility using InfiniBand, including the Tursa system at the University of Edinburgh.

InfiniBand Shines in Evaluation

In a technical evaluation, researchers found Tursa with NVIDIA GPU accelerators on a Quantum network delivered 5x the performance of their CPU-only Tesseract system using an alternative interconnect.

Application benchmarks show 16 nodes of Tursa have twice the performance of 512 nodes of Tesseract. Tursa delivers 10 teraflops/node using 90 percent of the network’s bandwidth at a significant improvement in performance per kilowatt over Tesseract.

It’s another example of why most of the world’s TOP500 systems are using NVIDIA technologies.

For more, watch our special address at SC21 either live on Monday, Nov. 15 at 3 pm PST or later on demand. NVIDIA’s Marc Hamilton will provide an overview of our latest news, innovations and technologies, followed by a live Q&A panel with NVIDIA experts.

The post Universities Expand Research Horizons with NVIDIA Systems, Networks appeared first on The Official NVIDIA Blog.

Read More

An Open Source Vibrotactile Haptics Platform for On-Body Applications

Posted by Artem Dementyev, Hardware Engineer, Google Research

Most wearable smart devices and mobile phones have the means to communicate with the user through tactile feedback, enabling applications from simple notifications to sensory substitution for accessibility. Typically, they accomplish this using vibrotactile actuators, which are small electric vibration motors. However, designing a haptic system that is well-targeted and effective for a given task requires experimentation with the number of actuators and their locations in the device, yet most practical applications require standalone on-body devices and integration into small form factors. This combination of factors can be difficult to address outside of a laboratory as integrating these systems can be quite time-consuming and often requires a high level of expertise.

A typical lab setup on the left and the VHP board on the right.

In “VHP: Vibrotactile Haptics Platform for On-body Applications”, presented at ACM UIST 2021, we develop a low-power miniature electronics board that can drive up to 12 independent channels of haptic signals with arbitrary waveforms. The VHP electronics board can be battery-powered, and integrated into wearable devices and small gadgets. It allows all-day wear, has low latency, battery life between 3 and 25 hours, and can run 12 actuators simultaneously. We show that VHP can be used in bracelet, sleeve, and phone-case form factors. The bracelet was programmed with an audio-to-tactile interface to aid lipreading and remained functional when worn for multiple months by developers. To facilitate greater progress in the field of wearable multi-channel haptics with the necessary tools for their design, implementation, and experimentation, we are releasing the hardware design and software for the VHP system via GitHub.

Front and back sides of the VHP circuit board.
Block diagram of the system.

Platform Specifications.
VHP consists of a custom designed circuit board, where the main components are the microcontroller and haptic amplifier, which converts microcontroller’s digital output into signals that drive the actuators. The haptic actuators can be controlled by signals arriving via serial, USB, and Bluetooth Low Energy (BLE), as well as onboard microphones, using an nRF52840 microcontroller, which was chosen because it offers many input and output options and BLE, all in a small package. We added several sensors into the board to provide more experimental flexibility: an on-board digital microphone, an analog microphone amplifier, and an accelerometer. The firmware is a portable C/C++ library that works in the Arduino ecosystem.

To allow for rapid iteration during development, the interface between the board and actuators is critical. The 12 tactile signals’ wiring have to be quick to set up in order to allow for such development, while being flexible and robust to stand up to prolonged use. For the interface, we use a 24-pin FPC (flexible printed circuit) connector on the VHP. We support interfacing to the actuators in two ways: with a custom flexible circuit board and with a rigid breakout board.

VHP board (small board on the right) connected to three different types of tactile actuators via rigid breakout board (large board on the left).

Using Haptic Actuators as Sensors
In our previous blog post, we explored how back-EMF in a haptic actuator could be used for sensing and demonstrated a variety of useful applications. Instead of using back-EMF sensing in the VHP system, we measure the electrical current that drives each vibrotactile actuator and use the current load as the sensing mechanism. Unlike back-EMF sensing, this current-sensing approach allows simultaneous sensing and actuation, while minimizing the additional space needed on the board.

One challenge with the current-sensing approach is that there is a wide variety of vibrotactile actuators, each of which may behave differently and need different presets. In addition, because different actuators can be added and removed during prototyping with the adapter board, it would be useful if the VHP were able to identify the actuator automatically. This would improve the speed of prototyping and make the system more novice-friendly.

To explore this possibility, we collected current-load data from three off-the-shelf haptic actuators and trained a simple support vector machine classifier to recognize the difference in the signal pattern between actuators. The test accuracy was 100% for classifying the three actuators, indicating that each actuator has a very distinct response.

Different actuators have a different current signature during a frequency sweep, thus allowing for automatic identification.

Additionally, vibrotactile actuators require proper contact with the skin for consistent control over stimulation. Thus, the device should measure skin contact and either provide an alert or self-adjust if it is not loaded correctly. To test whether a skin contact measuring technique works in practice, we measured the current load on actuators in a bracelet as it was tightened and loosened around the wrist. As the bracelet strap is tightened, the contact pressure between the skin and the actuator increases and the current required to drive the actuator signal increases commensurately.

Current load sensing is responding to touch, while the actuator is driven at 250 Hz frequency.

Quality of the fit of the bracelet is measured.

Audio-to-Tactile Feedback
To demonstrate the utility of the VHP platform, we used it to develop an audio-to-tactile feedback device to help with lipreading. Lipreading can be difficult for many speech sounds that look similar (visemes), such as “pin” and “min”. In order to help the user differentiate visemes like these, we attach a microphone to the VHP system, which can then pick up the speech sounds and translate the audio to vibrations on the wrist. For audio-to-tactile translation, we used our previously developed algorithms for real-time audio-to-tactile conversion, available via GitHub. Briefly, audio filters are paired with neural networks to recognize certain viesemes (e.g., picking up the hard consonant “p” in “pin”), and are then translated to vibrations in different parts of the bracelet. Our approach is inspired by tactile phonemic sleeve (TAPS), however the major difference is that in our approach the tactile signal is presented continuously and in real-time.

One of the developers who employs lipreading in daily life wore the bracelet daily for several months and found it to give better information to facilitate lipreading than previous devices, allowing improved understanding of lipreading visemes with the bracelet versus lipreading alone. In the future, we plan to conduct full-scale experiments with multiple users wearing the device for an extended time.

Left: Audio-to-tactile sleeve. Middle: Audio-to-tactile bracelet. Right: One of our developers tests out the bracelets, which are worn on both arms.

Potential Applications
The VHP platform enables rapid experimentation and prototyping that can be used to develop techniques for a variety of applications. For example:

  • Rich haptics on small devices: Expanding the number of actuators on mobile phones, which typically only have one or two, could be useful to provide additional tactile information. This is especially useful as fingers are sensitive to vibrations. We demonstrated a prototype mobile phone case with eight vibrotactile actuators. This could be used to provide rich notifications and enhance effects in a mobile game or when watching a video.
  • Lab psychophysical experiments: Because VHP can be easily set up to send and receive haptic signals in real time, e.g., from a Jupyter notebook, it could be used to perform real-time haptic experiments.
  • Notifications and alerts: The wearable VHP could be used to provide haptic notifications from other devices, e.g., alerting if someone is at the door, and could even communicate distinguishable alerts through use of multiple actuators.
  • Sensory substitution: Besides the lipreading assistance example above, there are many other potential applications for accessibility using sensory substitution, such as visual-to-tactile sensing or even sensing magnetic fields.
  • Loading sensing: The ability to sense from the haptic actuator current load is unique to our platform, and enables a variety of features, such as pressure sensing or automatically adjusting actuator output.
Integrating eight voice coils into a phone case. We used loading sensing to understand which voice coils are being touched.

What’s next?
We hope that others can utilize the platform to build a diverse set of applications. If you are interested and have ideas about using our platform or want to receive updates, please fill out this form. We hope that with this platform, we can help democratize the use of haptics and inspire a more widespread use of tactile devices.

Acknowledgments
This work was done by Artem Dementyev, Pascal Getreuer, Dimitri Kanevsky, Malcolm Slaney and Richard Lyon. We thank Alex Olwal, Thad Starner, Hong Tan, Charlotte Reed, Sarah Sterman for valuable feedback and discussion on the paper. Yuhui Zhao, Dmitrii Votintcev, Chet Gnegy, Whitney Bai and Sagar Savla for feedback on the design and engineering.

Read More

Automatically detect sports highlights in video with Amazon SageMaker

Extracting highlights from a video is a time-consuming and complex process. In this post, we provide a new take on instant replay for sporting events using a machine learning (ML) solution for automatically creating video highlights from original video content. Video highlights are then available for download so that users can continue to view them via a web app.

We use Amazon SageMaker to analyze a full-length sports video (in our case, a soccer match) and tag segments of the original video that are highlights (penalty kicks). We also show how to apply our end-to-end architecture to not only other sports, but other types of videos, given the availability of appropriate training data.

Architecture overview

The following diagram depicts our solution architecture.

Orchestration overview

We use AWS Lambda functions as part of the following AWS Step Functions workflow to orchestrate a series of AWS Lambda functions for each step of the process.

The first step of the workflow is to start a MediaConvert job that breaks down the video into individual frames. Once the MediaConvert job completes, a Lambda Function converts each frame to a feature vector. The Lambda function generates feature vectors by passing individual images through a pretrained model (Inception V3). These feature vectors are then sent as topics via Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose, and are finally stored in Amazon S3. Next step of the workflow is to invoke a machine learning model to infer if a video segment is interesting enough to pick up based on the sequence of feature vectors. The model determines what actions defined in the UCF101 labels are seen in the video. Here, AWS Fargate acts as a driver that loops through all sequences of feature vectors, prepares them for inference, performs inference using a SageMaker endpoint, and then collates results in an Amazon DynamoDB table. After the Fargate task completes, a message is placed in an Amazon SQS queue. A Lambda function periodically polls this Amazon SQS queue. When a completion message is detected, the Lambda function triggers a MediaConvert job to prepare highlight segments based on the results of machine learning inference. Finally, an email containing links to highlight clips is sent to the email address specified by the user.

Methodology

We use deep learning techniques to identify an activity in a given video. We use a deep Convolutional Neural Network (CNN) based on a pretrained Inception V3 model—to generate features from images extracted from video, and use a LSTM (Long Short-Term memory) network for predicting actions from sequences of features. Both CNN and LSTM are types of neural networks used in ML-based computer vision solutions. Let’s briefly discuss neural networks and related terms before we jump into 2D-CNN and LSTM.

Neural networks

Neural networks are computer systems vaguely inspired by biological neural networks that constitute animal brains. Just like how the basic unit of the brain is the neuron, the building block of an artificial neural network is a perceptron. Perceptrons do very simple processing. Perceptrons are connected to a large meshed network, which forms a neural network. The neural networks are organized into layers and connections between them are weighted. A neural network isn’t an algorithm, it’s a framework that multiple different ML algorithms can use. We describe different layers in a CNN later in this post when we build a model for extracting features from images extracted from videos.

A neural network is a supervised learning technique in ML. This means that model get better and better as it sees more similar objects, so more training samples results in a better accuracy.

Let’s break down the terms in deep CNNs and understand why this technique is effective in an image recognition. Together with LSTM, we use this technique later for activity identification in a given video.

Deep Convolutional Neural Networks

Deep Convolutional Neural Networks such as Inception V3 and YOLOV5 have proven to be a very effective technique for image recognition and other downstream fine-tuning tasks. More recent vision-transformer based models are also currently being used for state-of-the-art image classification, object detection and segmentation masks. Image recognition has many applications. Something that started as a technique to improve the accuracy of human written digits has evolved to solve more complex problems such as identifying and labeling specific objects and backgrounds in an image.

Although deep CNNs have made the problem of image classification and identification simple and improved the accuracy of results significantly, the implementation of an end-to-end solution from scratch is not a simple task. We recommend using services such as Amazon Rekognition, which provides a state-of-the-art API-based solution for image classification and image recognition solutions. If a custom model is required for solving computer vision problem for either image or video, SageMaker provides a framework for training and inference. SageMaker provides support for multiple ML frameworks using BYOC (bring your own container). We use BYOC in SageMaker for Keras to develop a model for activity recognition and deploy the model for inference.

The convolution technique makes the output of neural networks more robust and accurate because instead of processing every image as a single tile of pixels, it breaks an image into multiple tiles using a sliding window of fixed size. Each tile activates the next layer separately, and all tiles of an image are aggregated in successive layers to generate an output. For example, this allows the digit 8 in the left corner of an image to be identified as the same digit 8 in the right corner of an image. This is a called translation invariance.

LSTM

LSTM networks are types of Recurrent Neural Networks (RNNs), which contain special cells or neurons that allow information to persist due the existence of loops or special memory units. LSTMs in particular are useful in learning tasks involving time sequences of data (like our use case of video classification, which is a time sequence of static frames), especially when there is a need to remember information for long periods of time.

Challenges with video processing

It’s important to keep in mind that videos are like a flip book. The static image on each page when flipped generates the perception of motion. The faster you flip, the better the quality of motion perception you get.

Images are stored as a stream of pixels in a 2D spatial arrangement. This is how a computer program reads images. As an extension of images, videos have an extra dimension of time. Videos are a time series of static images. This makes videos a 3D spatial and temporal arrangement of pixels.

The extra dimension requires more compute and memory to develop an ML model. A lot of preprocessing is required before we can feed video input into CNNs and LSTM.

Apart from increased complexity in preprocessing and processing, there is also the lack of open datasets available for research on video data.

In this post, we use samples provided in the UCF101 dataset for building a model and deploying an endpoint for inference.

Reading a video and extracting frames

Assume that the video source that we’re analyzing in order to extract highlights is in Amazon Simple Storage Service (Amazon S3). We use AWS Elemental MediaConvert to split the video into individual frames, and this MediaConvert job is triggered from the following Lambda function:

1.	import json  
2.	import boto3  
3.	  
4.	s3_location = 's3://<ARTIFACT-BUCKET>/BYUfootballmatch.mp4'  
5.	  
6.	  
7.	def lambda_handler(event, context):  
8.	    with open('mediaconvert.json') as f:  
9.	        data = json.load(f)  
10.	      
11.	    client = boto3.client('mediaconvert')  
12.	    endpoint = client.describe_endpoints()['Endpoints'][0]['Url']  
13.	      
14.	    myclient = boto3.client('mediaconvert', endpoint_url=endpoint)  
15.	  
16.	    
17.	  
18.	    data['Settings']['Inputs'][0]['FileInput'] = s3_location  
19.	      
20.	    response = myclient.create_job(  
21.	    Queue=data['Queue'],  
22.	    Role=data['Role'],  
23.	    Settings=data['Settings'])  
24.	      

Line 22 uses the AWS SDK for Python (Boto3) to initiate the MediaConvert client using the following JSON template. You can specify codec settings, width, height, and other parameters specific to your video format.

1.	{  
2.	  "Queue": "arn:aws:mediaconvert:<REGION>:<AWS ACCOUNT NUMBER>:queues/Default",  
3.	  "UserMetadata": {},  
4.	  "Role": "arn:aws:iam::<AWS ACCOUNT ID>:role/MediaConvertRole",  
5.	  "Settings": {  
6.	    "OutputGroups": [  
7.	      {  
8.	        "CustomName": "MP4",  
9.	        "Name": "File Group",  
10.	        "Outputs": [  
11.	          {  
12.	            "ContainerSettings": {  
13.	              "Container": "MP4",  
14.	              "Mp4Settings": {  
15.	                "CslgAtom": "INCLUDE",  
16.	                "FreeSpaceBox": "EXCLUDE",  
17.	                "MoovPlacement": "PROGRESSIVE_DOWNLOAD"  
18.	              }  
19.	            },  
20.	            "VideoDescription": {  
21.	              "Width": 1280,  
22.	              "ScalingBehavior": "DEFAULT",  
23.	              "Height": 720,  
24.	              "TimecodeInsertion": "DISABLED",  
25.	              "AntiAlias": "ENABLED",  
26.	              "Sharpness": 50,  
27.	              "CodecSettings": {  
28.	                "Codec": "H_264",  
29.	                "H264Settings": {  
30.	                  "InterlaceMode": "PROGRESSIVE",  
31.	                  "NumberReferenceFrames": 3,  
32.	                  "Syntax": "DEFAULT",  
33.	                  "Softness": 0,  
34.	                  "GopClosedCadence": 1,  
35.	                  "GopSize": 90,  
36.	                  "Slices": 1,  
37.	                  "GopBReference": "DISABLED",  
38.	                  "SlowPal": "DISABLED",  
39.	                  "SpatialAdaptiveQuantization": "ENABLED",  
40.	                  "TemporalAdaptiveQuantization": "ENABLED",  
41.	                  "FlickerAdaptiveQuantization": "DISABLED",  
42.	                  "EntropyEncoding": "CABAC",  
43.	                  "Bitrate": 3000000,  
44.	                  "FramerateControl": "INITIALIZE_FROM_SOURCE",  
45.	                  "RateControlMode": "CBR",  
46.	                  "CodecProfile": "MAIN",  
47.	                  "Telecine": "NONE",  
48.	                  "MinIInterval": 0,  
49.	                  "AdaptiveQuantization": "HIGH",  
50.	                  "CodecLevel": "AUTO",  
51.	                  "FieldEncoding": "PAFF",  
52.	                  "SceneChangeDetect": "ENABLED",  
53.	                  "QualityTuningLevel": "SINGLE_PASS",  
54.	                  "FramerateConversionAlgorithm": "DUPLICATE_DROP",  
55.	                  "UnregisteredSeiTimecode": "DISABLED",  
56.	                  "GopSizeUnits": "FRAMES",  
57.	                  "ParControl": "INITIALIZE_FROM_SOURCE",  
58.	                  "NumberBFramesBetweenReferenceFrames": 2,  
59.	                  "RepeatPps": "DISABLED"  
60.	                }  
61.	              },  
62.	              "AfdSignaling": "NONE",  
63.	              "DropFrameTimecode": "ENABLED",  
64.	              "RespondToAfd": "NONE",  
65.	              "ColorMetadata": "INSERT"  
66.	            },  
67.	            "AudioDescriptions": [  
68.	              {  
69.	                "AudioTypeControl": "FOLLOW_INPUT",  
70.	                "CodecSettings": {  
71.	                  "Codec": "AAC",  
72.	                  "AacSettings": {  
73.	                    "AudioDescriptionBroadcasterMix": "NORMAL",  
74.	                    "Bitrate": 96000,  
75.	                    "RateControlMode": "CBR",  
76.	                    "CodecProfile": "LC",  
77.	                    "CodingMode": "CODING_MODE_2_0",  
78.	                    "RawFormat": "NONE",  
79.	                    "SampleRate": 48000,  
80.	                    "Specification": "MPEG4"  
81.	                  }  
82.	                },  
83.	                "LanguageCodeControl": "FOLLOW_INPUT"  
84.	              }  
85.	            ]  
86.	          }  
87.	        ],  
88.	        "OutputGroupSettings": {  
89.	          "Type": "FILE_GROUP_SETTINGS",  
90.	          "FileGroupSettings": {  
91.	            "Destination": "s3://<ARTIFACT-BUCKET>/MP4/"  
92.	          }  
93.	        }  
94.	      },  
95.	      {  
96.	        "CustomName": "Thumbnails",  
97.	        "Name": "File Group",  
98.	        "Outputs": [  
99.	          {  
100.	            "ContainerSettings": {  
101.	              "Container": "RAW"  
102.	            },  
103.	            "VideoDescription": {  
104.	              "Width": 768,  
105.	              "ScalingBehavior": "DEFAULT",  
106.	              "Height": 576,  
107.	              "TimecodeInsertion": "DISABLED",  
108.	              "AntiAlias": "ENABLED",  
109.	              "Sharpness": 50,  
110.	              "CodecSettings": {  
111.	                "Codec": "FRAME_CAPTURE",  
112.	                "FrameCaptureSettings": {  
113.	                  "FramerateNumerator": 20,  
114.	                  "FramerateDenominator": 1,  
115.	                  "MaxCaptures": 10000000,  
116.	                  "Quality": 100  
117.	                }  
118.	              },  
119.	              "AfdSignaling": "NONE",  
120.	              "DropFrameTimecode": "ENABLED",  
121.	              "RespondToAfd": "NONE",  
122.	              "ColorMetadata": "INSERT"  
123.	            }  
124.	          }  
125.	        ],  
126.	        "OutputGroupSettings": {  
127.	          "Type": "FILE_GROUP_SETTINGS",  
128.	          "FileGroupSettings": {  
129.	            "Destination": "s3://<ARTIFACT-BUCKET>/Thumbnails/"  
130.	          }  
131.	        }  
132.	      }  
133.	    ],  
134.	    "AdAvailOffset": 0,  
135.	    "Inputs": [  
136.	      {  
137.	        "AudioSelectors": {  
138.	          "Audio Selector 1": {  
139.	            "Offset": 0,  
140.	            "DefaultSelection": "DEFAULT",  
141.	            "ProgramSelection": 1  
142.	          }  
143.	        },  
144.	        "VideoSelector": {  
145.	          "ColorSpace": "FOLLOW"  
146.	        },  
147.	        "FilterEnable": "AUTO",  
148.	        "PsiControl": "USE_PSI",  
149.	        "FilterStrength": 0,  
150.	        "DeblockFilter": "DISABLED",  
151.	        "DenoiseFilter": "DISABLED",  
152.	        "TimecodeSource": "EMBEDDED",  
153.	        "FileInput": "s3:// <ARTIFACT-BUCKET>/BYUfootballmatch.mp4"  
154.	      }  
155.	    ]  
156.	  }  
157.	}  

While the MediaConvert job is running, another Lambda function checks for job completion. This function is written as follows:

1.	import json  
2.	import boto3  
3.	import pprint   
4.	def lambda_handler(event, context):  
5.	      
6.	    client = boto3.client('mediaconvert')  
7.	    endpoint = client.describe_endpoints()['Endpoints'][0]['Url']  
8.	      
9.	    myclient = boto3.client('mediaconvert', endpoint_url=endpoint)  
10.	      
11.	    response = myclient.list_jobs(  
12.	    MaxResults=1,  
13.	    Order='DESCENDING')  
14.	      
15.	    #Status='SUBMITTED'|'PROGRESSING'|'COMPLETE'|'CANCELED'|'ERROR')  
16.	    print(len(response['Jobs']))  
17.	    Status = response['Jobs'][0]['Status']  
18.	    Id = response['Jobs'][0]['Id']  
19.	    print(Id, Status)  
20.	    

Collect feature vectors for training

Each frame is passed through a pre-trained InceptionV3 model to extract features. The model is small enough to be packaged within the Lambda function along with an ML framework that was used to train the model (MXNet). We don’t describe the image classification model training here, but the overall procedure to do this is as follows:

  1. Train the InceptionV3 network on MXNet using ILSVRC 2012 data. For details about the network architecture and the dataset, see Rethinking the Inception Architecture for Computer Vision.
  2. Load the trained model into a Lambda function (model.json and model.params files) and pop the final layer. We’re left with an output layer that doesn’t perform classification, but provides us with a feature vector of size 1024×1.
  3. Each time a frame is passed through the Lambda function, it outputs this feature vector into a data stream, via Amazon Kinesis Data Streams.
  4. Topics from this stream are collected using Amazon Kinesis Data Firehose and output into another S3 bucket.
  5. An AWS Fargate job orders the files based on the original order in which the frames appear. Because we trigger 1,000 instances of this Lambda function in parallel with one frame per function, the outputs can be slightly out of order. You can also use SageMaker processing instead of Fargate. This gives us our final training data, which we can use in our SageMaker custom video classification model that can identify groups of frames as highlights. In our example, the highlights in our soccer video are penalty kicks.

The code for this Lambda function is as follows:

1.	import logging  
2.	import boto3  
3.	import json  
4.	import numpy as np  
5.	import tempfile  
6.	  
7.	logger = logging.getLogger()  
8.	logger.setLevel(logging.INFO)  
9.	region =   
10.	relevant_timestamps = []  
11.	  
12.	import mxnet as mx  
13.	  
14.	  
15.	def load_model(s_fname, p_fname):  
16.	    """ 
17.	    Load model checkpoint from file. 
18.	    :return: (arg_params, aux_params) 
19.	    arg_params : dict of str to NDArray 
20.	        Model parameter, dict of name to NDArray of net's weights. 
21.	    aux_params : dict of str to NDArray 
22.	        Model parameter, dict of name to NDArray of net's auxiliary states. 
23.	    """  
24.	    symbol = mx.symbol.load(s_fname)  
25.	    save_dict = mx.nd.load(p_fname)  
26.	    arg_params = {}  
27.	    aux_params = {}  
28.	    for k, v in save_dict.items():  
29.	        tp, name = k.split(':', 1)  
30.	        if tp == 'arg':  
31.	            arg_params[name] = v  
32.	        if tp == 'aux':  
33.	            aux_params[name] = v  
34.	    return symbol, arg_params, aux_params  
35.	  
36.	sym, arg_params, aux_params = load_model('model2.json', 'model2.params')  
37.	  
38.	#load json and params into model  
39.	#mod = None  
40.	  
41.	# We bind the module with the input shape and specify that it is only for predicting. The number 1 added before the image shape (3x224x224) means that we will only predict one image at a tim  
42.	  
43.	# FULL MODEL  
44.	#mod = mx.mod.Module(symbol=sym, label_names=None)  
45.	#mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes)  
46.	#mod.set_params(arg_params, aux_params, allow_missing=True)  
47.	  
48.	  
49.	from collections import namedtuple  
50.	Batch = namedtuple('Batch', ['data'])  
51.	  
52.	def lambda_handler(event, context):  
53.	    # PARTIAL MODEL  
54.	    mod2 = None  
55.	    all_layers = sym.get_internals()  
56.	    print(all_layers.list_outputs()[-10:])  
57.	    sym2 = all_layers['global_pool_output']  
58.	    mod2 = mx.mod.Module(symbol=sym2,label_names=None)  
59.	    #mod2.bind(for_training=False, data_shapes = [('data', (1,3,224,224))], label_shapes = mod2._label_shapes)  
60.	    mod2.bind(for_training=False, data_shapes=[('data', (1,3,299,299))])  
61.	    mod2.set_params(arg_params, aux_params)  
62.	      
63.	    #Get image(s) from s3  
64.	    s3 = boto3.resource('s3')  
65.	    bucket = s3.Bucket(event['bucketname'])  
66.	    object = bucket.Object(event['filename'])  
67.	  
68.	    #img = mx.image.imread('image.jpg')  
69.	      
70.	    tmp = tempfile.NamedTemporaryFile()  
71.	    with open(tmp.name, 'wb') as f:  
72.	        object.download_fileobj(f)  
73.	        img=mx.image.imread(tmp.name)  
74.	        # convert into format (batch, RGB, width, height)   
75.	        img = mx.image.imresize(img, 299, 299) # resize  
76.	        img = img.transpose((2, 0, 1)) # Channel first  
77.	        img = img.expand_dims(axis=0) # batchify  
78.	      
79.	        mod2.forward(Batch([img]))  
80.	    out = np.squeeze(mod2.get_outputs()[0].asnumpy())  
81.	      
82.	    kinesis_client = boto3.client('kinesis')  
83.	    put_response = kinesis_client.put_record(StreamName = 'bottleneck_stream',Data = json.dumps({'filename':event['filename'],'features':out.tolist()}), PartitionKey = "partitionkey")  
84.	    return 'Wrote features to kinesis stream'  

Label the images for training the model

As mentioned earlier, we use the UCF101 action recognition dataset, which you can obtain from within a Jupyter notebook instance using the following command:

!wget http://crcv.ucf.edu/data/UCF101/UCF101.rar

We extract the same feature vectors from InceptionV3 for all action recognition datasets contained within the .rar file downloaded (it contains several examples of 101 different actions, including ones relevant for our soccer example, such as the soccer penalty and soccer juggling labels.

We construct a custom LSTM model in TensorFlow and use features extracted in the previous step to train the model. The LSTM model is structured as follows:

  • Layer 1 – 2048 LSTM cells
  • Layer 2 – 512 Dense cells
  • Layer 3 – Drop out layer (p=0.5)
  • Layer 4 – Softmax layer for 101 classes

Model.summary() provides the following summary:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 2048)              33562624  
_________________________________________________________________
dense_1 (Dense)              (None, 512)               1049088   
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 101)               51813     
=================================================================
Total params: 34,663,525
Trainable params: 34,663,525
Non-trainable params: 0
___________________________

With a more relevant dataset, you should only include the classes you require in the classification task. Due to the nature of the problem selected, we only had access to this open dataset. However, for extracting highlights from custom videos, you can use your own labeled video datasets.

We save the model using the following code in the notebook:

trainedmodel.save('lstm_model.h5')

We create a container containing the following code and Dockerfile, to host the model using SageMaker. The following is the Python entry point code for inference:

1.	#!/usr/bin/env python  
2.	from __future__ import print_function import os import sys import traceback import numpy as np import pandas as pd   
3.	import tensorflow as tf from keras.layers import Dropout, Dense from keras.wrappers.scikit_learn import   
4.	KerasClassifier from keras.models import Sequential from keras.models import load_model def train():  
5.	    print('Starting the training.')  
6.	    try:  
7.	        model = load_model('lstm_model.h5')  
8.	        print('Model is loaded ... Training is complete.')  
9.	    except Exception as e:  
10.	        # Write out an error file. This will be returned as the failure Reason in the DescribeTrainingJob result.  
11.	        trc = traceback.format_exc()  
12.	        with open(os.path.join(output_path, 'failure'), 'w') as s:  
13.	            s.write('Exception during training: ' + str(e) + 'n' + trc)  
14.	        # Printing this causes the exception to be in the training job logs  
15.	        print(  
16.	            'Exception during training: ' + str(e) + 'n' + trc,  
17.	            file=sys.stderr)  
18.	        # A non-zero exit code causes the training job to be marked as Failed.  
19.	        sys.exit(255) if __name__ == '__main__':  
20.	    train()  
21.	    # A zero exit code causes the job to be marked a Succeeded.  
22.	    sys.exit(0)  

We containerize and push the Docker image to Amazon Elastic Container Registry (Amazon ECR):

1.	%%sh  
2.	  
3.	# The name of our algorithm  
4.	algorithm_name=kerassample14  
5.	  
6.	cd container  
7.	  
8.	chmod +x keras-model/train  
9.	chmod +x keras-model/serve  
10.	  
11.	account=$(aws sts get-caller-identity --query Account --output text)  
12.	  
13.	# Get the region defined in the current configuration (default to us-west-2 if none defined)  
14.	region=$(aws configure get region)  
15.	region=${region:-us-west-2}  
16.	  
17.	fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"  
18.	echo $fullname  
19.	# If the repository doesn't exist in ECR, create it.  
20.	  
21.	aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1  
22.	  
23.	if [ $? -ne 0 ]  
24.	then  
25.	    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null  
26.	fi  
27.	  
28.	# Get the login command from ECR and execute it directly  
29.	$(aws ecr get-login --region ${region} --no-include-email)  
30.	  
31.	# Build the docker image locally with the image name and then push it to ECR  
32.	# with the full name.  
33.	  
34.	docker build  -t ${algorithm_name} .  
35.	docker tag ${algorithm_name} ${fullname}  
36.	  
37.	docker push ${fullname} 

Lastly, we host the model on SageMaker:

1.	account = sess.boto_session.client('sts').get_caller_identity()['Account']  
2.	region = sess.boto_session.region_name  
3.	image = f'<account number>.dkr.<region>.amazonaws.com/<containername>:latest'  
4.	  
5.	classifier = sage.estimator.Estimator(  
6.	    image,   
7.	    role,   
8.	    1,   
9.	    'ml.c5.2xlarge',   
10.	    output_path="s3://{}/output".format(sess.default_bucket()),  
11.	    sagemaker_session=sess) 
12.	
13.	
14.	from sagemaker.predictor import csv_serializer
15.	predictor = classifier.deploy(1, 'ml.m5.xlarge', serializer=csv_serializer)

SageMaker provides us with an endpoint to call for predictions where the input is a set of feature vectors (example, 10 frames or images corresponds to a 10×1024 feature matrix), and the output is a probability distribution across the 101 UCF101 classes. We’re only interested in the soccer penalty class. For the purposes of this blog, we use the UCF101 dataset but for your own use cases, do take the time to research relevant action recognition datasets or pretrained models.

Extract highlights

In our architecture, the Fargate job calls the SageMaker estimator sequentially with a set of feature vectors and stores the decision to pick up a set of frames or not in an Amazon DynamoDB table. When the Fargate job is complete, another Lambda function (see the following code) uses the DynamoDB table to edit a clipping job definition and submit the same to a MediaConvert job. This MediaConvert job splits the original video into smaller sections where the desired class of action was identified. In our case, this was the soccer penalty kicks. These extracted videos are then made public for access from outside the account using a Boto3 command from within the same Lambda function.

1.	import json  
2.	import boto3  
3.	import time  
4.	from boto3.dynamodb.conditions import Key, Attr  
5.	import math   
6.	  
7.	s3_location = 's3://<ARTIFACT-BUCKET>/BYUfootballmatch.mp4'  
8.	  
9.	def start_mediaconvert_job(data, sec_in, sec_out):  
10.	    
11.	      
12.	    client = boto3.client('mediaconvert')  
13.	    endpoint = client.describe_endpoints()['Endpoints'][0]['Url']  
14.	      
15.	    myclient = boto3.client('mediaconvert', endpoint_url=endpoint)  
16.	  
17.	    data['Settings']['Inputs'][0]['FileInput'] = s3_location  
18.	      
19.	    starttime = time.strftime('%H:%M:%S:00', time.gmtime(sec_in))  
20.	    endtime = time.strftime('%H:%M:%S:00', time.gmtime(sec_out))  
21.	      
22.	    data['Settings']['Inputs'][0]['InputClippings'][0] = {'EndTimecode': endtime, 'StartTimecode': starttime}  
23.	      
24.	    data['Settings']['OutputGroups'][0]['Outputs'][0]['NameModifier'] = '-from-'+str(sec_in)+'-to-'+str(sec_out)  
25.	      
26.	    response = myclient.create_job(  
27.	    Queue=data['Queue'],  
28.	    Role=data['Role'],  
29.	    Settings=data['Settings'])  
30.	  
31.	def lambda_handler(event, context):  
32.	      
33.	    
34.	    dynamodb = boto3.resource('dynamodb')  
35.	    table = dynamodb.Table('sports-lstm-final-output')        
36.	    response = table.scan()             
37.	    timeins = []  
38.	    timeouts=[]        
39.	    for i in response['Items']:  
40.	        if(i['pickup']=='yes'):              
41.	            timeins.append(i['timein'])  
42.	            timeouts.append(i['timeout'])  
43.	              
44.	    timeins = sorted([int(x) for x in timeins])  
45.	    timeouts =sorted([int(x) for x in timeouts])      
46.	    mintime =min(timeins)  
47.	    maxtime =max(timeouts)  
48.	      
49.	    print('mintime='+str(mintime))  
50.	    print('maxtime='+str(maxtime))  
51.	      
52.	    print(timeins)  
53.	    print(timeouts)  
54.	    mystarttime = mintime  
55.	   
56.	    #find continuous range  
57.	    ranges = {}  
58.	    maxisofar=0  
59.	    rangecount = 0  
60.	    lastnum = timeouts[0]  
61.	    for i in range(len(timeins)-1):  
62.	        c=0  
63.	        if(timeouts[i] >= lastnum):  
64.	            for j in range(i,len(timeouts)-1):  
65.	                if(timeins[j+1] - timeins[j] == 20 and timeouts[j] - timeins[i] == 40 + c*20 ):  
66.	                    c=c+1  
67.	                    continue  
68.	                if(timeins[i+1] - timeins[i] > 20 and timeouts[j] - timeins[i] == 40 ):  
69.	                    print('single frame',i,j)  
70.	                    ranges[rangecount] = {'start':timeins[i], 'end':timeouts[i], 'count':1}   
71.	                    rangecount=rangecount+1  
72.	                    lastnum = timeouts[i+1]  
73.	                    continue  
74.	                      

75.	            if(c>0):  
76.	                
77.	                ranges[rangecount] = {'start':timeins[i], 'end':timeouts[i+c], 'count':c}  
78.	                rangecount=rangecount+1  
79.	                lastnum = timeouts[i+c+1]  
80.	  
81.	    print(lastnum)  
82.	    if(lastnum == timeouts[-1]):  
83.	        # Last frame is a single frame  
84.	        ranges[rangecount] = {'start':timeins[-1], 'end':timeouts[-1], 'count':1}   
85.	          
86.	    print(ranges)  
87.	    #Find max continuous range  
88.	    maxc = 0  
89.	    maxi = 0  
90.	    for i in range(len(ranges)):  
91.	        if maxc < ranges[i]['count']:  
92.	            maxi = i  
93.	            maxc = ranges[i]['count']  
94.	    buffer = 1 #seconds  
95.	      
96.	   
97.	    with open('mediaconvert.json') as f:  
98.	        data = json.load(f)  
99.	      
100.	    # DO THIS for ALL RANGES  
101.	    for i in range(len(ranges)):  
102.	        if ranges[i]['count']:  
103.	            sec_in = math.floor(ranges[i]['start']/20.0) - buffer  
104.	            sec_out = math.ceil(ranges[i]['end']/20.0) + buffer #20:1 was the framer rate in original video  
105.	            sec_in = 0 if sec_in<0 else sec_in  
106.	            start_mediaconvert_job(data, sec_in, sec_out)  
107.	            time.sleep(1)  
108.	      
109.	    print(ranges)  
110.	    return json.dumps({'bucket':'elemental-media-input','prefix':'High','postfix':'mp4'}) 

Deployment Prerequisites

To deploy this solution, you will need to create an Amazon S3 bucket and designate it as an ARTIFACT-BUCKET. This bucket will be used for storing the video file as well as the model artifacts. You could run the follow AWS CLI command to create an Amazon S3 bucket:

aws s3api create-bucket   --bucket <ARTIFACT-BUCKET>

Next, un the following command to copy required artifacts to the artifact bucket.

aws s3 cp s3://aws-ml-blog/artifacts/sportshighlights/ 
s3://<ARTIFACT-BUCKET>  --recursive --copy-props none

Deploy the solution using AWS CloudFormation

We provide an AWS CloudFormation template for creating resources and setting up the workflow for this post. AWS CloudFormation enables you to model, provision, and manage AWS resources by treating infrastructure as code.

The CloudFormation template requires you to provide an email address that is used for sending links to highlight clips at the end of the workflow. The stack sets up the following resources:

Follow these steps to deploy this in your own account:

  1. Choose Launch Stack:

  1. Enter a name for the stack.
  2. Enter an email address where you choose to receive notifications from the Step Functions workflow.
  3. For S4Bucket, enter the name of the ARTIFACT-BUCKET that you created earlier.
  4. Choose Next.
  5. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources.
  6. Choose Create stack.

After the stack is successfully created, you will receive an email with a request to subscribe to the topic. Chose ‘Confirm subscription’.

  1. From the AWS CloudFormation console, navigate to the resources tab for the stack you created. Click on the hyperlink (Physical ID) for the MLStateMachine resource.

This should navigate you to the Step Functions console. Select ‘Start Execution’.

  1. Enter a name, select defaults for input and select ‘Start Execution’.

You can monitor progress of the Step Functions execution by navigating to the ‘Graph Inspector’.

Wait for the Step Functions workflow to complete.

After the Step Functions execution completes, you should receive an email with Amazon S3 files representing the highlight clips from the original video. Following best practices, we do not expose these highlight clips publicly. You could navigate to the Amazon S3 bucket you created and will find the clips in a folder named HighLightclips.

At the end of the process, you should see that the following input video:

generates the following output highlight clip of a penalty kick:

Clean up

To avoid incurring ongoing charges, clean up your infrastructure by deleting the stack from the AWS CloudFormation console.

Empty the artifact bucket that you created for the blog. You could run the following AWS CLI command:

aws s3 rm s3://<ARTIFACT-BUCKET> --recursive 
aws s3api delete-bucket --bucket <ARTIFACT-BUCKET>

After this, navigate to AWS CloudFormation in AWS Management Console, select the stack you created and select ‘Delete’.

Conclusion

In this post, we showed you how to use a custom SageMaker model to generate sports highlights from full-length sports videos. You can extend this solution to generate highlights containing slam dunks, touch downs, home runs or sixers from your favorite sports videos, or from other shows, movies, meetings, and any other content in a video format – as long as you have a pretrained model, or you train a model specific to your use case.

For more information about how to make preprocessing easier, check out Amazon SageMaker Processing.

For more information about how to fine-tune state-of-the-art action recognition models like PAN Resnet 101, TSM, and R2+1D BERT, or host them on SageMaker as endpoints, see Deploy a Model in Amazon SageMaker.


About the Authors

Shreyas Subramanian is an AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges on the AWS Cloud.

Mohit Mehta is a leader in the AWS Professional Services Organization with expertise in AI/ML and Big Data technologies. Mohit holds a M.S in Computer Science, all 12 AWS certifications, MBA from College of William and Mary and GMP from Michigan Ross School of Business.

Vikrant Kahlir is Principal Architect in the Solutions Architecture team. He works with AWS strategic customers product and engineering teams to help them with technology solutions using AWS services for Managed Databases, AI/ML, HPC, Autonomous Computing, and IoT.

Read More

Monitor operational metrics for your Amazon Lex chatbot

Chatbots are increasingly becoming an important channel for companies to interact with their customers, employees, and partners. Amazon Lex allows you to build conversational interfaces into any application using voice and text. Amazon Lex V2 console and APIs make it easier to build, deploy, and manage bots so that you can expedite building virtual agents, conversational IVR systems, self-service chatbots, or informational bots. Designing a bot and deploying it in production is only the beginning of the journey. You want to analyze the bot’s performance over time to gather insights that can help you adapt the bot to your customers’ needs. A deeper understanding of key metrics such as trending topics, top utterances, missed utterances, conversation flow patterns, and customer sentiment help you enhance your bot to better engage with customers and improve their overall satisfaction. It then becomes crucial to have a conversational analytics dashboard to gain these insights from a single place.

In this post, we look at deploying an analytics dashboard solution for your Amazon Lex bot. The solution uses your Amazon Lex bot conversation logs to automatically generate metrics and visualizations. It creates an Amazon CloudWatch dashboard where you can track your chatbot performance, trends, and engagement insights.

Solution overview

The Amazon Lex V2 Analytics Dashboard Solution helps you monitor and visualize the performance and operational metrics of your Amazon Lex chatbot. You can use it to continuously analyze and improve the experience of end users interacting with your chatbot.

The solution includes the following features:

  • A common view of valuable chatbot insights, such as:
    • User and session activity (sentiment analysis, top-N sessions, text/speech modality)
    • Conversation statistics and aggregations (average session duration, messages per session, session heatmaps)
    • Conversation flow, trends, and history (intent path chart, intent per hour heatmaps)
    • Utterance history and performance (missed utterances, top-N utterances)
    • Slot and session attributes most frequently used values
  • Rich visualizations and widgets such as metrics charts, top-N lists, heatmaps, and utterance management
  • Serverless architecture using pay-per-use managed services that scale transparently
  • CloudWatch metrics that you can use to configure CloudWatch alarms

Architecture

The solution uses the following AWS services and features:

The following diagram illustrates the solution architecture.

The source code of this solution is available in the GitHub repository.

Additional resources

There are several blog posts for Amazon Lex that also explore monitoring and analytics dashboards:

This post was inspired by the concepts in those previous posts, but the current solution has been updated to work with Amazon Lex bots created from the V2 APIs. It also adds new capabilities such as CloudWatch custom widgets.

Enable conversation logs

Before you deploy the solution for your existing Amazon Lex bot (created using the V2 APIs), you should enable conversation logs. If your bot already has conversation logs enabled, you can skip this step.

We also provide the option to deploy the solution with an accompanying bot that has conversation logs enabled and a scheduled Lambda function to generate conversation logs. This is an alternative if you just want to test drive this solution without using an existing bot or configuring conversation logs yourself.

We first create a log group.

  1. On the CloudWatch console, in the navigation pane, choose Log groups.
  2. Choose Actions, then choose Create log group.
  3. Enter a name for the log group, then choose Create log group.

Now we can enable the conversation logs.

  1. On the Amazon Lex V2 console, from the list, choose your bot.
  2. On the left menu, choose Aliases.
  3. In the list of aliases, choose the alias for which you want to configure conversation logs.
  4. In the Conversation logs section, choose Manage conversation logs.
  5. For text logs, choose Enable.
  6. Enter the CloudWatch log group name that you created.
  7. Choose Save to start logging conversations.

If necessary, Amazon Lex updates your service role with permissions to access the log group.

The following screenshot shows the resulting conversation log configuration on the Amazon Lex console.

Deploy the solution

You can easily install this solution in your AWS accounts by launching it from the AWS Serverless Application Repository. As a minimum, you provide your bot ID, bot locale ID, and the conversation log group name when you deploy the dashboard. To deploy the solution, complete the following steps:

  1. Choose Launch Stack:

You’re redirected to the create application page on the Lambda console (this is a Serverless solution!).

  1. Scroll down to the Application Settings section and enter the parameters to point the dashboard to your existing bot:
    1. BotId – The ID of an existing Amazon Lex V2 bot that is going to be used with this dashboard. To get the ID of your bot, find your bot on the Amazon Lex console and look for the ID in the Bot details section.
    2. BotLocaleId – The bot locale ID associated to the bot ID with this dashboard, which defaults to en_US. To get the locales configured for your bot, choose View languages on the same page where you found the bot ID.Each dashboard creates metrics for a specific locale ID of a Lex bot. For more details on supported languages, see Supported languages and locales.
    3. LexConversationLogGroupName – The name of an existing CloudWatch log group containing the Amazon Lex conversation logs. The bot ID and locale must be configured to use this log group for its conversation logs.

Alternatively, if you just want to test drive the dashboard, this solution can deploy a fully functional sample bot. The sample bot comes with a Lambda function that is invoked every 2 minutes to generate conversation traffic. If you want to deploy the dashboard with the sample bot instead of using an existing bot, set the ShouldDeploySampleBots parameter to true. This is a quick and easy way to test the solution.

  1. After you set the desired values in the Application settings section, scroll down to the bottom of the page and select I acknowledge that this app creates custom IAM roles, resource policies and deploys nested applications.
  2. Choose Deploy to create the dashboard.

You’re redirected to the application overview page (it may take a moment).

  1. Choose the Deployments tab to watch the deployment status.
  2. Choose View stack events to go to the AWS CloudFormation console to see the deployment details.

The stack may take around 5 minutes to create. Wait until the stack status is CREATE_COMPLETE.

  1. When the stack creation is complete, you can look for a direct link to your dashboard on the Outputs tab of the stack (the DashboardConsoleLink output variable).

You may need to wait a few minutes for data to be reflected in the dashboard.

Use the solution

The dashboard provides a single pane of glass that allows you to monitor the performance of your Amazon Lex bot. The solution is built using CloudWatch features that are intended for monitoring and operational management purposes.

The dashboard displays widgets showing bot activity over a selectable time range. You can use the widgets to visualize trends, confirm that your bot is performing as expected, and optimize your bot configuration. For general information about using CloudWatch dashboards, see Using Amazon CloudWatch dashboards.

The dashboard contains widgets with metrics about end user interactions with your bot covering statistics of sessions, messages, sentiment, and intents. These statistics are useful to monitor activity and identify engagement trends. Your bot must have sentiment analysis enabled if you want to see sentiment metrics.

Additionally, the dashboard contains metrics for missed utterances (phrases that didn’t match the configured intents or slot values). You can expand entries in the Missed Utterance History widget to look for details of the state of the bot at the point of the missed utterance so that you can fine-tune your bot configuration. For example, you can look at the session attributes, context, and session ID of a missed utterance to better understand the related application state.

You can use the dashboard to monitor session duration, messages per session, and top session contributors.

You can track conversations with a widget listing the top messages (utterances) sent to your bot and a table containing a history of messages. You can expand each message in the Message History section to look at conversation state details when the message was sent.

You can visualize utilization with heatmap widgets that aggregate sessions and intents by day or time. You can hover your pointer over blocks to see the aggregation values.

You can look at a chart containing conversation paths aggregated by sessions. The thickness of the connecting path lines is proportional to the usage. Grey path lines show forward flows and red path lines show flows returning to a previously hit intent in the same session. You can hover your pointer over the end blocks to see the aggregated counts. The conversation path chart is useful to visualize the most common paths taken by your end users and to uncover unexpected flows.

The dashboard shows tables that aggregate the top slots and session attributes values. The session attributes and slots are dynamically extracted from the conversation logs. These widgets can be configured to exclude specific session attributes and slots by modifying the parameters of the widget. These tables are useful to identify the top values provided to the bot in slots (data inputs) and to track the top custom application information kept in session attributes.

You can add missed utterances to intents of the Draft version of your bot with the Add Missed Utterances widget. For more information about bot versions, see Creating versions. This widget is optionally added to the dashboard if you set the ShouldAddWriteWidgets parameter to true when you deploy the solution.

CloudWatch features

This section describes the CloudWatch features used to create the dashboard widgets.

Custom metrics

The dashboard includes custom CloudWatch metrics that are created using metric filters that extract data from the bot conversation logs. These custom metrics track your bot activity, including number of messages and missed utterances.

The metrics are collected under a custom namespace based on the bot ID and locale ID. The namespace is named Lex/Activity/<BotID>/<LocaleID> where <BotId> and <LocaleId> are the bot and locale IDs that you passed when creating the stack. To see these metrics on the CloudWatch console, navigate to the Metrics section and look for the namespace under Custom Namespaces.

Additionally, the metrics are categorized using dimensions based on bot characteristics, such as bot alias, bot version, and intent names. These dimensions are dynamically extracted from conversation logs so they automatically create metrics subcategories as your bot configuration changes over time.

You can use various CloudWatch capabilities with these custom metrics, including alarms and anomaly detection.

Contributor Insights

Similar to the custom metrics, the solution creates CloudWatch Contributor Insights rules to track the unique contributors of highly variable data such as utterances and session IDs. The widgets in the dashboard using Contributor Insights rules include Top 10 Messages and Top 10 Sessions.

The Contributor Insights rules are used to create top-N metrics and dynamically create aggregation metrics from this highly variable data. You can use these metrics to identify outliers in the number of messages sent in a session and see which utterances are the most commonly used. You can download the top-N items from these widgets as a CSV file by choosing the widget menu and choosing Export contributors.

Logs Insights

The dashboard uses the CloudWatch Logs Insights feature to query conversation logs. Various widgets in the dashboard including the Missed Utterance History and the Message History use CloudWatch Logs Insights queries to generate the tables.

The CloudWatch Logs Insights widgets allow you to inspect the details of the items returned by the queries by choosing the arrow next to the item. Additionally, the Logs Insights widgets have a link that can take you to the CloudWatch Logs Insights console to edit and run the query used to generate the results. You can access this link by choosing the widget menu and choosing View in CloudWatch Logs Insights. The CloudWatch Logs Insights console also allows you to export the result of the query as a CSV file by choosing Export results.

Custom widgets

The dashboard includes widgets that are rendered using the custom widgets feature. These widgets are powered by Lambda functions using Python or JavaScript code. The functions use the D3.js (JavaScript) or pandas (Python) libraries to render rich visualizations and perform complex data aggregations.

The Lambda functions query your bot conversation logs using the CloudWatch Logs Insights API. The functions then use code to aggregate the data and output the HTML that is displayed in the dashboard. It obtains bot configuration details (such as intents and utterances) using the Amazon Lex V2 APIs or dynamically extracts it from the query results (such as slots and session attributes). The dashboard uses custom widgets for the following widgets: heatmaps, conversation path, add utterances management form, and top-N slot/session attribute tables.

Cost

The Amazon Lex V2 Analytics Dashboard Solution is based on CloudWatch features. See Amazon CloudWatch pricing for cost details.

Clean up

To clean up your resources, you can delete the CloudFormation stack. This permanently removes the dashboard and metrics from your account. For more information, see Deleting a stack on the AWS CloudFormation console.

Summary

In this post, we showed you a solution that provides insights on how your users interact with your Amazon Lex chatbot. The solution uses CloudWatch features to create metrics and visualizations that you can use to improve the user experience of your bot users.

The Amazon Lex V2 Analytics Dashboard Solution is provided as open source—use it as a starting point for your own solution, and help us make it better by contributing back fixes and features. For expert assistance, AWS Professional Services and other AWS Partners are here to help.

We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the GitHub repository.


About the Author

Oliver Atoa is a Principal Solutions Architect in the AWS Language AI Services team.

Read More

Dexterous robotic hands manipulate thousands of objects with ease

At just one year old, a baby is more dexterous than a robot. Sure, machines can do more than just pick up and put down objects, but we’re not quite there as far as replicating a natural pull toward exploratory or sophisticated dexterous manipulation goes. 

Artificial intelligence firm OpenAI gave it a try with Dactyl (meaning “finger,” from the Greek word “daktylos”), using their humanoid robot hand to solve a Rubik’s cube with software that’s a step toward more general AI, and a step away from the common single-task mentality. DeepMind created “RGB-Stacking,” a vision-based system that challenges a robot to learn how to grab items and stack them. 

Scientists from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), in the ever-present quest to get machines to replicate human abilities, created a framework that’s more scaled up: a system that can reorient over 2,000 different objects, with the robotic hand facing both upwards and downwards. This ability to manipulate anything from a cup to a tuna can to a Cheez-It box could help the hand quickly pick-and-place objects in specific ways and locations — and even generalize to unseen objects. 

This deft “handiwork” — which is usually limited to single tasks and upright positions — could be an asset in speeding up logistics and manufacturing, helping with common demands such as packing objects into slots for kitting, or dexterously manipulating a wider range of tools. The team used a simulated, anthropomorphic hand with 24 degrees of freedom, and showed evidence that the system could be transferred to a real robotic system in the future. 

“In industry, a parallel-jaw gripper is most commonly used, partially due to its simplicity in control, but it’s physically unable to handle many tools we see in daily life,” says MIT CSAIL PhD student Tao Chen, member of the MIT Improbable AI Lab and the lead researcher on the project. “Even using a plier is difficult because it can’t dexterously move one handle back and forth. Our system will allow a multi-fingered hand to dexterously manipulate such tools, which opens up a new area for robotics applications.”

This type of “in-hand” object reorientation has been a challenging problem in robotics, due to the large number of motors to be controlled and the frequent change in contact state between the fingers and the objects. And with over 2,000 objects, the model had a lot to learn. 

The problem becomes even more tricky when the hand is facing downwards. Not only does the robot need to manipulate the object, but also circumvent gravity so it doesn’t fall down. 

The team found that a simple approach could solve complex problems. They used a model-free reinforcement learning algorithm (meaning the system has to figure out value functions from interactions with the environment) with deep learning, and something called a “teacher-student” training method. 

For this to work, the “teacher” network is trained on information about the object and robot that’s easily available in simulation, but not in the real world, such as the location of fingertips or object velocity. To ensure that the robots can work outside of the simulation, the knowledge of the “teacher” is distilled into observations that can be acquired in the real world, such as depth images captured by cameras, object pose, and the robot’s joint positions. They also used a “gravity curriculum,” where the robot first learns the skill in a zero-gravity environment, and then slowly adapts the controller to the normal gravity condition, which, when taking things at this pace, really improved the overall performance. 

While seemingly counterintuitive, a single controller (known as brain of the robot) could reorient a large number of objects it had never seen before, and with no knowledge of shape. 

“We initially thought that visual perception algorithms for inferring shape while the robot manipulates the object was going to be the primary challenge,” says MIT Professor Pulkit Agrawal, an author on the paper about the research. “To the contrary, our results show that one can learn robust control strategies that are shape-agnostic. This suggests that visual perception may be far less important for manipulation than what we are used to thinking, and simpler perceptual processing strategies might suffice.” 

Many small, circular shaped objects (apples, tennis balls, marbles), had close to 100 percent success rates when reoriented with the hand facing up and down, with the lowest success rates, unsurprisingly, for more complex objects, like a spoon, a screwdriver, or scissors, being closer to 30 percent. 

Beyond bringing the system out into the wild, since success rates varied with object shape, in the future, the team notes that training the model based on object shapes could improve performance. 

Chen wrote a paper about the research alongside MIT CSAIL PhD student Jie Xu and MIT Professor Pulkit Agrawal. The research is funded by Toyota Research Institute, Amazon Research Award, and DARPA Machine Common Sense Program. It will be presented at the 2021 The Conference on Robot Learning (CoRL).

Read More

AWS and NVIDIA launch “Hands-on Machine Learning with Amazon SageMaker and NVIDIA GPUs” on Coursera

AWS and NVIDIA are excited to announce the new Hands-on Machine Learning with Amazon SageMaker and NVIDIA GPUs course. The course has four parts, and is designed to help machine learning (ML) enthusiasts quickly learn how to perform modern ML in the AWS Cloud. Sign up for the course today on Coursera.

Machine learning can be complex, tedious, and time-consuming. AWS and NVIDIA provide the fastest, most effective, and easy-to-use ML tools to jump-start your ML project. This course is designed for ML practitioners, including data scientists and developers, who have a working knowledge of ML workflows. In this course, you gain hands-on experience with Amazon SageMaker and Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs.

Course overview

This course helps data scientists and developers prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML within Amazon SageMaker. EC2 instances powered by NVIDIA GPUs offer the highest-performing GPU-based training instances in the cloud for efficient model training and cost-effective model inference hosting. In the course, you have hands-on labs and quizzes developed specifically for this course and hosted by AWS Partner Vocareum.

You’re first given a high-level overview of modern machine learning. Then, in the labs, you will dive right in and get you up and running with a GPU-powered SageMaker instance. You will learn how to prepare your dataset for model training using GPU-accelerated data prep with the RAPIDS library, how to build a GPU accelerated tree-based model, how to perform training of this model, and how to deploy and optimize the model for GPU powered inference. You will receive hands-on learning of how to similarly build, train, and deploy deep learning models for computer vision (CV) and natural language processing (NLP) use cases. After completing this course, you will have the knowledge to build, train, deploy, and optimize ML workflows with GPU acceleration in SageMaker and understand the key SageMaker services applicable to tabular, computer vision, and language ML tasks.

In the first module, you will learn the basics of Amazon SageMaker, GPUs in the cloud, and how to spin up an Amazon SageMaker notebook instance. Then you get a tour Amazon SageMaker Studio, the first fully integrated development environment (IDE) for machine learning, which gives you access to all the capabilities of Amazon SageMaker. This is followed by an introduction to the NVIDIA GPU Cloud or NGC Catalog, and how it can help you simplify and accelerate ML workflows.

In the second module, you use the knowledge from module 1 and discover how to handle large datasets to build ML models with the NVIDIA RAPIDS framework. In the hands-on lab, you download the Airline Service Quality Performance dataset, and run GPU accelerated data prep, model training, and model deployment.

In the third module, you gain a brief history of how computer vision (CV) has evolved, learn how to work with image data, and learn how to build end-to-end CV applications using Amazon SageMaker. In the hands-on lab, you download the CUB_200 dataset, and then train and deploy an object detection model on SageMaker.

In the fourth module, you learn about the application of deep learning for natural language processing (NLP). What does it mean to understand languages? What is language modeling? What is the BERT language model, and why are such language models used in many popular services like search, office productivity software, and voice agents? Are NVIDIA GPUs the fastest and the most cost-efficient platform to train and deploy NLP models? In the hands-on lab, you download the SQuAD dataset, and then train and deploy a BERT-based question answering model.

Enroll today

Hands-on Machine Learning with AWS and NVIDIA is a great way to achieve toolsets needed for modern ML in the cloud. With this course, you can move projects from conceptual phases to production phases faster by leaving the undifferentiated heavy lifting of building infrastructure to AWS and NVIDIA GPUs, and apply your new found knowledge to solve new challenges with AI and ML.

Improve your ML skills in the cloud, and start applying them to your own business challenges by enrolling today at Coursera!


About the Authors

Pavan Kumar Sunder is a Solutions Architect Leader with the Envision Engineering team at Amazon Web Services. He provides technical guidance and helps customers accelerate their ability to innovate through showing the art of the possible on AWS. He has built multiple prototypes and reusable solutions around AI/ML, IoT, and robotics for our customers.

Isaac Privitera is a Senior Data Scientist at the Amazon Machine Learning Solutions Lab, where he develops bespoke machine learning and deep learning solutions to address customers’ business problems. He works primarily in the computer vision space, focusing on enabling AWS customers with distributed training and active learning.

Cameron Peron is Senior Marketing Manager for AWS AI/ML Education and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport and spending time with his family and friends, and is an avid fan of Euro-league basketball.

Read More

NVIDIA GTC Sees Spike in Developers From Africa

The appetite for AI and data science is increasing, and nowhere is that more prevalent than in emerging markets.

Registrations for this week’s GTC from African nations tripled compared with the spring edition of the event.

Indeed, Nigeria had the third most registered attendees for countries in the EMEA region, ahead of France, Italy and Spain. Five other African nations were among the region’s top 15 for registrants: Egypt (No. 6), Tunisia (No. 7), Ghana (No. 9), South Africa (No. 11) and Kenya (No. 12)

The numbers demonstrate the growing interest among Africa-based developers to access content, information and expertise centered around AI, data science, robotics and high performance computing. Developers are using these technologies as a platform to create innovative applications that address local challenges, such as healthcare and climate change.

Global Conference, Localized Content

Among the speakers at GTC were several members of NVIDIA Emerging Chapters, a new program that enables local communities in emerging economies to build and scale their AI, data science and graphics projects. Such highly localized content empowers developers from these areas and raises awareness of their unique challenges and needs.

For example, tinyML Kenya, a community of machine learning researchers and practitioners, spoke on the impacts of healthcare, education, conservation and climate change as a force for good in emerging markets. Zindi, Africa’s first data science competition platform, participated in a session about bridging the AI education gap among developers, IT professionals and students on the continent.

Multiple African organizations and universities also spoke at GTC about how developers in the region and emerging markets are using AI to build innovations that address local challenges. Among them were Kenya Adanian Labs, Cadi Ayyad University of Morocco, Data Science Africa, Python Ghana, and Nairobi Women in Machine Learning & Data Science.

Several Africa-based members of NVIDIA Inception, a free program designed to empower cutting-edge startups, spoke about the AI revolution underway in the continent and other emerging areas. Cyst.ai, minoHealth, Fastagger and ARMA were among the 70+ Inception startups who presented at the conference.

AI was not the only innovation topic for local developers. The top African gaming and animation companies Usiku Games, Leti Arts, NETINFO 3D and HeroSmashers TV also joined the party to discuss how the continent’s burgeoning gaming industry continues to thrive and the tools game developers need to be successful in an area of the world where access to compute resources is often limited.

Engaging Developers Everywhere

While AI developers and startup founders come from all over the world, developers in emerging areas face unique circumstances and opportunities. This means global representation and localized access become even more important to bolster developer ecosystems in emerging markets.

Through NVIDIA Emerging Chapters, grassroots organizations and communities can provide developers access to the NVIDIA Developer Program and course credits for the NVIDIA Deep Learning Institute, helping bridge new paths to AI development in the region.

Learn more about AI in emerging markets today.

Watch NVIDIA CEO Jensen Huang’s GTC keynote address:

The post NVIDIA GTC Sees Spike in Developers From Africa appeared first on The Official NVIDIA Blog.

Read More

NVIDIA to Build Earth-2 Supercomputer to See Our Future

The earth is warming. The past seven years are on track to be the seven warmest on record. The emissions of greenhouse gases from human activities are responsible for approximately 1.1°C of average warming since the period 1850-1900.

What we’re experiencing is very different from the global average. We experience extreme weather — historic droughts, unprecedented heatwaves, intense hurricanes, violent storms and catastrophic floods. Climate disasters are the new norm.

We need to confront climate change now. Yet, we won’t feel the impact of our efforts for decades. It’s hard to mobilize action for something so far in the future. But we must know our future today — see it and feel it — so we can act with urgency.

To make our future a reality today, simulation is the answer.

To develop the best strategies for mitigation and adaptation, we need climate models that can predict the climate in different regions of the globe over decades.

Unlike predicting the weather, which primarily models atmospheric physics, climate models are multidecade simulations that model the physics, chemistry and biology of the atmosphere, waters, ice, land and human activities.

Climate simulations are configured today at 10- to 100-kilometer resolutions.

But greater resolution is needed to model changes in the global water cycle — water movement from the ocean, sea ice, land surface and groundwater through the atmosphere and clouds. Changes in this system lead to intensifying storms and droughts.

Meter-scale resolution is needed to simulate clouds that reflect sunlight back to space. Scientists estimate that these resolutions will demand millions to billions of times more computing power than what’s currently available. It would take decades to achieve that through the ordinary course of computing advances, which accelerate 10x every five years.

For the first time, we have the technology to do ultra-high-resolution climate modeling, to jump to lightspeed and predict changes in regional extreme weather decades out.

We can achieve million-x speedups by combining three technologies: GPU-accelerated computing; deep learning and breakthroughs in physics-informed neural networks; and AI supercomputers, along with vast quantities of observed and model data to learn from.

And with super-resolution techniques, we may have within our grasp the billion-x leap needed to do ultra-high-resolution climate modeling. Countries, cities and towns can get early warnings to adapt and make infrastructures more resilient. And with more accurate predictions, people and nations will act with more urgency.

So, we will dedicate ourselves and our significant resources to direct NVIDIA’s scale and expertise in computational sciences, to join with the world’s climate science community.

NVIDIA this week revealed plans to build the world’s most powerful AI supercomputer dedicated to predicting climate change. Named Earth-2, or E-2, the system would create a digital twin of Earth in Omniverse.

The system would be the climate change counterpart to Cambridge-1, the world’s most powerful AI supercomputer for healthcare research. We unveiled Cambridge-1 earlier this year in the U.K. and it’s being used by a number of leading healthcare companies.

All the technologies we’ve invented up to this moment are needed to make Earth-2 possible. I can’t imagine a greater or more important use.

The post NVIDIA to Build Earth-2 Supercomputer to See Our Future appeared first on The Official NVIDIA Blog.

Read More