Integrate Amazon Lex and Uneeq’s digital human platform

In today’s digital landscape, customers are expecting a high-quality experience that is responsive and delightful. Chatbots and virtual assistants have transformed the customer experience from a point-and-click or a drag-and-drop experience to one that is driven by voice or text. You can create a more engaging experience by further augmenting the interaction with a visual modality.

Uneeq is an AWS Partner that specializes in developing animated visualizations of these voice bots and virtual agents, called. Uneeq’s digital humans can help provide a next-generation customer experience that is visual, animated, and emotional. Having worked with brands across numerous verticals such as UBS (financial services), Vodafone (telecommunications ), and Mentemia (healthcare), Uneeq helps customers enable innovative customer experiences powered by Amazon Lex.

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex provides natural language understanding (NLU) and automatic speech recognition (ASR), enabling customer experiences that are highly engaging through conversational interactions.

In this post, we guide you through the steps required to configure an Amazon Lex V2 chatbot, connect it to Uneeq’s digital human, and manage a conversation.

Overview of solution

This solution uses the following services:

The following diagram illustrates the architecture of our solution.

The architecture utilizes AWS serverless resources for ease of deployment and to minimize any associated run costs with the deploying the solution.

The Uneeq digital human interfaces with a simple REST API, configured with Lambda proxy integration that in turn interacts with a deployed Amazon Lex bot.

After you deploy the bot, you need to configure it with a basic Welcome intent. In the first interaction with Uneeq’s digital human, the Welcome intent determines the initial phrase the Uneeq digital human gives. For example, “Hi, my name is Crissy and I am your digital assistant today. How can I help you?”

You deploy the solution with three high-level steps:

  1. Deploy an Amazon Lex bot.
  2. Deploy the integration, which is a simple API Gateway REST API and Lambda function using AWS Serverless Application Model (AWS SAM) .
  3. Create a Uneeq 14-day free trial account and connect Uneeq’s digital human to the Amazon Lex bot.

Prerequisites

To implement this solution, you need the following prerequisites:

These instructions assume a general working knowledge of the listed Amazon services, particularly AWS SAM and AWS CloudFormation.

Deploy an Amazon Lex Bot

For this solution, we use the BookTrip sample bot that is provided in Amazon Lex.

  1. On the Amazon Lex v2 console, choose Bots in the navigation pane.
  2. Choose Create bot.
  3. Select Start with an example.
  4. For Example bot, choose BookTrip.
  5. In the Bot configuration section, enter a bot name and optional description.
  6. Under IAM permissions, select Create a role with basic Amazon Lex permissions.
  7. Because this is a bot for demo purposes, it’s not subject to COPPA, so in the Children’s Online Privacy Protection Act (COPPA) section, select No.
  8. Leave the remainder of the settings as default and choose Next.
  9. Choose your preferred language and voice, which is provided by Amazon Polly.
  10. Choose Done to create your bot.

Edit the BookTrip bot welcome intent

When first initiated, Uneeq’s digital human utters dialog to introduce itself based on a welcome intent defined in the Amazon Lex bot.

  1. To add the welcome intent, browse to the intents for the BookTrip bot just created and create a new intent called Welcome by choosing Add intent.
  2. To configure the welcome intent, in the Closing Response section, enter the initial phrase that you want Uneeq’s digital human to utter. For this post, we use “Hi, my name is Crissy and I am your digital assistant today. How can I help you?”

This is the only configuration required for this intent.

  1. Choose Save intent.
  2. Choose Build to build the bot with the Welcome intent.
  3. Record the bot ID, alias ID, locale ID, and Welcome intent name to use in the next step to deploy the integration.

Deploy the integration using AWS SAM

Browse to the GitHub repo and clone the lexV2 branch. The template.yaml file is the AWS SAM configuration for the application; the swagger.yaml is the OpenAPI configuration for the API.

  1. Deploy this application by following the instructions in the README file.
  2. Make sure your AWS Command Line Interface (AWS CLI) configuration can access an AWS account.
  3. Browse to the root of the cloned repository and install the required dependencies by running the following command:
    cd function && npm install && cd ..

  4. Prior to running the deploy command, upload the swagger.yaml file to an S3 bucket.
  5. Deploy the serverless application by running the following command from the root of the repository, and assign values to the listed parameters:
      1. pLexBotID
      2. pLexBotAliasID
      3. pWelcomeIntentName
      4. pLocaleID
      5. pS3BucketName
    sam deploy --template-file template.yml --s3-bucket %S3BUCKETNAME% --stack-name %STACKNAME% --parameter-overrides pLexBotID=%LexV2BotID% pLexBotAliasID=%AliasID% pWelcomeIntentName=Welcome pLocaleID=en_AU pS3BucketName=%S3BucketName% --capabilities CAPABILITY_NAMED_IAM

  6. Confirm the deployment has been successful by reviewing the output of the AWS SAM deployment.
  7. Take note of the API endpoint URL; you use this for configuring Uneeq’s digital human.

Create a Uneeq trial account and configure Uneeq’s digital human

Let’s start by creating a 14-day free trial account on the Uneeq website.

  1. On the Uneeq website, choose Free Trial.
  2. Enter the required details and verify your email address via a unique code that is sent to the provided email address.
  3. Choose a Uneeq digital human from the three provided to you as part of the free trial.

Uneeq has multiple personas available, but some require a paid subscription.

  1. Choose a background for Uneeq’s digital human.
  2. Enter a name for Uneeq’s digital human.
  3. Choose your preferred language and voice for Uneeq’s digital human.

You can choose Test Voice to hear an example of the voice.

  1. After you create Uneeq’s digital human, browse to the Uneeq dashboard and choose Personas.
  2. Choose the edit icon for Uneeq’s digital human you just created.
  3. In the Conversation settings section, choose Bring Your Own Conversation Platform.
  4. For API URL, enter the URL of our deployed API.
  5. Return to the Personas page and choose Try to start Uneeq’s digital human.

Uneeq’s digital human begins the interaction by uttering the dialog configured in your welcome intent.

For a demonstration of Uneeq’s digital human and Amazon Lex integration, watch Integrating Digital Humans with AWS Lambda – Devs in the Shed Episode 16.

Conclusion

In this post, I implemented a solution that integrates Amazon Lex with Uneeq’s digital human by enhancing the visual modality of the user experience. You can use this solution for multiple use cases by simply configuring it to a different Amazon Lex bot.

It’s easy to get started. Sign up for a free trial account with Uneeq’s digital human, and clone the GitHub repo to get started enhancing your customers’ interactions with your business. For more information about Amazon Lex, see Getting started with Amazon Lex and the V2 Developer Guide.


About the Author

Barry Conway is an Enterprise Solutions Architect with years of experience in the technology industry bridging the gap between business and technology. Barry has helped banking, manufacturing, logistics, and retail organizations realize their business goals.

Read More

Easily create and store features in Amazon SageMaker without code

Data scientists and machine learning (ML) engineers often prepare their data before building ML models. Data preparation typically includes data preprocessing and feature engineering. You preprocess data by transforming data into the right shape and quality for training, and you engineer features by selecting, transforming, and creating variables when building a predictive model.

Amazon SageMaker helps you perform these tasks by simplifying feature preparation with Amazon SageMaker Data Wrangler and storage and feature serving with Amazon SageMaker Feature Store. You can prepare your data and engineer features using over 300 built-in transformations with Data Wrangler. Then you can persist those features to a purpose-built feature store for ML with Feature Store. These services help you build automatic and repeatable processes to streamline your data preparation tasks, all without writing code.

We’re excited to announce a new capability that seamlessly integrates Data Wrangler with Feature Store. You can now easily create features with Data Wrangler and store those features in Feature Store with just a few clicks in Amazon SageMaker Studio.

In this post, we demonstrate creating features with Data Wrangler and persisting them in Feature Store using the hotel booking demand dataset. We focus on the data preparation and feature engineering tasks to show how easily you can create and stores features in SageMaker without code using Data Wrangler. After the features are stored, they can be used for training and inference by multiple models and teams.

Solution overview

To demonstrate feature engineering and feature storage, we use a hotel booking demand dataset. You can download the dataset and view the full description of each variable. The dataset contains information such as when a hotel booking was made, the booking location, the length of stay, the number of parking spaces, and other features.

Our goal is to engineer features to predict if a user will cancel a booking.

We host the dataset in an Amazon Simple Storage Service (Amazon S3) bucket. We also open a Studio domain to utilize the native Data Wrangler and Feature Store capabilities. We import the dataset into a Data Wrangler flow and define the data transformation steps we want to apply using the Data Wrangler user interface (UI). We then have SageMaker run our feature engineering steps and store the features in Feature Store.

The following diagram illustrates the solution workflow.

To demonstrate Data Wrangler’s feature engineering steps, we assume we’ve already conducted exploratory data analysis (EDA). EDA helps you understand your data by identifying patterns in your data. For example, we might find that customers who book resort hotels tend to stay longer than city hotels. Or customers that stay over the weekend purchase more meals. Because these patterns aren’t evident with data in tables, data scientists use visualization tools to help identify patterns. EDA is often a necessary step to determine which features to create, delete, and transform.

If you already have features ready to export to Feature Store, you can navigate to the Save features to Feature Store section to learn how you can easily save your prepared features to Feature Store.

Prerequisites

If you want to follow along with this post, you should have the following prerequisites:

Create features with Data Wrangler

To create features with Data Wrangler, complete the following steps:

  1. Enter your Studio domain.
  2. Choose Data Wrangler as your resource to view.
  3. Choose New flow.
  4. Choose Import and import your data.

You can see a preview of the data in the Data Wrangler UI when selecting your dataset. You can also choose a sampling method. Because our dataset is relatively small, we choose not to sample our data. The flow editor now shows two steps in the UI, representing the step you took to import the data and a data validation step Data Wrangler automatically completes for you.

  1. Choose the plus sign next to Data types and choose Add transform.

Assuming we’ve spent time in EDA, we can remove redundant columns that contribute to target leakage. Target leakage occurs when some data in a training dataset is strongly correlated with the target label, but isn’t available in real-world data. After we conduct a target leakage analysis, we determine we should drop redundant columns. Data Wrangler helped identify 10 columns to drop.

  1. Add a step and choose the Drop column transform step.

Additionally, we determine we can remove columns like agent and adults after a multicollinearity analysis. Multicollinearity is the presence of high correlations between two or more independent variables. We usually want to avoid variables to be correlated to each other because they can lead to misleading and inaccurate models.

We also want to drop duplicate rows. In our case, nearly 28% of all rows in our dataset are duplicates. Because duplicates may have undesirable effects on our model, we use the transform set to remove them.

  1. Add a new transform and choose Manage rows from the list of available transforms.
  2. Choose Drop duplicates on the Transform drop-down menu.

Next, we want to handle missing values. We find that many hotel guests didn’t travel with children, and have a blank value for the children column. We can replace this blank value with 0.

  1. Choose Handle missing as the transform step and Fill missing as the transform type.
  2. Add a transform to fill blank values with the 0 value by choosing children as the input column.

From our EDA, we see that there are many missing values for the country column. However, the data reveals most of the hotel guests are from Europe. We determine that missing country column values can be replaced with the most commonly occurring country—Portugal (PRT).

  1. Choose the Handle missing transform step and choose Fill missing as the transform type.
  2. Choose country as the input column, and enter PRT as the Fill value.

ML algorithms like linear regression, logistic regression, neural networks, and others that use gradient descent as an optimization technique require data to be scaled. Normalization (also known as min-max scaling) is a scaling technique that transforms values to be in the range of 0–1. Standardization is another scaling technique where the values are centered around the mean with a standard deviation unit. In our case, we normalize the numeric feature columns to a standard scale between [0, 1].

  1. Choose the Process numeric transform step and Scale values as the transform type.
  2. Choose Min-max scaler as the scaler and lead_time, booking_changes, adr, and others as the input columns.
  3. Leave 0 as Min and 1 as Max default values.

We also want to handle categorical data by representing them as numeric values. For example, if your categories are Dog and Cat, you may encode this information into two vectors, [1,0] to represent Dog, and [0,1] to represent Cat. For our dataset, we use one-hot encoding to encode categories into an integer between 0 and the total number of categories within the column.

  1. Choose the One-hot encode transform type from the Encode categorical transform.

ML models are sensitive to the distribution and range of your feature values. Outliers can negatively impact model accuracy and lead to longer training times. For our dataset, we apply the standard deviation numeric outliers transform with a set of configuration values as shown in the following screenshot. We apply this transform on the numeric columns.

  1. Choose the Standard Deviation Numeric Outliers transform type from the Handle outliers transform.

Lastly, we want to balance the target variable for class imbalance. In Data Wrangler, we can handle class imbalance using three different techniques:

  • Random undersample
  • Random oversample
  • SMOTE
  1. In the Data Wrangler transform pane, choose Balance data as the group and choose Random oversample for the Transform field.

The ratio of positive to negative cases is around 0.38 before balancing.

After oversampling and balancing the dataset, the ratio equates to 1.

Now that we’ve completed our feature engineering tasks, we’re ready to export our features to Feature Store with one click.

Save features to Feature Store

You can easily export your generated features to SageMaker Feature Store by selecting it as the destination.

You can save the features into an existing feature group or create new one. For this post, we create a new feature group. Studio directs you to a new tab where you can create a new feature group.

  1. Choose the plus sign, choose Export to, and choose SageMaker Feature Store.

  1. Choose Create Feature Group.

  1. Optionally, select Create “EventTime” column.
  2. Choose Next.

  1. Copy the JSON schema, then choose Create.

  1. Provide a feature group name and an optional description for your feature group.
  2. Select a feature group storage configuration that is either online or offline, or both.

Online stores serve features with low millisecond latency for real-time inference, whereas offline stores are ideal for retrieving your features for training models or for batch scoring. Additionally, you can run queries on your offline feature stores by registering your features in an AWS Glue Data Catalog. For more information, see Query Feature Store with Athena and AWS Glue.

  1. Choose Continue.

Next, you specify the feature definitions. You specify the data type (string, integral, fractional) for each feature definition.

  1. Enter the JSON schema from the previous step to define your feature definitions.
  2. Choose Continue.

  1. Next, you specify a record identifier name and a timestamp to uniquely identify a record within a feature group.

The record identifier name must refer to one of the names of a feature defined in the feature group’s feature definition. In our case, we use the existing identifier, distribution-channel, which was in our source dataset, and EventTime.

  1. Choose Continue.

  1. Lastly, apply any relevant tags and review your feature group details.
  2. Choose Create feature group to finalize the process.

  1. After we create our feature group, we can return to the Data Wrangler flow UI.
  2. Choose the plus sign, choose Add destination, and choose SageMaker Feature Store.

  1. We choose the desired destination feature group to ensure that the features we’re storing match the feature group schema.

If the newly created feature group doesn’t show up in the UI, refresh the list to reload the groups.

  1. Chose the message under the Validation column to have Data Wrangler validate the schema of the dataset with the schema of the feature group.

If you missed specifying the event time column, Data Wrangler will notify you of an error and request that you add one to your dataset.

Once validated, Data Wrangler informs you that the data frame matches the feature group schema.

  1. If you enabled both the online and offline stores for the feature group, you can optionally select Write to offline store only to only ingest data to the offline store.

This is helpful for historical data backfilling scenarios.

  1. Choose Add to add another step to our Data Wrangler flow.
  2. With all our steps defined, choose Create job to run our ML workflow from feature engineering to ingesting features into our feature group.

  1. Give the job a name, then provide the job specifications like the type and number of instances.
  2. Choose Run.

Congratulations! You’ve successfully engineered features using Data Wrangler and stored them in a persistent feature store without writing any code. You can easily explore features, see details of your feature group, and update the feature group schema when necessary.

Conclusion

In this post, we created features with Data Wrangler, and easily stored those features in Feature Store. We showed an example workflow for feature engineering in the Data Wrangler UI. Then we saved those features into Feature Store directly from Data Wrangler by creating a new feature group. Finally, we ran a processing job to ingest those features into Feature Store. These services helped us build automatic and repeatable processes to streamline our data preparation tasks, all without writing code.

With this new integration, you can accelerate your ML tasks with a more streamlined experience between feature engineering and feature ingestion. For more information, refer to Get Started with Data Wrangler and Get started with Amazon SageMaker Feature Store.


About the Authors

Peter Chung is a Solutions Architect for AWS, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions in both the public and private sectors. He holds all AWS certifications as well as two GCP certifications. He enjoys coffee, cooking, staying active, and spending time with his family.

Patrick Lin is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is committed to making Amazon SageMaker Data Wrangler the number one data preparation tool for productionized ML workflows. Outside of work, you can find him reading, listening to music, having conversations with friends, and serving at his church.

Ziyao Huang is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is passionate about building great product that makes ML easy for the customers. Outside of work, Ziyao likes to read, and hang out with his friends

Read More

Stunning Insights from James Webb Space Telescope Are Coming, Thanks to GPU-Powered Deep Learning

NVIDIA GPUs will play a key role interpreting data streaming in from the James Webb Space Telescope, with NASA preparing to release next month the first full-color images from the $10 billion scientific instrument.

The telescope’s iconic array of 18 interlocking hexagonal mirrors, which span a total of 21 feet 4 inches, will be able to peer far deeper into the universe, and deeper into the universe’s past, than any tool to date, unlocking discoveries for years to come.

GPU-powered deep learning will play a key role in several of the highest-profile efforts to process data from the revolutionary telescope positioned a million miles away from Earth, explains UC Santa Cruz Astronomy and Astrophysics Professor Brant Robertson.

“The JWST will really enable us to see the universe in a new way that we’ve never seen before,” said Robertson, who is playing a leading role in efforts to use AI to take advantage of the unprecedented opportunities JWST creates. “So it’s really exciting.”

High-Stakes Science

Late last year, Robertson was among the millions tensely following the runup to the launch of the telescope, developed over the course of three decades, and loaded with instruments that define the leading edge of science.

The JWST’s Christmas Day launch went better than planned, allowing the telescope to slide into a LaGrange point — a kind of gravitational eddy in space that allows an object to “park” indefinitely — and extending the telescope’s usable life to more than 10 years.

“It’s working fantastically,” Robertson reports. “All of the signs are it’s going to be a tremendous facility for science.”

AI Powering New Discoveries

Robertson — who leads the computational astrophysics group at UC Santa Cruz — is among a new generation of scientists across a growing array of disciplines using AI to quickly classify the vast quantities of data — often more than can be sifted in a human lifetime — streaming in from the latest generation of scientific instruments.

“What’s great about AI and machine learning is that you can train a model to actually make those decisions for you in a way that is less hands-on and more based on a set of metrics that you define,” Robertson said.

Simulated image of a portion of the JADES galaxy survey, part of the preparations for galaxy surveys using JWST UCSC astronomer Brant Robertson and his team have been working on for years. (Image credit: JADES Collaboration)

Working with Ryan Hausen, a Ph.D. student in UC Santa Cruz’s computer science department, Robertson helped create a deep learning framework that classifies astronomical objects, such as galaxies, based on the raw data streaming out of telescopes on a pixel by pixel basis, which they called Morpheus.

It quickly became a key tool for classifying images from the Hubble Space Telescope. Since then the team working on Morpheus has grown considerably, to roughly a half-dozen people at UC Santa Cruz.

Researchers are able to use NVIDIA GPUs to accelerate Morpheus across a variety of platforms — from an NVIDIA DGX Station desktop AI system to a small computing cluster equipped with several dozen NVIDIA V100 Tensor Core GPUs to sophisticated simulations runs thousands of GPUs on the Summit supercomputer at Oak Ridge National Laboratory.

A Trio of High-Profile Projects

Now, with the first science data from the JWST due for release July 12, much more’s coming.

“We’ll be applying that same framework to all of the major extragalactic JWST surveys that will be conducted in the first year,” Robertson.

Robertson is among a team of nearly 50 researchers who will be mapping the earliest structure of the universe through the COSMOS-Webb program, the largest general observer program selected for JWST’s first year.

Simulations by UCSC researchers showed how JWST can be used to map the distribution of galaxies in the early universe. The web-like structure in the background of this image is dark matter, and the yellow dots are galaxies that should be detected in the survey. (Image credit: Nicole Drakos)

Over the course of more than 200 hours, the COSMOS-Webb program will survey half a million galaxies with multiband, high-resolution, near-infrared imaging and an unprecedented 32,000 galaxies in mid-infrared.

“The COSMOS-Webb project is the largest contiguous area survey that will be executed with JWST for the foreseeable future,” Robertson said.

Robertson also serves on the steering committee for the JWST Advanced Deep Extragalactic Survey, or JADES, to produce infrared imaging and spectroscopy of unprecedented depth. Robertson and his team will put Morpheus to work classifying the survey’s findings.

Robertson and his team are also involved with another survey, dubbed PRIMER, to bring AI and machine learning classification capabilities to the effort.

From Studying the Stars to Studying Ourselves

All these efforts promise to help humanity survey — and understand — far more of our universe than ever before. But perhaps the most surprising application Robertson has found for Morpheus is here at home.

“We’ve actually trained Morpheus to go back into satellite data and automatically count up how much sea ice is present in the North Atlantic over time,” Robertson said, adding it could help scientists better understand and model climate change.

As a result, a tool developed to help us better understand the history of our universe may soon help us better predict the future of our own small place in it.

FEATURED IMAGE CREDIT: NASA

 

The post Stunning Insights from James Webb Space Telescope Are Coming, Thanks to GPU-Powered Deep Learning appeared first on NVIDIA Blog.

Read More

Neural Face Video Compression using Multiple Views

Recent advances in deep generative models led to the development of neural face video compression codecs that use an order of magnitude less bandwidth than engineered codecs. These neural codecs reconstruct the current frame by warping a source frame and using a generative model to compensate for imperfections in the warped source frame. Thereby, the warp is encoded and transmitted using a small number of keypoints rather than a dense flow field, which leads to massive savings compared to traditional codecs. However, by relying on a single source frame only, these methods lead to inaccurate…Apple Machine Learning Research

Collin Stultz named co-director and MIT lead of the Harvard-MIT Program in Health Sciences and Technology

Collin M. Stultz, the Nina T. and Robert H. Rubin Professor in Medical Engineering and Science at MIT, has been named co-director of the Harvard-MIT Program in Health Sciences and Technology (HST), and associate director of MIT’s Institute for Medical Engineering and Science (IMES), effective June 1. IMES is HST’s home at MIT.

Stultz is a professor of electrical engineering and computer science at MIT, a core faculty member in IMES, a member of the HST faculty, and a practicing cardiologist at Massachusetts General Hospital (MGH). He is also a member of the Research Laboratory of Electronics, and an associate member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Anantha P. Chandrakasan, dean of the MIT School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science, praised the appointment, saying “Professor Stultz’s remarkable leadership, commitment to teaching excellence, and unwavering devotion to pursuing advancements in human health, will undoubtedly help to reinforce and bolster the missions of both IMES and HST.”

Stultz is succeeding Emery N. Brown, who was first to serve as HST’s co-director at MIT, following the establishment of IMES in 2012. (Wolfram Goessling is the co-director of HST at Harvard University.) Brown, the Edward Hood Taplin Professor of Medical Engineering and of Computational Neuroscience at MIT, will now be focusing on the establishment of a new joint center between MIT and MGH that will use the study of anesthesia to design novel approaches to controlling brain states, with a goal of improving anesthesia and intensive care management.

“It was a pleasure and honor for me to shepherd HST for the last 10 years,” Brown says. “I am certain that Collin will be a phenomenal co-director. He is a highly accomplished scientist, a master clinician, and a committed educator.“

George Q. Daley, dean of Harvard Medical School and an HST alumnus, says, “I am thrilled that HST’s new co-director will be a Harvard Medical School alumnus who completed clinical training and practice at our affiliated hospitals. Dr. Stultz’s remarkable expertise in computer science and AI will engender positive change as we reinvigorate this historic Harvard-MIT collaboration and redefine the scope of what it means to be a physician-scientist in the 21st century.”

Elazer R. Edelman, the Edward J. Poitras Professor in Medical Engineering and Science and the director of IMES, also an HST alumnus, lauded the appointment, saying, “We are so excited by the future, using the incredible vision of Professor Stultz, his legacy of accomplishment, his commitment to mentorship, and his innate ability to meld excellence in science and medicine, engineering, and physiology to propel us forward. Everything Professor Stultz has done predicates him and HST for success. “

Goessling says he looks forward to working with Stultz in his new role. “I have known Collin since our residency days at Brigham and Women’s Hospital where we cared for patients together. I am truly excited to work collaboratively and synergistically with him to now take care of our students together, to innovate our education programs and continue the legacy of success for HST.”

Stultz earned his BA magna cum laude in mathematics and philosophy from Harvard University in 1988; a PhD in biophysics from Harvard in 1997; and an MD magna cum laude from Harvard Medical School, also in 1997. Stultz then went on to complete an internship and residency in internal medicine, followed by a fellowship in cardiovascular medicine, at the Brigham and Women’s Hospital before joining the faculty at MIT in 2004.

Stultz once said that his research focus at MIT is twofold: “the study of small things you can’t see with the naked eye, and the study of big things that you can,” and his scientific contributions have similarly spanned a wide range of length scales. As a graduate student in the laboratory of Martin Karplus — winner of the 2013 Nobel Prize in Chemistry — Stultz helped to develop computational methods for designing ligands to flexible protein targets. As a junior faculty member at MIT, his group leveraged computational biophysics and experimental biochemistry to model disordered proteins that play important roles in human disease. More recently, his research has focused on the development and application of machine learning methods that enable health care providers to gain insight into patient-specific physiology, using clinical data that are routinely obtained in both clinical and ambulatory settings. 

Stultz is a member of the American Society for Biochemistry and Molecular Biology, the Federation of American Societies for Experimental Biology, and a fellow of the American Institute for Medical and Biomedical Engineering. He is a past recipient of an Irving M. London teaching award, a National Science Foundation CAREER Award, a Burroughs Wellcome Fund Career Award in the Biomedical Sciences, and he is a recent Phi Beta Kappa visiting scholar. 

“Following in the footsteps of a scholar as renowned as Emery Brown is daunting; however, I am extraordinarily optimistic about what HMS, HST, and MIT can accomplish in the years to come,” Stultz says. “I look forward to working with Elazer, Anantha, Wolfram, and the leadership at HMS to advance the educational mission of HST on the HMS campus, and throughout the MIT ecosystem.”

Read More

End-to-end Generative Pre-training for Multimodal Video Captioning

Multimodal video captioning systems utilize both the video frames and speech to generate natural language descriptions (captions) of videos. Such systems are stepping stones towards the longstanding goal of building multimodal conversational systems that effortlessly communicate with users while perceiving environments through multimodal input streams.

Unlike video understanding tasks (e.g., video classification and retrieval) where the key challenge lies in processing and understanding multimodal input videos, the task of multimodal video captioning includes the additional challenge of generating grounded captions. The most widely adopted approach for this task is to train an encoder-decoder network jointly using manually annotated data. However, due to a lack of large-scale, manually annotated data, the task of annotating grounded captions for videos is labor intensive and, in many cases, impractical. Previous research such as VideoBERT and CoMVT pre-train their models on unlabelled videos by leveraging automatic speech recognition (ASR). However, such models often cannot generate natural language sentences because they lack a decoder, and thus only the video encoder is transferred to the downstream tasks.

In “End-to-End Generative Pre-training for Multimodal Video Captioning”, published at CVPR 2022, we introduce a novel pre-training framework for multimodal video captioning. This framework, which we call multimodal video generative pre-training or MV-GPT, jointly trains a multimodal video encoder and a sentence decoder from unlabelled videos by leveraging a future utterance as the target text and formulating a novel bi-directional generation task. We demonstrate that MV-GPT effectively transfers to multimodal video captioning, achieving state-of-the-art results on various benchmarks. Additionally, the multimodal video encoder is competitive for multiple video understanding tasks, such as VideoQA, text-video retrieval, and action recognition.

Future Utterance as an Additional Text Signal
Typically, each training video clip for multimodal video captioning is associated with two different texts: (1) a speech transcript that is aligned with the clip as a part of the multimodal input stream, and (2) a target caption, which is often manually annotated. The encoder learns to fuse information from the transcript with visual contents, and the target caption is used to train the decoder for generation. However, in the case of unlabelled videos, each video clip comes only with a transcript from ASR, without a manually annotated target caption. Moreover, we cannot use the same text (the ASR transcript) for the encoder input and decoder target, since the generation of the target would then be trivial.

MV-GPT circumvents this challenge by leveraging a future utterance as an additional text signal and enabling joint pre-training of the encoder and decoder. However, training a model to generate future utterances that are often not grounded in the input content is not ideal. So we apply a novel bi-directional generation loss to reinforce the connection to the input.

Bi-directional Generation Loss
The issue of non-grounded text generation is mitigated by formulating a bi-directional generation loss that includes forward and backward generation. Forward generation produces future utterances given visual frames and their corresponding transcripts and allows the model to learn to fuse the visual content with its corresponding transcript. Backward generation takes the visual frames and future utterances to train the model to generate a transcript that contains more grounded text of the video clip. Bi-directional generation loss in MV-GPT allows the encoder and the decoder to be trained to handle visually grounded texts.

Bi-directional generation in MV-GPT. A model is trained with two generation losses. In forward generation, the model generates a future utterance (blue boxes) given the frames and the present utterance (red boxes), whereas the present is generated from the future utterance in backward generation. Two special beginning-of-sentence tokens ([BOS-F] and [BOS-B]) initiate forward and backward generation for the decoder.

Results on Multimodal Video Captioning
We compare MV-GPT to existing pre-training losses using the same model architecture, on YouCook2 with standard evaluation metrics (Bleu-4, Cider, Meteor and Rouge-L). While all pre-training techniques improve captioning performances, it is critical to pre-train the decoder jointly to improve model performance. We demonstrate that MV-GPT outperforms the previous state-of-the-art joint pre-training method by over 3.5% with relative gains across all four metrics.

Pre-training Loss Pre-trained Parts Bleu-4 Cider Meteor Rouge-L
No Pre-training N/A 13.25 1.03 17.56 35.48
CoMVT Encoder 14.46 1.24 18.46 37.17
UniVL Encoder + Decoder 19.95 1.98 25.27 46.81
MV-GPT (ours) Encoder + Decoder 21.26 2.14 26.36 48.58
MV-GPT performance across four metrics (Bleu-4, Cider, Meteor and Rouge-L) of different pre-training losses on YouCook2. “Pre-trained parts” indicates which parts of the model are pre-trained — only the encoder or both the encoder and decoder. We reimplement the loss functions of existing methods but use our model and training strategies for a fair comparison.

We transfer a model pre-trained by MV-GPT to four different captioning benchmarks: YouCook2, MSR-VTT, ViTT and ActivityNet-Captions. Our model achieves state-of-the-art performance on all four benchmarks by significant margins. For instance on the Meteor metric, MV-GPT shows over 12% relative improvements in all four benchmarks.

YouCook2 MSR-VTT ViTT ActivityNet-Captions
Best Baseline 22.35 29.90 11.00 10.90
MV-GPT (ours) 27.09 38.66 26.75 12.31
Meteor metric scores of the best baseline methods and MV-GPT on four benchmarks.

Results on Non-generative Video Understanding Tasks
Although MV-GPT is designed to train a generative model for multimodal video captioning, we also find that our pre-training technique learns a powerful multimodal video encoder that can be applied to multiple video understanding tasks, including VideoQA, text-video retrieval and action classification. When compared to the best comparable baseline models, the model transferred from MV-GPT shows superior performance in five video understanding benchmarks on their primary metrics — i.e., top-1 accuracy for VideoQA and action classification benchmarks, and recall at 1 for the retrieval benchmark.

Task Benchmark Best Comparable Baseline MV-GPT
VideoQA MSRVTT-QA 41.5 41.7
ActivityNet-QA 38.9 39.1
Text-Video Retrieval MSR-VTT 33.7 37.3
Action Recognition Kinetics-400 78.9 80.4
Kinetics-600 80.6 82.4
Comparisons of MV-GPT to best comparable baseline models on five video understanding benchmarks. For each dataset we report the widely used primary metric, i.e., MSRVTT-QA and ActivityNet-QA: Top-1 answer accuracy; MSR-VTT: Recall at 1; and Kinetics: Top-1 classification accuracy.

Summary
We introduce MV-GPT, a new generative pre-training framework for multimodal video captioning. Our bi-directional generative objective jointly pre-trains a multimodal encoder and a caption decoder by using utterances sampled at different times in unlabelled videos. Our pre-trained model achieves state-of-the-art results on multiple video captioning benchmarks and other video understanding tasks, namely VideoQA, video retrieval and action classification.

Acknowledgements
This research was conducted by Paul Hongsuck Seo, Arsha Nagrani, Anurag Arnab and Cordelia Schmid.

Read More

Create train, test, and validation splits on your data for machine learning with Amazon SageMaker Data Wrangler

In this post, we talk about how to split a machine learning (ML) dataset into train, test, and validation datasets with Amazon SageMaker Data Wrangler so you can easily split your datasets with minimal to no code.

Data used for ML is typically split into the following datasets:

  • Training – Used to train an algorithm or ML model. The model iteratively uses the data and learns to provide the desired result.
  • Validation – Introduces new data to the trained model. You can use a validation set to periodically measure model performance as training is happening, and also tune any hyperparameters of the model. However, validation datasets are optional.
  • Test – Used on the final trained model to assess its performance on unseen data. This helps determine how well the model generalizes.

Data Wrangler is a capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for ML applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code.

Today, we’re excited to announce a new data transformation to split datasets for ML use cases within Data Wrangler. This transformation splits your dataset into training, test, and optionally validation datasets without having to write any code.

Overview of the split data transformation

The split data transformation includes four commonly used techniques to split the data for training the model, validating the model, and testing the model:

  • Random split – Splits data randomly into train, test, and, optionally validation datasets using the percentage specified for each dataset. It ensures that the distribution of the data is similar in all datasets. Choose this option when you don’t need to preserve the order of your input data. For example, consider a movie dataset where the dataset is sorted by genre and you’re predicting the genre of the movie. A random split on this dataset ensures that the distribution of the data includes all genres in all three datasets.
  • Ordered split – Splits data in order, using the percentage specified for each dataset. An ordered split ensures that the data in each split is non-overlapping while preserving the order of the data. When training, we want to avoid past or future information leaking across datasets. The ordered split option prevents data leakage. For example, consider a scenario where you have customer engagement data for the first few months and you want to use this historical data to predict customer engagement in the next month. You can perform this split by providing an optional input column (numeric column). This operation uses the values of a numeric column to ensure that the data in each split doesn’t overlap while preserving the order. This helps avoid data leakage across splits. If no input column is provided, the order of the rows is used, so the data in each split still comes before the data in the next split. This is useful where the rows of the dataset are already ordered (for example, by date) and the model may need to be fit to earlier data and tested on later data.
  • Stratified split – Splits the dataset so that each split is similar with respect to a column specifying different categories for your data, for example, size or country. This split ensures that the train, test, and validation datasets have the same proportions for each category as the input dataset. This is useful with classification problems where we’re trying to ensure that the train and test sets have approximately the same percentage of samples of each target class. Choose this option if you have imbalanced data across different categories and you need to have it balanced across split datasets.
  • Split by key – Takes one or more columns as input (the key) and ensures that no combination of values across the input columns occurs in more than one of the splits (split by key). This is useful to avoid data leakage for unordered data. Choose this option if your data for key columns needs to be in the same split. For example, consider customer transactions split by customer ID; the split ensures that customer IDs don’t overlap across split datasets.

Solution overview

For this post, we demonstrate how to split data into train, test, and validation datasets using the four new split options in Data Wrangler. We use a hotel booking dataset available publicly on Kaggle, which has the year, month, and date that bookings were made, along with reservation statuses, cancellations, repeat customers, and other features.

Prerequisites

Before getting started, upload the dataset to an Amazon Simple Storage Service (S3) bucket, then import it into Data Wrangler. For instructions, refer to Import data from Amazon S3.

Random split

After we import the data into Data Wrangler, we start the transformation. We first demonstrate a random split.

  1. On the Data Wrangler console, choose the plus sign and choose Add transform.
  2. To add the split data transformation, choose Add step.

    You’re redirected to the page where all transformations are displayed.
  3. Scroll down the list and choose Split data.

    The split data transformation has a drop-down menu that lists the available transformations to split your data, which include random, ordered, stratified, and split by key. By default, Randomized split is displayed.
  4. Choose the default value Randomized split.
  5. In the Splits section, enter the name Train with an 0.8 split percentage, and Test with a 0.2 percentage.
  6. Choose the plus sign to add an additional split.
  7. Add the Validation split with 0.2, and adjust Train to 0.7 and Test to 0.1.
    The split percentage can be any value you want, provided all three splits sum to 1 (100%).We can also specify optional fields like Error threshold and Random seed. We can achieve an exact split by setting the error threshold to 0. A smaller error threshold can lead to more processing time for splitting the data. This allows you to control the trade-off between time and accuracy on the operation. The Random seed option is for reproducibility. If not specified, Data Wrangler uses a default random seed value. We leave it blank for the purpose of this post.
  8. To preview your data split, choose Preview.

    The preview page displays the data split. You can choose Train, Test, or Validation on the drop-down menu to review the details of each split.
  9. When you’re satisfied with your data split, choose Add to add the transformation to your Data Wrangler flow.

To analyze the train dataset, choose Add analysis.

You can perform a similar analysis on the validation and test datasets.

Ordered split

We now use the hotel bookings dataset to demonstrate an ordered split transformation. The hotel dataset contains rows ordered by date.

  1. Repeat the steps to add a split, and choose Ordered split on the drop-down menu.
  2. Specify your three splits and desired percentages.
  3. Preview your data and choose Add to add the transformation to the Data Wrangler flow.
  4. Use the Add analysis option to verify the splits.

Stratified split

In the hotel booking dataset, we have an is_cancelled column, which indicates whether the booking was cancelled or not. We want to use this column to split the data. A stratified split ensures that the train, test, and validation datasets have same percentage of samples of is_cancelled.

  1. Repeat the steps to add a transformation, and choose Stratified split.
  2. Specify your three splits and desired percentages.
  3. For Input column, choose is_canceled.
  4. Preview your data and choose Add to add the transformation to the Data Wrangler flow.
  5. Use the Add analysis option to verify the splits.

Split by key

The split by key transformation splits the data by the key or multiple keys we specify. This split is useful to avoid having the same data in the split datasets created during transformation and to avoid data leakage.

  1. Repeat the steps to add a transformation, and choose Split by key.
  2. Specify your three splits and desired percentages.
  3. For Key column, we can specify the columns to form the key. For this post, choose the following columns:
    1. is_cancelled
    2. arrival_date_year
    3. arrival_date_month
    4. arrival_date_week_number
    5. reservation_status
  4. Preview your data and choose Add to add the transformation to the Data Wrangler flow.
  5. Use the Add analysis option to verify the splits.

Considerations

The node labeled as Data types cannot be deleted. Deleting a split node deletes all its datasets and downstream datasets and its nodes.

Conclusion

In this post, we demonstrated how to split an input dataset into train, test, and validation datasets with Data Wrangler using the split techniques random, ordered, stratified, and split by key.

To learn more about using data flows with Data Wrangler, refer to Create and Use a Data Wrangler Flow. To get started with Data Wrangler, see Prepare ML Data with Amazon SageMaker Data Wrangler.


About the Authors

Gopi Mudiyala is a Senior Technical Account Manager at AWS. He helps customers in the Financial Services Industry with their operations in AWS. As a machine learning specialist, Gopi works to support customers succeed in their ML journey.

Patrick Lin is a Software Development Engineer with Amazon SageMaker Data Wrangler. He is committed to making Amazon SageMaker Data Wrangler the number one data preparation tool for productionized ML workflows. Outside of work, you can find him reading, listening to music, having conversations with friends, and serving at his church.

Xiyi Li is a Front End Engineer at Amazon SageMaker Data Wrangler. She helps support Amazon SageMaker Data Wrangler and is passionate about building products that provide a great user experience. Outside of work, she enjoys hiking and listening to classical music.

Vishaal Kapoor is a Senior Applied Scientist with AWS AI. He is passionate about helping customers understand their data in Data Wrangler. In his spare time, he mountain bikes, snowboards, and spends time with his family.

Read More