Customize pronunciations using Amazon Polly

Amazon Polly breathes life into text by converting it into lifelike speech. This empowers developers and businesses to create applications that can converse in real time, thereby offering an enhanced interactive experience. Text-to-speech (TTS) in Amazon Polly supports a variety of languages and locales, which enables you to perform TTS conversion according to your preferences. Multiple factors guide this choice, such as geographic location and language locales.

Amazon Polly uses advanced deep learning technologies to synthesize text to speech in real time in various output formats, such as MP3, ogg vorbis, JSON, or PCM, across standard and neural engines. The Speech Synthesis Markup Language (SSML) support for Amazon Polly further bolsters the service’s capability to customize speech with a plethora of options, including controlling speech rate and volume, adding pauses, emphasizing certain words or phrases, and more.

In today’s world, businesses continue to expand across multiple geographic locations, and they’re continuously looking for mechanisms to improve personalized end-user engagement. For instance, you may require accurate pronunciation of certain words in a specific style pertaining to different geographical locations. Your business may also need to pronounce certain words and phrases in certain ways depending on their intended meaning. You can achieve this with the help of SSML tags provided by Amazon Polly.

This post aims to assist you in customizing pronunciation when dealing with a truly global customer base.

Modify pronunciation using phonemes

A phoneme can be considered as the smallest unit of speech. The <phoneme> SSML tag in Amazon Polly helps customize pronunciation based on phonemes using the IPA (International Phonetic Alphabets) or X-SAMPA (Extended Speech Assessment Methods Phonetic Alphabet). X-SAMPA is a representation of IPA in ASCII encoding. Phoneme tags are available and fully supported in both the standard and neural TTS engine. For example, the word “lead” can be pronounced as the present tense verb, or it can refer to the chemical element lead. We will discuss this with an example further in this blog post.

International Phonetic Alphabet

The IPA is used to portray sounds across different languages. For a list of phonemes Amazon Polly supports, refer to Phoneme and Viseme Tables for Supported Languages.

By default, Amazon Polly determines the pronunciation of the word in a specific format. Let’s use the example of the word “lead,” which can have different pronunciations when referring to the chemical element or the verb. In this example, when we provide the word “lead” as input, it’s spoken in the present tense form (without the use of any customizing SSML tags). The default pronunciation for L E A D by Amazon Polly is the present tense form of “lead.”

<speak>
The default pronunciation by Amazon Polly for L E A D is <break time = "300ms"/> lead,
which is the present tense form.
</speak>

To return the pronunciation of the chemical element lead (which can also be the verb in past tense), we can use phonemes along with IPA or X-SAMPA. IPA is generally used to customize the pronunciation of a word in a given language using phonemes:

<speak>
This is the pronunciation using the
<say-as interpret-as="characters">IPA</say-as> attribute
in the <say-as interpret-as="characters">SSML</say-as> tag. 
The verb form for L E A D is <break time="150ms"/> lead.
The chemical element <break time="150ms"/><phoneme alphabet="ipa" ph="lɛd">lead</phoneme> 
<break time="300ms"/>also has an identical spelling.
</speak>

Modify pronunciation by specifying parts of speech

If we consider the same example of pronouncing “lead,” we can also differentiate between the chemical element and the verb by specifying the parts of speech using the <w> SSML tag.

The <w> tag allows us to customize pronunciation by specifying parts of speech. You can configure the pronunciation in terms of verb (present simple or past tense), noun, adjective, preposition, and determiner. See the following example:

<speak>
The word<p> <say-as interpret-as="characters">lead</say-as></p> 
may be interpreted as either the present simple form <w role="amazon:VB">lead</w>, 
or the chemical element <w role="amazon:SENSE_1">lead</w>.
</speak>

Additionally, you can use the <sub> tag to indicate the pronunciation of acronyms and abbreviations:

<speak>
Polly is an <sub alias="Amazon Web Services">AWS</sub> 
offering providing text-to-Speech service. 
</speak>

Extended Speech Assessment Methods Phonetic Alphabet

The X-SAMPA transcription scheme is an extrapolation to the various language-specific SAMPA phoneme sets available.

The following snippet shows how you can use X-SAMPA to pronounce different variations of the word “lead”:

<speak>
This is the pronunciation using the X-SAMPA attribute, 
in the verb form <break time="1s"/> lead.
The chemical element <break time="1s"/> 
<phoneme alphabet='x-sampa' ph='lEd'>lead</phoneme> <break time="0.5s"/>
also has an identical spelling.
</speak>

The stress mark in IPA is usually represented by ˈ. We often encounter scenarios in which an apostrophe is used instead, which might give a different output than expected. In X-SAMPA, the stress mark is the double quotation mark, therefore we should use a single quotation mark for the word and specify the phonemic alphabet. See the following example:

<speak>
You say, <phoneme alphabet="ipa" ph="pɪˈkɑːn">pecan</phoneme>. 
</speak>

In the example above, we can see the character ˈ used for stressing the word. Similarly, the stress mark in X-SAMPA is shown in double quotation below:

<speak>
You say, <phoneme alphabet='x-sampa' ph='pI"kA:n'>pecan</phoneme>.
</speak>

Modify pronunciations using other SSML tags

You can use the <say as> tag to modify pronunciation by enabling the spell-out or character feature. Furthermore, it enhances pronunciations in terms of digits, fractions, unit, date, time, address, telephone, cardinal, and ordinal, and can also censor the text enclosed within the tag. For more information, refer to Controlling How Special Types of Words Are Spoken. Let’s look at examples of these attributes.

Date

By default, Amazon Polly speaks out different text inputs. However, for handling specific attributes such as dates, you can use the date attribute to customize pronunciation in the required format, such as month-day-year or day-month-year.

Without the date attribute, Amazon Polly provides the following output when speaking out dates:

<speak>
The default pronunciation when using date is 01-11-1996
</speak>

However, if you want the dates spoken in a specific format, the date attribute in the <say-as> tags helps customize the pronunciation:

<speak>
We will see the examples of different date formats using the date SSML tag.
The following date is written in the day-month-year format.
<say-as interpret-as="date" format="dmy">01-11-1995</say-as><break time="500ms"/>
The following date is written in the month-day-year format.
<say-as interpret-as="date" format="mdy">09-24-1995</say-as>
</speak>

Cardinal

This attribute represents a number in its cardinal format. For example, 124456 is pronounced “one hundred twenty four thousand four hundred fifty six”:

<speak> 
The following number is pronounced in it's cardinal form.
<say-as interpret-as="cardinal">124456</say-as>
</speak>

Ordinal

This attribute represents a number in its ordinal format. Without the ordinal attribute, the number is pronounced in its numerical form:

<speak>
The following number is pronounced in it's ordinal form 
without the use of any SSML attribute in the say as tag - 1242 
</speak>

If we want to pronounce 1242 as “one thousand two hundred forty second,” we can use the ordinal attribute:

<speak>
The following number is pronounced in it's ordinal form.
<say-as interpret-as="ordinal">1242</say-as>
</speak>

Digits

The digits attribute is used to speak out the numbers. For example, “1234” is pronounced as “one two three four”:

<speak>
The following number is pronounced as individual digits.
<say-as interpret-as="digits">1242</say-as>
</speak>

Fraction

The fraction attribute is used to customize the pronunciations in the fractional form:

<speak> 
The following are examples of pronunciations when 
<prosody volume="loud"> fraction</prosody>
is used as an attribute in the say -as tag. 
<break time="500ms"/>Seven one by two is pronounced as
<say-as interpret-as="fraction">7 ½ </say-as>
whereas three by twenty is pronounced as <say-as interpret-as="fraction">3/20</say-as>
</speak>

Time

The time attribute is used to measure the time across minutes and seconds:

<speak>
Polly also supports customizing pronunciation in terms of minutes and seconds. 
For example, <say-as interpret-as="time">2'42"</say-as>
</speak>

Expletive

The expletive attribute censors the text enclosed within the tags:

<speak> 
The value that is going to be censored is
<say-as interpret-as="expletive">this is not good</say-as>
You should have heard the beep sound.
</speak>

Telephone

To pronounce telephone numbers, you can use the telephone attribute to speak out telephone numbers instead of pronouncing them as standalone digits or as a cardinal number:

<speak>
The telephone number is 
<say-as interpret-as="telephone">1800 3000 9009</say-as>
</speak>

Address

The address attribute is used to customize the pronunciation of an address aligning to a specific format:

<speak> 
The address is<break time="1s"/>
<say-as interpret-as="address">440 Terry Avenue North, Seattle
WA 98109 USA</say-as>
</speak>

Lexicons

We’ve looked at some of the SSML tags readily available in Amazon Polly. Other use cases might require a higher degree of control for customized pronunciations. Lexicons help achieve this requirement. You can use lexicons when certain words need to be pronounced in a certain form that is uncommon to that specific language.

Another use case for lexicons is with the use of numeronyms, which are abbreviations formed with the help of numbers. For example, Y2K is pronounced as the “year 2000.” You can use lexicons to customize these pronunciations.

Amazon Polly supports lexicon files in .pls and .xml formats. For more information, see Managing Lexicons.

Conclusion

Amazon Polly SSML tags can help you customize pronunciation in a variety of ways. We hope that this post gives you a head start into the world of speech synthesis and powers your applications to provide more lifelike human interactions.


About the Authors

Abilashkumar P C is a Cloud Support Engineer at AWS. He works with customers providing technical troubleshooting guidance, helping them achieve their workloads at scale. Outside of work, he loves driving, following cricket, and reading.

Abhishek Soni is a Partner Solutions Architect at AWS. He works with customers to provide technical guidance for the best outcome of workloads on AWS.

Read More

Demystifying machine learning at the edge through real use cases

Edge is a term that refers to a location, far from the cloud or a big data center, where you have a computer device (edge device) capable of running (edge) applications. Edge computing is the act of running workloads on these edge devices. Machine learning at the edge (ML@Edge) is a concept that brings the capability of running ML models locally to edge devices. These ML models can then be invoked by the edge application. ML@Edge is important for many scenarios where raw data is collected from sources far from the cloud. These scenarios may also have specific requirements or restrictions:

  • Low-latency, real-time predictions
  • Poor or non-existing connectivity to the cloud
  • Legal restrictions that don’t allow sending data to external services
  • Large datasets that need to be preprocessed locally before sending responses to the cloud

The following are some of many use cases that can benefit from ML models running close to the equipment that generates the data used for the predictions:

  • Security and safety – A restricted area where heavy machines operate in an automated port is monitored by a camera. If a person enters this area by mistake, a safety mechanism is activated to stop the machines and protect the human.
  • Predictive maintenance – Vibration and audio sensors collect data from a gearbox of a wind turbine. An anomaly detection model processes the sensor data and identifies if anomalies with the equipment. If an anomaly is detected, the edge device can start a contingency measurement in real time to avoid damaging the equipment, like engage the breaks or disconnect the generator from the grid.
  • Defect detection in production lines – A camera captures images of products on a conveyor belt and process the frames with an image classification model. If a defect is detected, the product can be discarded automatically without manual intervention.

Although ML@Edge can address many use cases, there are complex architectural challenges that need to be solved in order to have a secure, robust, and reliable design. In this post, you learn some details about ML@Edge, related topics, and how to use AWS services to overcome these challenges and implement a complete solution for your ML at the edge workload.

ML@Edge overview

There is a common confusion when it comes to ML@Edge and Internet of Things (IoT), therefore it’s important to clarify how ML@Edge is different from IoT and how they both could come together to provide a powerful solution in certain cases.

An edge solution that uses ML@Edge has two main components: an edge application and an ML model (invoked by the application) running on the edge device. ML@Edge is about controlling the lifecycle of one or more ML models deployed to a fleet of edge devices. The ML model lifecycle can start on the cloud side (on Amazon SageMaker, for instance) but normally ends on a standalone deployment of the model on the edge device. Each scenario demands different ML model lifecycles that can be composed by many stages, such as data collection; data preparation; model building, compilation, and deployment to the edge device; model loading and running; and repeating the lifecycle.

The ML@Edge mechanism is not responsible for the application lifecycle. A different approach should be adopted for that purpose. Decoupling the ML model lifecycle and application lifecycle gives you the freedom and flexibility to keep evolving them at different paces. Imagine a mobile application that embeds an ML model as a resource like an image or XML file. In this case, each time you train a new model and want to deploy it to the mobile phones, you need to redeploy the whole application. This consumes time and money, and can introduce bugs to your application. By decoupling the ML model lifecycle, you publish the mobile app one time and deploy as many versions of the ML model as you need.

But how does IoT correlate to ML@Edge? IoT relates to physical objects embedded with technologies like sensors, processing ability, and software. These objects are connected to other devices and systems over the internet or other communication networks, in order to exchange data. The following figure illustrates this architecture. The concept was initially created when thinking of simple devices that just collect data from the edge, perform simple local processing, and send the result to a more powerful computing unity that runs analytics processes that help people and companies in their decision-making. The IoT solution is responsible for controlling the edge application lifecycle. For more information about IoT, refer to Internet of things.

If you already have an IoT application, you can add ML@Edge capabilities to make the product more efficient, as shown in the following figure. Keep in mind that ML@Edge doesn’t depend on IoT, but you can combine them to create a more powerful solution. When you do that, you improve the potential of your simple device to generate real-time insights for your business faster than just sending data to the cloud for later processing.

If you’re creating a new edge solution from scratch with ML@Edge capabilities, it’s important to design a flexible architecture that supports both the application and ML model lifecycles. We provide some reference architectures for edge applications with ML@Edge later in this post. But first, let’s dive deeper into edge computing and learn how to choose the correct edge device for your solution, based on the restrictions of the environment.

Edge computing

Depending on how far the device is from the cloud or a big data center (base), three main characteristics of the edge devices need to be considered to maximize performance and longevity of the system: computing and storage capacity, connectivity, and power consumption. The following diagram shows three groups of edge devices that combine different specifications of these characteristics, depending on how far from they are from the base.

The groups are as follows:

  • MECs (Multi-access Edge Computing) – MECs or small data centers, characterized by low or ultra-low latency and high bandwidth, are common environments where ML@Edge can bring benefits without big restrictions when compared to cloud workloads. 5G antennas and servers at factories, warehouses, laboratories, and so on with minimal energy constraints and with good internet connectivity offer different ways to run ML models on GPUs and CPUs, virtual machines, containers, and bare-metal servers.
  • Near edge – This is when mobility or data aggregation are requirements and the devices have some constraints regarding power consumption and processing power, but still have some reliable connectivity, although with higher latency, with limited throughput and more expensive than “close to the edge.” Mobile applications, specific boards to accelerate ML models, or simple devices with capacity to run ML models, covered by wireless networks, are included in this group.
  • Far edge – In this extreme scenario, edge devices have severe power consumption or connectivity constraints. Consequently, processing power is also restricted in many far edge scenarios. Agriculture, mining, surveillance and security, and maritime transportation are some areas where far edge devices play an important role. Simple boards, normally without GPUs or other AI accelerators, are common. They are designed to load and run simple ML models, save the predictions in a local database, and sleep until the next prediction cycle. The devices that need to process real-time data can have big local storages to avoid losing data.

Challenges

It’s common to have ML@Edge scenarios where you have hundreds or thousands (maybe even millions) of devices running the same models and edge applications. When you scale your system, it’s important to have a robust solution that can manage the number of devices that you need to support. This is a complex task and for these scenarios, you need to ask many questions:

  • How do I operate ML models on a fleet of devices at the edge?
  • How do I build, optimize, and deploy ML models to multiple edge devices?
  • How do I secure my model while deploying and running it at the edge?
  • How do I monitor my model’s performance and retrain it, if needed?
  • How do I eliminate the need of installing a big framework like TensorFlow or PyTorch on my restricted device?
  • How do I expose one or multiple models with my edge application as a simple API?
  • How do I create a new dataset with the payloads and predictions captured by the edge devices?
  • How do I do all these tasks automatically (MLOps plus ML@Edge)?

In the next section, we provide answers to all these questions through example use cases and reference architectures. We also discuss which AWS services you can combine to build complete solutions for each of the explored scenarios. However, if you want to start with a very simple flow that describes how to use some of the services provided by AWS to create your ML@Edge solution, this is an example:

With SageMaker, you can easily prepare a dataset and build the ML models that are deployed to the edge devices. With Amazon SageMaker Neo, you can compile and optimize the model you trained to the specific edge device you chose. After compiling the model, you only need a light runtime to run it (provided by the service). Amazon SageMaker Edge Manager is responsible for managing the lifecycle of all ML models deployed to your fleet of edge devices. Edge Manager can manage fleets of up to millions of devices. An agent, installed to each one of the edge devices, exposes the deployed ML models as an API to the application. The agent is also responsible for collecting metrics, payloads, and predictions that you can use for monitoring or building a new dataset to retrain the model if needed. Finally, with Amazon SageMaker Pipelines, you can create an automated pipeline with all the steps required to build, optimize, and deploy ML models to your fleet of devices. This automated pipeline can then be triggered by simple events you define, without human intervention.

Use case 1

Let’s say an airplane manufacturer wants to detect and track parts and tools in the production hangar. To improve productivity, all the required parts and correct tools need to be available for the engineers at each stage of production. We want to be able to answer questions like: Where is part A? or Where is tool B? We have multiple IP cameras already installed and connected to a local network. The cameras cover the entire hangar and can stream real-time HD video through the network.

AWS Panorama fits in nicely in this case. AWS Panorama provides an ML appliance and managed service that enables you to add computer vision (CV) to your existing fleet of IP cameras and automate. AWS Panorama gives you the ability to add CV to your existing Internet Protocol (IP) cameras and automate tasks that traditionally require human inspection and monitoring.

In the following reference architecture, we show the major components of the application running on an AWS Panorama Appliance. The Panorama Application SDK makes it easy to capture video from camera streams, perform inference with a pipeline of multiple ML models, and process the results using Python code running inside a container. You can run models from any popular ML library such as TensorFlow, PyTorch, or TensorRT. The results from the model can be integrated with business systems on your local area network, allowing you to respond to events in real time.

The solution consists of the following steps:

  1. Connect and configure an AWS Panorama device to the same local network.
  2. Train an ML model (object detection) to identify parts and tools in each frame.
  3. Build an AWS Panorama Application that gets the predictions from the ML model, applies a tracking mechanism to each object, and sends the results to a real-time database.
  4. The operators can send queries to the database to locate the parts and tools.

Use case 2

For our next use case, imagine we’re creating a dashcam for vehicles capable of supporting the driver in many situations, such as avoiding pedestrians, based on a CV25 board from Ambaralla. Hosting ML models on a device with limited system resources can be difficult. In this case, let’s assume we already have a well-established over-the-air (OTA) delivery mechanism in place to deploy the application components needed on to the edge device. However, we would still benefit from ability to do OTA deployment of the model itself, thereby isolating the application lifecycle and model lifecycle.

Amazon SageMaker Edge Manager and Amazon SageMaker Neo fit well for this use case.

Edge Manager makes it easy for ML edge developers to use the same familiar tools in the cloud or on edge devices. It reduces the time and effort required to get models to production, while allowing you to continuously monitor and improve model quality across your device fleet. SageMaker Edge includes an OTA deployment mechanism that helps you deploy models on the fleet independent of the application or device firmware. The Edge Manager agent allows you to run multiple models on the same device. The agent collects prediction data based on the logic that you control, such as intervals, and uploads it to the cloud so that you can periodically retrain your models over time. SageMaker Edge cryptographically signs your models so you can verify that it wasn’t tampered with as it moves from the cloud to edge device.

Neo is a compiler as a service and an especially good fit in this use case. Neo automatically optimizes ML models for inference on cloud instances and edge devices to run faster with no loss in accuracy. You start with an ML model built with one of supported frameworks and trained in SageMaker or anywhere else. Then you choose your target hardware platform, (refer to the list of supported devices). With a single click, Neo optimizes the trained model and compiles it into a package that can be run using the lightweight SageMaker Edge runtime. The compiler uses an ML model to apply the performance optimizations that extract the best available performance for your model on the cloud instance or edge device. You then deploy the model as a SageMaker endpoint or on supported edge devices and start making predictions.

The following diagram illustrates this architecture.

The solution workflow consists of the following steps:

  1. The developer builds, trains, validates, and creates the final model artefact that needs to be deployed to the dashcam.
  2. Invoke Neo to compile the trained model.
  3. The SageMaker Edge agent is installed and configured on the Edge device, in this case the dashcam.
  4. Create a deployment package with a signed model and the runtime used by the SageMaker Edge agent to load and invoke the optimized model.
  5. Deploy the package using the existing OTA deployment mechanism.
  6. The edge application interacts with the SageMaker Edge agent to do inference.
  7. The agent can be configured (if required) to send real-time sample input data from the application for model monitoring and refinement purposes.

Use case 3

Suppose your customer is developing an application that detects anomalies in the mechanisms of a wind turbine (like the gearbox, generator, or rotor). The goal is to minimize the damage on the equipment by running local protection procedures on the fly. These turbines are very expensive and located in places that aren’t easily accessible. Each turbine can be outfitted with an NVIDIA Jetson device to monitor sensor data from the turbine. We then need a solution to capture the data and use an ML algorithm to detect anomalies. We also need an OTA mechanism to keep the software and ML models on the device up to date.

AWS IoT Greengrass V2 along with Edge Manager fit well in this use case. AWS IoT Greengrass is an open-source IoT edge runtime and cloud service that helps you build, deploy, and manage IoT applications on your devices. You can use AWS IoT Greengrass to build edge applications using pre-built software modules, called components, that can connect your edge devices to AWS services or third-party services. This ability of AWS IoT Greengrass makes it easy to deploy assets to devices, including a SageMaker Edge agent. AWS IoT Greengrass is responsible for managing the application lifecycle, while Edge Manager decouples the ML model lifecycle. This gives you the flexibility to keep evolving the whole solution by deploying new versions of the edge application and ML models independently. The following diagram illustrates this architecture.

The solution consists of the following steps:

  1. The developer builds, trains, validates, and creates the final model artefact that needs to be deployed to the wind turbine.
  2. Invoke Neo to compile the trained model.
  3. Create a model component using Edge Manager with AWS IoT Greengrass V2 integration.
  4. Set up AWS IoT Greengrass V2.
  5. Create an inference component using AWS IoT Greengrass V2.
  6. The edge application interacts with the SageMaker Edge agent to do inference.
  7. The agent can be configured (if required) to send real-time sample input data from the application for model monitoring and refinement purposes.

Use case 4

For our final use case, let’s look at a vessel transporting containers, where each container has a couple of sensors and streams a signal to the compute and storage infrastructure deployed locally. The challenge is that we want to know the content of each container, and the condition of the goods based on temperature, humidity, and gases inside each container. We also want to track all the goods in each one of the containers. There is no internet connectivity throughout the voyage, and the voyage can take months. The ML models running on this infrastructure should preprocess the data and generate information to answer all our questions. The data generated needs to be stored locally for months. The edge application stores all the inferences in a local database and then synchronizes the results with the cloud when the vessel approaches the port.

AWS Snowcone and AWS Snowball from the AWS Snow Family could fit very well in this use case.

AWS Snowcone is a small, rugged, and secure edge computing and data migration device. Snowcone is designed to the OSHA standard for a one-person liftable device. Snowcone enables you to run edge workloads using Amazon Elastic Compute Cloud (Amazon EC2) computing, and local storage in harsh, disconnected field environments such as oil rigs, search and rescue vehicles, military sites, or factory floors, as well as remote offices, hospitals, and movie theaters.

Snowball adds more computing when compared to Snowcone and therefore may be a great fit for more demanding applications. The Compute Optimized feature provides an optional NVIDIA Tesla V100 GPU along with EC2 instances to accelerate an application’s performance in disconnected environments. With the GPU option, you can run applications such as advanced ML and full motion video analysis in environments with little or no connectivity.

On top of the EC2 instance, you have the freedom to build and deploy any type of edge solution. For instance: you can use Amazon ECS or other container manager to deploy the edge application, Edge Manager Agent and the ML model as individual containers. This architecture would be similar to Use Case 2 (except that it will work offline most of the time), with the addition of a container manager tool.

The following diagram illustrates this solution architecture.

To implement this solution, simply order your Snow device from the AWS Management Console and launch your resources.

Conclusion

In this post, we discussed the different aspects of edge that you may choose to work with based on your use case. We also discussed some of the key concepts around ML@Edge and how decoupling the application lifecycle and the ML model lifecycle gives you the freedom to evolve them without any dependency on each other. We emphasized how choosing the right edge device for your workload and asking the right questions during the solution process can help you work backward and narrow down the right AWS services. We also presented different use cases along with reference architectures to inspire you to create your own solutions that will work for your workload.


About the Authors

Dinesh Kumar Subramani is a Senior Solutions Architect with the UKIR SMB team, based in Edinburgh, Scotland. He specializes in artificial intelligence and machine learning. Dinesh enjoys working with customers across industries to help them solve their problems with AWS services. Outside of work, he loves spending time with his family, playing chess and enjoying music across genres.

Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. He likes playing with hardware and automation projects in his free time, and he has a particular interest for robotics.

Read More

Text summarization with Amazon SageMaker and Hugging Face

In this post, we show you how to implement one of the most downloaded Hugging Face pre-trained models used for text summarization, DistilBART-CNN-12-6, within a Jupyter notebook using Amazon SageMaker and the SageMaker Hugging Face Inference Toolkit. Based on the steps shown in this post, you can try summarizing text from the WikiText-2 dataset managed by fast.ai, available at the Registry of Open Data on AWS.

Global data volumes are growing at zettabyte scale as companies and consumers expand their use of digital products and online services. To better understand this growing data, machine learning (ML) natural language processing (NLP) techniques for text analysis have evolved to address use cases involving text summarization, entity recognition, classification, translation, and more. AWS offers pre-trained AWS AI services that can be integrated into applications using API calls and require no ML experience. For example, Amazon Comprehend can perform NLP tasks such as custom entity recognition, sentiment analysis, key phrase extraction, topic modeling, and more to gather insights from text. It can perform text analysis on a wide variety of languages for its various features.

Text summarization is a helpful technique in understanding large amounts of text data because it creates a subset of contextually meaningful information from source documents. You can apply this NLP technique to longer-form text documents and articles, enabling quicker consumption and more effective document indexing, for example to summarize call notes from meetings.

Hugging Face is a popular open-source library for NLP, with over 49,000 pre-trained models in more than 185 languages with support for different frameworks. AWS and Hugging Face have a partnership that allows a seamless integration through SageMaker with a set of AWS Deep Learning Containers (DLCs) for training and inference in PyTorch or TensorFlow, and Hugging Face estimators and predictors for the SageMaker Python SDK. These capabilities in SageMaker help developers and data scientists get started with NLP on AWS more easily. Processing texts with transformers in deep learning frameworks such as PyTorch is typically a complex and time-consuming task for data scientists, often leading to frustration and lack of efficiency when developing NLP projects. The rise of AI communities like Hugging Face, combined with the power of ML services in the cloud like SageMaker, accelerate and simplify the development of these text processing tasks. SageMaker helps you build, train, deploy, and operationalize Hugging Face models.

Text summarization overview

You can apply text summarization to identify key sentences within a document or identify key sentences across multiple documents. Text summarization can produce two types of summaries: extractive and abstractive. Extractive summaries don’t contain any machine-generated text and are a collection of important sentences selected from the input document. Abstractive summaries contain new human-readable phrases and sentences generated by the text summarization model. Most text summarization systems are based on extractive summarization because accurate abstractive text summarization is difficult to achieve.

Hugging Face has over 400 pre-trained state-of-the-art text summarization models available, implementing different combinations of NLP techniques. These models are trained on different datasets, uploaded and maintained by technology companies and members of the Hugging Face community. You can filter the models by most downloaded or most liked, and directly load them when using the summarization pipeline Hugging Face transformer API. The Hugging Face transformer simplifies the NLP implementation process so that high-performance NLP models can be fine-tuned to deliver text summaries, without requiring extensive ML operation knowledge.

Hugging Face text summarization models on AWS

SageMaker offers business analysts, data scientists, and MLOps engineers a choice of tools to design and operate ML workloads on AWS. These tools provide you with faster implementation and testing of ML models to achieve your optimal outcomes.

From the SageMaker Hugging Face Inference Toolkit, an open-source library, we outline three different ways to implement and host Hugging Face text summarization models using a Jupyter notebook:

  • Hugging Face summarization pipeline – Create a Hugging Face summarization pipeline using the “summarization” task identifier to use a default text summarization model for inference within your Jupyter notebook. These pipelines abstract away the complex code, offering novice ML practitioners a simple API to quickly implement text summarization without configuring an inference endpoint. The pipeline also allows the ML practitioner to select a specific pre-trained model and its associated tokenizer. Tokenizers prepare text to be ready as an input for the model by splitting text into words or subwords, which then are converted to IDs through a lookup table. For simplicity, the following code snippet provides for the default case when using pipelines. The DistilBART-CNN-12-6 model is one of the most downloaded summarization models on Hugging Face and is the default model for the summarization pipeline. The last line calls the pre-trained model to get a summary for the passed text given the provided two arguments.

    from transformers import pipeline
    
    summarizer = pipeline("summarization")
    summarizer("An apple a day, keeps the doctor away", min_length=5, max_length=20)

  • SageMaker endpoint with pre-trained model – Create a SageMaker endpoint with a pre-trained model from the Hugging Face Model Hub and deploy it on an inference endpoint, such as the ml.m5.xlarge instance in the following code snippet. This method allows experienced ML practitioners to quickly select specific open-source models, fine-tune them, and deploy the models onto high-performing inference instances.

    from sagemaker.huggingface import HuggingFaceModel
    from sagemaker import get_execution_role
    
    role = get_execution_role()
    
    # Hub Model configuration. https://huggingface.co/models
    hub = {
      'HF_MODEL_ID':'sshleifer/distilbart-cnn-12-6',
      'HF_TASK':'summarization'
    }
    
    # create Hugging Face Model Class
    huggingface_model = HuggingFaceModel(
        transformers_version='4.17.0',
        pytorch_version='1.10.2',
        py_version='py38',
        env=hub,
        role=role,
    )
    
    # deploy model to SageMaker Inference
    predictor = huggingface_model.deploy(initial_instance_count=1,instance_type="ml.m5.xlarge")

  • SageMaker endpoint with a trained model – Create a SageMaker model endpoint with a trained model stored in an Amazon Simple Storage Service (Amazon S3) bucket and deploy it on an inference endpoint. This method allows experienced ML practitioners to quickly deploy their own models stored on Amazon S3 onto high-performing inference instances. The model itself is downloaded from Hugging Face and compressed, and then can be uploaded to Amazon S3. This step is demonstrated in the following code snippet:

    from sagemaker.huggingface import HuggingFaceModel
    from sagemaker import get_execution_role
    
    role = get_execution_role()
    
    # create Hugging Face Model Class
    huggingface_model = HuggingFaceModel(
        transformers_version='4.17.0',
        pytorch_version='1.0.2',
        py_version='py38',
        model_data='s3://my-trained-model/artifacts/model.tar.gz',
        role=role,
    )
    
    # deploy model to SageMaker Inference
    predictor = huggingface_model.deploy(initial_instance_count=1,instance_type="ml.m5.xlarge")

AWS has several resources available to assist you in deploying your ML workloads. The Machine Learning Lens of the AWS Well Architected Framework recommends ML workloads best practices, including optimizing resources and reducing cost. These recommended design principles ensure that well architected ML workloads on AWS are deployed to production. Amazon SageMaker Inference Recommender helps you select the right instance to deploy your ML models at optimal inference performance and cost. Inference Recommender speeds up model deployment and reduces time to market by automating load testing and optimizing model performance across ML instances.

In the next sections, we demonstrate how to load a trained model from an S3 bucket and deploy it to a suitable inference instance.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Load the Hugging Face model to SageMaker for text summarization inference

Use the following code to download the Hugging Face pre-trained text summarization model DistilBART-CNN-12-6 and its tokenizer, and save them locally in SageMaker to your Jupyter notebook directory:

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

PRE_TRAINED_MODEL_NAME='sshleifer/distilbart-cnn-12-6'

model = BartForConditionalGeneration.from_pretrained(PRE_TRAINED_MODEL_NAME, cache_dir=hf_cache_dir)
model.save_pretrained('./models/bart_model/')

tokenizer = BartTokenizer.from_pretrained(PRE_TRAINED_MODEL_NAME)
tokenizer.save_pretrained('./models/bart_tokenizer/')

Compress the saved text summarization model and its tokenizer into tar.gz format and upload the compressed model artifact to an S3 bucket:

! tar -C models/ -czf model.tar.gz code/ bart_tokenizer/ bart_model/
from sagemaker.s3 import S3Uploader

file_key = 'model.tar.gz'
model_artifact = S3Uploader.upload(file_key,'s3://my-trained-model/artifacts')

Select an inference Docker container image to perform the text summarization inference. Define the Linux OS, PyTorch framework, and Hugging Face Transformer version and specify the Amazon Elastic Compute Cloud (Amazon EC2) instance type to run the container.

The Docker image is available in the Amazon Elastic Container Registry (Amazon ECR) of the same AWS account, and the link for that container image is returned as a URI.

from sagemaker.image_uris import retrieve

deploy_instance_type = 'ml.m5.xlarge'

pytorch_inference_image_uri = retrieve('huggingface',
                                       region=region,
                                       version='4.6.1',
                                       instance_type=deploy_instance_type,
                                       base_framework_version='pytorch1.8.1',
                                       image_scope='inference')

Define the text summarization model to be deployed by the selected container image performing inference. In the following code snippet, the compressed model uploaded to Amazon S3 is deployed:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker import get_execution_role

role = get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://my-trained-model/artifacts/model.tar.gz", # path to your trained sagemaker model
   image_uri=pytorch_inference_image_uri,
   role=role, # iam role with permissions to create an Endpoint
   transformers_version="4.6.1", # transformers version used
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1, 
   instance_type="ml.m5.xlarge"
)

Test the deployed text summarization model on a sample input:

# example request, you need to define "inputs"
data = {
   "text": "Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days."
}

# request
predictor.predict(data)

Use Inference Recommender to evaluate the optimal EC2 instance for the inference task

Next, create multiple payload samples of input text in JSON format and compress them into a single payload file. These payload samples are used by the Inference Recommender to compare inference performance between different EC2 instance types. Each of the sample payloads must match the JSON format shown earlier. You can get examples from the WikiText-2 dataset managed by fast.ai, available at the Registry of Open Data on AWS.

Upload the compressed text summarization model artifact and the compressed sample payload file to the S3 bucket. We uploaded the model in an earlier step, but for clarity we include the code to upload it again:

bucket = sagemaker.Session().default_bucket()

prefix = "sagemaker/inference-recommender"

model_archive_name = "model.tar.gz"
payload_archive_name = "payload.tar.gz"

sample_payload_url = sagemaker.Session().upload_data(
    payload_archive_name, bucket=bucket, key_prefix=prefix + "/inference"
)
model_url = sagemaker.Session().upload_data(
    model_archive_name, bucket=bucket, key_prefix=prefix + "/model"
)

Review the list of standard ML models available on SageMaker across common model zoos, such as NLP and computer vision. Select an NLP model to perform the text summarization inference:

import boto3
import pandas as pd

inference_client = boto3.client("sagemaker", region)

list_model_metadata_response = inference_client.list_model_metadata()

domains = []
frameworks = []
framework_versions = []
tasks = []
models = []

for model_summary in list_model_metadata_response["ModelMetadataSummaries"]:
    domains.append(model_summary["Domain"])
    tasks.append(model_summary["Task"])
    models.append(model_summary["Model"])
    frameworks.append(model_summary["Framework"])
    framework_versions.append(model_summary["FrameworkVersion"])

data = {
    "Domain": domains,
    "Task": tasks,
    "Framework": frameworks,
    "FrameworkVersion": framework_versions,
    "Model": models,
}

df = pd.DataFrame(data)

pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)
pd.set_option("display.colheader_justify", "center")
pd.set_option("display.precision", 3)

display(df.sort_values(by=["Domain", "Task", "Framework", "FrameworkVersion"]))

The following example uses the bert-base-cased NLP model. Register the text summarization model into the SageMaker model registry with the correctly identified domain, framework, and task from the previous step. The parameters for this example are shown at the beginning of the following code snippet.

Note the range of EC2 instance types to be evaluated by Inference Recommender under SupportedRealtimeInferenceInstanceTypes in the following code. Make sure that the service limits for the AWS account allow the deployment of these types of inference nodes.

ml_domain = "NATURAL_LANGUAGE_PROCESSING"
ml_task = "FILL_MASK"
model_name = "bert-base-cased"
dlc_uri = pytorch_inference_image_uri
framework = 'PYTORCH'
framework_version='1.6.0'

inference_client = boto3.client("sagemaker", region)

model_package_group_name = uuid.uuid1()

model_pacakge_group_response = inference_client.create_model_package_group(
    ModelPackageGroupName=str(model_package_group_name), ModelPackageGroupDescription="description"
)

model_package_version_response = inference_client.create_model_package(
    ModelPackageGroupName=str(model_package_group_name),
    ModelPackageDescription="InferenceRecommenderDemo",
    Domain=ml_domain,
    Task=ml_task,
    SamplePayloadUrl=sample_payload_url,
    InferenceSpecification={
        "Containers": [
            {
                "ContainerHostname": "huggingface-pytorch",
                "Image": dlc_uri,
                "ModelDataUrl": model_url,
                "Framework": framework,
                "FrameworkVersion": framework_version,
                "NearestModelName": model_name,
                "Environment": {
                    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
                    "SAGEMAKER_PROGRAM": "inference.py",
                    "SAGEMAKER_REGION": region,
                    "SAGEMAKER_SUBMIT_DIRECTORY": model_url,
                },
            },
        ],
        "SupportedRealtimeInferenceInstanceTypes": [
            "ml.t2.xlarge",
            "ml.c5.xlarge",
            "ml.m5.xlarge",
            "ml.m5d.xlarge",
            "ml.r5.xlarge",
            "ml.inf1.xlarge",
        ],
        "SupportedContentTypes": [
            "application/json",
        ],
        "SupportedResponseMIMETypes": ["application/json"],
    },
)

Create an Inference Recommender default job using the ModelPackageVersion resulting from the previous step. The uuid Python library is used to generate a unique name for the job.

from sagemaker import get_execution_role

client = boto3.client("sagemaker", region)

role = get_execution_role()
default_job = uuid.uuid1()
default_response = client.create_inference_recommendations_job(
    JobName=str(default_job),
    JobDescription="Job Description",
    JobType="Default",
    RoleArn=role,
    InputConfig={"ModelPackageVersionArn": model_package_version_response["ModelPackageArn"]},
)

You can get the status of the Inference Recommender job by running the following code:

inference_recommender_job = client.describe_inference_recommendations_job(
        JobName=str(default_job)
)

When the job status is COMPLETED, compare the inference latency, runtime, and other metrics of the EC2 instance types evaluated by the Inference Recommender default job. Select the suitable node type based on your use case requirements.

data = [
    {**x["EndpointConfiguration"], **x["ModelConfiguration"], **x["Metrics"]}
    for x in inference_recommender_job["InferenceRecommendations"]
]
df = pd.DataFrame(data)
df.drop("VariantName", inplace=True, axis=1)
pd.set_option("max_colwidth", 400)
df.head()

Conclusion

SageMaker offers multiple ways to use Hugging Face models; for more examples, check out the AWS Samples GitHub. Depending on the complexity of the use case and the need to fine-tune the model, you can select the optimal way to use these models. The Hugging Face pipelines can be a good starting point to quickly experiment and select suitable models. When you need to customize and parameterize the selected models, you can download the models and deploy them to customized inference endpoints. To fine-tune the model more for a specific use case, you’ll need to train the model after downloading it.

NLP models in general, including text summarization models, perform better after being trained on a dataset that is specific for the use case. The MLOPs and model monitoring features of SageMaker make sure that the deployed model continues to perform within expectations. In this post, we used Inference Recommender to evaluate the best suited instance type to deploy the text summarization model. These recommendations can optimize performance and cost for your ML use case.


About the Authors

Dr. Nidal AlBeiruti is a Senior Solutions Architect at Amazon Web Services, with a passion for machine learning solutions. Nidal has over 25 years of experience working in a variety of global IT roles at different levels and verticals. Nidal acts as a trusted advisor for many AWS customers to support and accelerate their cloud adoption journey.

Darren Ko is a Solutions Architect based in London. He advises UK and Ireland SMB customers on rearchitecting and innovating on the cloud. Darren is interested in applications built with serverless architectures and he is passionate about solving sustainability challenges with machine learning.

Read More

Take your intelligent search experience to the next level with Amazon Kendra hierarchical facets

Unstructured data continues to grow in many organizations, making it a challenge for users to get the information they need. Amazon Kendra is a highly accurate, intelligent search service powered by machine learning (ML). Amazon Kendra uses deep learning and reading comprehension to deliver precise answers, and returns a list of ranked documents that match the search query for you to choose from. To help users interactively narrow down the list of relevant documents, you can assign metadata at the time of document ingestion to provide filtering and faceting capabilities.

In a search solution with a growing number of documents, simple faceting or filtering isn’t always sufficient to enable users to really pinpoint documents with the information they’re looking for. Amazon Kendra now features hierarchical facets, with a more granular view of the scope of the search results. Hierarchical facets offer filtering options with more details about the number of results expected for each option, and allows users to further narrow their search, pinpointing their documents of interest quickly.

In this post, we demonstrate what hierarchical facets in Amazon Kendra can do. We first ingest a set of documents, along with their metadata, into an Amazon Kendra index. We then make search queries using both simple and hierarchical facets, and add filtering to get straight to the documents of interest.

Solution overview

Instead of presenting each facet individually as a list, hierarchical facets enable defining a parent-child relationship between facets to shape the scope of the search results. With this, you see the number of results that not only have a particular facet, but also have each of the sub-facets. Let’s take the example of a repository of AWS documents of types User_Guides, Reference_Guides and Release_Notes, regarding compute, storage, and database technologies.

First let’s look at non-hierarchical facets from the response to a search query:

Technology
  Databases:23
  Storage:22
  Compute:15
Document_Type
  User_Guides:37
  Reference_Guides:18
  Release_Notes:5

Here we know the number of search results in each of the technologies, as well as each of the document types. However, we don’t know, for example, how many results to expect from User_Guides related to Storage, except that it’s going to be less than 22, as the smaller of the number of results from User_Guides:37 and from Storage:22.

Now let’s look at hierarchical facets from the response to the same search query:

Technology
  Databases:23
    Document_Type
      User_Guides:12
      Reference_Guides:7
      Release_Notes:4
  Storage:22
    Document_Type
      User_Guides:16
      Reference_Guides:6
  Compute:15
    Document_Type
      User_Guides:9
      Reference_Guides:5
      Release_Notes:1

With hierarchical facets, we get more information in terms of the number results from each document type about each technology. With this additional information, we know that there are 16 results from User_Guides related to Storage.

In the subsequent sections, we use this example to demonstrate the use of hierarchical facets to narrow down search results along with step-by-step instructions you can follow to try this out in your own AWS account. If you just want to read about this feature without running it yourself, you can refer to the Python script facet-search-query.py used in this post, and its output output.txt, and then jump to the section Search and filtering using facets without hierarchy.

Prerequisites

To deploy and experiment with the solution in this post, make sure that you have the following:

Set up the infrastructure and run the Python script to query the Amazon Kendra index

To set up the solution, complete the following steps:

  1. Use the AWS Management Console for Amazon S3 to create an S3 bucket to use as a data source to store the sample documents.
  2. On the AWS Management Console, start CloudShell by choosing the shell icon on the navigation bar.
    Alternatively, you can run the Python script from any computer that has the AWS SDK for Python (Boto3) installed and an AWS account with access to the Amazon Kendra index. Make sure to update Boto3 on your computer. For simplicity, the step-by-step instructions in this post focus on CloudShell.
  3. After CloudShell starts, download facet-search-query.py to your local machine.
  4. Upload the script to your CloudShell by switching to the CloudShell tab, choosing the Actions menu, and choosing Upload file.
  5. Download hierarchical-facets-data.zip to your local machine, unzip it, and upload the entire directory structure to your S3 bucket.
  6. If you’re not using an existing Amazon Kendra index, create a new Amazon Kendra index.
  7. On the Amazon Kendra console, open your index.
  8. In the navigation pane, choose Facet definition.
  9. Choose Add field.
  10. Configure the field Document_Type and choose Add.
  11. Configure the field Technology and choose Add.
  12. Configure your S3 bucket as a data source to the Amazon Kendra index you just created.
  13. Sync the data source and wait for the sync to complete.
  14. Switch to the CloudShell tab.
  15. Update Boto3 by running pip3 install boto3=1.23.1 --upgrade.
    This ensures that CloudShell has a version of Boto3 that supports hierarchical facets.
  16. Edit facet-search-query.py and replace REPLACE-WITH-YOUR-AMAZON-KENDRA-INDEX-ID with your Amazon Kendra index ID.
    You can get the index ID by opening your index details on the Amazon Kendra console.
  17. In the CloudShell prompt, run facet-search-query.py using the command python3 facet-search-query.py | tee output.txt.

If this step is canceled with the error Unknown parameter in Facets[0]: “Facets”, must be one of: DocumentAttributeKey,
choose the Actions menu, and choose Delete AWS CloudShell home directory. Repeat the steps to download facet-search-query.py, update Boto3, edit facet-search-query.py, and run it again. If you have any other data in the CloudShell home directory, you should back it up before running this step.

For convenience, all the steps are included in one Python script. You can read facet-search-query.py and experiment by copying parts of this script and making your own scripts. Edit output.txt to observe the search results.

Search and filtering with facets without hierarchy

Let’s start by querying with facets having no hierarchy. In this case, the facets parameter used in the query only provides the information that the results in the response should be faceted using two attributes: Technology and Document_Type. See the following code:

fac0 = [
    { "DocumentAttributeKey":"Technology" },
    { "DocumentAttributeKey":"Document_Type" }
]

This is used as a parameter to the query API call:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac0)

The formatted version of the response is as follows:

Query:  How to encrypt data?
Number of results: 62
Document Title:  developerguide
Document Attributes:
  Document_Type: User_Guides
  Technology: Databases
Document Excerpt:
  4. Choose the option that you want for encryption at rest. Whichever
  option you choose, you can't   change it after the cluster is
  created. • To encrypt data at rest in this cluster, choose Enable
  encryption. • If you don't want to encrypt data at rest in this
  cluster, choose Disable encryption.
----------------------------------------------------------------------
Facets:
  Technology
    Databases:23
    Storage:22
    Compute:16
  Document_Type
    User_Guides:37
    Reference_Guides:19
    Release_Notes:5
======================================================================

The first result from the response is from a User_Guide about Databases. The facets below the result show the number of results for Technology and Document_Type present in the response.

Let’s narrow down these results to be only from User_Guides and Storage by setting the filter as follows:

att_filter0 = {
    "AndAllFilters": [
        {
            "EqualsTo":{
                "Key": "Technology",
                "Value": {
                    "StringValue": "Storage"
                }
            }
        },
        {
            "EqualsTo":{
                "Key": "Document_Type",
                "Value": {
                    "StringValue": "User_Guides"
                }
            }
        }
    ]
}

Now let’s make a query call using the facets without hierarchy and the preceding filter:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac0, AttributeFilter=att_filter0)

A formatted version of the response is as follows:

Query:  How to encrypt data?
Query Filter: Technology: Storage AND Document_Type: User_Guides
Number of results: 18
Document Title:  efs-ug
Document Attributes:
  Document_Type: User_Guides
  Technology: Storage
Document Excerpt:
  ,             "Action": [                 "kms:Describe*",
  "kms:Get*",                 "kms:List*",
  "kms:RevokeGrant"             ],             "Resource": "*"
  }     ] }   Encrypting data in transit You can encrypt data in
  transit using an Amazon EFS file sys
----------------------------------------------------------------------
Facets:
  Technology
    Storage:16
  Document_Type
    User_Guides:16

The response contains 16 results from User_Guides on Storage. Based on the non-hierarchical facets in the response without filters, we only knew to expect fewer than 22 results.

Search and filtering with hierarchical facets with Document_Type as a sub-facet of Technology

Now let’s run a query using hierarchical facets, with the relationship of Document_Type being a sub-facet of Technology. This hierarchical relationship is important for a Technology-focused user such as an engineer. Note the nested facets in the following definition. The MaxResults parameter is used to display only top MaxResults facets. For our example, there are only three facets for Technology and Document_Type, therefore this parameter isn’t particularly useful. When the number of facets is high, it makes sense to use this parameter.

fac1 = [{
    "DocumentAttributeKey":"Technology",
    "Facets":[{
        "DocumentAttributeKey":"Document_Type",
        "MaxResults": max_results
    }],
}]

The query API call is made as follows:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac1)

The formatted version of the response is as follows:

Document Attributes:
  Document_Type: User_Guides
  Technology: Databases
Document Excerpt:
  4. Choose the option that you want for encryption at rest. Whichever
  option you choose, you can't   change it after the cluster is
  created. • To encrypt data at rest in this cluster, choose Enable
  encryption. • If you don't want to encrypt data at rest in this
  cluster, choose Disable encryption.
----------------------------------------------------------------------
Facets:
  Technology
    Databases:23
      Document_Type
        User_Guides:12
        Reference_Guides:7
        Release_Notes:4
    Storage:22
      Document_Type
        User_Guides:16
        Reference_Guides:6
    Compute:16
      Document_Type
        User_Guides:9
        Reference_Guides:6
        Release_Notes:1
======================================================================

The results are classified as per the Technology facet followed by Document_Type. In this case, looking at the facets, we know that 16 results are from User_Guides about Storage and 7 are from Reference_Guides related to Databases.

Let’s narrow down these results to be only from Reference_Guides related to Databases using the following filter:

att_filter1 = {
    "AndAllFilters": [
        {
            "EqualsTo":{
                "Key": "Technology",
                "Value": {
                    "StringValue": "Databases"
                }
            }
        },
        {
            "EqualsTo":{
                "Key": "Document_Type",
                "Value": {
                    "StringValue": "Reference_Guides"
                }
            }
        }
    ]
}

Now let’s make a query API call using the hierarchical facets with this filter:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac1, AttributeFilter=att_filter1)

The formatted response to this is as follows:

Query:  How to encrypt data?
Query Filter: Technology: Databases AND Document_Type: Reference_Guides
Number of results: 7
Document Title:  redshift-api
Document Attributes:
  Document_Type: Reference_Guides
  Technology: Databases
Document Excerpt:
  ...Constraints: Maximum length of 2147483647.   Required: No
  KmsKeyId   The AWS Key Management Service (KMS) key ID of the
  encryption key that you want to use to encrypt data in the cluster.
  Type: String   Length Constraints: Maximum length of 2147483647.
  Required: No LoadSampleData   A flag...
----------------------------------------------------------------------
Facets:
  Technology
    Databases:7
      Document_Type
        Reference_Guides:7
======================================================================

From the facets of this response, there are seven results, all from Reference_Guides related to Databases, exactly as we knew before making the query.

Search and filtering with hierarchical facets with Technology as a sub-facet of Document_Type

You can choose the hierarchical relationship between different facets at the time of querying. Let’s define Technology as the sub-facet of Document_Type, as shown in the following code. This hierarchical relationship would be important for a Document_Type-focused user such as a technical writer.

fac2 = [{
    "DocumentAttributeKey":"Document_Type",
    "Facets":[{
        "DocumentAttributeKey":"Technology",
        "MaxResults": max_results
    }]
}]

The query API call is made as follows:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac2)

The formatted response to this is as follows:

Query:  How to encrypt data?
Number of results: 62
Document Title:  developerguide
Document Attributes:
  Document_Type: User_Guides
  Technology: Databases
Document Excerpt:
  4. Choose the option that you want for encryption at rest. Whichever
  option you choose, you can't   change it after the cluster is
  created. • To encrypt data at rest in this cluster, choose Enable
  encryption. • If you don't want to encrypt data at rest in this
  cluster, choose Disable encryption.
----------------------------------------------------------------------
Facets:
  Document_Type
    User_Guides:37
      Technology
        Storage:16
        Databases:12
        Compute:9
    Reference_Guides:19
      Technology
        Databases:7
        Compute:6
        Storage:6
    Release_Notes:5
      Technology
        Databases:4
        Compute:1
======================================================================

The results are classified as per their Document_Type followed by Technology. In other words, reversing the hierarchical relationship results in transposing the matrix of scope of results as shown by the preceding facets. Six results are from Reference_Guides related to Compute. Let’s define the filter as follows:

att_filter2 = {
    "AndAllFilters": [
        {
            "EqualsTo":{
                "Key": "Document_Type",
                "Value": {
                    "StringValue": "Reference_Guides"
                }
            }
        },
        {
            "EqualsTo":{
                "Key": "Technology",
                "Value": {
                    "StringValue": "Compute"
                }
            }
        }
    ]
}

We use this filter to make the query API call:

kclient.query(IndexId=indexid, QueryText=kquery, Facets=fac2, AttributeFilter=att_filter2)

The formatted response to this is as follows:

Query:  How to encrypt data?
Query Filter: Document_Type: Reference_Guides AND Technology:Compute
Number of results: 7
Document Title:  ecr-api
Document Attributes:
  Document_Type: Reference_Guides
  Technology: Compute
Document Excerpt:
  When you use AWS KMS to encrypt your data, you can either use the
  default AWS managed AWS KMS key for Amazon ECR, or specify your own
  AWS KMS key, which you already created. For more information, see
  Protecting data using server-side encryption with an AWS KMS key
  stored in AWS Key Management Service
----------------------------------------------------------------------
Facets:
  Document_Type
    Reference_Guides:6
      Technology
        Compute:6
======================================================================

The results contain six Reference_Guides related to Compute, exactly as we knew before running the query.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Amazon S3, delete that data source. If you created an Amazon S3 bucket to store the data used, delete that as well.

Conclusion

You can use Amazon Kendra hierarchical facets to define a hierarchical relationship between attributes to provide granular information about the scope of the results in the response to a query. This enables you to make an informed filtering choice to narrow down the search results and find the documents you’re looking for quickly.

To learn more about facets and filters in Amazon Kendra, refer to the Filtering queries.

For more information on how you can automatically create, modify, or delete metadata, which you can use for faceting the search results, refer to Customizing document metadata during the ingestion process and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the Authors

Abhinav JawadekarAbhinav Jawadekar is a Principal Solutions Architect focused on Amazon Kendra in the AI/ML language services team at AWS. Abhinav works with AWS customers and partners to help them build intelligent search solutions on AWS.

Ji Kim is a Software Development Engineer at Amazon Web Services and is a member of the Amazon Kendra team.

Read More

Easily customize your notifications while using Amazon Lookout for Metrics

We are excited to announce that you can now add filters to alerts and also edit existing alerts while using Amazon Lookout for Metrics. With this launch, you can add filters to your alerts configuration to only get notifications for anomalies that matter the most to you. You can also modify existing alerts as per your needs for notification as anomalies evolve.

Lookout for Metrics uses machine learning (ML) to automatically monitor the metrics that are most important to businesses with greater speed and accuracy. The service also makes it easier to diagnose the root cause of anomalies like unexpected dips in revenue, high rates of abandoned shopping carts, spikes in payment transaction failures, increases in new user signups, and many more. Lookout for Metrics goes beyond simple anomaly detection. It allows developers to set up autonomous monitoring for important metrics to detect anomalies and identify their root cause in a matter of few clicks, using the same technology used by Amazon internally to detect anomalies in its metrics—all with no ML experience required.

Alert is an optional feature that allows you to set up notifications on anomalies in the datasets, which are sent through Amazon Simple Notification Service (Amazon SNS) and AWS Lambda functions. Previously, when you set up an alert, you were notified on all detected anomalies above the severity score you selected, which made it challenging to quickly identify the most relevant anomalies to your business. Now, by implementing filters and edits in the alert system, different business units within your organization are able to specify the types of alerts they receive. Your developers can benefit from this feature by being able to receive alerts on anomalies that are related to the development of their service, while your business analysts and business managers can track anomalies related to the status of their business, such as a location that is underperforming. For example, you may set up an alert to get notified when there is a spike or drop in your revenue. But you may only be interested in a specific store location and in a particular product. The filtering capability allows you to get alerted only when a revenue anomaly fits the criteria you have set.

Solution overview

In this post, we demonstrate how to create Alert with filters and how the configured filters publish alerts only for anomalies matching the filter criteria. The alert filters are based on metrics and dimensions that are present in the dataset definition for the anomaly detector. The solution enables you to use alert filters to get targeted notifications for anomalies detected in your data. The following diagram illustrates the solution architecture.

Provision resources with AWS CloudFormation

You can use the provided AWS CloudFormation stack to set up resources for the walkthrough. It contains resources to continuously generate live data and publish them to Amazon S3, create a detector (named TestAlertFilters) and add a dataset (named AlertFiltersDataset) to the detector. Complete the following steps:

  1. Choose Launch Stack:
  2. Choose Next.
  3. Enter a stack name (for example, L4MAlertFiltersStack).
  4. Enter the values for the detector (TestAlertFilters) and dataset (AlertFiltersDataset).
  5. Choose Next.
  6. Leave the settings for Configure stack options at their defaults and choose Next.
  7. Select the acknowledgement check box and choose Create stack.

Activate the detector created by CFN template

To set up your detector, complete the following steps:

  1. On the Lookout for Metrics console, choose Detectors in the navigation pane.
  2. Select the detector TestAlertFilters and choose View details.
  3. To activate the detector, you can either choose Activate at the top or choose Activate detector under How it works.
  4. Choose Activate to confirm if you want to activate the detector for continuous detection.

A confirmation message shows that the detector is activating. Activation can take up to 1 hour to complete. In the meantime, we can proceed with alert configuration.

Configure your alert

We now configure an alert to get notifications for anomalies detected by the detector. Alert filters are optional configurations, and you can select up to 5 measures and 5 dimensions while adding filters. In this post, we walk through creating an alert with filters. Complete the following steps:

  1. On your detector details page, choose Add alerts.
  2. Confirm your alert name.
    Lookout for Metrics populates the configuration fields with the metrics and dimensions supplied during dataset creation.In this release, the Severity score field is optional, which previously was a required field. By default, we start with severity score of 70, which you can change or remove.
  3. To add a measure, choose Add criteria and choose Measure.
  4. For Measure EQUALS, choose the revenue measure.
  5. Choose Add criteria again and choose Dimension.

    You can choose up to 5 dimension filters. For this post, we configure two.
  6. For Dimension, choose the marketplace dimension.
  7. For Equals, add the values US and CA.
  8. Add category as your second dimension with the values fashion and jewellery.
  9. For Severity score, enter 20.
  10. For Channel, choose Amazon SNS.
  11. Choose your SNS topic (for this post, we use the SNS topic to which we already subscribed our email to receive the alert notifications).
  12. Choose your format (for this post, we choose Long Text).
  13. Under Service access, select Use an existing service role and choose your role.
  14. Choose Add alert.

    A message appears when the alert is created successfully.
  15. Select the alert and choose View details.

You can review the alert filters and other details. The Filter criteria explains how the configured filters are used to filter anomalies before publishing alert notifications.

If you want to modify the alert configuration, select the alert on the Alerts page and choose Edit.

Alternatively, you can open the alert details page and choose Edit.

You’re redirected to the Edit page, where you can modify the alert configuration as required. You can modify the same configurations you set when you created the alert, but you can’t change the alert name while editing.

Review and analyze the results

When Lookout for Metrics detects anomalies in your data, it sends a notification if alerts were configured on that detector. If the anomaly group details match the filter criteria (measure filter, dimension filter, and severity score) of the alert, a notification is published.

For this example, we created two alerts on the detector, testAlertWithNoFilters and testRevenueForFashionOrJewelleryInUSOrCA, and injected anomalies in our data. We also enabled email subscription on the SNS topic used for alert notification publishing. The following screenshots show the details for each alert.

The following is an example of an anomaly notification for testRevenueForFashionOrJewelleryInUSOrCA:

{
"Type" : "Notification",
 "MessageId" : "0b0a7bfe-d029-5f4f-b706-20f644793c3d",
 "TopicArn" : "arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic",
 "Message" : "[Amazon LookoutForMetrics] The anomaly detector TestAlertFilters detected 
             an anomaly in revenue with a severity score of 77.3 on May 25, 2022 at 8:05 PM.
             nAnomalous graphs were detected for the following:n
             nrevenue for: jewellery, thirdParty, CA, regular, priorityn
             nrevenue for: electronics, self, MX, premium, overnightn
             nrevenue for: electronics, self, US, regular, overnightn
             nTo view the anomaly, visit the Lookout for Metrics console at: 
             https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/anomalies/anomaly/bd0a07e1-c520-46bd-aaa3-dcc00583d707
             nTo modify settings for this alert: https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/alerts/alertDetails/arn:aws:lookoutmetrics:us-west-2:488415817882:Alert:testRevenueForFashionOrJewelleryInUSOrCA",
 "Timestamp" : "2022-05-25T20:31:12.330Z",
 "SignatureVersion" : "1",
 "Signature" : "pFDZj3TwLrL9rqjkRiVgbWjcrPhxz5PDV485d6NroLXWhrviX7sUEQqOIL5j8YYd0SFBjFEkrZKZ27RSbd+33sRhJ52mmd1eR23cZQP68+iIVdpeWubcPgGnqxoOa3APE1WZr4SmVK/bgJAjX1RXn0rKZvPzwDkxPD2fZB4gnbqPJ8GBw/1dxU5qfJzRpkqc87d1gpvQIwMpb5uUROuPZEQVyaR/By0BTsflkE2Sz2mOeZQkMaXz3q9dwX/qDxyR9q6gNviMagGtOLwtb6StN8/PUYlvK9fCBcJnJxg0bdmMtnXiXWdl1O7J50Wqj4Tkl8amph97UlVAnComoe649g==",
 "SigningCertURL" : "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-7ff5318490ec183fbaddaa2a969abfda.pem",
 "UnsubscribeURL" : "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic:8f24ae74-b160-44c7-8bc9-96a30e27d365"
}

The following is an example of an anomaly notification for testAlertWithNoFilters:

{
 "Type" : "Notification",
 "MessageId" : "fcc70263-f2c1-52ed-81ec-596b8c399b67",
 "TopicArn" : "arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic",
 "Message" : "[Amazon LookoutForMetrics] The anomaly detector TestAlertFilters detected 
             an anomaly in revenue with a severity score of 77.59 on May 25, 2022 at 6:35 PM.
             nAnomalous graphs were detected for the following:n
             nrevenue for: jewellery, self, UK, regular, overnightn
             nrevenue for: jewellery, thirdParty, JP, premium, overnightn
             nrevenue for: electronics, thirdParty, DE, premium, priorityn
             nTo view the anomaly, visit the Lookout for Metrics console at: 
             https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/anomalies/anomaly/194c87f4-3312-420c-8920-12fbfc9b1700
             nTo modify settings for this alert: https://us-west-2.console.aws.amazon.com/lookoutmetrics/home?region=us-west-2#arn:aws:lookoutmetrics:us-west-2:488415817882:AnomalyDetector:TestAlertFilters/alerts/alertDetails/arn:aws:lookoutmetrics:us-west-2:488415817882:Alert:testAlertWithNoFilters",
 "Timestamp" : "2022-05-25T19:00:08.374Z",
 "SignatureVersion" : "1",
 "Signature" : "e4+BHo4eh8wNbfQMaR3L8MWY2wkpqxoxKKrj2h/QROQHvhcnYfucYchjfppgjM8LNIF7Oo4QfuP6qcLj9DlghiMZ80qpzHyAH6vmIDfSjK7Bz23i8rnIMyKJIVRFN8z69YlC9vfsp3MayWyyMJcskeVJ1bzsdkDIeA5gkT1le8yh/9nhbsgwm+bowNjsnl+/sFwk6QZJlplYB27sOqegrm73nH/CrmTe4FcPtekCRysSECwMLKazPJqR1uiGagnWfUeyTptRg9rVQVQJJdmOUwlv8vodR96s52btAegpY4iZZLUJ87vs1PwOwVfTTIHf+pdnwPUuFupzejUEudP7sQ==",
 "SigningCertURL" : "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-7ff5318490ec183fbaddaa2a969abfda.pem",
 "UnsubscribeURL" : "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:us-west-2:488415817882:filterAlertsDemoTopic:8f24ae74-b160-44c7-8bc9-96a30e27d365"
}

We didn’t receive the notification for this anomaly through the testRevenueForFashionOrJewelleryInUSOrCA alert because the anomaly group details don’t match the filter criteria for dimension marketplace. For our filter criteria on the measure revenue, the dimension marketplace must equal US or CA, and the dimension category must equal fashion or jewellery, with a severity threshold of 20.

Although the anomaly detected matches the filter criteria for the measure, severity score, and category dimension, it doesn’t match the criteria for the marketplace dimension, so the alert wasn’t published.

Based on the notifications we received, we can confirm that Lookout for Metrics detected anomalies and verified the alert filter-based notifications.

Clean up

After you complete the testing, you can delete the CloudFormation stack created by the template. Deletion of stack the cleans up all the resources created for the purpose of this test. To delete the stack, open the AWS CloudFormation console, select the stack L4MAlertFiltersStack, and choose Delete.

Deletion of the stack doesn’t delete the S3 bucket created by the template because it’s not empty; you have to delete it manually.

Conclusion

You can now easily customize your notification experience by adding filters and editing existing alerts to reduce noise and focus on the metrics that matter the most to your business.

To learn more about this capability, see Working with Alerts. You can use this capability in all Regions where Lookout for Metrics is publicly available. For more information about Region availability, see AWS Regional Services.


About the Authors

Alex Kim is a Sr. Product Manager for AWS AI Services. His mission is to deliver AI/ML solutions to all customers who can benefit from it. In his free time, he enjoys all types of sports and discovering new places to eat.

Utkarsh Dubey is a Software Development Engineer in the Lookout for Metrics team. His interests lie in building scalable distributed systems. In his spare time, he enjoys traveling and catching up with friends.

Read More

Use a pre-signed URL to provide your business analysts with secure access to Amazon SageMaker Canvas

Agility and security have historically been two aspects of IT of paramount importance for any company. With the simplification of access to advanced IT technologies thanks to low-code and no-code (LCNC) tools, an even bigger number of people must be enabled to access resources, without impacting security. For many companies, the solution has been to develop a company web portal, which simplifies access to cloud applications and resources, by redirecting to or embedding applications, so that employees can have a single point of access to the services they use most.

In this post, we suggest an architecture for a company with an existing web portal to generate a pre-signed URL redirecting to Amazon SageMaker Canvas, a visual point-and-click interface for business analysts to build machine learning (ML) models and generate accurate predictions without writing code or having any previous ML experience, without having to log in via the AWS Management Console.

Solution overview

The solution architecture is composed of three main parts:

  • The company web portal, with its own system for authentication of users and other resources.
  • An AWS Lambda function, responsible for calling the Amazon SageMaker SDK. This function is directly called via its function URL, a simple way to assign an HTTP(S) endpoint to the Lambda function directly, without the need for a REST API.
  • The Canvas app.

The following diagram illustrates the solution workflow.

The flow has four steps:

  1. The business analyst accesses the company portal, (optionally) authenticates, then chooses to generate a Canvas URL.
  2. The Lambda function receives information about the user from the company portal, and uses it to call SageMaker via an AWS SDK to generate a presigned Canvas URL. For this post, we use the AWS SDK for Python (Boto3).
  3. The generated URL is sent back to the business analyst through the company portal.
  4. The business analyst can then choose that link to access Canvas directly, without having to access the console.

Prerequisites

Before you implement the solution architecture, make sure that you have correctly onboarded to an Amazon SageMaker Studio domain using AWS Identity and Access Management (IAM). For instructions, refer to Onboard to Amazon SageMaker Domain Using IAM. IAM as method of authentication is a strict requirement, because the API CreatePresignedDomainURL requires the IAM authentication method, and it won’t work with AWS Single Sign-On authentication for your domain. Also, make sure you have created at least one user profile for your Studio domain.

Deploy the solution

The first step is to create the Lambda function.

  1. On the Lambda console, choose Create function.
  2. For Name, enter a name (for this post, canvas-presignedURL).
  3. For Runtime, choose Python 3.9.
  4. For Architecture, select your preferred architecture (for this post, we select arm64).
  5. Under Permissions, expand Change default execution role.
  6. Select Create a new role with basic Lambda permissions.
    We change the Lambda permissions in a later step.
  7. Under Advanced settings, select Enable function URL.
  8. For Auth type, select NONE.
    For this post, we don’t provide authentication details to our requests. However, this isn’t a best practice and it’s not advised for production workloads. We suggest using IAM authentication for your Lambda function, or another method for authentication and authorization such as Amazon Cognito.
  9. If your domain runs in a VPC, select Enable VPC to access those private resources.
  10. Choose Create function.
    Function creation takes a few seconds to complete. You can now set up the permissions to run SageMaker calls.
  11. On the Configuration tab, choose Permissions in the left pane.
  12. Choose your role name.

    You’re redirected to the IAM console.
  13. Choose Add permissions.
  14. Choose Create inline policy.
  15. For Service, choose SageMaker.
  16. For Actions, choose CreatePresignedDomainUrl.
  17. For Resources, select Any in this account.
  18. Choose Review.
  19. Enter a name for the policy (for this post, CanvasPresignedURLsFromLambda).
  20. Choose Create policy.
    The policy is now created and assigned to the role. You can close the IAM console tab and return to the Lambda console.Now it’s time to change our code base to run a call to SageMaker. We use the Boto3 call create_presigned_domain_url.
  21. On the Code tab, replace the code inside the lambda_function.py file with the following:
    import json
    import boto3
    
    sagemaker = boto3.client('sagemaker')
    SESSION_EXPIRATION_IN_SECONDS = 8*60*60 # the session will be valid for 8 hours
    URL_TIME_TO_LIVE_IN_SECONDS = 60 # the URL is only valid for 60 seconds
    
    def lambda_handler(event, context):
        
        # Parse the event body
        body = json.loads(event['body'])
        
        # Pass the domain ID and user profile name as part of the request
        domain_id = body['domain_id']
        user_profile_name = body['user_profile_name']
        
        # Call the service to create the URL
        response = sagemaker.create_presigned_domain_url(
            DomainId=domain_id,
            UserProfileName=user_profile_name,
            SessionExpirationDurationInSeconds=SESSION_EXPIRATION_IN_SECONDS,
            ExpiresInSeconds=URL_TIME_TO_LIVE_IN_SECONDS
        )
        studio_url = response['AuthorizedUrl']
        
        # Add the redirect to Canvas
        canvas_url = studio_url + '&redirect=Canvas'
        
        # Return to the app
        return {
            'statusCode': 200,
            'body': json.dumps(canvas_url)
        }

    The preceding code consists of three main steps:

    • Parsing the body of the request and retrieving the Studio domain ID and user profile name
    • Calling the API with this information
    • Adding the redirection to Canvas and returning the result

    Now that the function is ready, let’s test it.

  22. Choose Deploy, then choose Test.
  23. In the test event configuration, provide the following event JSON, substituting the correct values:
    {
      "body": "{"domain_id": "<YOUR-DOMAIN-ID>","user_profile_name": "<YOUR-USER-PROFILE>"}"
    }

  24. Save the test event and choose Test again.

Your result should now be available in the body of your response.

You can now test this with your HTTP request tool of choice, such as CURL or Postman, to integrate into your existing company web portal. Below, a screenshot of a Postman POST request to the AWS Lambda function URL created in the previous steps, and the response payload containing the pre-signed URL.

The following screenshot shows an example of a (simplified) company web portal that, upon login, generates a pre-signed URL to access Amazon SageMaker Canvas.

Conclusion

In this post, we discussed a solution to help business analysts experience no-code ML via Canvas in a secured and unified way through their company web portal, without the need to allow access via the console. We used a Lambda function to generate a presigned URL, which the business analyst can use directly in their browser.

To make this solution production-ready, we suggest considering how to implement authentication and authorization, either via IAM authentication of Lambda functions with function URLs, or more advanced solutions based on Amazon API Gateway, such as API Gateway Lambda authorizers. For more information, refer to Security and auth model for Lambda function URLs.

If you haven’t built yet your company web portal, you might want to check out AWS Amplify Studio, a visual development environment that lets developers easily build and ship complete web and mobile apps in hours instead of weeks. With Amplify Studio, you can quickly build an app backend, create rich user interface (UI) components, and connect a UI to the backend with minimal coding.

To learn more about Canvas, check out Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts.


About the Author

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Read More

Enable business analysts to access Amazon SageMaker Canvas without using the AWS Management Console with AWS SSO

IT has evolved in recent years: thanks to low-code and no-code (LCNC) technologies, an increasing number of people with varying backgrounds require access to tools and platforms that were previously a prerogative to more tech-savvy individuals in the company, such as engineers or developers.

Out of those LCNC technologies, we have recently announced Amazon SageMaker Canvas, a visual point-and-click interface for business analysts to build machine learning (ML) models and generate accurate predictions without writing code or having any previous ML experience.

To enable agility for those new users while ensuring security of the environments, many companies have chosen to adopt single sign-on technology, such as AWS Single Sign-On. AWS SSO is a cloud-based single sign-on service that makes it easy to centrally manage SSO access to all your AWS accounts and cloud applications. It includes a user portal where end-users can find and access all their assigned AWS accounts and cloud applications in one place, including custom applications that support Security Assertion Markup Language (SAML) 2.0.

In this post, we walk you through the necessary steps to configure Canvas as a custom SAML 2.0 application in AWS SSO, so that your business analysts can seamlessly access Canvas with their credentials from AWS SSO or other existing identity providers (IdPs), without the need to do so via the AWS Management Console.

Solution overview

To establish a connection from AWS SSO to the Amazon SageMaker Studio domain app, you must complete the following steps:

  1. Create a user profile in Studio for every AWS SSO user that should access Canvas.
  2. Create a custom SAML 2.0 application in AWS SSO and assign it to the users.
  3. Create the necessary AWS Identity and Access Management (IAM) SAML provider and AWS SSO role.
  4. Map the necessary information from AWS SSO to the SageMaker domain via attribute mappings.
  5. Access the Canvas application from AWS SSO.

Prerequisites

To connect Canvas to AWS SSO, you must have the following prerequisites set up:

Create a Studio domain user profile

In a Studio domain, every user has their own user profile. Studio apps like Studio IDE, RStudio, and Canvas can be created by these user profiles, and are bound to the user profile that has created them.

For AWS SSO to access the Canvas app for a given user profile, you have to map the user profile name to the user name in AWS SSO. This way, the AWS SSO user name—and therefore the user profile name—can be passed automatically by AWS SSO to Canvas.

In this post, we assume that AWS SSO users are already available, created during the prerequisites of onboarding to AWS SSO. You need a user profile for each AWS SSO user that you want to onboard to your Studio domain and therefore to Canvas.

To retrieve this information, navigate to the Users page on the AWS SSO console. Here you can see the user name of your user, in our case davide-gallitelli.

With this information, you can now go to your Studio domain and create a new user profile called exactly davide-gallitelli.

If you have another IdP, you can use any information provided by it to name your user profile, as long as it’s unique for your domain. Just make sure you map it correctly according to AWS SSO attribute mapping.

Create the custom SAML 2.0 application in AWS SSO

The next step is to create a custom SAML 2.0 application in AWS SSO.

  1. On the AWS SSO console, choose Applications in the navigation pane.
  2. Choose Add a new application.
  3. Choose Add a custom SAML 2.0 application.
  4. Download the AWS SSO SAML metadata file, which you use during IAM configuration.
  5. For Display name, enter a name, such as SageMaker Canvas followed by your Region.
  6. For Description, enter an optional description.
  7. For Application start URL, leave as is.
  8. For Relay state, enter https://YOUR-REGION.console.aws.amazon.com/sagemaker/home?region=YOUR-REGION#/studio/canvas/open/YOUR-STUDIO-DOMAIN-ID.
  9. For Session duration, choose your session duration. We suggest 8 hours.
    The Session duration value represents the amount of time you want the user session to last before authentication is required again. One hour is the most secure, whereas more time means less need for interaction. We choose 8 hours in this case, equivalent to one work day.
  10. For Application ACS URL, enter https://signin.aws.amazon.com/saml.
  11. For Application SAML audience, enter urn:amazon:webservices.
    After your settings are saved, your application configuration should look similar to the following screenshot.
    You can now assign your users to this application, so that the application appears in their AWS SSO portal after login.
  12. On the Assigned users tab, choose Assign users.
  13. Choose your users.

Optionally, if you want to enable a lot of data scientists and business analysts in your company to use Canvas, the fastest and easiest way is to use AWS SSO groups. To do so, we create two AWS SSO groups: business-analysts and data-scientists. We assign the users to these groups according to their roles, and then give access to the application to both groups.

Configure your IAM SAML provider and AWS SSO role

To configure your IAM SAML provider, complete the following steps:

  1. On the IAM console, choose Identity providers in the navigation pane.
  2. Choose Add provider.
  3. For Provider type, select SAML.
  4. For Provider name, enter a name, such as AWS_SSO_Canvas.
  5. Upload the metadata document you downloaded earlier.
  6. Note the ARN to use in a later step.

    We also need to create a new role for AWS SSO to use to access the application.
  7. On the IAM console, choose Roles in the navigation pane.
  8. Choose Create role.
  9. For Trusted entity type, select SAML 2.0 federation.
  10. For SAML 2.0-based provider, choose the provider you created (AWS_SSO_Canvas).
  11. Don’t select either of the two SAML 2.0 access methods.
  12. For Attribute, choose SAML:sub_type.
  13. For Value, enter persistent.
  14. Choose Next.

    We need to give AWS SSO the permission to create a Studio domain presigned URL, which we need to perform the redirect to Canvas.
  15. On the Permissions policies page, choose Create policy.
  16. On the Create policy tab, choose JSON and enter the following code:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "sagemaker:CreatePresignedDomainUrlWithPrincipalTag",
                    "sagemaker:CreatePresignedDomainUrl"
                ],
                "Resource": "*"
            }
        ]
    }

  17. Choose Next:Tags and provide tags if needed.
  18. Choose Next:Review.
  19. Name the policy, for example CanvasSSOPresignedURL.
  20. Choose Create policy.
  21. Return to the Add permissions page and search for the policy you created.
  22. Select the policy, then choose Next.
  23. Name the role, for example AWS_SSO_Canvas_Role, and provide an optional description.
  24. On the review page, edit the trust policy to match the following code:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "<ARN OF THE SAML PROVIDER FROM IAM>"
                },
                "Action": [
                    "sts:AssumeRoleWithSAML",
                    "sts:SetSourceIdentity",
                    "sts:TagSession"
                ],
                "Condition": {
                    "StringEquals": {
                        "SAML:sub_type": "persistent",
                        "SAML:aud": "https://signin.aws.amazon.com/saml"
                    }
                }
            }
        ]
    }

  25. Save the changes, then choose Create role.
  26. Note the ARN of this role as well, to use in the following section.

Configure the attribute mappings in AWS SSO

The final step is to configure the attribute mappings. The attributes you map here become part of the SAML assertion that is sent to the application. You can choose which user attributes in your application map to corresponding user attributes in your connected directory. For more information, refer to Attribute mappings.

  1. On the AWS SSO console, navigate to the application you created.
  2. On the Attribute mappings tab, configure the following mappings:
User attribute in the application Maps to this string value or user attribute in AWS SSO
Subject ${user:email}
https://aws.amazon.com/SAML/Attributes/RoleSessionName ${user:email}
https://aws.amazon.com/SAML/Attributes/PrincipalTag:SageMakerStudioUserProfileName ${user:subject}
https://aws.amazon.com/SAML/Attributes/Role <ARN OF THE SAML PROVIDER FROM IAM>, <ARN OF THE CANVAS SSO ROLE FROM IAM>
  1. Choose Save changes.

You’re done!

Access the Canvas application from AWS SSO

On the AWS SSO console, note down the user portal URL. We suggest you log out of your AWS account first, or open an incognito browser window. Navigate to the user portal URL, log in with the credentials you set for the AWS SSO user, then choose your Canvas application.

You’re automatically redirected to the Canvas application.

Conclusion

In this post, we discussed a solution to enable business analysts to experience no-code ML via Canvas in a secured and unified way through a single sign-on portal. To do this, we configured Canvas as a custom SAML 2.0 application within AWS SSO. Business analysts are now one click away from using Canvas and solving new challenges with no-code ML. This enables the security needed by cloud engineering and security teams, while allowing for the agility and independence of business analysts teams. A similar process can be replicated in any IdP by reproducing these steps and adapting them to the specific SSO.

To learn more about Canvas, check out Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts. Canvas also enables easy collaboration with data science teams. To learn more, see Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas. For IT administrators, we suggest checking out Setting up and managing Amazon SageMaker Canvas (for IT administrators).


About the Author

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Read More

Create, train, and deploy a billion-parameter language model on terabytes of data with TensorFlow and Amazon SageMaker

The increasing size of language models has been one of the biggest trends in natural language processing (NLP) in recent years. Since 2018, we’ve seen unprecedented development and deployment of ever-larger language models, including BERT and its variants, GPT-2, T-NLG, and GPT-3 (175 billion parameters).

These models have pushed the boundaries of possible architectural innovations. We face several challenges when training large-scale deep learning models, especially the new wave of generative pre-trained transformers. These challenges include hardware limitations and trade-offs with computation and efficiency. To overcome these challenges of model and data parallelism, AWS offers a wide range of capabilities.

In this post, we introduce two main approaches: data parallelization and model parallelization using Amazon SageMaker, and discuss their pros and cons.

The model

For the language model, we use Transformers, introduced in the paper Attention Is All You Need. Transformers are deep learning models designed to deliberately avoid the pitfalls of RNNs by relying on a self-attention mechanism to draw global dependencies between input and output. The Transformer model architecture allows for significantly better parallelization and can achieve high performance in relatively short training time. Built on the success of Transformers, BERT, introduced in the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, added bidirectional pre-training for language representation. Inspired by the Cloze task, BERT is pre-trained with masked language modeling (MLM), in which the model learns to recover the original words for randomly masked tokens. The BERT model is also pretrained on the next sentence prediction (NSP) task to predict if two sentences are in correct reading order. Since its advent in 2018, BERT and its variations have been widely used in language models.

We begin by creating two embedding layers for token and positional embedding. The input embeddings are the sum of the token embeddings and position embeddings.

class TokenAndPositionEmbedding(tf.keras.layers.Layer):
    """
    Creates two separate embedding layers: one for tokens and one for token index (positions).
    """
    def __init__(self, maxlen, vocab_size, embed_dim):
        super(TokenAndPositionEmbedding, self).__init__()
        self.token_emb = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embed_dim)
        self.pos_emb = tf.keras.layers.Embedding(input_dim=maxlen, output_dim=embed_dim)

    def call(self, x):
        maxlen = tf.shape(x)[-1]

        # positions are represented by a token's index
        positions = tf.range(start=0, limit=maxlen, delta=1)
        positions = self.pos_emb(positions)

        # token embedding
        x = self.token_emb(x)

        # return sum as input 
        return x + positions

Then we define a transformer decoder block with two sub-layers: a multi-head self-attention layer, and a simple fully connected feed-forward network followed by layer normalization and dropout:

class TransformerBlock(tf.keras.layers.Layer):
    def __init__(self, embed_dim, num_heads, ff_dim, rate=0.1):
        # self attention layer
        super(TransformerBlock, self).__init__()
        self.att = tf.keras.layers.MultiHeadAttention(
            num_heads=num_heads, key_dim=embed_dim)
        
        # feed forward layer
        self.ffn = [tf.keras.layers.Dense(ff_dim, activation="relu"), tf.keras.layers.Dense(embed_dim)]

        # layer normalization 
        self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)
        self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)

        # dropout 
        self.dropout1 = tf.keras.layers.Dropout(rate)
        self.dropout2 = tf.keras.layers.Dropout(rate)

    def call(self, inputs):
        # getting batch size and seq len from input shape
        input_shape = tf.shape(inputs)
        batch_size = input_shape[0]
        seq_len = input_shape[1]

        # decoder casual mask
        casual_mask = casual_attention_mask(batch_size, seq_len, seq_len, tf.bool)

        # self attention forward pass
        attention_output = self.att(inputs, inputs, attention_mask=causal_mask)

        # dense layers, dropout and normalization
        attention_output = self.dropout1(attention_output)
        ffn_output = self.ffn[0](out1)
        ffn_output = self.ffn[1](ffn_output)
        out2 = self.dropout2(ffn_output)
        
        return self.layernorm2(out1 + out2)

Finally, we create our language model with the preceding embedding layer and transformer blocks:

class MyModel(tf.keras.Model):
    def __init__(self, maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate):
        super(MyModel, self).__init__(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate)

        # embedding layer
        self.embedding_layer = TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)

        # transformer blocks
        self.transformer_blocks = [
            TransformerBlock(embed_dim, num_heads, feed_forward_dim)
            for i in range(num_layers)
        ]

        # last dense layer
        self.dense = tf.keras.layers.Dense(vocab_size)
        
    def call(self, inputs, training=None):
        x_emb = self.embedding_layer(inputs)
        x = x_emb        
        for transformer_block in self.transformer_blocks:
            x = transformer_block(x)
        outputs = self.dense(x)
        return [outputs, x_emb]


def init_train_settings(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate):
    """
    Creates model, optimizer and loss function 
    """
    model = MyModel(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate) 
    loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
    return model, optimizer, loss_fn

Depending on your hyperparameters, you can scale this model from thousands of parameters to billions of parameters. The primary challenge with billion-parameter models is that you can’t host the model in one instance and need to distribute the model over several nodes for training and inference.

The dataset

In our experiments, we used the Pile dataset. The Pile is an 800 GiB English text dataset designed for training large-scale language models. It is created from 22 diverse and high-quality datasets, including both established NLP datasets and newly introduced ones.

The dataset is created from a variety of data sources, including books; GitHub repositories; webpages; chat logs; and medical, physics, math, computer science, and philosophy papers. Specifically, it uses the following sources: Pile-CC, PubMed Central, ArXiv, GitHub, the FreeLaw Project, Stack Exchange, the US Patent and Trademark Office, PubMed, Ubuntu, IRC, HackerNews, YouTube, PhilPapers, Books3, Project Gutenberg (PG-19), OpenSubtitles, English Wikipedia, DM Mathematics, EuroParl, the Enron Emails corpus, and NIH ExPorter. It also includes OpenWebText2 and BookCorpus2, which are extensions of the original OpenWebText and BookCorpus datasets, respectively. The diversity in data sources can improve the general cross-domain knowledge and consequently improve downstream generalization capabilities.

The primary challenge with this dataset is the sheer size; the dataset has 825 GiB of text, which translates into 4.2 TiB of preprocessed and compressed datapoints. Similar to the challenges we face with training and hosting the models, training a model with this dataset on a single instance will take a lot of time and isn’t practical.

Our solution is to break down the dataset into approximately 1 GiB chunks of data, load and preprocess the features in TensorFlow Dataset objects, and store them in Amazon Elastic File Service (Amazon EFS). TensorFlow datasets provide an easy-to-use and high-performance data pipeline that integrates well with our models. Amazon EFS is an easy-to-use service that enables us to build a shared file system that scales automatically as files are added and deleted. In addition, Amazon EFS is capable of bursting to higher throughput levels when needed, which is critical in our data and model training pipeline.

Next, we look into distributed training strategies to tackle these challenges.

Distributed training

In this project, we faced two challenges: scaling model size and data volume. Increasing the model size and number of trainable parameters may result in better accuracy, but there’s a limit to the model you can fit into a single GPU memory or even multiple GPUs in a single instance. In addition, bigger model sizes take more time to train.

You can tackle these challenges two different ways: data parallelism and model parallelism. With data parallelism, we perform Stochastic Gradient Descent (SGD) by distributing the records of a mini-batch over different devices to speed up the training. However, parallel data training comes with extra complexity of computing mini-batch gradient average with gradients from all devices, a step called AllReduce, which becomes harder as the training cluster is grown. While using data parallelism, we must be able to fit the model and a single datapoint in a device (CPU or GPU), which is a limiting factor in our experiments because the size of such a large model is much larger than the single GPU’s memory size.

Another solution is to use model parallelism, which splits the model over multiple devices. Model parallelism is the process of splitting a model up between multiple devices or nodes (such as GPU-equipped instances) and creating an efficient pipeline to train the model across these devices to maximize GPU utilization.

Data parallelization

Parallelizing the data is the most common approach to multiple GPUs or distributed training. You can batch your data, send it to multiple devices (each hosting a replicated model), then aggregate the results. We experimented with two packages for data parallelization: Horovod and the SageMaker distributed data parallel library.

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. To use Horovod, we went through the following process:

  1. Initialize by running hvd.init().
  2. Associate each device with a single process. The first process or worker is associated with the first device, the second process is associated with the second device, and so on.
  3. Adjust the learning rate based on the number of devices.
  4. Wrap the optimizer in hvd.DistributedOptimizer.
  5. Broadcast the initial variable states from the first worker with rank 0 to all other processes. This is necessary to ensure consistent initialization of all workers when training is started with random weights or restored from a checkpoint.
  6. Make sure that only device 0 can save checkpoints to prevent other workers from corrupting them.

The following is the training script:

import horovod.tensorflow as hvd
# Initialize Horovod
hvd.init()

# Pin GPU to be used to process local rank (one GPU per process)
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
if gpus:
    tf.config.experimental.set_visible_devices(gpus[hvd.local_rank()], 'GPU')

# Build model
...

@tf.function
def training_step(texts, labels, first_batch):
    with tf.GradientTape() as tape:
        predictions = model(texts, training=True)
        loss = loss_fn(labels, predictions[0])

    # Horovod: add Horovod Distributed GradientTape.
    tape = hvd.DistributedGradientTape(tape)

    grads = tape.gradient(loss, model.trainable_variables)
    opt.apply_gradients(zip(grads, model.trainable_variables))

    # Horovod: broadcast initial variable states from rank 0 to all other processes.
    # This is necessary to ensure consistent initialization of all workers when
    # training is started with random weights or restored from a checkpoint.
    #
    # Note: broadcast should be done after the first gradient step to ensure optimizer
    # initialization.
    if first_batch:
        hvd.broadcast_variables(model.variables, root_rank=0)
        hvd.broadcast_variables(opt.variables(), root_rank=0)

    return loss

# Horovod: adjust number of steps based on number of GPUs.
for batch, (texts, labels) in enumerate(dataset.take(10000 // hvd.size())):
    loss = training_step(texts, labels, batch == 0)

    if batch % 10 == 0 and hvd.local_rank() == 0:
        print('Step #%dtLoss: %.6f' % (batch, loss))

# Horovod: save checkpoints only on worker 0 to prevent other workers from
# corrupting it.
if hvd.rank() == 0:
    checkpoint.save(checkpoint_dir)

The SageMaker data parallel library enables us to scale our training with near-linear efficiency, speeding up our training with minimal code changes. The library performs a custom AllReduce operation and optimizes device-to-device communication by fully utilizing AWS’s network infrastructure and Amazon Elastic Compute Cloud (Amazon EC2) instance topology. To use the SageMaker data parallel library, we went through the following process:

  1. Import and initialize sdp.init().
  2. Associate each device with a single smdistributed.dataparallel process with local_rank. sdp.tensorflow.local_rank() gives us the local rank of devices. The leader is rank 0, and workers are rank 1, 2, 3, and so on.
  3. Adjust the learning rate based on the number of devices.
  4. Wrap tf.GradientTape with DistributedGradientTape to perform AllReduce.
  5. Broadcast the initial model variables from the leader node to all the worker nodes.
  6. Make sure that only device 0 can save checkpoints.

Model parallelization

We can adjust the hyperparameters to keep the model small enough to train using a single GPU, or we can use model parallelism to split the model between multiple GPUs across multiple instances. Increasing a model’s number of trainable parameters can result in better accuracy, but there’s a limit to the maximum model size you can fit in a single GPU memory. We used the SageMaker distributed model parallel library to train our larger models. The steps are as follows:

  1. Import and initialize the library with smp.init().
  2. The Keras model needs to inherit from smp.DistributedModel instead of the Keras Model class.
  3. Set drop_remainder=True in the tf.Dataset.batch() method to ensure that the batch size is always divisible by the number of microbatches.
  4. Random operations in the data pipeline all need to use the same seed: smp.dp_rank(), for example, shuffle(ds, seed=smp.dp_rank()). This ensures consistency of data samples across devices that hold different model partitions.
  5. Forward and backward logic needs to be in a step function with smp.step decoration.
  6. Perform postprocessing on the outputs across microbatches using StepOutput methods such as reduce_mean. The smp.step function must have a return value that depends on the output of smp.DistributedModel.

The training script is as follows:

import smdistributed.modelparallel.tensorflow as smp

# SMP: Initialize
smp.init()

# SMP: Define smp.DistributedModel the same way as Keras sub-classing API
class MyModel(smp.DistributedModel):
    def __init__(self, maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate):
        super(MyModel, self).__init__(maxlen, vocab_size, embed_dim, num_heads, feed_forward_dim, num_layers, learning_rate)
        
        self.embedding_layer = gpt_model.TokenAndPositionEmbedding(maxlen, vocab_size, embed_dim)
        self.transformer_blocks = [
            gpt_model.TransformerBlock(embed_dim, num_heads, feed_forward_dim)
            for i in range(num_layers)
        ]
        self.dense = tf.keras.layers.Dense(vocab_size)
        
    def call(self, inputs, training=None):
        x_emb = self.embedding_layer(inputs)
        x = x_emb

        for transformer_block in self.transformer_blocks:
            x = transformer_block(x)
        outputs = self.dense(x)
        return [outputs, x_emb]


# SMP: Define smp.step. Return any tensors needed outside
@smp.step
def get_grads(texts, labels):
    predictions = model(texts, training=True)
    loss = loss_fn(labels, predictions[0])
    grads = optimizer.get_gradients(loss, model.trainable_variables)
    return grads, loss, predictions[0]

@tf.function
def train_step(texts, labels, first_batch):
    gradients, loss, predictions = get_grads(texts, labels)
    # SMP: Accumulate the gradients across microbatches
    gradients = [g.accumulate() for g in gradients]
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    
    # SMP: Average the loss across microbatches
    train_loss(loss.reduce_mean())
    # SMP: Merge predictions across microbatches
    train_accuracy(labels, predictions.merge())
    return loss.reduce_mean()

histories = []

for _ in range(epochs):
    train_loss.reset_states()
    train_accuracy.reset_states()

    for texts, labels in text_ds:
        for i in range(128):
            text = tf.expand_dims(texts[0][i], axis=0)
            label = tf.expand_dims(labels[0][i], axis=0)
            train_step(text, label)  

For a detailed guide to enable the TensorFlow training script for the SageMaker distributed model parallel library, refer to Modify a TensorFlow Training Script. For PyTorch, refer to Modify a PyTorch Training Script.

SageMaker Debugger

In the previous sections, we discussed how to optimize the training using model and data parallelization techniques. With Amazon SageMaker Debugger, we can now capture performance profiling information from our training runs to determine how much the training has improved. By default, Debugger captures system metrics for each SageMaker training job such as GPU, CPU utilization, memory, network, and I/O at a sampling interval of 500 milliseconds. We can access the data as follows:

from smdebug.profiler.analysis.notebook_utils.training_job import TrainingJob
tj = TrainingJob('SMD-MP-demo-2022-01-21-06-43-23-841', "us-east-1")
tj.wait_for_sys_profiling_data_to_be_available()
system_metrics_reader = tj.get_systems_metrics_reader()

Debugger provides utilities to visualize the profiling data in different ways. In the following example, we see the total GPU and CPU utilization as well as the I/O wait time for the multi-GPU training job using Horovod. To generate these graphs, we run the following code:

from smdebug.profiler.analysis.notebook_utils.timeline_charts import TimelineCharts

view_timeline_charts = TimelineCharts(
    system_metrics_reader, 
    framework_metrics_reader,
    select_dimensions=["CPU", "GPU", "I/O"], 
    select_events=["total"],
    show_workers=False           
)

The GPU utilization frequently fluctuates between 0–100%, and high I/O wait times with low GPU utilization are an indicator of an I/O bottleneck. Furthermore, the total CPU utilization never exceeds 70%, which means that we can improve data preprocessing by increasing the number of worker processes.

We can improve performance by switching from Horovod to the SageMaker distributed data parallel library. In the following graphs, we can see that GPUs are utilized more efficiently and only dropping to low utilization for short periods of time.

Training infrastructure

For training the models, we used 10 ml.p3.16xlarge instances using a SageMaker training job. SageMaker reduces the time and cost to train and tune machine learning (ML) models without the need to manage infrastructure. With SageMaker, you can easily train and tune ML models using built-in tools to manage and track training experiments, automatically choose optimal hyperparameters, debug training jobs, and monitor the utilization of system resources such as GPUs, CPUs, and network bandwidth. The data was hosted in Amazon EFS, which enabled us to grow and shrink as we add and remove files with no need for management or provisioning. Our primary objectives were to improve training speed and reduce costs.

Model scalability

Although this infrastructure is primarily used for language generation, with the GPT architecture and Pile dataset, you can use these techniques to train large-scale transformer models, which is useful in many domains beyond NLP. In machine learning itself, many computer vision tasks are now solved with large-parameter (transformer) architectures where they have been shown to outperform traditional CNNs (Convolutional Neural Network) on tasks like representation learning (see Advancing the state of the art in computer vision with self-supervised Transformers and 10x more efficient training) and large-scale mapping of images to text (such as CLIP). Large-parameter models are also breaking new ground in life sciences in fields like protein structure analysis and analysis of medical image data.

The solutions we detail in this post for distributed training and managing large models should apply to models in any of these domains as well.

Trade-offs

There has been an ongoing discussion in the research community regarding the risks of training large-scale language models, and whether enough thought has been put into the potential risks associated with developing them and strategies to mitigate these risks, some of which include the financial and environmental costs. According to a paper published in ACM, training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight. The environmental impacts scale with model size, and being able to efficiently fine-tune such models can potentially curtail the emissions significantly. AWS recently launched a new Customer Carbon Footprint Tool, available to all AWS customers at no cost, as part of Amazon’s efforts to increase sustainability and reduce carbon emissions. Running applications on the AWS Cloud can potentially decrease the carbon footprint (when compared to enterprise data centers that were surveyed in a 2019 report).

Conclusion

This post demonstrated a solution that facilitates the fine-tuning of language models with a billion parameters on the AWS Cloud using SageMaker.

For more information about model parallelism with SageMaker, refer to Train 175+ billion parameter NLP models with model parallel additions and Hugging Face on Amazon SageMaker and How Latent Space used the Amazon SageMaker model parallelism library to push the frontiers of large-scale transformers.

If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon ML Solutions Lab.


About the Authors

Sia Gholami is a Senior Data Scientist at the Amazon ML Solutions Lab, where he builds AI/ML solutions for customers across various industries. He is passionate about natural language processing (NLP) and deep learning. Outside of work, Sia enjoys spending time in nature and playing tennis.

Mehdi Nooriis a Manager and a Senior Applied Scientist at the Amazon ML Solutions Lab, where he works with customers across various industries, and helps them to accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

Muhyun Kim is a data scientist at Amazon Machine Learning Solutions Lab. He solves customer’s various business problems by applying machine learning and deep learning, and also helps them gets skilled.

Danny Byrd is an Applied Scientist at the Amazon ML Solutions Lab. At the lab he’s helped customers develop advanced ML solutions, in ML specialties from computer vision to reinforcement learning. He’s passionate about pushing technology forward and unlocking new potential from AWS products along the way.

Francisco Calderon Rodriguez is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he helps solve critical business problems for AWS customers using deep learning. In his spare time, Francisco likes to play music and guitar, play soccer with his daughters, and enjoy time with his family.

Yohei Nakayama is a Deep Learning Architect at the Amazon ML Solutions Lab. He works with customers across different verticals to accelerate their use of artificial intelligence and AWS Cloud services to solve their business challenges. He is interested in applying ML/AI technologies to the space industry.

Nathalie Rauschmayr is a Senior Applied Scientist at AWS, where she helps customers develop deep learning applications.

Read More