Amazon AWS – Page 279

It’s a wrap for Amazon SageMaker Month, 30 days of content, discussions, and news

May 24, 2021

by Shashank Murthy Amazon AWS

Did you miss SageMaker Month? Don’t look any further than this round-up post to get caught up. In this post, we share key highlights and learning materials to accelerate your machine learning (ML) innovation.

On April 20, 2021, we launched the first ever Amazon SageMaker Month, 30 days of hands-on workshops, tech talks, Twitch sessions, blog posts, and playbooks. Our goal with SageMaker Month was to connect you with AWS experts, getting started resources, workshops, and learning content to be successful with ML. The following is a summary of what you can access on-demand to get started on your ML journey with Amazon SageMaker.

Introducing SageMaker Savings Plans

To kick off SageMaker month, we introduced Amazon SageMaker Savings Plans, a flexible, usage-based pricing model for SageMaker. The goal of SageMaker Savings Plans is to offer you the flexibility to save up to 64% on SageMaker ML instance usage in exchange for a commitment of consistent usage for a 1-year or 3-year term. In addition, to help you save even more, we announced a price drop on SageMaker CPU and GPU instances.

To enable customers to save more on SageMaker, we hosted a SageMaker Friday Twitch session with Greg Coquillo, the second-most influential speaker according to LinkedIn Top Voices 2020: Data Science & AI, along with Julien Simon and Segolene Dessertine-Panhard outlining cost-optimization techniques using SageMaker and SageMaker Savings Plans.

SageMaker Savings Plans enhance the productivity and cost-optimizing capabilities already available in Amazon SageMaker Studio, which can improve your data science team’s productivity up to 10 times. Studio provides a single visual interface where you can perform all your ML development steps. Studio also gives you complete access, control, and visibility into each step required to build, train, and deploy models. To enable your teams to move faster and boost productivity, learn how to customize your Studio notebooks.

Getting started with ML

SageMaker is the most comprehensive ML service, purpose-built for every step of the ML development lifecycle. SageMaker provides all the components used for ML in a single service, so you can prepare data and build, train, and deploy models.

Data preparation is the first step of building an ML model. It’s a time-consuming and involved process that is largely undifferentiated. We hear from our customers that it constitutes up to 80% of their time during ML development. Data preparation has always been considered tedious and resource intensive, due to the inherent nature of data being “dirty” and not ready for ML in its raw form. “Dirty” data could include missing or erroneous values, outliers, and more. Feature engineering is often needed to transform the inputs to deliver more accurate and efficient ML models. To help with feature engineering, Amazon SageMaker Feature Store offers a purpose-built repository to store, update, retrieve, and share ML features within development teams.

Another challenge with data preparation is that it often requires multiple steps. Although most standalone data preparation tools provide data transformation, feature engineering, and visualization, few tools provide built-in model validation. And all of these data preparation steps are considered separate from ML. What’s needed is a framework that provides all these capabilities in one place and is tightly integrated with the rest of the ML pipeline. Most standalone tools for data preparation treat it as an extract, transform, and load (ETL) workload, making it tedious to iteratively prepare data, validate the model on test datasets, deploy it in production, and go back to ingesting new data sources and performing additional feature engineering. Most iterative data preparation is divorced from deployment. Therefore, data preparation modules need curation and integration before they’re deployed in production. These practices in ML are sometimes referred to as MLOps.

To help you overcome these challenges, you can use Amazon SageMaker Data Wrangler, a capability to simplify the process of data preparation, feature engineering, and each step of the data preparation workflow, including data selection, cleansing, and exploration on a single visual interface. As part of SageMaker Month, we created a step-by-step tutorial on how you can prepare data for ML with Data Wrangler. In addition, you can learn how financial customers use SageMaker every day to predict credit risk and approve loans. This example uses Data Wrangler and Amazon SageMaker Clarify to detect bias during the data preparation stage.

Another part of the data preparation stage is labeling data. Data labeling is the task of identifying objects in raw data, such as text, images, and videos, and tagging them with labels that help your ML model make accurate predictions and estimations. For example, in an autonomous vehicle use case, Light Detection and Ranging (LIDAR) devices are commonly used to capture and generate a three-dimensional point cloud data, which is an understanding of the physical space at a single point in time. For this use case, you need to label your data captured both in 2D and 3D spaces to produce highly accurate predictions of vehicles, lanes, and pedestrians. Amazon SageMaker Ground Truth, a fully managed data labeling service, makes it easy to build highly accurate training datasets for ML in 2D and 3D spaces using custom or built-in data labeling workflows. To help you label your data, we created how-to blog posts to showcase how to annotate 3D point cloud data and automate data labeling workflows for an autonomous vehicle use case with Ground Truth.

After you built your ML model, you must train and tune it to achieve the highest accuracy. Improving a model’s performance is an experimental and iterative process. For SageMaker Month, we consolidated a few techniques and best practices on how to train and tune high-quality deep learning models with complete visibility using SageMaker.

When you’re satisfied with your model’s accuracy, understanding how to deploy and manage models at scale is key. For model deployment and management, we showcase an example where an application developer is using SageMaker multi-model endpoints to host thousands of models and pipelines to automate retraining to improve recommendations across different US cities.

When it’s time to deploy your model and make predictions, a process called inference, you can use SageMaker for inference in the cloud or on edge devices. Amazon SageMaker Neo automatically compiles ML models for any ML framework and any target hardware. A Neo compiled model can speed up YOLOv4 inference to twice as fast. You can also reduce ML inference costs on SageMaker with hardware and software acceleration.

As part of SageMaker Month, we also launched an example use case that shows how you can use Amazon SageMaker Edge Manager, a capability to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, and mobile devices. This blog outlines how to manage and monitor models on edge devices such as wind turbines.

Finally, to bring all our SageMaker capabilities together and help you move from model ideation to production, we created an on-demand introduction to SageMaker workshop similar to the virtual hands-on workshops we conducted live and during recent AWS Summits. It includes everything you need to get started with SageMaker at your own pace.

ML through our Partners

As part of SageMaker Month, we partnered with Tableau and DOMO to empower data and business analysts with ML-powered insights without needing any ML expertise. With the right data readily available, you can use ML and business intelligence (BI) tools to help make predictions needed to automate and speed up critical business processes and workflows.

We partnered with DOMO to enable ML for everyone with SageMaker. Domo AutoML, powered by Amazon SageMaker Autopilot, provides insights to complex business problems and automates the end-to-end decision-making process. This helps organizations improve decision-making and adapt faster to business changes.

We also partnered with Tableau to create a blog post and tech talk that showcases an end-to-end demo and new Quick Start solution that makes it easy for data analysts to use ML models deployed on SageMaker directly in their Tableau dashboards without writing any custom integration code.

What’s next

SageMaker Month focused on cost savings and optimization, getting started with ML, and learning content to accelerate ML innovation. As we wrap up SageMaker Month, we’re excited to share the upcoming and first ever virtual AWS Machine Learning Summit on June 2, 2021. The summit brings together industry-leading scientists, AWS customers, and experts to dive deep into the art, science, and impact of ML. Attend for free, learn about features over 30 sessions, and interact with leaders in a live Q&A.

About the Author

Shashank Murthy is a Senior Product Marketing Manager with AWS Machine Learning. His goal is to make it easy for customers to build, train, and deploy machine learning models using Amazon SageMaker. For fun outside work, Shashank likes to hike the Pacific Northwest, play soccer, and run obstacle course races.

Enhance sports narratives with natural language generation using Amazon SageMaker

May 24, 2021

by Henry Wang Amazon AWS

This blog post was co-authored by Arbi Tamrazian, Director of Data Science and Machine Learning at Fox Sports

FOX Sports is the sports television arm of FOX Network. The company used machine learning (ML) and Amazon SageMaker to streamline the production of relevant in-game storylines for commentators to use during live broadcasts.

“We collaborated with the Amazon Machine Learning Solutions Lab to build a natural language generation (NLG) engine that automatically produces sports narratives for commentators to use during games. Leveraging Amazon SageMaker, the Amazon Machine Learning Solutions Lab developed a model pipeline that generates natural-sounding sports narratives from a ML model trained on billions of English texts and sports stats snippets. In just a few short weeks, the NLG solution achieved BLEU scores above 99% on unseen Fox Sports testing dataset, significantly improving the readability of narratives compared to test benchmarks. Standardizing our ML workloads on Amazon SageMaker will enable our broadcasters to engage fans with pertinent gameday stories, in real-time.” – Arbi Tamrazian, Director of Data Science and Machine Learning, Fox Sports

Objectives

As viewers may have noticed, sports broadcasters are increasingly sharing statistical insights throughout the game to tell a richer story for the audience. Thanks to an abundance of data and advanced stats such as NFL Next Gen Stats powered by AWS, broadcasters can quickly tell stories and make comparisons between teams and players to keep viewers engaged.

Due to the fast-paced nature of many games, broadcasters rely on template-generated narratives to speak about in-game statistics in real time. These rule-based templates “stitch” tabular information and create narratives with fixed sentence structures that sometimes sound rigid and are hard to understand. It’s also becoming harder to build and maintain templates to keep up the pace with the introduction of new statistics.

To improve the broadcasting experience, Fox Sports turns to AWS and its artificial intelligence technologies to convert their real-time data into easy-to-understand narratives for commentators and audiences. The Amazon ML Solutions Lab partnered with Fox Sports to design and implement an end-to-end ML system using natural language generation (NLG), a technique to generate natural language descriptions from structured data. The objective of the partnership is to produce more natural-sounding narratives compared to the rule-based templates in a scalable fashion. The system enables Fox Sports to expand their rule-based generation engine into an ML solution. The model is trained to understand the semantic meaning of inputs, and can be expanded to new statistics and other sports by fine-tuning with a few hundred sample narratives.

In this post, we walk you through how to fine-tune a pretrained language model to generate sentences similar to those from rule-based templates. In addition, we show how to use different NLG techniques to make the sentences sound more natural, which leads to improved fan experiences and reduced cost in building and maintaining templates.

Template for an ML approach

The first phase of the NLG-based narrative generation solution relies on tabular features, including player and team names, metrics, and game situations. These features are paired with their target sequences, which are generated using predefined rule-based templates. The goal here is to use NLG to take the tabular features and generate candidate narratives containing all the relevant information.

Dataset

To train this model, we use a dataset synthetically generated by Fox Sports using the current rule-based methodology. The dataset is generated by permuting different statistics, feature values, and team and player names, and includes more than 57,000 samples of 8 features. For each sample, we have the narrative generated from a rule-based template as our target. We randomly shuffle and divide the dataset into training, validation, and testing sets based on an 80/10/10 split for training and fine-tuning our models.

The following table shows examples of the raw data used in this experiment—each row represents a record, and each column represents the relevant information associated with the record, including the statistic, values for the statistic, situation that the statistic is calculated upon, and more. For this post, we replace actual team and players names with generic names: team Bobcats and player John Peccy.

Statistic	Situation	Value	Time frame	Rank	Rank Order	Population	Team name / Player name
rec_td	stadium_retractable_dome	5	season	7	True	32	Bobcats
qbkd	score_differential_trailing	3	season	2	False	190	John Peccy

For each row, the raw tabular features are concatenated to form a text sequence. The following table shows examples of the text sequences used as input and the associated narrative from the rule-based template as output.

Template input	Template output
rec_td stadium_retractable_dome 5 season 7 TRUE 32 Bobcats	Bobcats’ 5 caught passes for touchdowns when playing in a retractable roof is the 7th highest out of 32 in the NFL this season.
qbkd score_differential_trailing 3 season 2 FALSE 190 John Peccy	John Peccy’s 3 credited QB knockdowns when trailing is the 2nd lowest out of 190 in the NFL this season.

Methods and metrics

The task of translating tabular features to natural sentences is a subtask of natural language generation. Because transfer learning has proved effective at this task, we utilize a language model called T5 (Text-To-Text Transfer Transformer), which was pretrained on the open-source dataset C4 (Colossal Clean Crawled Corpus). T5 achieves state-of-the-art results on many NLP benchmarks and is flexible to be fine-tuned to different NLP tasks. To fine-tune the T5 model for Fox Sports, we concatenate the tabular features into a single sequence of text as our training input. Then we use the template-generated statements as labels. For example, the following table is translated into the text sequence Team Bobcats, prss, 4, score_differential_leading, 7.

Team name	Metric	Value	Situation	Rank
Bobcats	prss	4	score_differential_leading	7

The corresponding template statement – The Bobcats’ 4 total times of pressuring the quarterback when leading is the 7th highest in the NFL this season” – is passed in as the target output. After fine-tuning the T5 model with thousands of such examples, the model is able to generate statements similar to the template. It even works for previously unseen input, making it extensible to fresh players and newly created metrics.

We use the BLEU (Bilingual Evaluation Understudy) performance metric to quantitatively measure model performance. BLEU measures the matching quality of a generated sentence to a ground truth sentence by assigning a score from 0–100, with 100 being a perfect match to the ground truth. After fine-tuning on a few thousand sentences, the T5 model is able to achieve a BLEU score of above 99 on the test set, an indication that most of the generated sentences are identical to template-generated sentences. It also echoes the usefulness of using pretrained models on abundantly available unlabeled text for different downstream tasks.

Improving comprehensibility

The template-generated narratives capture core details, but are repetitive and sometimes difficult to read because they follow the same predefined sentence structure. This leads to confusion for the broadcasters and fans. To address this drawback, we include a second phase of modeling, which employs language models to enhance the readability and comprehensibility of the fine-tuned T5 model’s generated narratives. This step’s objective is to make the narratives sound more natural, allowing commentators to easily communicate the information during live broadcasting.

Language processing methods

One way to replace unnatural words in sentences is through back translation. Back translation is a two-step translation method. It first translates a sentence into another language, and then translates the sentence back to its original language. It’s a technique used mostly for text data augmentation, namely, increasing the variety of original text. For this use case, we find that translation models trained on a large text corpus can help fix mistakes in the original sentence. During back translation, a singular noun may be corrected to a plural. The model may also choose more natural-sounding language. This approach gives us an automatic way to improve readability for our generated sentences.

An alternative natural language processing (NLP) approach to back translation is called paraphrasing—a technique that aims to express semantically similar narratives in different forms. We employ a pretrained T5 model, which is fine-tuned for paraphrasing purposes using the open-sourced paraphraser dataset PAWS. Our paraphrasing model generates several candidates for a given narrative with slightly different content. One major advantage of using this technique is that it offers several narratives per input. This gives us several candidate sentences, from which we can choose the version that best fits Fox Sports’s business needs. An example of the paraphrasing output against a sample sentence is shown in the following table.

Type	Sentence
Original	The Bobcats’ 4 total times of pressuring the quarterback when leading is the 7th highest out of 32 in the NFL this season.
Paraphrased 1	The Bobcats pressing the quarterback 4 times when leading this season is the 7th best out of 32 in the NFL.
Paraphrased 2	The Bobcats’ 4 total times of pressuring quarterback in leading is the 7th highest out of 32 in the NFL this season.
Paraphrased 3	The Bobcats have pressured the quarterback 4 times total when leading—the 7th highest out of 32 in the NFL this season.

Model evaluation

Quantitatively evaluating how natural a sentence sounds is an ongoing challenge in the NLP community. For this project, we use an existing metric called perplexity. Perplexity is a proxy measure of how “surprised” a language model is at sentences. In other words, it measures how common an evaluation sentence is among text corpus used to train a language model, which can be used to compare the quality of different sentences. For language models such as GPT2, it typically assigns a low perplexity score to real and syntactically correct sentences and high score to fake, incorrect, or highly infrequent sentences. For example, GPT2 assigns a lower score to sentences like “Can you do it?” and a higher score to sentences like “Can you does it?” With this, we can compare the quality of generated sentences sharing similar semantic meanings and output the one with the lowest perplexity score.

Architecture

Our final product is an end-to-end ML workflow using SageMaker. To meet Fox Sports’ needs, the workflow ensures that the following two criteria are satisfied:

The end-to-end results must include all the required features defined by a user
The final narrative output of the models shouldn’t be harder to read than the original rule-based template narrative

Our solution consists of two major components:

Replace the current ruled-based approach with the fine-tuned T5 model
Enhance the generated narratives through a multi-step ML-based approach

As illustrated in the following figure, the fine-tuned T5 ML model generates the narratives (green blocks). Next, the narratives are passed through the back translation model as an attempt to produce enhanced narratives. If the back translated results include the necessary keywords and their perplexity scores are lower compared to the T5 model outputs, they’re used as the final outputs. Otherwise, we pass the T5 model outputs through the paraphrasing model and apply the same condition check. If none of our enhancement models reduce the perplexity score, we simply output the T5 model outputs. Through this workflow, we ensure all the required features are captured and improve the readability of the sentence when appropriate, maximizing the benefit ML can bring to the existing solution.

Results

With models combined to form the preceding architecture, the output narrative has on average 13% lower perplexity compared to original rule-based, template-generated narratives, and all the information is maintained. Fox Sports can display the narratives to broadcasters and sports fans for more exciting viewing experiences!

Conclusion

The ML Solutions Lab and Fox Sports ML team worked closely to build an end-to-end ML solution that converts in-game tabular stats into natural-sounding narratives. Because the solution is built on top of language models pretrained on a huge text corpus, additional metrics and game situations can be passed in directly to generate the desired outputs. The extensibility also enables the solution to be transferred to other sports by simply fine-tuning the model with sample narratives. These capabilities allow the model to scale and adapt to future business needs.

Around the world, many sports leagues and sports networks like Fox Sports are transforming the fan experience with AWS technology. AWS is helping bring fans closer to the game through partnering with Bundesliga, F1, NFL, NHL, NASCAR, and many others. Visit AWS Sports for more details.

If you’d like help accelerating your use of ML in your products and processes, please contact the ML Solutions Lab program.

About the Authors

Henry Wang is a Data Scientist at Amazon Machine Learning Solutions Lab. Prior to joining AWS, he was a graduate student at Harvard in Computational Science and Engineering, where he worked on healthcare research with reinforcement learning. In his spare time, he enjoys playing tennis and golf, reading, and watching StarCraft II tournaments.

Saman Sarraf is a Data Scientist at the Amazon ML Solutions Lab. His background is in applied machine learning including deep learning, computer vision, and time series data prediction.

Arbi Tamrazian is the Director of Data Science and Machine Learning at FOX where he focuses on building scalable machine learning solutions that can be applied to real-time data feeds and media assets. His main areas of interest are Deep Learning, Computer Vision and Reinforcement Learning.

How lekker got more insights into their customer churn model with Amazon SageMaker Debugger

May 24, 2021

by Steffen Kremers Amazon AWS

With over 400,000 customers, lekker Energie GmbH is a leading supraregional provider of electricity and gas on the German energy market. lekker is customer and service oriented and regularly scores top marks in comparison tests. As one of the most important suppliers of green electricity to private households, the company, with its 220 employees, stands for environmentally and consumer-friendly products.

Germany’s energy market was liberalized in the 1990s. Since then, customers have free choice of their energy and gas supplier. During the liberalization, the German government standardized the switching processes, so switching your energy or gas supplier is an easy task. However, it’s a challenging task for lekker to hold churn rates low. Preventing existing customers from leaving is several times cheaper than acquiring new ones. The best way to realize low churn rates is to keep their customers satisfied. Knowledge about a customer’s churn risk is helpful information for target-based campaigns, because it allows lekker to focus on customers who are more likely to churn.

This post discusses how lekker used Amazon SageMaker Debugger to get deep insights into their customer churn model. Debugger automatically collects data during model training and provides built-in rules to automatically detect issues in model training.

Data preprocessing

lekker has a wide range of systems with different databases and data structures, and uses Spark and AWS Step Functions to create a data lake on AWS. In preparation of the churn model, lekker creates a Spark processing job that holds customer-specific information like duration, sales channel, consumption, and other information for label creation. lekker make distinctions between active and passive churn. Active churn describes customers canceling their contract. Passive churn describes customers who are no longer in lekker’s delivery area or whose contract was cancelled due to late payment. For the introduced model, lekker uses active churn as a label, which helps better fit marketing expectations for retention campaigns.

Create a customer churn model

Before lekker started with AWS, data came from an Oracle database, which was used as a business intelligence (BI) platform. The BI team and analysts were organized in different departments and had different access rights. Data scientists needed to access data by schema-on-read. Models were trained on local machines or non-scalable servers, and computational restrictions came up quickly. If a model was trained, model monitoring and debugging was hard to perform, while management’s skepticism of potential closed-box models grew. Model deployment was also difficult, caused by missing orchestration tools and limited server availability and capacity.

When lekker decided to use SageMaker, most of these problems were solved, because SageMaker offers solutions along the whole machine learning workflow. lekker can now easily scale computing capacity needs and access all available data on Amazon S3. Their data scientists can now explore and prepare data in the same notebook, and find it easier to create and train models using SageMaker Estimators. Additionally, lekker frequently use SageMaker automatic model tuning, which figures out the best model by running different hyperparameter configurations. This helped raise model quality tremendously. lekker uses Debugger to evaluate and communicate models’ results and get model insights.

Set up training on Amazon SageMaker

To run the XGBoost training on SageMaker, lekker uses the SageMaker Estimator API. It takes the instance type for the model training (ml.m5.4xlarge). It also takes the image URI of the training image and a dictionary for the model hyperparameters. See the following code:

Estimator(
    role=role,
    instance_count=1,
    instance_type='ml.m5.4xlarge',
    hyperparameters = {
        'num_round': '20',
        'rate_drop': '0.3',
        'scale_pos_weight': scale_pos_weight,
        'tweedie_variance_power': '1.4',
        'objective': 'binary:logistic'
        },
    image_uri = sagemaker.image_uris.retrieve('xgboost',region, version='1.0-1')
)

Configure Debugger and rules

lekker uses Debugger in three ways:

Use built-in rules to identify underperforming training jobs
Create automatic visualizations
Collect important metrics from training jobs

The following code shows the Debugger hook configuration to collect metrics such as feature importance and Shapley values from churn model training:

debugger_hook_config=DebuggerHookConfig(
    hook_parameters={'save_interval':'5'},
    collection_configs=[ 
        CollectionConfig(name="metrics"),
        CollectionConfig(name="feature_importance"),
        CollectionConfig(name="full_shap"),
        CollectionConfig(name="average_shap"),
    ]
 )

Debugger provides built-in rules that check for model training issues such overfitting or loss not decreasing. Those rules run as a SageMaker processing job in a separate container and instance so the rule analysis doesn’t interfere with the actual training. Users don’t pay to run these built-in rules. lekker frequently uses the loss_not_decreasing and xboost_report rules. The first rule monitors the loss curves and triggers if loss doesn’t decrease by a certain percentage. The xgboost_report rule captures XGBoost model data and creates a static HTML report with visualizations such as ROC curves, errors plots, and more, and provides key insights and recommendations. See the following code:

 rules=[
    Rule.sagemaker(
        rule_configs.loss_not_decreasing(),
        rule_parameters={
        "collection_names": "metrics",
        "num_steps": str(save_interval * 2),
        },
        ),
    Rule.sagemaker(rule_configs.create_xgboost_report())
 ]

After the Debugger hook configuration and list of rules are specified, one starts the SageMaker training with estimator.fit(). The fit function takes as input the path to training and validation data in Amazon S3. See the following code:

estimator.fit( 
    "train": TrainingInput(model_train_file, content_type="csv")
    "validation": TrainingInput(model_test_file, content_type="csv"))

SageMaker automatically spins up the ml.m5.4xlarge training instance, downloads the training container and datasets, and runs the model training. It also spins up an instance to run the rule analysis as a SageMaker processing job. You can go to SageMaker Studio and check the rule status or check the status from the Python SDK.

Visualize and perform real-time monitoring

When the training is running, lekker uses Debugger’s open-source smdebug library to fetch and query the data that is uploaded in real time to Amazon S3. The first step is to create a trial object that takes either a local or S3 path:

from smdebug.trials import create_trial

s3_output_path = xgboost_estimator.latest_job_debugger_artifacts_path()
trial = create_trial(s3_output_path)

Now one access and query the data. To plot the loss curves, one simply retrieves the metrics collection and the number of recorded steps:

steps = trial.steps()
fig, ax = plt.subplots()
for tname in trial.collection("metrics").tensor_names:
    data = [value for value in trial.tensor(tname).values().values()]
    ax.plot(steps, data, label=tname)

The following figure shows that train and validation errors fall while training the customer churn model. That’s a sign of a well-trained model, because it shows that the model performs well on the unseen data (validation data). Debugger makes this visualization easy to create.

When the training job has completed, lekker uses the output of the xgboost_report rule to get further insights into the customer churn model. The following figure shows the model’s feature importance for the training job. The most important feature is customer duration (membership in months). lekker offers contracts with a fixed duration, such as 12 or 24 months. If customers cancel their contract, the churn shows at the end of the fixed duration period. That’s why most churn appears at month 12 and 24.

Knowledge about what influences the models’ outcome is important because it helps explain the model. lekker uses SHapley Additive exPlanations (SHAP) values recorded by Debugger during training. SHAP was made for local interpretability of a predictive model. It uses a game theoretic approach to explain the output of machine learning models.

In the following figure, blue represents low feature values, red represents high. The x-axis shows the SHAP-value, which describes the impact on the outcome. High values indicate a predicted value increase, low values indicate a decrease. A line’s thickness represents how many customers are at this specific point. In the churn model customers with low duration have low predicted churn probabilities. That’s a result of their contract structure, because customer churn can be determined after 12 months at the earliest.

Users running on Amazon SageMaker can obtain SHAP values for their model either through SageMaker Debugger or SageMaker Clarify. The key difference is that Debugger records those values during training, while Clarify captures them after the model has been trained. Inspecting SHAP values during the training phase, helps to further improve the model by identifying and removing irrelevant input features.

Once the model is trained, you can use Clarify to get SHAP values for any dataset. Once you deploy the model as an endpoint, you can use Clarify to monitor the SHAP values for captured data from the endpoint. Another key difference is that Debugger can collect SHAP values during training for XGBoost models whereas Clarify is model agnostic and can work with any model.

Results

With all the tools and services SageMaker provides, lekker was able to raise churn model accuracy by nearly 20%. In addition, the model is more stable than earlier versions. That’s why the F1 score raised over 80% and AUC to 96%.

“Since we got all this information about model insights, we are able to get a clear understanding about what’s happening,” says Steffen Kremers, a data scientist at lekker. “Especially the concept of feature gains, which is fully integrated in the Debugger report, gave us useful information about the most influencing features. Important information for both feature engineering and feature selection.”

Since the churn model was deployed, lekker has moved three more models to SageMaker and integrated them into operations. lekker transferred the learnings they made to all these models, and have seen that all models yield better results than before. Once lekker saw the insights ML can bring, they began expanding their ML activities.

Conclusion

This post demonstrated how lekker moved workloads from on premises to SageMaker, and how it helped their data science teams accelerate and innovate faster. lekker extensively uses Debugger to get deeper insights into their models, which help improve and better explain the models. To learn more about Debugger features and how this service can help your business, see Amazon SageMaker Debugger. To learn more about optimizing for customer churn, check out the blog post Preventing customer churn by optimizing incentive programs using stochastic programming.

About the Authors

Steffen Kremers is a data scientist at lekker based in Germany. He accompanies the whole machine learning process – from developing use case ideas to model building up to model deployment.

Nathalie Rauschmayr is an Applied Scientist at AWS, where she helps customers develop deep learning applications.

Lu Huang is a Senior Product Manager on the AWS Deep Engine team, managing Amazon SageMaker Debugger.

Best practices in customer service automation

May 21, 2021

by Arte Merritt Amazon AWS

Chatbots, virtual assistants, and Interactive Voice Response (IVR) systems are key components of successful customer service strategies.

We had the pleasure of hearing from three AWS Contact Center Intelligence (AWS CCI) Partners as part of our Best Practices in Customer Service Automation webinar, who provided valuable insights and tips for building automated, customer-service solutions.

The panel included:

Brad Beumer, CX and Contact Center Lead, Americas, for UiPath
Pat Higbie, CEO and Co-founder of XAPP AI
Rebecca Owens, Senior Product Manager at Genesys

Why build a chatbot or IVR?

Customers expect great customer service. At the same time, enterprises struggle with the costs and resources necessary to provide high-quality, highly available, live-agent solutions. Automated solutions, like chatbots and IVR, enable enterprises to provide quality support, 24/7, while reducing costs and increasing customer satisfaction.

Although reducing costs is important, a big reason enterprises are implementing automated solutions is to provide a better overall user-experience. As Brad Beumer of UIPath points out, it is what customers are asking for. Customers want a 24/7/365 experience—especially for common tasks they can handle on their own without an agent.

Self-serve, automated solutions help take the pressure off live agents. As Rebecca Owens of Genesys mentions, self-service can help handle the upfront tasks, leaving the more complex tasks to the live agents, who are the contact centers’ most valuable assets.

The impact of COVID-19

COVID-19 has had a significant impact on the interest in chatbots. Shelter-in-place rules affected both the consumers’ ability to go into locations, and the live agents’ ability to work in the same contact center. The need for automated solutions skyrocketed. Genesys saw a large increase in call volumes—in some cases, nearly triple the volume.

Chatbots are not only helping consumers during COVID-19, but work-from-home agents as well. As Beumer mentions, automated solutions help offload more of the agents’ tasks and help them with compliance, security, and even VPN issues related to working from home.

COVID-19 resulted in more stress on existing chatbots too. As Pat Higbie of XAPP AI shares, existing chatbots were not set up to handle the additional use cases people wanted them to handle. These are opportunities to take advantage of AI, through tools like Amazon Lex or Amazon Kendra, for chatbots and natural language search, to enable users to get what they need and improve the customer experience.

Five best practices

Building automated solutions is an iterative process. Our panelists provided insights and best practices when facing common issues.

Getting started

Building conversational interfaces can be challenging because it is hard to know all the things a user may request, or even how they pose the request.

Our panelists see three basic use cases:

Task completion – Collecting user information to make an update, like an address change
Information requests – Providing information like delivery status or a bank balance
Efficient routing – Collecting information to route the user to the most appropriate agent

Our panelists recommend getting started with simpler use cases that have a high impact. As Beumer recommends, start with high-volume, low-complexity tasks like password resets or lost credit cards. Owens adds that starting with high-level Natural Language Understanding (NLU) menus to understand user intent and routing them to the right agent is a simple investment with a significant ROI. Afterwards, move to simple task automation and information requests, and then move into the more advanced use cases that were not possible before conversational AI. As Higbie puts it, start with a quick win, like informational chatbots, especially if you have not done this before. The level of complexity can go up quite dramatically, especially with transactional use cases.

As complexity increases, there are opportunities for more advanced use cases, like transactional or even proactive use cases. Owens mentioned an example of using AI to monitor activity on a website and proactively offering a chatbot when needed. For example, if you can predict the likelihood of an ecommerce user having an issue at checkout, a chatbot can proactively offer to help the user, to lead them through completion so the user does not abandon their cart.

Handling fallbacks gracefully

Fallbacks occur when the automated solution cannot understand the user or cannot handle the request. It is important to handle fallbacks gracefully.

In the past with contact centers, users were often routed to an agent when a fallback occurred. Now with AI, you can better understand the user’s intent and context, and either send them to another AI solution, or more efficiently transfer them to an agent, sending the full context so the user does not have to repeat themselves.

Fallbacks are an opportunity to educate users on what they can say and do—to help get users back on the “happy path.” For example, if the user asks for something the chatbot cannot do, have it respond with a list of what it can do. Predefined buttons, referred to as quick replies, can also help let a user know what the chatbot can do.

Supporting multimodal channels

Our panelists see enterprises building automated solutions across multiple channels, including multi-modal text and voice options on the web, IVR, social media, and email. Enterprises are building solutions where their customers are interacting. There are additional factors to consider when supporting multiple channels.

People ask questions differently across channels. As Higbie points out, users communicating via text tend do so in “keyword style” with incomplete sentences, whereas in voice, they tend to ask the full question.

The way the chatbot responds across channels can be different as well. In text, the chatbot can provide a menu of options for the user to click. With voice, if there are more than three options, it can be difficult for the user to remember.

Regardless of the channel, it is important to understand the user’s intent. As Beumer mentions, if the intent can be understood, the right automation can be triggered.

It can be helpful to have a common interaction model for understanding across channels, but it is important to optimize the actual responses for each particular channel. As Higbie indicates, model management, dialog management, and content management are all needed to handle the complexities in conversational AI.

Keeping context in mind

Context is important—what is known about the user, where they are, or what they are doing can help improve the user experience.

Chatbots and IVRs can connect to backend CRMs to have additional information to personalize and tailor the experience. They can also pass along information gathered from a user to a live agent for more efficient handling so the user does not have to repeat themselves.

In the case of voice, knowing if the user has been in recent contact before can be helpful. While introductory prompts can be great to educate people, if the user contacts again, it is better to use a tapered approach that reduces some of the default messaging in order to have a quicker opening response.

The context can also be used with proactive solutions that monitor user activity and prompt if help is needed.

Measuring success

Our panelists use a variety of metrics to measure success, such as call deflection rates, self-service containment rates, first response time, and customer satisfaction. The metrics can also be used to calculate operational cost savings by knowing the cost of live agents and the deflection rates.

Customer satisfaction is very important—one of the goals of automated solutions is to provide a better user experience. One way UIPath does this is to look at Net Promoter Scores (NPS) before and after an automated solution is launched. Surveys can be used as well, via outbound calls after an interaction to gather customer feedback. With chatbots, you can immediately ask the user whether the response was helpful and take further action depending on the response.

Automated solutions like chatbots and IVRs need continuous optimization. It is difficult to anticipate all the things a user may ask, or how they may ask them. Monitoring the interactions to understand what users are asking for, how the automated solution is responding, and where it needs improvement is important. It is an iterative process.

What the future looks like

Our panelists shared their thoughts on the future of automated solutions.

Owens sees an increase in usage of automated solutions across all channels as chatbot technologies gain momentum and AI is able to handle even more tasks and complexity. Although customer service is heavily voice today, she is seeing a push to digital, and expects the trend to continue. One area of growth is in the expansion of language support in AI beyond English to support worldwide coverage.

Beumer envisions expansion of automated solutions across all channels, for a more consistent user experience. While automation will increase, it is important to continue to make sure that when a chatbot hands off to a live agent, that it is done so seamlessly.

Higbie sees a lot of exciting opportunity for automated solutions, and believes we are only in the “first inning” of AI automation. Customers will ask for even more than what chatbots currently do, and they will get the responses instantly. Solutions will move more to the proactive side as well. He sees this as a bigger paradigm shift than either web or mobile. It is important to commit now and not be displaced. As he summarizes, enterprises need to get started, get a quick win, and then expand the sophistication of their AI initiatives.

As the underlying technologies continue to evolve, the opportunities for automated chatbots continue to grow. It is exciting to learn from our panelists and see where automated solutions are going in the future.

About AWS Contact Center Intelligence

AWS CCI solutions can quickly and easily add AI and ML to your existing contact center to improve customer satisfaction and reduce costs. AWS CCI covers three key areas of the contact center workflow: self-service automation, real-time analytics with agent assist, and post-call analytics. Each solution is created using a specific combination of AWS AI services, and is available through select AWS Partners. Join the next CCI Webinar, “Banking on Bots”, on May 25, 2021.

About the Author

Arte Merritt leads partnerships for Contact Center Intelligence and Conversational AI. He is a frequent author and speaker in the conversational AI space. He was the co-founder and CEO of the leading analytics platform for conversational interfaces, leading the company to 20,000 customers, 90B messages, and multiple acquisition offers. Previously he founded Motally, a mobile analytics platform he sold to Nokia. Arte has more than 20 years experience in big data analytics. Arte is an MIT alum.

Implement live customer service chat with two-way translation, using Amazon Connect and Amazon Translate

May 21, 2021

by Bob Strahan Amazon AWS

Many businesses support customers across multiple countries and ethnic communities, and therefore need to provide customer service in a wide variety of local languages. It’s hard to consistently staff contact centers with agents with different language proficiencies. During periods of high call volumes, callers often must wait on hold for an agent who can speak their language.

What if these businesses could implement a system to act as a real-time translator, allowing customers and agents to easily communicate in different languages? With such a system, a customer could message a support agent in their native language, such as French, and the support agent could use their own native language, maybe Italian, to read and respond to the customer’s messages. Deliveroo, an online food delivery company based in England, has implemented a system that does exactly that!

Deliveroo provides food delivery in over 200 locations across Europe, the Middle East, and Asia, serving customers in dozens of languages. Previously, during periods of high demand (such as during televised sporting events, or bad weather) they would ask customers to wait for a native speaker to become available or ask their agents to copy/paste the chats into an online translation service. These approaches were far from ideal, so Deliveroo is now deploying a much better solution that uses Amazon Connect and Amazon Translate to implement scalable live agent chat with built-in automatic two-way translation.

In this post, we share an open-source version of this solution from one of Amazon’s partners, VoiceFoundry. We show you how to install and try the solution, and then how you can customize it to control translations of specific phrases. Finally, we share success stories from our customer, Deliveroo, and leave you with pointers for implementing a similar solution for your own business.

Set up an Amazon Connect test instance and live chat translation

Follow these tutorials to set up an Amazon Connect test instance and experiment with the chat contact feature:

If you have an Amazon Connect test instance and you already know how to use chat contacts, you can skip this step.

Now that you have Amazon Connect chat working, it’s time to install the sample live chat translation solution. My co-author, Dan from VoiceFoundry, has made it easy. Follow the instructions in the project GitHub repository Install Translate CCP Demo for Amazon Connect.

Test the solution

To test the solution, you simulate two roles—the agent and the customer.

As the agent, sign in to your Amazon Connect instance dashboard.
In a separate browser window, open the new web application using the URL created when you installed the solution.

The Amazon Connect Control Panel is displayed on the left, and the new chat translation panel is on the right.

On the Control Panel title bar, change your status from Offline to Available.
Acting as the customer, launch the test chat page from the Amazon Connect dashboard, or using the URL https://<yourConnectInstance>/connect/test-chat.

In a real-world application, you use a customer chat client widget on a website or mobile application. However, for this post, it’s convenient to use the test chat client.

Open the customer test chat widget to initiate contact with the agent.

You hear a ring tone and see a visual indicator on the agent’s control panel as the agent is asked to accept your contact.

As the agent, accept the incoming request to establish contact.

As the customer, enter a message in Spanish into the customer test chat widget. For example, “Hola, necesito ayuda con mi pedido.”

Let’s assume that the agent can’t understand the incoming message in Spanish. Don’t worry—we can use our sample solution. The new web app chat translation panel displays the translation in English, along with the customer’s original message. Now you can understand the phrase “Hi, I need help with my order.”

As the agent, enter a reply in English in the chat translation panel text box, for example “Hi, My name is Bob and I will be happy to help. What is your name and phone number?”

Your reply is automatically translated back to Spanish.

As the customer, verify that you received a reply from the agent in Spanish.

Continue the conversation and observe how the customer can chat entirely in Spanish, and the agent entirely in English. Take a moment to consider how useful this can be.

When you’re done, as the agent, choose End chat and Close contact to end the chat session. As the customer, choose End chat.

Did you notice that the chat translation panel automatically identified the language the customer used—in this case Spanish? You can use any of the languages supported by Amazon Translate. Try the experiment again, this time using a different language for the customer. Have some fun with it—engage friends who are fluent in other languages and communicate with them in their native tongue.

In the sample application, we have assumed that the agent always uses English. A production version of the application would allow the agent to choose their preferred language.

Multi-chat support

Amazon Connect supports up to five concurrent chat sessions per agent. Our sample application allows a single agent to support multiple customer chats in different languages concurrently.

In the following screenshot, agent Bob is now chatting with a new customer, this time in German!

Customize terminology

Let’s say you have a product called Moonlight and Roses. While discussing this product with your Spanish-speaking customer, you enter something like “I see that you ordered Moonlight and Roses on May 13, is that correct?”

Your customer sees the translation “Veo que ordenaste Luz de Luna y Rosas el 13 de mayo, ¿es correcto?”

This is a good literal translation—Luz de Luna y Rosas does mean Moonlight and Roses. But in this case, you want your English product name, Moonlight and Roses, to be translated to the Spanish product name, Moonlight y Roses.

This is where we can use the powerful custom terminology feature in Amazon Translate. Let’s try it. For instructions on updating your custom terminologies, see the GitHub repo.

Now we can validate the solution with another simulated chat between an agent and customer, as in the following screenshot.

Deliveroo use case

Amazon Translate helps Deliveroo’s customers, riders (delivery personnel), and food establishment owners talk to each other across language barriers to deliver hot and tasty food of your choice from your local neighborhood eateries quickly.

This helped support the food delivery industry especially during the COVID-19 pandemic, when going out to restaurants became a hazardous endeavor.

Amy Norris, Product Manager for Deliveroo Customer Care says, “Amazon Translate is fast, accurate, and customizable to ensure that food item names, restaurant names, addresses, and customer names are translated correctly to create trustful conversational connections in uncertain times. By using Amazon Translate, our customer service agents were able to increase their first call resolution to 83% and reduce the average call handling time for their customers by 20%.”

Clean up

When you have finished experimenting with this solution, you can clean up your resources by removing the sample live chat translation application and deleting your test Amazon Connect instance.

Conclusion

The combination of Amazon Connect and Amazon Translate enables a scalable, cost-effective solution for your customer support agents to communicate in real time with customers in their preferred languages. The sample application is provided as open source—you can use it as a starting point for your own solution. AWS Professional Services, VoiceFoundry, and other Amazon partners are here to help as well.

We’d love to hear from you. Let us know what you think in the comments section, or using the issues forum in the sample solution GitHub repository.

About the Authors

Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

Daniel Bloy is a practice leader for VoiceFoundry, an Amazon Connect specialty partner.

Reduce ML inference costs on Amazon SageMaker with hardware and software acceleration

May 21, 2021

by Jiacheng Guo Amazon AWS

Amazon SageMaker is a fully-managed service that enables data scientists and developers to build, train, and deploy machine learning (ML) models at 50% lower TCO than self-managed deployments on Elastic Compute Cloud (Amazon EC2). Elastic Inference is a capability of SageMaker that delivers 20% better performance for model inference than AWS Deep Learning Containers on EC2 by accelerating inference through model compilation, model server tuning, and underlying hardware and software acceleration technologies.

Inference is the process of making predictions using a trained ML model. For production ML applications, inference accounts for up to 90% of total compute costs. Hence, when deploying an ML model for inference, accelerating inference performance on low-cost instance types is an effective way to reduce overall compute costs while meeting performance requirements such as latency and throughput. For example, running ML models on GPU-based instances provides good inference performance; however, selecting the right instance size and optimizing GPU utilization is challenging because different ML models require different amounts of compute and memory resources.

Elastic Inference Accelerators (EIA) solve this problem by enabling you to attach the right amount of GPU-powered inference acceleration to any Amazon SageMaker ML instance. You can choose any CPU instance type that best suits your application’s overall compute and memory needs, and separately attach the right amount of GPU-powered inference acceleration needed to satisfy your performance requirements. This allows you to reduce inference costs by using compute resources more efficiently. Along with hardware acceleration, Elastic Inference offers software acceleration through SageMaker Neo, a capability of SageMaker that automatically compiles ML models for any ML framework and to any target hardware. With SageMaker Neo, you don’t need to set up third-party or framework-specific compiler software or tune the model manually for optimizing inference performance. With Elastic Inference, you can combine software and hardware acceleration to get the best inference performance on SageMaker.

This post demonstrates how you can use hardware and software-based inference acceleration to reduce costs and improve latency for pre-trained TensorFlow models on Amazon SageMaker. We show you how to compile a pre-trained TensorFlow ResNet-50 model using SageMaker Neo and how to deploy this model to a SageMaker Endpoint with Elastic Inference.

Setup

First, we need to ensure we have SageMaker Python SDK >=2.32.1 and import necessary Python packages. If you are using SageMaker Notebook Instances, select conda_tensorflow_p36 as your kernel. Note that you may have to restart your kernel after upgrading packages.

import numpy as np
import time
import json
import requests
import boto3
import os
import sagemaker

Next, we’ll get the IAM execution role and a few other SageMaker specific variables from our notebook environment so that SageMaker can access resources in your AWS account. See the documentation for more information on how to set this up.

from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

Get pre-trained model for compilation

SageMaker Neo supports compiling TensorFlow/Keras, PyTorch, ONNX, and XGBoost models. However, only Neo-compiled TensorFlow models are supported on EIA as of this writing. TensorFlow models should be in SavedModel format or frozen graph format. Learn more here.

Import ResNet50 model from Keras

We will import ResNet50 model from Keras applications and create a model artifact model.tar.gz.

import tensorflow as tf
import tarfile

tf.keras.backend.set_image_data_format('channels_last')
pretrained_model = tf.keras.applications.resnet.ResNet50()
saved_model_dir = '1'
tf.saved_model.save(pretrained_model, saved_model_dir)

with tarfile.open('model.tar.gz', 'w:gz') as tar:
    tar.add(saved_model_dir)

Upload model artifact to S3

SageMaker Neo expects a path to the model artifact in Amazon S3, so we will upload the model artifact to an S3 bucket.

from sagemaker.utils import name_from_base

prefix = name_from_base('ResNet50')
input_model_path = session.upload_data(path='model.tar.gz', bucket=bucket, key_prefix=prefix)
print('S3 path for input model: {}'.format(input_model_path))

Compile model for EI Accelerator using SageMaker Neo

Now the model is ready to be compiled by SageMaker Neo. Note that ml_eia2 needs to be set for target_instance_family field in order for the model to be optimized for EI accelerator deployment. If you want to compile your own model for EI accelerator, refer to Neo compilation API. In order to compile the model, you also need to provide the model input_shape and any optional compiler_options to your model. Note that 32-bit floating-point types (FP32) are the default precision mode for ML models. We include this here to be explicit versus compiling with lower precision models. Learn more about advantages of different precision types here.

from sagemaker.tensorflow import TensorFlowModel

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp32"
compiled_model_fp32 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp32"})

Deploy compiled model to an Endpoint with EI Accelerator attached

Deploying a model to a SageMaker Endpoint uses the same deploy function whether or not a model is compiled using SageMaker Neo. The only change required for utilizing EI Accelerator is to provide an accelerator_type parameter, which determines the type of EI accelerator to be attached to your endpoint. All supported types of accelerators can be found here.

predictor_compiled_fp32 = compiled_model_fp32.deploy(initial_instance_count=1,
instance_type='ml.m5.xlarge', accelerator_type='ml.eia2.large')

Benchmarking endpoints

Once the endpoint is created, we will benchmark to measure latency. The model expects input shape of 1 x 224 x 224 x 3, so we expand the dog image (224x224x3) with a batch size of 1 to be compatible with the model input. The benchmark first runs a series of 100 warmup inferences, and then runs 1000 inferences to make sure that we get an accurate estimate of latency ignoring startup times. Latency percentiles are reported from these 1000 inferences.

import numpy as np
import matplotlib.image as mpimg

data = mpimg.imread('dog.jpg')
data = np.expand_dims(data, axis=0)
print("Input data shape: {}".format(data.shape))

import time
import numpy as np


def benchmark_sm_endpoint(predictor, input_data):
    print('Doing warmup round of 100 inferences (not counted)')
    for i in range(100):
      output = predictor.predict(input_data)
    time.sleep(3)

    client_times = []
    print('Running 1000 inferences')
    for i in range(1000):
      client_start = time.time()
      output = predictor.predict(data)
      client_end = time.time()
      client_times.append((client_end - client_start)*1000)

    print('Client end-to-end latency percentiles:')
    client_avg = np.mean(client_times)
    client_p50 = np.percentile(client_times, 50)
    client_p90 = np.percentile(client_times, 90)
    client_p99 = np.percentile(client_times, 99)
    print('Avg | P50 | P90 | P99')
    print('{:.4f} | {:.4f} | {:.4f} | {:.4f}n'.format(client_avg, client_p50, client_p90, client_p99))
    
benchmark_sm_endpoint(predictor_compiled_fp32, data)

From the benchmark above, the output will be similar to the following:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
103.2129 | 124.4727 | 129.1123 | 133.2371

Compile and benchmark model with quantization

Quantization based model optimizations represent model weights in lower precision (e.g. FP16) which increases throughput and offers lower latency. Using FP16 precision in particular provides faster performance than FP32 with effectively no drop (<0.1%) in model accuracy. When you enable FP16 precision, SageMaker Neo chooses kernels from both FP16 and FP32 precision. For the ResNet50 model in this post, we are able to compile the model along with FP16 quantization by setting the precision_mode under compiler_options.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp16"
compiled_model_fp16 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp16"})

# Deploy the compiled model to SM endpoint with EI attached
predictor_compiled_fp16 = compiled_model_fp16.deploy(initial_instance_count=1,
                                                     instance_type='ml.m5.xlarge',
                                                     accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_compiled_fp16, data)

Benchmark data for model compiled with FP16 will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
91.8721 | 112.8929 | 117.7130 | 122.6844

Compare latency with unoptimized model on EIA

We could see that model compiled with FP16 precision mode is faster than the model compiled with FP32, now let’s get the latency for an uncompiled model as well.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Deploy the uncompiled model to SM endpoint with EI attached
predictor_uncompiled = tensorflow_model.deploy(initial_instance_count=1,
                                           instance_type='ml.m5.xlarge',
                                           accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_uncompiled, data)

Benchmark data for uncompiled model will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
117.1654 | 137.9665 | 143.5326 | 150.2070

Clean up endpoints

Having an endpoint running will incur some costs. Therefore, we would delete the endpoint to release the resources after finishing this example.

session.delete_endpoint(predictor_compiled_fp32.endpoint_name)
session.delete_endpoint(predictor_compiled_fp16.endpoint_name)
session.delete_endpoint(predictor_uncompiled.endpoint_name)

Performance comparison

To understand the performance improvement from model compilation and quantization, you can visualize differences in percentile latency for models with different optimizations in following plot. For our model, we find that adding model compilation improves latency by 13.5% compared to the unoptimized model. Adding quantization (FP16) to the compiled model results in 27.5% improvement in latency compared to the unoptimized model.

Summary

SageMaker Elastic Inference is an easy-to-use solution for adding model optimizations to improve inference performance on Amazon SageMaker. With Elastic Inference accelerators, you can get GPU inference acceleration and remain more cost-effective than standalone SageMaker GPU instances. With SageMaker Neo, software-based acceleration provided by model optimizations further improves performance (27.5%) over unoptimized models.

If you have any questions or comments, use the Amazon SageMaker Discussion Forums or send an email to amazon-ei-feedback@amazon.com.

About the Authors

Jiacheng Guo is a Software Engineer with AWS AI. He is passionate about building high performance deep learning systems with state-of-art techniques. In his spare time, he enjoys drifting on dirt track and playing with his Ragdoll cat.

Santosh Bhavani is a Senior Technical Product Manager with the Amazon SageMaker Elastic Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys traveling, playing tennis, and drinking lots of Pu’er tea.

AWS releases code to help reduce bias in machine learning models

May 21, 2021

by admin Amazon AWS

Open-source library enables optimization of hyperparameters to maximize performance while meeting fairness constraints.Read More

Automate feature engineering pipelines with Amazon SageMaker

May 20, 2021

by Muhammad Khas Amazon AWS

The process of extracting, cleaning, manipulating, and encoding data from raw sources and preparing it to be consumed by machine learning (ML) algorithms is an important, expensive, and time-consuming part of data science. Managing these data pipelines for either training or inference is a challenge for data science teams, however, and can take valuable time away that could be better used towards experimenting with new features or optimizing model performance with different algorithms or hyperparameter tuning.

Many ML use cases such as churn prediction, fraud detection, or predictive maintenance rely on models trained from historical datasets that build up over time. The set of feature engineering steps a data scientist defined and performed on historical data for one time period needs to be applied towards any new data after that period, as models trained from historic features need to make predictions on features derived from the new data. Instead of manually performing these feature transformations on new data as it arrives, data scientists can create a data preprocessing pipeline to perform the desired set of feature engineering steps that runs automatically whenever new raw data is available. Decoupling the data engineering from the data science in this way can be a powerful time-saving practice when done well.

Workflow orchestration tools like AWS Step Functions or Apache Airflow are typically used by data engineering teams to build these kinds of extract, transform, and load (ETL) data pipelines. Although these tools offer comprehensive and scalable options to support many data transformation workloads, data scientists may prefer to use a toolset specific to ML workloads. Amazon SageMaker supports the end-to-end lifecycle for ML projects, including simplifying feature preparation with SageMaker Data Wrangler and storage and feature serving with SageMaker Feature Store.

In this post, we show you how a data scientist working on a new ML use case can use both Data Wrangler and Feature Store to create a set of feature transformations, perform them over a historical dataset, and then use SageMaker Pipelines to automatically transform and store features as new data arrives daily.

For more information about SageMaker Data Wrangler, Feature Store, and Pipelines, we recommend the following resources:

Overview of solution

The following diagram shows an example end-to-end process from receiving a raw dataset to using the transformed features for model training and predictions. This post describes how to set up your architecture such that each new dataset arriving in Amazon Simple Storage Service (Amazon S3) automatically triggers a pipeline that performs a set of predefined transformations with Data Wrangler and stores the resulting features in Feature Store. You can visit our code repo to try it out in your own account.

Before we set up the architecture for automating feature transformations, we first explore the historical dataset with Data Wrangler, define the set of transformations we want to apply, and store the features in Feature Store.

Dataset

To demonstrate feature pipeline automation, we use an example of preparing features for a flight delay prediction model. We use flight delay data from the US Department of Transportation’s Bureau of Transportation Statistics (BTS), which tracks the on-time performance of domestic US flights. After you try out the approach with this example, you can experiment with the same pattern on your own datasets.

Each record in the flight delay dataset contains information such as:

Flight date
Airline details
Origin and destination airport details
Scheduled and actual times for takeoff and landing
Delay details

Once the features have been transformed, we can use them to train a machine learning model to predict future flight delays.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
A SageMaker Studio domain with the AmazonSageMakerFeatureStoreAccess managed policy attached to the AWS Identity and Access Management (IAM) execution role
An S3 bucket

Upload the historical dataset to Amazon S3

Our code repo provides a link to download the raw flight delay dataset used in this example. The directory flight-delay-data contains two CSV files covering two time periods with the same columns. One file contains flight data from Jan 1, 2020, through March 30, 2020. The second file contains flight data for a single day: March 31, 2020. We use the first file for the initial feature transformations. We use the second file to test our feature pipeline automation. In this example, we store the raw dataset in the default S3 bucket associated with our Studio domain, but this isn’t required.

Feature engineering with Data Wrangler

Whenever a data scientist starts working on a new ML use case, the first step is typically to explore and understand the available data. Data Wrangler provides a fast and easy way to visually inspect datasets and perform exploratory data analysis. In this post, we use Data Wrangler within the Studio IDE to analyze the airline dataset and create the transformations we later automate.

A typical model may have dozens or hundreds of features. To keep our example simple, we show how to create the following feature engineering steps using Data Wrangler:

One-hot encoding the airline carrier column
Adding a record identifier feature and an event timestamp feature, so that we can export to Feature Store
Adding a feature with the aggregate daily count of delays from each origin airport

Data Wrangler walkthrough

To start using Data Wrangler, complete the following steps:

In a Studio domain, on the Launcher tab, choose New data flow.
Import the flight delay dataset jan01_mar30_2020.csv from its location in Amazon S3.

Data Wrangler shows you a preview of the data before importing.

Choose Import dataset.

You’re ready to begin exploring and feature engineering.

Because ML algorithms typically require all input features to be numeric for training and inference, it’s common to transform categorical features into a numerical representation. Here we use one-hot encoding for the airline carrier column, which transforms it into several binary columns, one for each airline carrier present in the data.

Choose the + icon next to the dataset and choose Add Transform.
For the field OP_UNIQUE_CARRIER, select one-hot encoding.
Under Encode Categorical, for Output Style, choose Columns.

Feature Store requires a unique RecordIdentifier field for each record ingested into the store, so we add a new column to our dataset, RECORD_ID which is a concatenation of four fields: OP_CARRIER_FL_NUM, ORIGIN, DEP_TIME, and DEST. It also requires an EventTime feature for each record, so we add a timestamp to FL_DATE in a new column called EVENT_TIME. Here we use Data Wrangler’s custom transform option with Pandas:

df['RECORD_ID']= df['OP_CARRIER_FL_NUM'].astype(str) +df['ORIGIN']+df['DEP_TIME'].astype(str)+df['DEST']

df['EVENT_TIME']=df['FL_DATE'].astype(str)+'T00:00:00Z'

To predict delays for certain flights each day, it’s useful to create aggregated features based on the entities present in the data over different time windows. Providing an ML algorithm with these kinds of features can deliver a powerful signal over and above what contextual information is available for a single record in this raw dataset. Here, we calculate the number of delayed flights from each origin airport over the last day using Data Wrangler’s custom transform option with PySpark SQL:

SELECT *, SUM(ARR_DEL15) OVER w1 as NUM_DELAYS_LAST_DAY
FROM df WINDOW w1 AS (PARTITION BY ORIGIN order by 
cast('EVENT_TIME' AS timestamp) 
RANGE INTERVAL 1 DAY PRECEDING)

In a real use case, we’d likely spend a lot of time at this stage exploring the data, defining transformations, and creating more features. After defining all of the transformations to perform over the dataset, you can export the resulting ML features to Feature Store.

On the Export tab, choose </> under Steps. This displays a list of all the steps you have created.
Choose the last step, then choose Export Step.
On the Export Step drop-down menu, choose Feature Store.

SageMaker generates a Jupyter notebook for you and opens it in a new tab in Studio. This notebook contains everything needed to run the transformations over our historical dataset and ingest the resulting features into Feature Store.

Store features in Feature Store

Now that we’ve defined the set of transformations to apply to our dataset, we need to perform them over the set of historical records and store them in Feature Store, a purpose-built store for ML features, so that we can easily discover and reuse them without needing to reproduce the same transformations from the raw dataset as we have done here. For more information about the capabilities of Feature Store, see Understanding the key capabilities of Amazon SageMaker Feature Store.

Running all code cells in the notebook created in the earlier section completes the following:

Creates a feature group
Runs a SageMaker Processing job that uses our historical dataset and defined transformations from Data Wrangler as input
Ingests the newly transformed historical features into Feature Store

Select the kernel Python 3 (Data Science) in the newly opened notebook tab.
Read through and explore the Jupyter notebook.
In the Create FeatureGroup section of the generated notebook, update the following fields for event time and record identifier with the column names we created in the previous Data Wrangler step (if using your own dataset, your names may differ):

record_identifier_name = "RECORD_ID"
event_time_feature_name = "EVENT_TIME"

Choose Run and then choose Run All Cells.

Automate data transformations for future datasets

After the Processing job is complete, we’re ready to move on to creating a pipeline that is automatically triggered when new data arrives in Amazon S3, which reproduces the same set of transformations on the new data and constantly refreshes the Feature Store, without any manual intervention needed.

Open a new terminal in Studio and clone our repo by running git clone https://github.com/aws-samples/amazon-sagemaker-automated-feature-transformation.git
Open the Jupyter notebook called automating-feature-transformation-pipeline.ipynb in a new tab

This notebook walks through the process of creating a new pipeline that runs whenever any new data arrives in the designated S3 location.

After running the code in that notebook, we upload one new day’s worth of flight delay data, mar31_2020.csv, to Amazon S3.

A run of our newly created pipeline is automatically triggered to create features from this data and ingest them into Feature Store. You can monitor progress and see past runs on the Pipelines tab in Studio.

Our example pipeline only has one step to perform feature transformations, but you can easily add subsequent steps like model training, deployment, or batch predictions if it fits your particular use case. For a more in-depth look at SageMaker Pipelines, see Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines.

We use an S3 event notification with a AWS Lambda function destination to trigger a run of the feature transformation pipeline, but you can also schedule pipeline runs using Amazon EventBridge, which enables you to automate pipelines to respond automatically to events such as training job or endpoint status changes, or even configure your feature pipeline to run on a specific schedule.

Conclusion

In this post, we showed how you can use a combination of Data Wrangler, Feature Store, and Pipelines to transform data as it arrives in Amazon S3 and store the engineered features automatically into Feature Store. We hope you try this solution and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts or on the SageMaker Discussion Forum.

About the Authors

Muhammad Khas is a Solutions Architect working in the Public Sector team at Amazon Web Services. He enjoys supporting customers in using artificial intelligence and machine learning to enhance their decision-making. Outside of work, Muhammad enjoys swimming and horseback riding.

Megan Leoni is an AI/ML Specialist Solutions Architect for AWS, helping customers across Europe, Middle East, and Africa design and implement ML solutions. Prior to joining AWS, Megan worked as a data scientist building and deploying real-time fraud detection models.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Learn how the winner of the AWS DeepComposer Chartbusters Keep Calm and Model On challenge used Transformer algorithms to create music

May 20, 2021

by Paloma Pineda Amazon AWS

AWS is excited to announce the winner of the AWS DeepComposer Chartbusters Keep Calm and Model On challenge, Nari Koizumi. AWS DeepComposer gives developers a creative way to get started with machine learning (ML) by creating an original piece of music in collaboration with artificial intelligence (AI). In June 2020, we launched Chartbusters, a global competition where developers use AWS DeepComposer to create original AI-generated compositions and compete to showcase their ML skills. The Keep Calm and Model On challenge, which ran from December 2020 to January 2021, challenged developers to use the newly launched Transformers algorithm to extend an input melody by up to 20 seconds to create new and interesting musical scores from an input melody.

We interviewed Nari to learn more about his experience competing in the Keep Calm and Model On Chartbusters challenge, and asked him to tell us more about how he created his winning composition.

Learning about AWS DeepComposer

Nari currently works in the TV and media industry and describes himself as a creator. Before getting started with AWS DeepComposer, Nari had no prior ML experience.

“I have no educational background in machine learning, but I’m an artist and creator. I always look for artificial intelligence services for creative purposes. I’m working on a project, called Project 52, which is about making artwork everyday. I always set a theme each month, and this month’s theme was about composition and audio visualization.”

Nari discovered AWS DeepComposer when he was gathering ideas for his new project.

“I was searching one day for ‘AI composition music’, and that’s how I found out about AWS DeepComposer. I knew that AWS had many, many services and I was surprised that AWS was doing something with entertainment and AI.”

Nari at his work station.

Building in AWS DeepComposer

Nari saw AWS DeepComposer as an opportunity to see how he could combine his creative side with his interest in learning more about AI. To get started, Nari first played around in the AWS DeepComposer Music Studio and used the learning capsules provided to understand the generative AI models offered by AWS DeepComposer.

“I thought AWS DeepComposer was very easy to use and make music. I checked through all the learning capsules and pages to help get started.”

For the Keep Calm and Model On Chartbusters challenge, participants were challenged to use the newly launched Transformers algorithm, which can extend an input melody by up to 30 seconds. The Transformer is a state-of-the-art model that works with sequential data such as predicting stock prices, or natural language tasks such as translation. Learn more about the Transformer technique in the learning capsule provided on the AWS DeepComposer console.

“I used my keyboard and connected it to the Music Studio, and made a short melody and recorded in the Music Studio. What’s interesting is you can extend your own melody using Transformers and it will make a 30-second song from only 5 seconds of input. That was such an interesting moment for me; how I was able to input a short melody, and AI created the rest of the song.”

The Transformers feature used in Nari’s composition in the AWS DeepComposer Music Studio.

After playing around with his keyboard, Nari chose one of the input melodies. The Transformers model allows developers to experiment with parameters such as creative risk, track length, and note length.

“I chose one of the melodies provided, and then played around with a couple parameters. I made seven songs, and tweaked until I liked the final output. You can also export the MIDI file and continue to play around with parts of the song. That was a fun part, because I exported the file and continued to play with the melody to customize with other instruments. It was so much fun playing around and making different sounds.”

Nari composing his melody.

You can listen to Nari’s winning composition “P.S. No. 11 Ext.” on the AWS DeepComposer SoundCloud page. Check out Nari’s Instagram, where he created audio visualization to one of the tracks he created using AWS DeepComposer.

Conclusion

Nari found competing in the challenge to be a rewarding experience because he was able to go from no experience in ML to developing an understanding of generative AI in less than an hour.

“What’s great about AWS DeepComposer is it’s easy to use. I think AWS has so many services and many can be hard or intimidating to get started with for those who aren’t programmers. When I first found out about AWS DeepComposer, I knew it was exciting. But at the same time, I thought it was AWS and I’m not an engineer and I wasn’t sure if I had the knowledge to get started. But even the setup was super easy, and it took only 15 minutes to get started, so it was very easy to use.”

Nari is excited to see how AI will continue to transform the creative industry.

“Even though I’m not an engineer or programmer, I know that AI has huge potential for creative purposes. I think it’s getting more interesting in creating artwork with AI. There’s so much potential with AI not just within music, but also in the media world in general. It’s a pretty exciting future.”

By participating in the challenge, Nari hopes that he will inspire future participants to get started in ML.

“I’m on the creative side, so I hope I can be a good example that someone who’s not an engineer or programmer can create something with AWS DeepComposer. Try it out, and you can do it!”

Congratulations to Nari for his well-deserved win!

We hope Nari’s story inspired you to learn more about ML and AWS DeepComposer. Check out the new skill-based AWS DeepComposer Chartbusters challenge and start composing today.

About the Authors

Paloma Pineda is a Product Marketing Manager for AWS Artificial Intelligence Devices. She is passionate about the intersection of technology, art, and human centered design. Out of the office, Paloma enjoys photography, watching foreign films, and cooking French cuisine.

Five Carnegie Mellon students named Amazon graduate research fellows

May 20, 2021

by admin Amazon AWS

Amazon Graduate Research Fellows Program supports research in automated reasoning, computer vision, robotics, language technology, machine learning, operations research, and data science.Read More

Introducing SageMaker Savings Plans

Getting started with ML

ML through our Partners

What’s next

About the Author

Objectives

Template for an ML approach

Dataset

Methods and metrics

Improving comprehensibility

Language processing methods

Model evaluation

Architecture

Results

Conclusion

About the Authors

Data preprocessing

Create a customer churn model

Set up training on Amazon SageMaker

Configure Debugger and rules

Visualize and perform real-time monitoring

Results

Conclusion

About the Authors

Why build a chatbot or IVR?

The impact of COVID-19

Five best practices

Getting started

Handling fallbacks gracefully

Supporting multimodal channels

Keeping context in mind

Measuring success

What the future looks like

About AWS Contact Center Intelligence

About the Author

Set up an Amazon Connect test instance and live chat translation

Test the solution

Multi-chat support

Customize terminology

Deliveroo use case

Clean up

Conclusion

About the Authors

Setup

Get pre-trained model for compilation

Import ResNet50 model from Keras

Upload model artifact to S3

Compile model for EI Accelerator using SageMaker Neo

Deploy compiled model to an Endpoint with EI Accelerator attached

Benchmarking endpoints

Compile and benchmark model with quantization

Compare latency with unoptimized model on EIA

Clean up endpoints

Performance comparison

Summary

About the Authors

Overview of solution

Dataset

Prerequisites

Upload the historical dataset to Amazon S3

Feature engineering with Data Wrangler

Data Wrangler walkthrough

Store features in Feature Store

Automate data transformations for future datasets

Conclusion

About the Authors

Learning about AWS DeepComposer

Building in AWS DeepComposer

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.