Your Guide to the AWS Machine Learning Summit

We’re about a week away from the AWS Machine Learning Summit and if you haven’t registered yet, you better get on it! On June 2, 2021 (Americas) and June 3, 2021 (Asia-Pacific, Japan, Europe, Middle East, and Africa), don’t miss the opportunity to hear from some of the brightest minds in machine learning (ML) at the free virtual AWS Machine Learning Summit. This Summit, which is open to all, brings together industry luminaries, AWS customers, and leading ML experts to share the latest in ML. You’ll learn about science breakthroughs in ML, how ML is impacting business, best practices in building ML, and how to get started now without prior ML expertise. This post is your guide to navigating the Summit.

The day kicks off with a keynote from ML leaders from across AWS, Amazon, and the industry, including Swami Sivasubramanian, VP of AI and Machine Learning, AWS; Bratin Saha, VP of Machine Learning, AWS; and Yoelle Maarek, VP of Research, Alexa Shopping, who will share a keynote on how we’re applying customer-obsessed science to advance ML. You’ll also hear from Ashok Srivastava, Senior Vice President and Chief Data Officer at Intuit, about how the company is scaling its ML and AI to create new customers experiences.

Next, tune in for an exclusive fireside chat with Andrew Ng, founder and CEO of Landing AI and founder of deeplearning.ai, and Swami Sivasubramanian about the future of ML, the skills that are fundamental for the next generation of ML practitioners, and how we can bridge the gap from proof of concept to production in ML.

From there, pick the track that best matches your interests or mix and match throughout the day. We’ll also have expert Q&A available from 11:00am–3:00pm local time.

The science of machine learning

If you’re an advanced practitioner or just really interested in the science of ML, this track provides a technical deep dive into the groundbreaking work that ML scientists within AWS, Amazon, and beyond are doing to advance the science of ML in areas including computer vision, natural language processing, bias, and more.

Speakers include two Amazon Scholars, Michael Kearns and Kathleen McKeown. Kearns is a professor in the Computer and Information Science department at the University of Pennsylvania, where he holds the National Center Chair. He is co-author of the book “The Ethical Algorithm: The Science of Socially Aware Algorithm Design,” and joined Amazon as a scholar June 2020. McKeown is the Henry and Gertrude Rothschild professor of computer science at Columbia University, and the founding director of the school’s Data Science Institute. She joined Amazon as a scholar in 2019.

You’ll also get an inside look at trends in deep learning and natural language in a powerhouse fireside chat with Amazon distinguished scientists Alex Smola and Bernhard Schölkopf, and Alexa AI Senior Principal Scientist Dilek Hakkani-Tur.

The impact of machine learning

If you’re a technical business leader, you won’t want to miss this track where you’ll learn from AWS customers that are leading the way in ML adoption. Customers including 3M, AstraZeneca, Vanguard, Carbon Lighthouse, ADP, and Bundesliga will share how they’re applying ML to create efficiencies, deliver new revenue streams, and launch entirely new products and business models. You’ll get best practices for scaling ML in an organization and showing impact.

How machine learning is done

If you’re a data scientist or ML developer, join this track for practical deep dives into tools that can speed up the entire ML lifecycle, from building to training to deploying ML models. Sessions include how to choose the right algorithms, more accurate and speedy data prep, model explainability, and more.

Machine learning: No expertise required

If you’re a developer who wants to apply ML and AI to a use case but you don’t have the expertise, this track is for you. Learn how to use AWS AI services and other tools to get started with your ML project right away, for use cases including contact center intelligence, personalization, intelligent document processing, business metrics analysis, computer vision, and more. You’ll also learn from customers like Fidelity on how they’re applying ML to business problems like DevOps.

For more details, visit the website and we’ll see you there!


About the Author

Laura Jones is a product marketing lead for AWS AI/ML where she focuses on sharing the stories of AWS customers and educating organizations on the impact of machine learning. As a Florida native living and surviving in rainy Seattle, she enjoys coffee, attempting to ski and enjoying the great outdoors.

Read More

It’s a wrap for Amazon SageMaker Month, 30 days of content, discussions, and news

Did you miss SageMaker Month? Don’t look any further than this round-up post to get caught up. In this post, we share key highlights and learning materials to accelerate your machine learning (ML) innovation.

On April 20, 2021, we launched the first ever Amazon SageMaker Month, 30 days of hands-on workshops, tech talks, Twitch sessions, blog posts, and playbooks. Our goal with SageMaker Month was to connect you with AWS experts, getting started resources, workshops, and learning content to be successful with ML. The following is a summary of what you can access on-demand to get started on your ML journey with Amazon SageMaker.

Introducing SageMaker Savings Plans

To kick off SageMaker month, we introduced Amazon SageMaker Savings Plans, a flexible, usage-based pricing model for SageMaker. The goal of SageMaker Savings Plans is to offer you the flexibility to save up to 64% on SageMaker ML instance usage in exchange for a commitment of consistent usage for a 1-year or 3-year term. In addition, to help you save even more, we announced a price drop on SageMaker CPU and GPU instances.

To enable customers to save more on SageMaker, we hosted a SageMaker Friday Twitch session with Greg Coquillo, the second-most influential speaker according to LinkedIn Top Voices 2020: Data Science & AI, along with Julien Simon and Segolene Dessertine-Panhard outlining cost-optimization techniques using SageMaker and SageMaker Savings Plans.

SageMaker Savings Plans enhance the productivity and cost-optimizing capabilities already available in Amazon SageMaker Studio, which can improve your data science team’s productivity up to 10 times. Studio provides a single visual interface where you can perform all your ML development steps. Studio also gives you complete access, control, and visibility into each step required to build, train, and deploy models. To enable your teams to move faster and boost productivity, learn how to customize your Studio notebooks.

Getting started with ML

SageMaker is the most comprehensive ML service, purpose-built for every step of the ML development lifecycle. SageMaker provides all the components used for ML in a single service, so you can prepare data and build, train, and deploy models.

Data preparation is the first step of building an ML model. It’s a time-consuming and involved process that is largely undifferentiated. We hear from our customers that it constitutes up to 80% of their time during ML development. Data preparation has always been considered tedious and resource intensive, due to the inherent nature of data being “dirty” and not ready for ML in its raw form. “Dirty” data could include missing or erroneous values, outliers, and more. Feature engineering is often needed to transform the inputs to deliver more accurate and efficient ML models. To help with feature engineering, Amazon SageMaker Feature Store offers a purpose-built repository to store, update, retrieve, and share ML features within development teams.

Another challenge with data preparation is that it often requires multiple steps. Although most standalone data preparation tools provide data transformation, feature engineering, and visualization, few tools provide built-in model validation. And all of these data preparation steps are considered separate from ML. What’s needed is a framework that provides all these capabilities in one place and is tightly integrated with the rest of the ML pipeline. Most standalone tools for data preparation treat it as an extract, transform, and load (ETL) workload, making it tedious to iteratively prepare data, validate the model on test datasets, deploy it in production, and go back to ingesting new data sources and performing additional feature engineering. Most iterative data preparation is divorced from deployment. Therefore, data preparation modules need curation and integration before they’re deployed in production. These practices in ML are sometimes referred to as MLOps.

To help you overcome these challenges, you can use Amazon SageMaker Data Wrangler, a capability to simplify the process of data preparation, feature engineering, and each step of the data preparation workflow, including data selection, cleansing, and exploration on a single visual interface. As part of SageMaker Month, we created a step-by-step tutorial on how you can prepare data for ML with Data Wrangler. In addition, you can learn how financial customers use SageMaker every day to predict credit risk and approve loans. This example uses Data Wrangler and Amazon SageMaker Clarify to detect bias during the data preparation stage.

Another part of the data preparation stage is labeling data. Data labeling is the task of identifying objects in raw data, such as text, images, and videos, and tagging them with labels that help your ML model make accurate predictions and estimations. For example, in an autonomous vehicle use case, Light Detection and Ranging (LIDAR) devices are commonly used to capture and generate a three-dimensional point cloud data, which is an understanding of the physical space at a single point in time. For this use case, you need to label your data captured both in 2D and 3D spaces to produce highly accurate predictions of vehicles, lanes, and pedestrians. Amazon SageMaker Ground Truth, a fully managed data labeling service, makes it easy to build highly accurate training datasets for ML in 2D and 3D spaces using custom or built-in data labeling workflows. To help you label your data, we created how-to blog posts to showcase how to annotate 3D point cloud data and automate data labeling workflows for an autonomous vehicle use case with Ground Truth.

After you built your ML model, you must train and tune it to achieve the highest accuracy. Improving a model’s performance is an experimental and iterative process. For SageMaker Month, we consolidated a few techniques and best practices on how to train and tune high-quality deep learning models with complete visibility using SageMaker.

When you’re satisfied with your model’s accuracy, understanding how to deploy and manage models at scale is key. For model deployment and management, we showcase an example where an application developer is using SageMaker multi-model endpoints to host thousands of models and pipelines to automate retraining to improve recommendations across different US cities.

When it’s time to deploy your model and make predictions, a process called inference, you can use SageMaker for inference in the cloud or on edge devices. Amazon SageMaker Neo automatically compiles ML models for any ML framework and any target hardware. A Neo compiled model can speed up YOLOv4 inference to twice as fast. You can also reduce ML inference costs on SageMaker with hardware and software acceleration.

As part of SageMaker Month, we also launched an example use case that shows how you can use Amazon SageMaker Edge Manager, a capability to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, and mobile devices. This blog outlines how to manage and monitor models on edge devices such as wind turbines.

Finally, to bring all our SageMaker capabilities together and help you move from model ideation to production, we created an on-demand introduction to SageMaker workshop similar to the virtual hands-on workshops we conducted live and during recent AWS Summits. It includes everything you need to get started with SageMaker at your own pace.

ML through our Partners

As part of SageMaker Month, we partnered with Tableau and DOMO to empower data and business analysts with ML-powered insights without needing any ML expertise. With the right data readily available, you can use ML and business intelligence (BI) tools to help make predictions needed to automate and speed up critical business processes and workflows.

We partnered with DOMO to enable ML for everyone with SageMaker. Domo AutoML, powered by Amazon SageMaker Autopilot, provides insights to complex business problems and automates the end-to-end decision-making process. This helps organizations improve decision-making and adapt faster to business changes.

We also partnered with Tableau to create a blog post and tech talk that showcases an end-to-end demo and new Quick Start solution that makes it easy for data analysts to use ML models deployed on SageMaker directly in their Tableau dashboards without writing any custom integration code.

What’s next

SageMaker Month focused on cost savings and optimization, getting started with ML, and learning content to accelerate ML innovation. As we wrap up SageMaker Month, we’re excited to share the upcoming and first ever virtual AWS Machine Learning Summit on June 2, 2021. The summit brings together industry-leading scientists, AWS customers, and experts to dive deep into the art, science, and impact of ML. Attend for free, learn about features over 30 sessions, and interact with leaders in a live Q&A.


About the Author

Shashank Murthy is a Senior Product Marketing Manager with AWS Machine Learning. His goal is to make it easy for customers to build, train, and deploy machine learning models using Amazon SageMaker. For fun outside work, Shashank likes to hike the Pacific Northwest, play soccer, and run obstacle course races.

Read More

Enhance sports narratives with natural language generation using Amazon SageMaker

This blog post was co-authored by Arbi Tamrazian, Director of Data Science and Machine Learning at Fox Sports

FOX Sports is the sports television arm of FOX Network. The company used machine learning (ML) and Amazon SageMaker to streamline the production of relevant in-game storylines for commentators to use during live broadcasts.

“We collaborated with the Amazon Machine Learning Solutions Lab to build a natural language generation (NLG) engine that automatically produces sports narratives for commentators to use during games. Leveraging Amazon SageMaker, the Amazon Machine Learning Solutions Lab developed a model pipeline that generates natural-sounding sports narratives from a ML model trained on billions of English texts and sports stats snippets. In just a few short weeks, the NLG solution achieved BLEU scores above 99% on unseen Fox Sports testing dataset, significantly improving the readability of narratives compared to test benchmarks. Standardizing our ML workloads on Amazon SageMaker will enable our broadcasters to engage fans with pertinent gameday stories, in real-time.” – Arbi Tamrazian, Director of Data Science and Machine Learning, Fox Sports

Objectives

As viewers may have noticed, sports broadcasters are increasingly sharing statistical insights throughout the game to tell a richer story for the audience. Thanks to an abundance of data and advanced stats such as NFL Next Gen Stats powered by AWS, broadcasters can quickly tell stories and make comparisons between teams and players to keep viewers engaged.

Due to the fast-paced nature of many games, broadcasters rely on template-generated narratives to speak about in-game statistics in real time. These rule-based templates “stitch” tabular information and create narratives with fixed sentence structures that sometimes sound rigid and are hard to understand. It’s also becoming harder to build and maintain templates to keep up the pace with the introduction of new statistics.

To improve the broadcasting experience, Fox Sports turns to AWS and its artificial intelligence technologies to convert their real-time data into easy-to-understand narratives for commentators and audiences. The Amazon ML Solutions Lab partnered with Fox Sports to design and implement an end-to-end ML system using natural language generation (NLG), a technique to generate natural language descriptions from structured data. The objective of the partnership is to produce more natural-sounding narratives compared to the rule-based templates in a scalable fashion. The system enables Fox Sports to expand their rule-based generation engine into an ML solution. The model is trained to understand the semantic meaning of inputs, and can be expanded to new statistics and other sports by fine-tuning with a few hundred sample narratives.

In this post, we walk you through how to fine-tune a pretrained language model to generate sentences similar to those from rule-based templates. In addition, we show how to use different NLG techniques to make the sentences sound more natural, which leads to improved fan experiences and reduced cost in building and maintaining templates.

Template for an ML approach

The first phase of the NLG-based narrative generation solution relies on tabular features, including player and team names, metrics, and game situations. These features are paired with their target sequences, which are generated using predefined rule-based templates. The goal here is to use NLG to take the tabular features and generate candidate narratives containing all the relevant information.

Dataset

To train this model, we use a dataset synthetically generated by Fox Sports using the current rule-based methodology. The dataset is generated by permuting different statistics, feature values, and team and player names, and includes more than 57,000 samples of 8 features. For each sample, we have the narrative generated from a rule-based template as our target. We randomly shuffle and divide the dataset into training, validation, and testing sets based on an 80/10/10 split for training and fine-tuning our models.

The following table shows examples of the raw data used in this experiment—each row represents a record, and each column represents the relevant information associated with the record, including the statistic, values for the statistic, situation that the statistic is calculated upon, and more. For this post, we replace actual team and players names with generic names: team Bobcats and player John Peccy.

Statistic Situation Value Time frame Rank Rank Order Population Team name / Player name
rec_td stadium_retractable_dome 5 season 7 True 32 Bobcats
qbkd score_differential_trailing 3 season 2 False 190 John Peccy

For each row, the raw tabular features are concatenated to form a text sequence. The following table shows examples of the text sequences used as input and the associated narrative from the rule-based template as output.

Template input Template output
rec_td stadium_retractable_dome 5 season 7 TRUE 32 Bobcats Bobcats’ 5 caught passes for touchdowns when playing in a retractable roof is the 7th highest out of 32 in the NFL this season.
qbkd score_differential_trailing 3 season 2 FALSE 190 John Peccy John Peccy’s 3 credited QB knockdowns when trailing is the 2nd lowest out of 190 in the NFL this season.

Methods and metrics

The task of translating tabular features to natural sentences is a subtask of natural language generation. Because transfer learning has proved effective at this task, we utilize a language model called T5 (Text-To-Text Transfer Transformer), which was pretrained on the open-source dataset C4 (Colossal Clean Crawled Corpus). T5 achieves state-of-the-art results on many NLP benchmarks and is flexible to be fine-tuned to different NLP tasks. To fine-tune the T5 model for Fox Sports, we concatenate the tabular features into a single sequence of text as our training input. Then we use the template-generated statements as labels. For example, the following table is translated into the text sequence Team Bobcats, prss, 4, score_differential_leading, 7.

Team name Metric Value Situation Rank
Bobcats prss 4 score_differential_leading 7

The corresponding template statement – The Bobcats’ 4 total times of pressuring the quarterback when leading is the 7th highest in the NFL this season” – is passed in as the target output. After fine-tuning the T5 model with thousands of such examples, the model is able to generate statements similar to the template. It even works for previously unseen input, making it extensible to fresh players and newly created metrics.

We use the BLEU (Bilingual Evaluation Understudy) performance metric to quantitatively measure model performance. BLEU measures the matching quality of a generated sentence to a ground truth sentence by assigning a score from 0–100, with 100 being a perfect match to the ground truth. After fine-tuning on a few thousand sentences, the T5 model is able to achieve a BLEU score of above 99 on the test set, an indication that most of the generated sentences are identical to template-generated sentences. It also echoes the usefulness of using pretrained models on abundantly available unlabeled text for different downstream tasks.

Improving comprehensibility

The template-generated narratives capture core details, but are repetitive and sometimes difficult to read because they follow the same predefined sentence structure. This leads to confusion for the broadcasters and fans. To address this drawback, we include a second phase of modeling, which employs language models to enhance the readability and comprehensibility of the fine-tuned T5 model’s generated narratives. This step’s objective is to make the narratives sound more natural, allowing commentators to easily communicate the information during live broadcasting.

Language processing methods

One way to replace unnatural words in sentences is through back translation. Back translation is a two-step translation method. It first translates a sentence into another language, and then translates the sentence back to its original language. It’s a technique used mostly for text data augmentation, namely, increasing the variety of original text. For this use case, we find that translation models trained on a large text corpus can help fix mistakes in the original sentence. During back translation, a singular noun may be corrected to a plural. The model may also choose more natural-sounding language. This approach gives us an automatic way to improve readability for our generated sentences.

An alternative natural language processing (NLP) approach to back translation is called paraphrasing—a technique that aims to express semantically similar narratives in different forms. We employ a pretrained T5 model, which is fine-tuned for paraphrasing purposes using the open-sourced paraphraser dataset PAWS. Our paraphrasing model generates several candidates for a given narrative with slightly different content. One major advantage of using this technique is that it offers several narratives per input. This gives us several candidate sentences, from which we can choose the version that best fits Fox Sports’s business needs. An example of the paraphrasing output against a sample sentence is shown in the following table.

Type Sentence
Original The Bobcats’ 4 total times of pressuring the quarterback when leading is the 7th highest out of 32 in the NFL this season.
Paraphrased 1 The Bobcats pressing the quarterback 4 times when leading this season is the 7th best out of 32 in the NFL.
Paraphrased 2 The Bobcats’ 4 total times of pressuring quarterback in leading is the 7th highest out of 32 in the NFL this season.
Paraphrased 3 The Bobcats have pressured the quarterback 4 times total when leading—the 7th highest out of 32 in the NFL this season.

Model evaluation

Quantitatively evaluating how natural a sentence sounds is an ongoing challenge in the NLP community. For this project, we use an existing metric called perplexity. Perplexity is a proxy measure of how “surprised” a language model is at sentences. In other words, it measures how common an evaluation sentence is among text corpus used to train a language model, which can be used to compare the quality of different sentences. For language models such as GPT2, it typically assigns a low perplexity score to real and syntactically correct sentences and high score to fake, incorrect, or highly infrequent sentences. For example, GPT2 assigns a lower score to sentences like “Can you do it?” and a higher score to sentences like “Can you does it?” With this, we can compare the quality of generated sentences sharing similar semantic meanings and output the one with the lowest perplexity score.

Architecture

Our final product is an end-to-end ML workflow using SageMaker. To meet Fox Sports’ needs, the workflow ensures that the following two criteria are satisfied:

  • The end-to-end results must include all the required features defined by a user
  • The final narrative output of the models shouldn’t be harder to read than the original rule-based template narrative

Our solution consists of two major components:

  1. Replace the current ruled-based approach with the fine-tuned T5 model
  2. Enhance the generated narratives through a multi-step ML-based approach

As illustrated in the following figure, the fine-tuned T5 ML model generates the narratives (green blocks). Next, the narratives are passed through the back translation model as an attempt to produce enhanced narratives. If the back translated results include the necessary keywords and their perplexity scores are lower compared to the T5 model outputs, they’re used as the final outputs. Otherwise, we pass the T5 model outputs through the paraphrasing model and apply the same condition check. If none of our enhancement models reduce the perplexity score, we simply output the T5 model outputs. Through this workflow, we ensure all the required features are captured and improve the readability of the sentence when appropriate, maximizing the benefit ML can bring to the existing solution.

Results

With models combined to form the preceding architecture, the output narrative has on average 13% lower perplexity compared to original rule-based, template-generated narratives, and all the information is maintained. Fox Sports can display the narratives to broadcasters and sports fans for more exciting viewing experiences!

Conclusion

The ML Solutions Lab and Fox Sports ML team worked closely to build an end-to-end ML solution that converts in-game tabular stats into natural-sounding narratives. Because the solution is built on top of language models pretrained on a huge text corpus, additional metrics and game situations can be passed in directly to generate the desired outputs. The extensibility also enables the solution to be transferred to other sports by simply fine-tuning the model with sample narratives. These capabilities allow the model to scale and adapt to future business needs.

Around the world, many sports leagues and sports networks like Fox Sports are transforming the fan experience with AWS technology. AWS is helping bring fans closer to the game through partnering with BundesligaF1NFLNHLNASCAR, and many others. Visit AWS Sports for more details.

If you’d like help accelerating your use of ML in your products and processes, please contact the ML Solutions Lab program.


About the Authors

Henry Wang is a Data Scientist at Amazon Machine Learning Solutions Lab. Prior to joining AWS, he was a graduate student at Harvard in Computational Science and Engineering, where he worked on healthcare research with reinforcement learning. In his spare time, he enjoys playing tennis and golf, reading, and watching StarCraft II tournaments.

 

 

Saman Sarraf is a Data Scientist at the Amazon ML Solutions Lab. His background is in applied machine learning including deep learning, computer vision, and time series data prediction.

 

 

 

Arbi Tamrazian is the Director of Data Science and Machine Learning at FOX where he focuses on building scalable machine learning solutions that can be applied to real-time data feeds and media assets. His main areas of interest are Deep Learning, Computer Vision and Reinforcement Learning.

Read More

How lekker got more insights into their customer churn model with Amazon SageMaker Debugger

With over 400,000 customers, lekker Energie GmbH is a leading supraregional provider of electricity and gas on the German energy market. lekker is customer and service oriented and regularly scores top marks in comparison tests. As one of the most important suppliers of green electricity to private households, the company, with its 220 employees, stands for environmentally and consumer-friendly products.

Germany’s energy market was liberalized in the 1990s. Since then, customers have free choice of their energy and gas supplier. During the liberalization, the German government standardized the switching processes, so switching your energy or gas supplier is an easy task. However, it’s a challenging task for lekker to hold churn rates low. Preventing existing customers from leaving is several times cheaper than acquiring new ones. The best way to realize low churn rates is to keep their customers satisfied. Knowledge about a customer’s churn risk is helpful information for target-based campaigns, because it allows lekker to focus on customers who are more likely to churn.

This post discusses how lekker used Amazon SageMaker Debugger to get deep insights into their customer churn model. Debugger automatically collects data during model training and provides built-in rules to automatically detect issues in model training.

Data preprocessing

lekker has a wide range of systems with different databases and data structures, and uses Spark and AWS Step Functions to create a data lake on AWS. In preparation of the churn model, lekker creates a Spark processing job that holds customer-specific information like duration, sales channel, consumption, and other information for label creation. lekker make distinctions between active and passive churn. Active churn describes customers canceling their contract. Passive churn describes customers who are no longer in lekker’s delivery area or whose contract was cancelled due to late payment. For the introduced model, lekker uses active churn as a label, which helps better fit marketing expectations for retention campaigns.

Create a customer churn model

Before lekker started with AWS, data came from an Oracle database, which was used as a business intelligence (BI) platform. The BI team and analysts were organized in different departments and had different access rights. Data scientists needed to access data by schema-on-read. Models were trained on local machines or non-scalable servers, and computational restrictions came up quickly. If a model was trained, model monitoring and debugging was hard to perform, while management’s skepticism of potential closed-box models grew. Model deployment was also difficult, caused by missing orchestration tools and limited server availability and capacity.

When lekker decided to use SageMaker, most of these problems were solved, because SageMaker offers solutions along the whole machine learning workflow. lekker can now easily scale computing capacity needs and access all available data on Amazon S3.  Their data scientists can now explore and prepare data in the same notebook, and find it easier to create and train models using SageMaker Estimators. Additionally, lekker frequently use SageMaker automatic model tuning, which figures out the best model by running different hyperparameter configurations. This helped raise model quality tremendously. lekker uses Debugger to evaluate and communicate models’ results and get model insights.

Set up training on Amazon SageMaker

To run the XGBoost training on SageMaker, lekker uses the SageMaker Estimator API. It takes the instance type for the model training (ml.m5.4xlarge). It also takes the image URI of the training image and a dictionary for the model hyperparameters. See the following code:

Estimator(
    role=role,
    instance_count=1,
    instance_type='ml.m5.4xlarge',
    hyperparameters = {
        'num_round': '20',
        'rate_drop': '0.3',
        'scale_pos_weight': scale_pos_weight,
        'tweedie_variance_power': '1.4',
        'objective': 'binary:logistic'
        },
    image_uri = sagemaker.image_uris.retrieve('xgboost',region, version='1.0-1')
)

Configure Debugger and rules

lekker uses Debugger in three ways:

  • Use built-in rules to identify underperforming training jobs
  • Create automatic visualizations
  • Collect important metrics from training jobs

The following code shows the Debugger hook configuration to collect metrics such as feature importance and Shapley values from churn model training:

debugger_hook_config=DebuggerHookConfig(
    hook_parameters={'save_interval':'5'},
    collection_configs=[ 
        CollectionConfig(name="metrics"),
        CollectionConfig(name="feature_importance"),
        CollectionConfig(name="full_shap"),
        CollectionConfig(name="average_shap"),
    ]
 )

Debugger provides built-in rules that check for model training issues such overfitting or loss not decreasing. Those rules run as a SageMaker processing job in a separate container and instance so the rule analysis doesn’t interfere with the actual training. Users don’t pay to run these built-in rules. lekker frequently uses the loss_not_decreasing and xboost_report rules. The first rule monitors the loss curves and triggers if loss doesn’t decrease by a certain percentage. The xgboost_report rule captures XGBoost model data and creates a static HTML report with visualizations such as ROC curves, errors plots, and more, and provides key insights and recommendations. See the following code:

 rules=[
    Rule.sagemaker(
        rule_configs.loss_not_decreasing(),
        rule_parameters={
        "collection_names": "metrics",
        "num_steps": str(save_interval * 2),
        },
        ),
    Rule.sagemaker(rule_configs.create_xgboost_report())
 ]

After the Debugger hook configuration and list of rules are specified, one starts the SageMaker training with estimator.fit(). The fit function takes as input the path to training and validation data in Amazon S3. See the following code:

estimator.fit( 
    "train": TrainingInput(model_train_file, content_type="csv")
    "validation": TrainingInput(model_test_file, content_type="csv"))

SageMaker automatically spins up the ml.m5.4xlarge training instance, downloads the training container and datasets, and runs the model training. It also spins up an instance to run the rule analysis as a SageMaker processing job. You can go to SageMaker Studio and check the rule status or check the status from the Python SDK.

Visualize and perform real-time monitoring

When the training is running, lekker uses Debugger’s open-source smdebug library to fetch and query the data that is uploaded in real time to Amazon S3. The first step is to create a trial object that takes either a local or S3 path:

from smdebug.trials import create_trial

s3_output_path = xgboost_estimator.latest_job_debugger_artifacts_path()
trial = create_trial(s3_output_path)

Now one access and query the data. To plot the loss curves, one simply retrieves the metrics collection and the number of recorded steps:

steps = trial.steps()
fig, ax = plt.subplots()
for tname in trial.collection("metrics").tensor_names:
    data = [value for value in trial.tensor(tname).values().values()]
    ax.plot(steps, data, label=tname)

The following figure shows that train and validation errors fall while training the customer churn model. That’s a sign of a well-trained model, because it shows that the model performs well on the unseen data (validation data). Debugger makes this visualization easy to create.

When the training job has completed, lekker uses the output of the xgboost_report rule to get further insights into the customer churn model. The following figure shows the model’s feature importance for the training job. The most important feature is customer duration (membership in months). lekker offers contracts with a fixed duration, such as 12 or 24 months. If customers cancel their contract, the churn shows at the end of the fixed duration period. That’s why most churn appears at month 12 and 24.

Knowledge about what influences the models’ outcome is important because it helps explain the model. lekker uses SHapley Additive exPlanations (SHAP) values recorded by Debugger during training. SHAP was made for local interpretability of a predictive model. It uses a game theoretic approach to explain the output of machine learning models.

In the following figure, blue represents low feature values, red represents high. The x-axis shows the SHAP-value, which describes the impact on the outcome. High values indicate a predicted value increase, low values indicate a decrease. A line’s thickness represents how many customers are at this specific point. In the churn model customers with low duration have low predicted churn probabilities. That’s a result of their contract structure, because customer churn can be determined after 12 months at the earliest.

Users running on Amazon SageMaker can obtain SHAP values for their model either through SageMaker Debugger or SageMaker Clarify. The key difference is that Debugger records those values during training, while Clarify captures them after the model has been trained. Inspecting SHAP values during the training phase, helps to further improve the model by identifying and removing irrelevant input features.

Once the model is trained, you can use Clarify to get SHAP values for any dataset. Once you deploy the model as an endpoint, you can use Clarify to monitor the SHAP values for captured data from the endpoint. Another key difference is that Debugger can collect SHAP values during training for XGBoost models whereas Clarify is model agnostic and can work with any model.

Results

With all the tools and services SageMaker provides, lekker was able to raise churn model accuracy by nearly 20%. In addition, the model is more stable than earlier versions. That’s why the F1 score raised over 80% and AUC to 96%.

“Since we got all this information about model insights, we are able to get a clear understanding about what’s happening,” says Steffen Kremers, a data scientist at lekker. “Especially the concept of feature gains, which is fully integrated in the Debugger report, gave us useful information about the most influencing features. Important information for both feature engineering and feature selection.”

Since the churn model was deployed, lekker has moved three more models to SageMaker and integrated them into operations. lekker transferred the learnings they made to all these models, and have seen that all models yield better results than before. Once lekker saw the insights ML can bring, they began expanding their ML activities.

Conclusion

This post demonstrated how lekker moved workloads from on premises to SageMaker, and how it helped their data science teams accelerate and innovate faster. lekker extensively uses Debugger to get deeper insights into their models, which help improve and better explain the models. To learn more about Debugger features and how this service can help your business, see Amazon SageMaker Debugger. To learn more about optimizing for customer churn, check out the blog post Preventing customer churn by optimizing incentive programs using stochastic programming.


About the Authors

Steffen Kremers is a data scientist at lekker based in Germany. He accompanies the whole machine learning process – from developing use case ideas to model building up to model deployment.

 

 

 

Nathalie Rauschmayr is an Applied Scientist at AWS, where she helps customers develop deep learning applications.

 

 

 

Lu HuangLu Huang is a Senior Product Manager on the AWS Deep Engine team, managing Amazon SageMaker Debugger.

 

Read More

Best practices in customer service automation

Chatbots, virtual assistants, and Interactive Voice Response (IVR) systems are key components of successful customer service strategies.

We had the pleasure of hearing from three AWS Contact Center Intelligence (AWS CCI) Partners as part of our Best Practices in Customer Service Automation webinar, who provided valuable insights and tips for building automated, customer-service solutions.

The panel included:

Why build a chatbot or IVR?

Customers expect great customer service. At the same time, enterprises struggle with the costs and resources necessary to provide high-quality, highly available, live-agent solutions. Automated solutions, like chatbots and IVR, enable enterprises to provide quality support, 24/7, while reducing costs and increasing customer satisfaction.

Although reducing costs is important, a big reason enterprises are implementing automated solutions is to provide a better overall user-experience. As Brad Beumer of UIPath points out, it is what customers are asking for. Customers want a 24/7/365 experience—especially for common tasks they can handle on their own without an agent.

Self-serve, automated solutions help take the pressure off live agents. As Rebecca Owens of Genesys mentions, self-service can help handle the upfront tasks, leaving the more complex tasks to the live agents, who are the contact centers’ most valuable assets.

The impact of COVID-19

COVID-19 has had a significant impact on the interest in chatbots. Shelter-in-place rules affected both the consumers’ ability to go into locations, and the live agents’ ability to work in the same contact center. The need for automated solutions skyrocketed. Genesys saw a large increase in call volumes—in some cases, nearly triple the volume.

Chatbots are not only helping consumers during COVID-19, but work-from-home agents as well. As Beumer mentions, automated solutions help offload more of the agents’ tasks and help them with compliance, security, and even VPN issues related to working from home.

COVID-19 resulted in more stress on existing chatbots too. As Pat Higbie of XAPP AI shares, existing chatbots were not set up to handle the additional use cases people wanted them to handle. These are opportunities to take advantage of AI, through tools like Amazon Lex or Amazon Kendra, for chatbots and natural language search, to enable users to get what they need and improve the customer experience.

Five best practices

Building automated solutions is an iterative process. Our panelists provided insights and best practices when facing common issues.

Getting started

Building conversational interfaces can be challenging because it is hard to know all the things a user may request, or even how they pose the request.

Our panelists see three basic use cases:

  • Task completion – Collecting user information to make an update, like an address change
  • Information requests – Providing information like delivery status or a bank balance
  • Efficient routing – Collecting information to route the user to the most appropriate agent

Our panelists recommend getting started with simpler use cases that have a high impact. As Beumer recommends, start with high-volume, low-complexity tasks like password resets or lost credit cards. Owens adds that starting with high-level Natural Language Understanding (NLU) menus to understand user intent and routing them to the right agent is a simple investment with a significant ROI. Afterwards, move to simple task automation and information requests, and then move into the more advanced use cases that were not possible before conversational AI. As Higbie puts it, start with a quick win, like informational chatbots, especially if you have not done this before. The level of complexity can go up quite dramatically, especially with transactional use cases.

As complexity increases, there are opportunities for more advanced use cases, like transactional or even proactive use cases. Owens mentioned an example of using AI to monitor activity on a website and proactively offering a chatbot when needed. For example, if you can predict the likelihood of an ecommerce user having an issue at checkout, a chatbot can proactively offer to help the user, to lead them through completion so the user does not abandon their cart.

Handling fallbacks gracefully

Fallbacks occur when the automated solution cannot understand the user or cannot handle the request. It is important to handle fallbacks gracefully.

In the past with contact centers, users were often routed to an agent when a fallback occurred. Now with AI, you can better understand the user’s intent and context, and either send them to another AI solution, or more efficiently transfer them to an agent, sending the full context so the user does not have to repeat themselves.

Fallbacks are an opportunity to educate users on what they can say and do—to help get users back on the “happy path.” For example, if the user asks for something the chatbot cannot do, have it respond with a list of what it can do. Predefined buttons, referred to as quick replies, can also help let a user know what the chatbot can do.

Supporting multimodal channels

Our panelists see enterprises building automated solutions across multiple channels, including multi-modal text and voice options on the web, IVR, social media, and email. Enterprises are building solutions where their customers are interacting. There are additional factors to consider when supporting multiple channels.

People ask questions differently across channels. As Higbie points out, users communicating via text tend do so in “keyword style” with incomplete sentences, whereas in voice, they tend to ask the full question.

The way the chatbot responds across channels can be different as well. In text, the chatbot can provide a menu of options for the user to click. With voice, if there are more than three options, it can be difficult for the user to remember.

Regardless of the channel, it is important to understand the user’s intent. As Beumer mentions, if the intent can be understood, the right automation can be triggered.

It can be helpful to have a common interaction model for understanding across channels, but it is important to optimize the actual responses for each particular channel. As Higbie indicates, model management, dialog management, and content management are all needed to handle the complexities in conversational AI.

Keeping context in mind

Context is important—what is known about the user, where they are, or what they are doing can help improve the user experience.

Chatbots and IVRs can connect to backend CRMs to have additional information to personalize and tailor the experience. They can also pass along information gathered from a user to a live agent for more efficient handling so the user does not have to repeat themselves.

In the case of voice, knowing if the user has been in recent contact before can be helpful. While introductory prompts can be great to educate people, if the user contacts again, it is better to use a tapered approach that reduces some of the default messaging in order to have a quicker opening response.

The context can also be used with proactive solutions that monitor user activity and prompt if help is needed.

Measuring success

Our panelists use a variety of metrics to measure success, such as call deflection rates, self-service containment rates, first response time, and customer satisfaction. The metrics can also be used to calculate operational cost savings by knowing the cost of live agents and the deflection rates.

Customer satisfaction is very important—one of the goals of automated solutions is to provide a better user experience. One way UIPath does this is to look at Net Promoter Scores (NPS) before and after an automated solution is launched. Surveys can be used as well, via outbound calls after an interaction to gather customer feedback. With chatbots, you can immediately ask the user whether the response was helpful and take further action depending on the response.

Automated solutions like chatbots and IVRs need continuous optimization. It is difficult to anticipate all the things a user may ask, or how they may ask them. Monitoring the interactions to understand what users are asking for, how the automated solution is responding, and where it needs improvement is important. It is an iterative process.

What the future looks like

Our panelists shared their thoughts on the future of automated solutions.

Owens sees an increase in usage of automated solutions across all channels as chatbot technologies gain momentum and AI is able to handle even more tasks and complexity. Although customer service is heavily voice today, she is seeing a push to digital, and expects the trend to continue. One area of growth is in the expansion of language support in AI beyond English to support worldwide coverage.

Beumer envisions expansion of automated solutions across all channels, for a more consistent user experience. While automation will increase, it is important to continue to make sure that when a chatbot hands off to a live agent, that it is done so seamlessly.

Higbie sees a lot of exciting opportunity for automated solutions, and believes we are only in the “first inning” of AI automation. Customers will ask for even more than what chatbots currently do, and they will get the responses instantly. Solutions will move more to the proactive side as well. He sees this as a bigger paradigm shift than either web or mobile. It is important to commit now and not be displaced. As he summarizes, enterprises need to get started, get a quick win, and then expand the sophistication of their AI initiatives.

As the underlying technologies continue to evolve, the opportunities for automated chatbots continue to grow. It is exciting to learn from our panelists and see where automated solutions are going in the future.

About AWS Contact Center Intelligence

AWS CCI solutions can quickly and easily add AI and ML to your existing contact center to improve customer satisfaction and reduce costs. AWS CCI covers three key areas of the contact center workflow: self-service automation, real-time analytics with agent assist, and post-call analytics. Each solution is created using a specific combination of AWS AI services, and is available through select AWS Partners. Join the next CCI Webinar, “Banking on Bots”, on May 25, 2021.


About the Author

Arte Merritt leads partnerships for Contact Center Intelligence and Conversational AI. He is a frequent author and speaker in the conversational AI space. He was the co-founder and CEO of the leading analytics platform for conversational interfaces, leading the company to 20,000 customers, 90B messages, and multiple acquisition offers. Previously he founded Motally, a mobile analytics platform he sold to Nokia. Arte has more than 20 years experience in big data analytics. Arte is an MIT alum.

Read More

Implement live customer service chat with two-way translation, using Amazon Connect and Amazon Translate

Many businesses support customers across multiple countries and ethnic communities, and therefore need to provide customer service in a wide variety of local languages. It’s hard to consistently staff contact centers with agents with different language proficiencies. During periods of high call volumes, callers often must wait on hold for an agent who can speak their language.

What if these businesses could implement a system to act as a real-time translator, allowing customers and agents to easily communicate in different languages? With such a system, a customer could message a support agent in their native language, such as French, and the support agent could use their own native language, maybe Italian, to read and respond to the customer’s messages. Deliveroo, an online food delivery company based in England, has implemented a system that does exactly that!

Deliveroo provides food delivery in over 200 locations across Europe, the Middle East, and Asia, serving customers in dozens of languages. Previously, during periods of high demand (such as during televised sporting events, or bad weather) they would ask customers to wait for a native speaker to become available or ask their agents to copy/paste the chats into an online translation service. These approaches were far from ideal, so Deliveroo is now deploying a much better solution that uses Amazon Connect and Amazon Translate to implement scalable live agent chat with built-in automatic two-way translation.

In this post, we share an open-source version of this solution from one of Amazon’s partners, VoiceFoundry. We show you how to install and try the solution, and then how you can customize it to control translations of specific phrases. Finally, we share success stories from our customer, Deliveroo, and leave you with pointers for implementing a similar solution for your own business.

Set up an Amazon Connect test instance and live chat translation

Follow these tutorials to set up an Amazon Connect test instance and experiment with the chat contact feature:

If you have an Amazon Connect test instance and you already know how to use chat contacts, you can skip this step.

Now that you have Amazon Connect chat working, it’s time to install the sample live chat translation solution. My co-author, Dan from VoiceFoundry, has made it easy. Follow the instructions in the project GitHub repository Install Translate CCP Demo for Amazon Connect.

Test the solution

To test the solution, you simulate two roles—the agent and the customer.

  1. As the agent, sign in to your Amazon Connect instance dashboard.
  2. In a separate browser window, open the new web application using the URL created when you installed the solution.

The Amazon Connect Control Panel is displayed on the left, and the new chat translation panel is on the right.

  1. On the Control Panel title bar, change your status from Offline to Available.
  2. Acting as the customer, launch the test chat page from the Amazon Connect dashboard, or using the URL https://<yourConnectInstance>/connect/test-chat.

In a real-world application, you use a customer chat client widget on a website or mobile application. However, for this post, it’s convenient to use the test chat client.

  1. Open the customer test chat widget to initiate contact with the agent.

You hear a ring tone and see a visual indicator on the agent’s control panel as the agent is asked to accept your contact.

  1. As the agent, accept the incoming request to establish contact.

  1. As the customer, enter a message in Spanish into the customer test chat widget. For example, “Hola, necesito ayuda con mi pedido.”

Let’s assume that the agent can’t understand the incoming message in Spanish. Don’t worry—we can use our sample solution. The new web app chat translation panel displays the translation in English, along with the customer’s original message. Now you can understand the phrase “Hi, I need help with my order.”

  1. As the agent, enter a reply in English in the chat translation panel text box, for example “Hi, My name is Bob and I will be happy to help. What is your name and phone number?”

Your reply is automatically translated back to Spanish.

  1. As the customer, verify that you received a reply from the agent in Spanish.

Continue the conversation and observe how the customer can chat entirely in Spanish, and the agent entirely in English. Take a moment to consider how useful this can be.

When you’re done, as the agent, choose End chat and Close contact to end the chat session. As the customer, choose End chat.

Did you notice that the chat translation panel automatically identified the language the customer used—in this case Spanish? You can use any of the languages supported by Amazon Translate. Try the experiment again, this time using a different language for the customer. Have some fun with it—engage friends who are fluent in other languages and communicate with them in their native tongue.

In the sample application, we have assumed that the agent always uses English. A production version of the application would allow the agent to choose their preferred language.

Multi-chat support

Amazon Connect supports up to five concurrent chat sessions per agent. Our sample application allows a single agent to support multiple customer chats in different languages concurrently.

In the following screenshot, agent Bob is now chatting with a new customer, this time in German!

Customize terminology

Let’s say you have a product called Moonlight and Roses. While discussing this product with your Spanish-speaking customer, you enter something like “I see that you ordered Moonlight and Roses on May 13, is that correct?”

Your customer sees the translation “Veo que ordenaste Luz de Luna y Rosas el 13 de mayo, ¿es correcto?”

This is a good literal translation—Luz de Luna y Rosas does mean Moonlight and Roses. But in this case, you want your English product name, Moonlight and Roses, to be translated to the Spanish product name, Moonlight y Roses.

This is where we can use the powerful custom terminology feature in Amazon Translate. Let’s try it. For instructions on updating your custom terminologies, see the GitHub repo.

Now we can validate the solution with another simulated chat between an agent and customer, as in the following screenshot.

Deliveroo use case

Amazon Translate helps Deliveroo’s customers, riders (delivery personnel), and food establishment owners talk to each other across language barriers to deliver hot and tasty food of your choice from your local neighborhood eateries quickly.

This helped support the food delivery industry especially during the COVID-19 pandemic, when going out to restaurants became a hazardous endeavor.

Amy Norris, Product Manager for Deliveroo Customer Care says, “Amazon Translate is fast, accurate, and customizable to ensure that food item names, restaurant names, addresses, and customer names are translated correctly to create trustful conversational connections in uncertain times. By using Amazon Translate, our customer service agents were able to increase their first call resolution to 83% and reduce the average call handling time for their customers by 20%.”

Clean up

When you have finished experimenting with this solution, you can clean up your resources by removing the sample live chat translation application and deleting your test Amazon Connect instance.

Conclusion

The combination of Amazon Connect and Amazon Translate enables a scalable, cost-effective solution for your customer support agents to communicate in real time with customers in their preferred languages. The sample application is provided as open source—you can use it as a starting point for your own solution. AWS Professional Services, VoiceFoundry, and other Amazon partners are here to help as well.

We’d love to hear from you. Let us know what you think in the comments section, or using the issues forum in the sample solution GitHub repository.


About the Authors

Bob StrahanBob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

 

 

 

 

Daniel Bloy is a practice leader for VoiceFoundry, an Amazon Connect specialty partner.

Read More

Reduce ML inference costs on Amazon SageMaker with hardware and software acceleration

Amazon SageMaker is a fully-managed service that enables data scientists and developers to build, train, and deploy machine learning (ML) models at 50% lower TCO than self-managed deployments on Elastic Compute Cloud (Amazon EC2). Elastic Inference is a capability of SageMaker that delivers 20% better performance for model inference than AWS Deep Learning Containers on EC2 by accelerating inference through model compilation, model server tuning, and underlying hardware and software acceleration technologies.

Inference is the process of making predictions using a trained ML model. For production ML applications, inference accounts for up to 90% of total compute costs. Hence, when deploying an ML model for inference, accelerating inference performance on low-cost instance types is an effective way to reduce overall compute costs while meeting performance requirements such as latency and throughput. For example, running ML models on GPU-based instances provides good inference performance; however, selecting the right instance size and optimizing GPU utilization is challenging because different ML models require different amounts of compute and memory resources.

Elastic Inference Accelerators (EIA) solve this problem by enabling you to attach the right amount of GPU-powered inference acceleration to any Amazon SageMaker ML instance. You can choose any CPU instance type that best suits your application’s overall compute and memory needs, and separately attach the right amount of GPU-powered inference acceleration needed to satisfy your performance requirements. This allows you to reduce inference costs by using compute resources more efficiently. Along with hardware acceleration, Elastic Inference offers software acceleration through SageMaker Neo, a capability of SageMaker that automatically compiles ML models for any ML framework and to any target hardware. With SageMaker Neo, you don’t need to set up third-party or framework-specific compiler software or tune the model manually for optimizing inference performance. With Elastic Inference, you can combine software and hardware acceleration to get the best inference performance on SageMaker.

This post demonstrates how you can use hardware and software-based inference acceleration to reduce costs and improve latency for pre-trained TensorFlow models on Amazon SageMaker. We show you how to compile a pre-trained TensorFlow ResNet-50 model using SageMaker Neo and how to deploy this model to a SageMaker Endpoint with Elastic Inference.

Setup

First, we need to ensure we have SageMaker Python SDK  >=2.32.1 and import necessary Python packages. If you are using SageMaker Notebook Instances, select conda_tensorflow_p36 as your kernel. Note that you may have to restart your kernel after upgrading packages.

import numpy as np
import time
import json
import requests
import boto3
import os
import sagemaker

Next, we’ll get the IAM execution role and a few other SageMaker specific variables from our notebook environment so that SageMaker can access resources in your AWS account. See the documentation for more information on how to set this up.

from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

Get pre-trained model for compilation

SageMaker Neo supports compiling TensorFlow/Keras, PyTorch, ONNX, and XGBoost models. However, only Neo-compiled TensorFlow models are supported on EIA as of this writing. TensorFlow models should be in SavedModel format or frozen graph format. Learn more here.

Import ResNet50 model from Keras

We will import ResNet50 model from Keras applications and create a model artifact model.tar.gz.

import tensorflow as tf
import tarfile

tf.keras.backend.set_image_data_format('channels_last')
pretrained_model = tf.keras.applications.resnet.ResNet50()
saved_model_dir = '1'
tf.saved_model.save(pretrained_model, saved_model_dir)

with tarfile.open('model.tar.gz', 'w:gz') as tar:
    tar.add(saved_model_dir)

Upload model artifact to S3

SageMaker Neo expects a path to the model artifact in Amazon S3, so we will upload the model artifact to an S3 bucket.

from sagemaker.utils import name_from_base

prefix = name_from_base('ResNet50')
input_model_path = session.upload_data(path='model.tar.gz', bucket=bucket, key_prefix=prefix)
print('S3 path for input model: {}'.format(input_model_path))

Compile model for EI Accelerator using SageMaker Neo

Now the model is ready to be compiled by SageMaker Neo. Note that ml_eia2 needs to be set for target_instance_family field in order for the model to be optimized for EI accelerator deployment. If you want to compile your own model for EI accelerator, refer to Neo compilation API. In order to compile the model, you also need to provide the model input_shape and any optional compiler_options to your model. Note that 32-bit floating-point types (FP32) are the default precision mode for ML models. We include this here to be explicit versus compiling with lower precision models. Learn more about advantages of different precision types here.

from sagemaker.tensorflow import TensorFlowModel

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp32"
compiled_model_fp32 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp32"})

Deploy compiled model to an Endpoint with EI Accelerator attached

Deploying a model to a SageMaker Endpoint uses the same deploy function whether or not a model is compiled using SageMaker Neo. The only change required for utilizing EI Accelerator is to provide an accelerator_type parameter, which determines the type of EI accelerator to be attached to your endpoint. All supported types of accelerators can be found here.

predictor_compiled_fp32 = compiled_model_fp32.deploy(initial_instance_count=1,
instance_type='ml.m5.xlarge', accelerator_type='ml.eia2.large')

Benchmarking endpoints

Once the endpoint is created, we will benchmark to measure latency. The model expects input shape of 1 x 224 x 224 x 3, so we expand the dog image (224x224x3) with a batch size of 1 to be compatible with the model input. The benchmark first runs a series of 100 warmup inferences, and then runs 1000 inferences to make sure that we get an accurate estimate of latency ignoring startup times. Latency percentiles are reported from these 1000 inferences.

import numpy as np
import matplotlib.image as mpimg

data = mpimg.imread('dog.jpg')
data = np.expand_dims(data, axis=0)
print("Input data shape: {}".format(data.shape))

import time
import numpy as np


def benchmark_sm_endpoint(predictor, input_data):
    print('Doing warmup round of 100 inferences (not counted)')
    for i in range(100):
      output = predictor.predict(input_data)
    time.sleep(3)

    client_times = []
    print('Running 1000 inferences')
    for i in range(1000):
      client_start = time.time()
      output = predictor.predict(data)
      client_end = time.time()
      client_times.append((client_end - client_start)*1000)

    print('Client end-to-end latency percentiles:')
    client_avg = np.mean(client_times)
    client_p50 = np.percentile(client_times, 50)
    client_p90 = np.percentile(client_times, 90)
    client_p99 = np.percentile(client_times, 99)
    print('Avg | P50 | P90 | P99')
    print('{:.4f} | {:.4f} | {:.4f} | {:.4f}n'.format(client_avg, client_p50, client_p90, client_p99))
    
benchmark_sm_endpoint(predictor_compiled_fp32, data)

From the benchmark above, the output will be similar to the following:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
103.2129 | 124.4727 | 129.1123 | 133.2371

Compile and benchmark model with quantization

Quantization based model optimizations represent model weights in lower precision (e.g. FP16) which increases throughput and offers lower latency. Using FP16 precision in particular provides faster performance than FP32 with effectively no drop (<0.1%) in model accuracy. When you enable FP16 precision, SageMaker Neo chooses kernels from both FP16 and FP32 precision. For the ResNet50 model in this post, we are able to compile the model along with FP16 quantization by setting the precision_mode under compiler_options.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp16"
compiled_model_fp16 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp16"})

# Deploy the compiled model to SM endpoint with EI attached
predictor_compiled_fp16 = compiled_model_fp16.deploy(initial_instance_count=1,
                                                     instance_type='ml.m5.xlarge',
                                                     accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_compiled_fp16, data)

Benchmark data for model compiled with FP16 will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
91.8721 | 112.8929 | 117.7130 | 122.6844

Compare latency with unoptimized model on EIA

We could see that model compiled with FP16 precision mode is faster than the model compiled with FP32, now let’s get the latency for an uncompiled model as well.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Deploy the uncompiled model to SM endpoint with EI attached
predictor_uncompiled = tensorflow_model.deploy(initial_instance_count=1,
                                           instance_type='ml.m5.xlarge',
                                           accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_uncompiled, data)

Benchmark data for uncompiled model will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
117.1654 | 137.9665 | 143.5326 | 150.2070

Clean up endpoints

Having an endpoint running will incur some costs. Therefore, we would delete the endpoint to release the resources after finishing this example.

session.delete_endpoint(predictor_compiled_fp32.endpoint_name)
session.delete_endpoint(predictor_compiled_fp16.endpoint_name)
session.delete_endpoint(predictor_uncompiled.endpoint_name)

Performance comparison

To understand the performance improvement from model compilation and quantization, you can visualize differences in percentile latency for models with different optimizations in following plot. For our model, we find that adding model compilation improves latency by 13.5% compared to the unoptimized model. Adding quantization (FP16) to the compiled model results in 27.5% improvement in latency compared to the unoptimized model.

Summary

SageMaker Elastic Inference is an easy-to-use solution for adding model optimizations to improve inference performance on Amazon SageMaker. With Elastic Inference accelerators, you can get GPU inference acceleration and remain more cost-effective than standalone SageMaker GPU instances. With SageMaker Neo, software-based acceleration provided by model optimizations further improves performance (27.5%) over unoptimized models.

If you have any questions or comments, use the Amazon SageMaker Discussion Forums or send an email to amazon-ei-feedback@amazon.com.


About the Authors

Jiacheng Guo is a Software Engineer with AWS AI. He is passionate about building high performance deep learning systems with state-of-art techniques. In his spare time, he enjoys drifting on dirt track and playing with his Ragdoll cat.

 

 

 

Santosh Bhavani is a Senior Technical Product Manager with the Amazon SageMaker Elastic Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys traveling, playing tennis, and drinking lots of Pu’er tea.

Read More

Automate feature engineering pipelines with Amazon SageMaker

The process of extracting, cleaning, manipulating, and encoding data from raw sources and preparing it to be consumed by machine learning (ML) algorithms is an important, expensive, and time-consuming part of data science. Managing these data pipelines for either training or inference is a challenge for data science teams, however, and can take valuable time away that could be better used towards experimenting with new features or optimizing model performance with different algorithms or hyperparameter tuning.

Many ML use cases such as churn prediction, fraud detection, or predictive maintenance rely on models trained from historical datasets that build up over time. The set of feature engineering steps a data scientist defined and performed on historical data for one time period needs to be applied towards any new data after that period, as models trained from historic features need to make predictions on features derived from the new data. Instead of manually performing these feature transformations on new data as it arrives, data scientists can create a data preprocessing pipeline to perform the desired set of feature engineering steps that runs automatically whenever new raw data is available. Decoupling the data engineering from the data science in this way can be a powerful time-saving practice when done well.

Workflow orchestration tools like AWS Step Functions or Apache Airflow are typically used by data engineering teams to build these kinds of extract, transform, and load (ETL) data pipelines. Although these tools offer comprehensive and scalable options to support many data transformation workloads, data scientists may prefer to use a toolset specific to ML workloads. Amazon SageMaker supports the end-to-end lifecycle for ML projects, including simplifying feature preparation with SageMaker Data Wrangler and storage and feature serving with SageMaker Feature Store.

In this post, we show you how a data scientist working on a new ML use case can use both Data Wrangler and Feature Store to create a set of feature transformations, perform them over a historical dataset, and then use SageMaker Pipelines to automatically transform and store features as new data arrives daily.

For more information about SageMaker Data Wrangler, Feature Store, and Pipelines, we recommend the following resources:

Overview of solution

The following diagram shows an example end-to-end process from receiving a raw dataset to using the transformed features for model training and predictions. This post describes how to set up your architecture such that each new dataset arriving in Amazon Simple Storage Service (Amazon S3) automatically triggers a pipeline that performs a set of predefined transformations with Data Wrangler and stores the resulting features in Feature Store. You can visit our code repo to try it out in your own account.

Before we set up the architecture for automating feature transformations, we first explore the historical dataset with Data Wrangler, define the set of transformations we want to apply, and store the features in Feature Store.

Dataset

To demonstrate feature pipeline automation, we use an example of preparing features for a flight delay prediction model. We use flight delay data from the US Department of Transportation’s Bureau of Transportation Statistics (BTS), which tracks the on-time performance of domestic US flights. After you try out the approach with this example, you can experiment with the same pattern on your own datasets.

Each record in the flight delay dataset contains information such as:

  • Flight date
  • Airline details
  • Origin and destination airport details
  • Scheduled and actual times for takeoff and landing
  • Delay details

Once the features have been transformed, we can use them to train a machine learning model to predict future flight delays.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload the historical dataset to Amazon S3

Our code repo provides a link to download the raw flight delay dataset used in this example. The directory flight-delay-data contains two CSV files covering two time periods with the same columns. One file contains flight data from Jan 1, 2020, through March 30, 2020. The second file contains flight data for a single day: March 31, 2020. We use the first file for the initial feature transformations. We use the second file to test our feature pipeline automation. In this example, we store the raw dataset in the default S3 bucket associated with our Studio domain, but this isn’t required.

Feature engineering with Data Wrangler

Whenever a data scientist starts working on a new ML use case, the first step is typically to explore and understand the available data. Data Wrangler provides a fast and easy way to visually inspect datasets and perform exploratory data analysis. In this post, we use Data Wrangler within the Studio IDE to analyze the airline dataset and create the transformations we later automate.

A typical model may have dozens or hundreds of features. To keep our example simple, we show how to create the following feature engineering steps using Data Wrangler:

  • One-hot encoding the airline carrier column
  • Adding a record identifier feature and an event timestamp feature, so that we can export to Feature Store
  • Adding a feature with the aggregate daily count of delays from each origin airport

Data Wrangler walkthrough

To start using Data Wrangler, complete the following steps:

  1. In a Studio domain, on the Launcher tab, choose New data flow.
  2. Import the flight delay dataset jan01_mar30_2020.csv from its location in Amazon S3.

Data Wrangler shows you a preview of the data before importing.

  1. Choose Import dataset.

You’re ready to begin exploring and feature engineering.

Because ML algorithms typically require all input features to be numeric for training and inference, it’s common to transform categorical features into a numerical representation. Here we use one-hot encoding for the airline carrier column, which transforms it into several binary columns, one for each airline carrier present in the data.

  1. Choose the + icon next to the dataset and choose Add Transform.
  2. For the field OP_UNIQUE_CARRIER, select one-hot encoding.
  3. Under Encode Categorical, for Output Style, choose Columns.

Feature Store requires a unique RecordIdentifier field for each record ingested into the store, so we add a new column to our dataset, RECORD_ID which is a concatenation of four fields: OP_CARRIER_FL_NUM, ORIGIN, DEP_TIME, and DEST. It also requires an EventTime feature for each record, so we add a timestamp to FL_DATE in a new column called EVENT_TIME. Here we use Data Wrangler’s custom transform option with Pandas:

df['RECORD_ID']= df['OP_CARRIER_FL_NUM'].astype(str) +df['ORIGIN']+df['DEP_TIME'].astype(str)+df['DEST']

df['EVENT_TIME']=df['FL_DATE'].astype(str)+'T00:00:00Z'

To predict delays for certain flights each day, it’s useful to create aggregated features based on the entities present in the data over different time windows. Providing an ML algorithm with these kinds of features can deliver a powerful signal over and above what contextual information is available for a single record in this raw dataset. Here, we calculate the number of delayed flights from each origin airport over the last day using Data Wrangler’s custom transform option with PySpark SQL:

SELECT *, SUM(ARR_DEL15) OVER w1 as NUM_DELAYS_LAST_DAY
FROM df WINDOW w1 AS (PARTITION BY ORIGIN order by 
cast('EVENT_TIME' AS timestamp) 
RANGE INTERVAL 1 DAY PRECEDING)

In a real use case, we’d likely spend a lot of time at this stage exploring the data, defining transformations, and creating more features. After defining all of the transformations to perform over the dataset, you can export the resulting ML features to Feature Store.

  1. On the Export tab, choose </> under Steps. This displays a list of all the steps you have created.
  2. Choose the last step, then choose Export Step.
  3. On the Export Step drop-down menu, choose Feature Store.

SageMaker generates a Jupyter notebook for you and opens it in a new tab in Studio. This notebook contains everything needed to run the transformations over our historical dataset and ingest the resulting features into Feature Store.

Store features in Feature Store

Now that we’ve defined the set of transformations to apply to our dataset, we need to perform them over the set of historical records and store them in Feature Store, a purpose-built store for ML features, so that we can easily discover and reuse them without needing to reproduce the same transformations from the raw dataset as we have done here. For more information about the capabilities of Feature Store, see Understanding the key capabilities of Amazon SageMaker Feature Store.

Running all code cells in the notebook created in the earlier section completes the following:

  • Creates a feature group
  • Runs a SageMaker Processing job that uses our historical dataset and defined transformations from Data Wrangler as input
  • Ingests the newly transformed historical features into Feature Store
  1. Select the kernel Python 3 (Data Science) in the newly opened notebook tab.
  2. Read through and explore the Jupyter notebook.
  3. In the Create FeatureGroup section of the generated notebook, update the following fields for event time and record identifier with the column names we created in the previous Data Wrangler step (if using your own dataset, your names may differ):
record_identifier_name = "RECORD_ID"
event_time_feature_name = "EVENT_TIME"
  1. Choose Run and then choose Run All Cells.

Automate data transformations for future datasets

After the Processing job is complete, we’re ready to move on to creating a pipeline that is automatically triggered when new data arrives in Amazon S3, which reproduces the same set of transformations on the new data and constantly refreshes the Feature Store, without any manual intervention needed.

  1. Open a new terminal in Studio and clone our repo by running git clone https://github.com/aws-samples/amazon-sagemaker-automated-feature-transformation.git
  2. Open the Jupyter notebook called automating-feature-transformation-pipeline.ipynb in a new tab

This notebook walks through the process of creating a new pipeline that runs whenever any new data arrives in the designated S3 location.

  1. After running the code in that notebook, we upload one new day’s worth of flight delay data, mar31_2020.csv, to Amazon S3.

A run of our newly created pipeline is automatically triggered to create features from this data and ingest them into Feature Store. You can monitor progress and see past runs on the Pipelines tab in Studio.

Our example pipeline only has one step to perform feature transformations, but you can easily add subsequent steps like model training, deployment, or batch predictions if it fits your particular use case. For a more in-depth look at SageMaker Pipelines, see Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines.

We use an S3 event notification with a AWS Lambda function destination to trigger a run of the feature transformation pipeline, but you can also schedule pipeline runs using Amazon EventBridge, which enables you to automate pipelines to respond automatically to events such as training job or endpoint status changes, or even configure your feature pipeline to run on a specific schedule.

Conclusion

In this post, we showed how you can use a combination of Data Wrangler, Feature Store, and Pipelines to transform data as it arrives in Amazon S3 and store the engineered features automatically into Feature Store. We hope you try this solution and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts or on the SageMaker Discussion Forum.


About the Authors

Muhammad Khas is a Solutions Architect working in the Public Sector team at Amazon Web Services. He enjoys supporting customers in using artificial intelligence and machine learning to enhance their decision-making. Outside of work, Muhammad enjoys swimming and horseback riding.

 

 

Megan Leoni is an AI/ML Specialist Solutions Architect for AWS, helping customers across Europe, Middle East, and Africa design and implement ML solutions. Prior to joining AWS, Megan worked as a data scientist building and deploying real-time fraud detection models.

 

 

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Read More