Run distributed hyperparameter and neural architecture tuning jobs with Syne Tune

Today we announce the general availability of Syne Tune, an open-source Python library for large-scale distributed hyperparameter and neural architecture optimization. It provides implementations of several state-of-the-art global optimizers, such as Bayesian optimization, Hyperband, and population-based training. Additionally, it supports constrained and multi-objective optimization, and allows you to bring your own global optimization algorithm.

With Syne Tune, you can run hyperparameter and neural architecture tuning jobs locally on your machine or remotely on Amazon SageMaker by changing just one line of code. The former is a well-suited backend for smaller workloads and fast experimentation on local CPUs or GPUs. The latter is well-suited for larger workloads, which come with a substantial amount of implementation overhead. Syne Tune makes it easy to use SageMaker as a backend to reduce wall clock time by evaluating a large number of configurations on parallel Amazon Elastic Compute Cloud (Amazon EC2) instances, while taking advantage of SageMaker’s rich set of functionalities (including pre-built Docker deep learning framework images, EC2 Spot Instances, experiment tracking, and virtual private networks).

By open-sourcing Syne Tune, we hope to create a community that brings together academic and industrial researchers in machine learning (ML). Our goal is to create synergies between these two groups by enabling academics to easily validate small-scale experiments at larger scale and industrials to use a broader set of state-of-the-art optimizers.

In this post, we discuss hyperparameter and architecture optimization in ML, and show you how to launch tuning experiments on your local machine and also on SageMaker for large-scale experiments.

Hyperparameter and architecture optimization in machine learning

Every ML algorithm comes with a set of hyperparameters that control the training algorithm or the architecture of the underlying statistical model. Typical examples of such hyperparameters for deep neural networks are the learning rate or the number of units per layer. Setting these hyperparameters correctly is crucial to obtain top-notch predictive performances.

To overcome the daunting process of trial and error, hyperparameter and architecture optimization aims to automatically find the specific configuration that maximizes the validation performance of our ML algorithm. Arguably, the easiest method to solve this global optimization problem is random search, where configurations are sampled from a predefined probability distribution. A more sample-efficient technique is Bayesian optimization, which maintains a probabilistic model of the objective function (here, the validation performance) to guide the search toward the global optimum in a sequential manner.

Unfortunately, with ever-increasing dataset sizes and ever-deeper models, training deep neural networks can be prohibitively slow to tune. Recent advances in hyperparameter optimization, such as Hyperband or MoBster, early stop the evaluation of configurations that are unlikely to achieve a good performance and reallocate the resources that would have been consumed to the evaluation of other candidate configurations. You can obtain further gains by using distributed resources to parallelize the tuning process. Because the time to train a deep neural network can vary widely across hyperparameter and architecture configurations, optimal resource allocation requires our optimizer to asynchronously decide which configuration to run next by taking the pending evaluation of other configurations into account. Next, we see how this works in practice and how we can run this either on a local machine or on SageMaker.

Tune hyperparameters with Syne Tune

We now detail how to tune hyperparameters with Syne Tune. First, you need a script that takes hyperparameters as arguments and reports results as soon as they are observed. Let’s look at a simplified example of a script that exposes the learning rate, dropout rate, and momentum as hyperparameters, and reports the validation accuracy after each training epoch:

from argparse import ArgumentParser
from syne_tune.report import Reporter

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--lr', type=float)
    parser.add_argument('--dropout_rate', type=float)
    parser.add_argument('--momentum', type=float)

    args, _ = parser.parse_known_args()
    report = Reporter()

    for epoch in range(1, args.epochs + 1):
        # ... train model and get validation accuracy        
        val_acc = compute_accuracy()
        
        # Feed the score back to Syne Tune.
        report(epoch=epoch, val_acc=val_acc)

The important part is the call to report. It enables you to transmit results to a scheduler that decides whether to continue the evaluation of a configuration, or trial, and later potentially uses this data to select new configurations. In our case, we use a common use case that trains a computer vision model adapted from SageMaker examples on GitHub.

We define the search space for the hyperparameters (dropout, learning rate, momentum) that we want to optimize by specifying the ranges:

from syne_tune.search_space import loguniform, uniform

max_epochs = 27
config_space = {
    "epochs": max_epochs,
    "lr": loguniform(1e-5, 1e-1),
    "momentum": uniform(0.8, 1.0),
    "dropout_rate": loguniform(1e-5, 1.0),
}

We also specify the scheduler we want to use, Hyperband in our case:

from syne_tune.optimizer.schedulers.hyperband import HyperbandScheduler

scheduler = HyperbandScheduler(
    config_space,
    max_t=max_epochs,
    resource_attr='epoch',
    searcher='random',
    metric="val_acc",
    mode="max",
)

Hyperband is a method that randomly samples configurations and early stops evaluation trials if they’re not performing well enough after a few epochs. We use this particular scheduler for our example, but many others are available; for example, switching searcher=bayesopt enables us to use MoBster, which uses a surrogate model to sample new configurations to evaluate.

We’re now ready to define and launch a hyperparameter tuning job. First, we define the number of workers that evaluate trials concurrently and how long the optimization should run in seconds. Importantly, we use the local backend to evaluate our training script “train_cifar100.py” (see the full code). This means that the tuning happens on the local machine with one Python subprocess per worker. See the following code:

from syne_tune.backend.local_backend import LocalBackend
from syne_tune.tuner import Tuner
from syne_tune.stopping_criterion import StoppingCriterion

tuner = Tuner(
    backend=LocalBackend(entry_point="train_cifar100.py"),
    scheduler=scheduler,
    stop_criterion=StoppingCriterion(max_wallclock_time=7200),
    n_workers=4,
)

tuner.run()

As soon as the tuning starts, Syne Tune outputs the following line:

INFO:syne_tune.tuner:results of trials will be saved on /home/ec2-user/syne-tune/train-cifar100-2021-11-05-13-29-01-468

The log of the trials is stored in the aforementioned folder for further analysis. At any time during the tuning job, we can easily get the results obtained so far by calling load_experiment(“train-cifar100-2021-11-05-15-22-27-531”) and plotting the best result obtained since the start of the tuning job:

from syne_tune.experiments import load_experiment
tuning_experiment = load_experiment("train-cifar100-2021-11-05-15-22-27-531")
tuning_experiment.plot()

The following graph shows our results.

More fine-grained information is available if desired; the results obtained during tuning are stored as well as the scheduler and tuner state—namely, the state of the optimization process. For instance, we can plot the metric obtained for each trial over time (recall that we run four trials asynchronously). In the following figure, each trace represents the evaluation of a configuration as a function of the wall clock time; a dot is a trial stopped after one epoch.

We clearly see the effect of early stopping—only the most promising configurations are evaluated fully and poor performing configurations are stopped early, often after just evaluating a single epoch.

We can also easily switch to another scheduler, for example, random search or MoBster:

from syne_tune.optimizer.schedulers.fifo import FIFOScheduler

scheduler = FIFOScheduler(
    config_space,
    searcher='random',
    metric="val_acc",
    mode="max",
)
scheduler = HyperbandScheduler(
    config_space,
    max_t=max_epochs,
    resource_attr='epoch',
    searcher='bayesopt',
    metric="val_acc",
    mode="max",
)

If we then run the same code with the new schedulers, we can compare all three methods. We see in the following figure that Hyperband only continues well-performing trials, and early stops poorly performing configurations.

Therefore, Hyperband evaluates many more configurations than random search (see the following figure), which uses resources to evaluate every configuration until the end. This can lead to drastic speedups of the tuning process in practice.

MoBster further improves over Hyperband by using a probabilistic surrogate model of the objective function.

The following figure show all configurations that Hyperband samples during the tuning job.

In comparison, MoBster samples more promising configurations around the well-performing range (brighter color being better) of the search space instead of sampling them uniformly at random like Hyperband.

Run large-scale tuning jobs with Syne Tune and SageMaker

The previous example showed how to tune hyperparameters on a local machine. Sometimes, we need more powerful machines or a large number or workers, which motivates the use of a cloud infrastructure. Syne Tune provides a very simple way to run tuning jobs on SageMaker. Let’s look at how this can be achieved with Syne Tune.

We first upload the cifar100 dataset to Amazon Simple Storage Service (Amazon S3) so that it’s available on EC2 instances:

import sagemaker

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-pytorch-cnn-cifar100"
role = sagemaker.get_execution_role()
inputs = sagemaker_session.upload_data(path="data", bucket=bucket, key_prefix="data/cifar100")

Next, we specify that we want trials to be run on the SageMaker backend. We use the SageMaker framework (PyTorch) in this particular example because we have a PyTorch training script, but you can use any SageMaker framework (such as XGBoost, TensorFlow, Scikit-learn, or Hugging Face).

A SageMaker framework is a Python wrapper that allows you to run ML code easily by providing a pre-made Docker image that works seamlessly on CPU and GPU for many framework versions. In this particular example, all we need to do is to instantiate the wrapper PyTorch with our training script:

from sagemaker.pytorch import PyTorch
from syne_tune.backend.sagemaker_backend.sagemaker_utils import get_execution_role
from syne_tune.backend.sagemaker_backend.sagemaker_backend import SagemakerBackend

backend = SagemakerBackend(
    sm_estimator=PyTorch(
        entry_point="./train_cifar100.py",
        instance_type="ml.g4dn.xlarge",
        instance_count=1,
        role=get_execution_role(),
        framework_version='1.7.1',
        py_version='py3',
    ),
    inputs=inputs,
)

We can now run our tuning job again, but this time we use 20 workers, each having their own GPU:

tuner = Tuner(
    backend=backend,
    scheduler=scheduler,
    stop_criterion=StoppingCriterion(max_wallclock_time=7200, max_cost=20.0),
    n_workers=20,
    tuner_name="cifar100-on-sagemaker"
)

tuner.run()

After each instance initiates a training job, you see the status update as in the local case. An important difference to the local backend is that the total estimated dollar cost is displayed as well the cost of workers.

trial_id      status  iter  dropout_rate  epochs        lr  momentum  epoch  val_acc  worker-time  worker-cost
        0  InProgress     1      0.003162      30  0.001000  0.900000    1.0   0.4518         50.0     0.010222
        1  InProgress     1      0.037723      30  0.000062  0.843500    1.0   0.1202         50.0     0.010222
        2  InProgress     1      0.000015      30  0.000865  0.821807    1.0   0.4121         50.0     0.010222
        3  InProgress     1      0.298864      30  0.006991  0.942469    1.0   0.2283         49.0     0.010018
        4  InProgress     0      0.000017      30  0.028001  0.911238      -        -                  -
        5  InProgress     0      0.000144      30  0.000080  0.870546      -        -            -            -
6 trials running, 0 finished (0 until the end), 387.53s wallclock-time, 0.04068444444444444$ estimated cost

Because we specified max_wallclock_time=7200 and max_cost=20.0, the tuning job stops when the wall clock time or the estimated cost goes above the specified bound. In addition to providing an estimate of the cost, it can be optimized with our multi-objective optimizers (see the GitHub repo for an example). As shown in the following figures, the SageMaker backend allows you to evaluate many more configurations of hyperparameters and architectures in the same wall clock time than the local one and, as a result, increases the likelihood of finding a better configuration.

Conclusion

In this post, we saw how to use Syne Tune to launch tuning experiments on your local machine and also on SageMaker for large-scale experiments. To learn more about the library, check out our GitHub repo for documentation and examples that show, for instance, how to run model-based Hyperband, tune multiple objectives, or run with your own scheduler. We look forward to your contributions and seeing how this solution can address everyday tuning of ML pipelines and models.


About the Author

David Salinas is a Sr Applied Scientist at AWS.

 Aaron Klein is an Applied Scientist at AWS.

Matthias Seeger is a Principal Applied Scientist at AWS.

Cedric Archambeau is a Principal Applied Scientist at AWS and Fellow of the European Lab for Learning and Intelligent Systems.

Read More

Your guide to AI and ML at AWS re:Invent 2021

It’s almost here! Only 9 days until AWS re:Invent 2021, and we’re very excited to share some highlights you might enjoy this year. The AI/ML team has been working hard to serve up some amazing content and this year, we have more session types for you to enjoy. Back in person, we now have chalk talks, workshops, builders’ sessions, and our traditional breakout sessions. Last year we hosted the first-ever machine learning (ML) keynote, and we are continuing the tradition. We also have more interactive and fun events happening with our AWS DeepRacer League and AWS BugBust Challenge. There are over 200 AI/ML sessions, including breakout sessions with customers such as Aon Corporation, Qualtrics, Shutterstock, and Bloomberg.

To help you plan your agenda for this year’s re:Invent, here are some highlights of the AI/ML track. You can also get the scoop from some of our AI/ML Community Heroes. So buckle up, and start registering for your favorite sessions.

Swami Sivasubramanian keynote

Wednesday, December 1, 8:30 am PT

Join Swami Sivasubramanian, Vice President, Machine Learning, AWS on an exploration of what it takes to put data in action with an end-to-end data strategy including the latest news on databases, analytics, and ML.

AI/ML leadership session with Bratin Saha

Wednesday, December 1, 4:00 pm PT

With the rise in compute power and data proliferation, ML has moved from the peripheral to being a core part of businesses and organizations across industries. AWS customers use ML and AI services to make accurate predictions, get deeper insights from their data, reduce operational overhead, improve customer experiences, and create entirely new lines of business. In this session, hear from Bratin Saha, Vice President, Machine Leaning, AWS and explore how AWS services can help you move from idea to production with ML.

AI/ML session preview

Here’s a preview of some of the different sessions we’re offering this year by session type. You can always log in to the event portal to favorite or register for any of these sessions, or search the catalog for over 200 other sessions available.

Breakout sessions

Prepare data for ML with ease, speed, and accuracy (AIM319)

Join this session to learn how to prepare data for ML in minutes using Amazon SageMaker. SageMaker offers tools to simplify data preparation so that you can label, prepare, and understand your data. Walk through a complete data-preparation workflow, including how to label training datasets using SageMaker Ground Truth, as well as how to extract data from multiple data sources, transform it using the prebuilt visualization templates in SageMaker Data Wrangler, and create model features. Also, learn how to improve efficiency by using SageMaker Feature Store to create a repository to store, retrieve, and share features.

Achieve high performance and cost-effective model deployment (AIM408)

To maximize your ML investments, high performance and cost-effective techniques are needed to scale model deployments. In this session, learn about the deployment options available in Amazon SageMaker, including optimized infrastructure choices; real-time, asynchronous, and batch inferences; multi-container endpoints; multi-model endpoints; auto scaling; model monitoring; and CI/CD integration for your ML workloads. Discover how to choose a better inference option for your ML use case. Then, hear from Goldman Sachs about how they use SageMaker for fast, low-latency, and scalable deployments to provide relevant research content recommendations for their clients.

Implementing MLOps practices with Amazon SageMaker, featuring Vanguard (AIM320)

Implementing MLOps practices helps data scientists and operations engineers collaborate to prepare, build, train, deploy, and manage models at scale. During this session, explore the breadth of MLOps features in Amazon SageMaker that help you provision consistent model development environments, automate ML workflows, implement CI/CD pipelines for ML, monitor models in production, and standardize model governance capabilities. Then, hear from Vanguard as they share their journey enabling MLOps to achieve ML at scale for their polyglot model development platforms using SageMaker features, including SageMaker projects, SageMaker Pipelines, SageMaker Model Registry, and SageMaker Model Monitor.

Enhancing the customer experience with Amazon Personalize (AIM204)

Personalizing content for a customer online is key to breaking through the noise. Yet, brands face challenges that often prevent them from providing these seamless, relevant experiences. Learn how easy it is to use Amazon Personalize to tailor product and content recommendations to ensure that your users are getting the content they want, leading to increased engagement and retention.

AI/ML for sustainability innovation: Insight at the edge (AIM207)

As climate change, wildlife conservation, public health, racial and economic equity, and new energy solutions become increasingly interdependent, scalable solutions are needed for actionable analysis at the intersection of these fields. In this session, learn how the power of AI/ML and IoT can be brought as close as possible to the challenging edge environments that provide data to create these insights. Also learn how AWS puts AI/ML in the hands of the largest-scale fisheries on the planet, and how organizations can leverage data to support more sustainable, resilient supply chains.

Get started with AWS computer vision services (AIM202)

This session provides an overview of AWS computer vision services and demonstrates how these pretrained and customizable ML capabilities can help you get started quickly—no ML expertise required. Learn how to deploy these models onto the device of your choice to run an inference locally or use cloud APIs for your specific computing needs. Learn first-hand how Shutterstock uses AWS computer vision services to create performance at scale for media analysis, content moderation, and quality inspection use cases.

Chalk talk sessions

Build an ML-powered demand planning system using Amazon Forecast (AIM310)

This chalk talk explores how you can use Amazon Forecast to build an ML-powered, fully automated demand planning system for your business or your multi-tenant SaaS platform without needing any ML expertise. Forecast automatically generates highly accurate forecasts using ML, explains the drivers behind those forecasts, and keeps your ML models always up to date to capture new trends.

Hello, is it conversational AI you’re looking for? (AIM305)

Customers calling in for support expect a personalized experience and a quick resolution to their issue. With chatbots, you can provide automated and human-like conversational experiences for your customers. In this chalk talk, discuss strategies to design personalized experiences using Amazon Lex and Amazon Polly. Explore how to design conversation paths, customize responses, integrate with your applications, and enable self-service use cases to scale your customer support functions.

Harness the power of ML to protect your business with Amazon Fraud Detector (AIM308)

How does more than 20 years of Amazon experience fighting fraud translate into an AI service that can help companies detect more online fraud faster? In this session, learn how Amazon Fraud Detector transforms raw data into highly accurate ML-based fraud detection models. Then, discover how the service does data preparation and validation, feature engineering, data enrichment, and model training and tuning. Finally, with actual customer examples across a wide range of industries and fraud use cases, find out how the service makes deployment easy.

Deep learning applications with PyTorch (AIM404)

By using PyTorch in Amazon SageMaker, you have a flexible deep learning framework combined with a fully managed ML solution that allows you to transition seamlessly from research prototyping to production deployment. In this session, hear from the PyTorch team on the latest features and library releases. Also, learn how to develop with PyTorch using SageMaker for key use cases, such as using a BERT model for natural language processing (NLP) and instance segmentation for fine-grained computer vision with distributed training and model parallelism.

Explore, analyze, and process data using Jupyter notebooks (AIM324)

Before using a dataset to train a model, you need to explore, analyze, and preprocess it. During this chalk talk, learn how to use Amazon SageMaker to complete these tasks in a Jupyter notebook environment.

Machine learning at the edge with Amazon SageMaker (AIM410)

More ML models are being deployed on edge devices such as robots and smart cameras. In this chalk talk, dive into building computer vision (CV) applications at the edge for predictive maintenance, industrial IoT, and more. Learn how to operate and monitor multiple models across a fleet of devices. Also walk through the process to build and train CV models with Amazon SageMaker and how to package, deploy, and manage them with SageMaker Edge Manager. The chalk talk also covers edge device setup and MLOps lifecycle with over-the-air model updates and data capture to the cloud.

Builders’ sessions

Build and deploy a custom computer vision model in 60 minutes (AIM314)

Amazon Rekognition Custom Labels is an automated ML feature that enables customers to quickly train their own custom models for detecting business-specific objects and scenes from images—no ML expertise is required. In this builders’ session, learn how to use Amazon Rekognition Custom Labels to build and deploy your own computer vision model and push it to an application to showcase inference on images from a camera feed. Bring your laptop and an AWS account.

Easily label training data for machine learning at scale (AIM406)

Join this session to learn how to create high-quality labels while also reducing your data labeling costs by up to 70%. This builders’ session walks through the different workflow options in Amazon SageMaker Ground Truth, such as automatic labeling and assistive labeling features like auto-segmentation and image label verification. It also details how to build highly accurate training datasets for company brand logos, so you can build an ML model for company brand protection.

Workshop sessions

Develop your ML project with Amazon SageMaker (AIM402)

In this workshop, learn how to develop a full ML project end to end with Amazon SageMaker. Start with data exploration and analysis, data cleansing, and feature engineering with SageMaker Data Wrangler. Then, store features in SageMaker Feature Store, extract features for training with SageMaker Processing, train a model with SageMaker training, and then deploy it with SageMaker hosting. Also, learn how to use SageMaker Studio as an IDE and SageMaker Pipelines for orchestrating the ML workflow.

End-to-end 3D machine learning on Amazon SageMaker (AIM414)

As lidar sensors become more accessible and cost-effective, customers increasingly use point cloud data in new spaces like autonomous driving, robotics, and augmented reality. The growing availability of lidar sensors has increased use of point cloud data for ML tasks like 3D object detection, segmentation, object synthesis, and reconstruction. This workshop features Amazon SageMaker Ground Truth and explains how to ingest raw 3D point cloud data, label it, train a 3D object detection model, and deploy the model. The model in this session will be trained on an autonomous vehicle dataset.

AI workflow automation for document processing (AIM316)

Mortgage packets have hundreds of documents in various layouts and formats. With ML, you can set up a document-processing pipeline to automate mortgage application workflows like extracting text from W2s, paystubs, and deeds; classifying documents; or using custom entity recognition to pull out specific data points. In this workshop, learn various ways to use optical character recognition (OCR), NLP, and human-in-the-loop services to build a document-processing pipeline to automate mortgage applications—saving time, reducing manual effort, and improving ROI for your organization.

Boost the value of your media content with ML-powered search (AIM315)

Consumers rely on content not only to entertain but also to educate and facilitate purchasing decisions. To meet this demand, media content production is exploding. However, the process of producing, distributing, and monetizing this content is often complex, expensive, and time-consuming. Applying artificial intelligence and ML capabilities like image and video analysis, audio transcription, machine translation, and text analytics can solve many of these problems. In this workshop, utilize ML to extract detailed metadata from content and make it available for search, discovery, and editing use cases.

Instantly detect and diagnose anomalies within your business data (AIM302)

Anomalies in business data often indicate potential issues or even opportunities. ML can help you detect anomalies and then act on them proactively. In this workshop, learn how Amazon Lookout for Metrics automatically detects anomalies across thousands of metrics in near-real time and reduces false alarms.

Join the first annual AWS BugBust re:Invent Challenge and help set a Guinness record

The largest code fixing challenge is here! Python and Java developers of all skill levels can compete to fix software bugs, earn points, and win an array of prizes including Amazon Echo Dots, hoodies, and the grand prize of $1,500 USD. As you bust bugs, you also become part of an attempt to set the record for the largest bug fixing challenge with the Guinness World Records. All registered participants who fix even one bug will receive exclusive prizes and a certificate from AWS and Guinness to commemorate their contribution. Let the bug busting begin! You can join the challenge virtually or in-person at the AWS BugBust Hub in the main expo. Register now for free.

AWS DeepRacer: The fastest way to get rolling with machine learning

Developers of all skill levels from beginners to experts can get hands-on with ML by using AWS DeepRacer to train models in a cloud-based 3D racing simulator. Racers from virtually anywhere in the world can compete in the AWS DeepRacer League, the first global autonomous racing league driven by reinforcement learning. The race is on now! Sign in to AWS DeepRacer and compete in the AWS re:Invent Open for prizes and glory now through December 31, 2021. Tune in to the AWS DeepRacer League Championships on Twitch November 19 and 22 to see the 40 fastest developers of the 2021 season compete live. Learn from the best as they vie for a chance to advance to the Championship Cup Finale during Swami Sivasubramanian’s keynote on December 1, where they will race for their shot at $20,000 USD in cash prizes and the right to hoist the Championship Cup!

For those attending re:Invent in Las Vegas, don’t miss out on the opportunity to take your model from Sim2Real (simulation to reality) on the AWS DeepRacer Speedway inside the content hub at Caesar’s Forum. Upload your model and race a 1/18th scale autonomous RC car on a physical track. Stop by Tuesday afternoon to participate in the livestreamed wildcard race for a chance to win a trip back for re:Invent 2022. No model? No problem! The all-new AWS DeepRacer Arcade is available in the expo, where you can get literally get in the driver’s seat and take the wheel in this educational racing game. Take a spin on the virtual track and then compete against a featured AWS DeepRacer autonomous model in this arcade racing experience, with prizes and giveaways galore. Shift into the fast lane on your ML learning journey with AWS DeepRacer.

Head over to the re:Invent portal to build your schedule so you’re ready to hit the ground running. Be sure to stop by and talk to our experts at the AI/ML booth, or chat with the speakers after sessions. We can’t wait to see you in Las Vegas!


About the Authors

Andrea Youmans is a Product Marketing Manager on the AI Services team at AWS. Over the past 10 years she has worked in the technology and telecommunications industries, focused on developer storytelling and marketing campaigns. In her spare time, she enjoys heading to the lake with her husband and Aussie dog Oakley, tasting wine and enjoying a movie from time to time.

Read More

AWS AI/ML Community attendee guides to AWS re:Invent 2021

The AWS AI/ML Community has compiled a series of session guides to AWS re:Invent 2021 to help you get the most out of re:Invent this year. They covered four distinct categories relevant to AI/ML. With a number of our guide authors attending re:Invent virtually, you will find a balance between virtually accessible sessions and sessions available in-person.

The AWS AI/ML Community is a vibrant group of developers, data scientists, researchers, and business decision-makers that dive deep into artificial intelligence and machine learning (ML) concepts, contribute with real-world experiences, and collaborate on building projects together.

Community guides for developers new to machine learning

From AWS ML Hero Mike Chambers AWS reInvent 2021: How To, tips, and my session selection (video). In this video—which should be required viewing for anyone new to re:Invent—Mike dives deep, beyond simply recommending sessions, with loads of tips and advice for how to make the most of your re:Invent experience—in-person or virtual.

AWS ML Hero Cyrus Wong’s top five AL/ML newbies should attend! For folks new to ML on AWS, spend your time leaning and making use of Amazon AI/ML services with Cyrus’s top five re:Invent sessions.

AWS re:Invent 2021: How to maximize your in-person learning experience as a new Machine Learning practitioner, from AWS ML Community Builder Martin Paradesi. For those attending re:Invent in-person this year, check out Martin’s guide for five sessions curated for new ML practitioners.

From our new Egypt-based AWS ML Hero Salah Elhossiny: Top 5 AWS ML Sessions to Attend at AWS re:Invent 2021. For those new to AWS ML, spend your time learning and using Amazon SageMaker with the best five AWS re:Invent sessions to help you get started quickly!

Community guides for AI/ML developers

AWS ML Hero Juv Chan’s top five recommendations for AI/ML builders and architects. Juv, a Sr. Cloud AI Engineer/Architect, ML Hero, and re:Invent Championship Cup 2019 finalist, shares his top five session picks and can’t miss photos from re:Invent 2019.

Top 5 Sessions for AI/ML Developers at AWS re:Invent 2021, from AWS ML Community Builder Brooke Jamieson. For those attending re:Invent virtually this year, check out Brooke’s guide.

AWS ML Hero Tomasz Ptak’s AWS re:Invent 2021 schedule. Tomasz shares his session picks plus tips and advice for making the most of your re:Invent experience.

Production-grade ML re:Invent 2021 sessions guide, from AWS ML Community Builder Kyle Gallatin. Builder Kyle Gallatin shares five ML talks skewed towards his interests in scalable, production-grade ML.

Community guides for MLOps developers

AWS ML Hero Rustem Feyzkhanov’s top MLOps breakout sessions to look forward to at re:Invent 2021. Rustem shares seven sessions to help you stay in the loop of MLOps in the AWS Cloud.

AWS ML Community Builder Phil Basford’s must-see sessions. For those interested in MLOps, ML architecture, edge computing, or data analytics, see Phil’s guide and his tips on how to have fun in Vegas and at home for those attending virtually.

Community guides for ML data scientists

AWS ML Hero’s Philipp Schmid’s remote guide for your virtual re:Invent 2021, focused on NLP and machine learning. Attending remote from Germany, Hugging Face ML engineer and AWS ML Hero Philipp Schmid offers an in-depth guide.

AWS ML Community Builder Pier Paolo Ippolito’s top five suggestions for ML data scientists. Pier, a data scientist at SAS and editor at Towards Data Science, shares his top five picks curated for technical ML builders.

Other AWS ML Community guides worth exploring

AWS ML Hero Kesha Williams’s Machine Learning Attendee Guide 2021. The official AWS Hero guide from Kesha dives deep across all session categories. Check this guide out for a full walkthrough of how to build your schedule, and the ultimate deep dive into Kesha’s ML session picks.

Lastly, we have a unique in-depth guide from AWS ML Community Builder Janos Tolgyesi. Learn how to fight climate change with ML skills and make the Earth a better place with ML at re:Invent 2021. Janos shares his sessions picks and a bonus session suggestion for those interested in beer, plus personalized recommendations!

Whether you’re attending in-person or virtually this year, we hope these recommendations and advice from the AWS ML Community help you make the most of your re:Invent experience. Have a great re:Invent!


About the Author

Paxton Hall is a Marketing Program Manager for the AWS AI/ML Community on the AI/ML Education team at AWS. He has worked in retail and experiential marketing for the past 7 years, focused on developing communities and marketing campaigns. Out of the office, he’s passionate about public lands access and conservation, and enjoys backcountry skiing, climbing, biking, and hiking throughout Washington’s Cascade mountains.

Read More

Understand drivers that influence your forecasts with explainability impact scores in Amazon Forecast

We’re excited to launch explainability impact scores in Amazon Forecast, which help you understand the factors that impact your forecasts for specific items and time durations of interest. Forecast is a managed service for developers that uses machine learning (ML) to generate more accurate demand forecasts, without requiring any ML experience. To increase forecast model accuracy, you can add additional information or attributes such as price, promotion, category details, holidays, or weather information to your forecasting model, but you may not know how each attribute influences your forecast. With today’s launch, you can now understand how each attribute impacts your forecasted values using the explainability feature, which we discuss in this post.

ML-based forecasting models, which are more accurate than heuristic rules or human judgment, can drive significant improvement in revenue and customer experience. However, business leaders often lose trust in technology when they see forecasted numbers drastically differing from their intuition, and may find it hard to trust ML systems. Because demand planning decisions have a high impact on the business, business leaders may end up overriding forecasts because they may believe that they have to take the forecast model predictions at face value to make critical business decisions, without understanding why those forecasts were generated and what factors are influencing forecasts to be higher or lower. This can lead to compromising forecast accuracy, and you may lose the benefit of ML forecasting.

Amazon Forecast now provides explainability, which gives you item-level insights across your preferred time duration. Having a certain level of understanding on why a particular forecast value is high or low at a particular time is helpful for decision-making and building trust and confidence in your ML solutions. Explainability reports include impact scores, which help you understand how each attribute in your training data contributes to either increasing or decreasing your forecasted values for specific items. In addition, you can choose to understand explainability for your entire forecast horizon or for specific time durations. Explainability removes the need of running multiple manual analyses to understand past sales and external variable trends to explain forecast results.

How to interpret explainability impact scores

Explainability helps you better understand how the attributes, such as price, category, or holidays, in your datasets impact your forecast values. Forecast uses a metric called impact scores to quantify the relative impact of each attribute and determine whether they generally increase or decrease forecast values.

Impact scores measure the relative impact attributes have on forecast values. For example, if the price attribute has an impact score that is twice as large as the brand_id attribute, you can conclude that the price of an item has twice the impact on forecast values than the product brand. Impact scores also provide information on whether an attribute increases or decreases the forecasted value. A negative impact score reflects that the attribute tends to decrease the value of the forecast.

Impact scores measure the relative impact of attributes to each other, not the absolute impact. If an attribute has a low impact score, that doesn’t necessarily mean that it has a low impact on forecast values; it means that it has a lower impact on forecast values than other attributes used by the predictor. If you change attributes in your predictor, the impact scores may differ, and the attribute with the low impact score may have a higher score relative to other attributes. Also, you can’t use impact scores to determine whether particular attributes improve the model accuracy or not. You should use accuracy metrics such as weighted quantile loss and others provided by Forecast to access predictor accuracy.

In the following graph, we take an example of an explainability report graph that shows the relative impact of different attributes on the forecasted value of item_d 1 across all the time points in the forecast horizon. We see that the relative impact is in the following order: Price has the highest impact, followed by StoreLocation, then Promo and Holiday_US. Price has the highest influence item_id 1 and tends to increase the forecast value. StoreLocation has the second highest impact on item_id 1 but tends to decrease the forecast value. Because Promo is close to 0.2 impact score, Price has five times more impact than Promo on the forecasted value of item_id 1, and both attributes tend to increase the forecast value. Holiday_US has an impact score of 0, which means that this attribute doesn’t increase or decrease the forecast value for item_id 1 relative to other attributes.

The following image shows an example of the explainability report export file with the impact scores for specific time series and time points as well as aggregated scores across those time series and time points.

Generate explainability impact scores

In this section, we walk through how to generate explainability impact scores for your forecasts using the Forecast console. To use the new CreateExplainability API, refer to the notebook in our GitHub repo or review Forecast Explainability.

  1. On the Forecast console, create a dataset group. Upload your historical demand dataset as target time series followed by related time series or item metadata that you want to use for more accurate forecasting and for which you’re interested in seeing explainability impact scores.

  1. In the navigation pane, under your dataset, choose Predictors.
  2. Choose Train new predictor.

Forecast defaults to AutoPredictor as the default training option. No further action is needed from you, but remember that only forecasts generated from a model that has been trained with AutoPredictor are eligible for later generating explainability impact scores for specific forecasts.

  1. Now that your model is trained, choose Forecasts in the navigation pane.
  2. Choose Create a forecast.
  3. Select your trained predictor to create a forecast.
  4. Choose Insights in the navigation pane.
  5. Choose Create explainability.

  1. Choose the forecast that you want to generate explainability impact scores for.
  2. Choose if you want to see impact scores for all the time points in the forecast horizon or only for a specific time duration.

You can specify up to 500 consecutive time points per explainability report.

  1. Upload the list of specific time series for which you want to see explainability impact scores.

A time series is a unique combination of item ID and dimension. You can specify up to 50 time series per Forecast explainability.

  1. Specify the schema of the CSV file that you have uploaded.
  2. Choose Create explainability.

It takes less than an hour to generate the explainability impact scores.

  1. When the job status is active, choose the explainability job to view the impact score.

Here you can review the explainability impact score graph. You can use the controls at the top of the graph to drill down to specific time series or time points or view at an aggregated level.

  1. To export all the impact scores, choose Create explainability export in the Explainability exports
  2. Provide the export details and choose Create explainability export.

The export is saved in an Amazon Simple Storage Service (Amazon S3) bucket that you specify.

  1. When the export is complete, navigate to your S3 bucket to review the explainability report CSV file.

The following is an example of your explainability export CSV file. Depending on how large your dataset is, multiple files may be exported.

Aggregate explainability impact scores for category level analysis

You may want to review explainability for a group of items together, which can have more than 50 items. For example, a grocery retailer might be interested in understanding what is driving the forecasts for all their fruits and vegetables, and this category may consist of more than 50 SKUs in their data. However, Forecast lets you specify up to 50 time series per Forecast explainability job. If you have more than 50 time series, you need to run the explainability job multiple times with different items in each job and then combine them.

The explainability export file provides two type of impact scores: normalized impact scores and raw impact scores. Raw impact scores are based on Shapley values and aren’t scaled or bounded. Normalized impact scores scale the raw scores to a value between -1 and 1. Raw impact scores are useful for combining and comparing scores across different explainability resources. Use the raw impact scores of all the time series across multiple explainability jobs to aggregate, then compare it to find the relative influence of each attribute. You can view an example on how to do so by following the notebook in our GitHub repo.

Conclusion

Forecast now provides explainability for specific items and time durations of interest. With the explainability feature, you can understand how each attribute impacts your forecasted values. To learn more, review Forecast Explainability and the notebook in our GitHub repo. If you are interested in aggregated explainability for all your items at the predictor level, review our blog on using the CreateAutoPredictor API here. Explainability is available in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.


About the Authors

Namita Das is a Sr. Product Manager for Amazon Forecast. Her current focus is to democratize machine learning by building no-code/low-code ML services. On the side, she frequently advises startups and loves training her dog with new tricks.

Dima Fayyad is a Software Development Engineer on the Amazon Forecast team. She is passionate about machine learning and AI and is currently working on large-scale distributed systems in the forecasting space. In her free time, she enjoys exploring different cuisines, traveling, and skiing.

Youngsuk Park is a Machine Learning Scientist at AWS AI and Amazon Forecast. His research lies in the interplay between machine learning, optimization, and decision-making, with over 10 publications in top-notch ML/AI venues. Before joining AWS, he obtained a PhD from Stanford University.

Shannon Killingsworth is a UX Designer for Amazon Forecast. His current work is creating console experiences that are usable by anyone, and integrating new features into the console experience. In his spare time, he is a fitness and automobile enthusiast.

Read More

New Amazon Forecast API that creates up to 40% more accurate forecasts and provides explainability

We’re excited to announce a new forecasting API for Amazon Forecast that generates up to 40% more accurate forecasts and helps you understand which factors, such as price, holidays, weather, or item category, are most influencing your forecasts. Forecast uses machine learning (ML) to generate more accurate demand forecasts, without requiring any ML experience. Forecast brings the same technology used at Amazon to developers as a fully managed service, removing the need to manage resources.

With today’s launch, Forecast can now forecast up to 40% more accurate results by using a combination of ML algorithms that are best suited for your data. In many scenarios, ML experts train separate models for different parts of their dataset to improve forecasting accuracy. This process of segmenting your data and applying different algorithms can be very challenging for non-ML experts. Forecast uses ML to learn not only the best algorithm for each item, but the best ensemble of algorithms for each item, leading to up to 40% better accuracy on forecasts.

To further increase forecast model accuracy, you can add additional information or attributes such as price, promotion, category details, holidays, or weather information, but you may not know how each attribute influences your forecast. Forecasting is mission critical, and therefore having a certain level of attribute explainability is helpful for decision-making. With today’s launch, Forecast now helps you understand and explain how your forecasting model is making predictions by providing explainability reports after your model has been trained. Explainability reports include impact scores, so you can understand how each attribute in your training data contributes to either increasing or decreasing your forecasted values. By understanding how your model makes predictions, you can make more informed business decisions. For example, you can verify that your model is behaving as expected by confirming that attributes with a high impact score represent a valid signal for predictions in your business problem.

You can bring in your recent data to use the latest insights before forecasting for the next period. However, in doing so, you have to train your entire forecasting model again, which is a time-consuming process. Most Forecast customers deploy their forecasting workflow within their operations such as an inventory management solution and run their operations at a set cadence. Because retraining on the entire data can be time-consuming, customer operations may get delayed. With today’s launch, you can save up to 50% of retraining time by selecting to incrementally retrain your models with the new information that you have added.

To get more accurate forecasts, faster retraining, and explainability, use the new experience through the AWS Management Console or the CreateAutoPredictor API. This launch is accompanied with new pricing, which you can review at Amazon Forecast pricing.

Interpreting model explainability

Explainability helps you better understand how the attributes in your datasets, such as price, category, or holidays, impact your forecast values. Forecast uses a metric called impact scores to quantify the relative impact of each attribute and determine whether they generally increase or decrease forecast values.

Impact scores measure the relative impact attributes have on forecast values. For example, if the price attribute has an impact score that is twice as large as the brand_id attribute, you can conclude that the price of an item has twice the impact on forecast values than the product brand. Impact scores also provide information on whether an attribute increases or decreases the forecasted value. A negative impact score reflects that the attribute tends to decrease the value of the forecast.

Impact scores measure the relative impact of attributes to each other, not the absolute impact. If an attribute has a low impact score, that doesn’t necessarily mean that it has a low impact on forecast values; it means that it has a lower impact on forecast values than other attributes used by the predictor. If you change attributes in your predictor, the impact scores may differ, and the attribute with the low impact score may have a higher score relative to other attributes. Also, you can’t use impact scores to determine whether particular attributes improve the model accuracy or not. You should use accuracy metrics such as weighted quantile loss and others provided by Forecast to access predictor accuracy.

In the following graph, we take an example of a predictor where the relative impact of attributes is as follows: US holidays, promos, weather, price, and category. US holidays has the highest impact on the forecast values. US holidays tend to increase the forecasted value. Category has the lowest impact on the forecast values, and this attribute tends to decrease the forecast value.

Train a new predictor with the new Forecast API

In this section, we walk through how to train a new predictor using the newly launched forecasting API through the console. To use the new CreateAutoPredictor API directly, refer to the notebook in our GitHub repo or review Training Predictors.

  1. On the Forecast console, create a dataset group and upload your historical demand dataset as target time series followed by any related time series or item metadata that you want to use for more accurate forecasting.
  2. In the navigation pane, under your dataset, choose Predictors.
  3. Choose Train new predictor.
  4. In the Predictor settings section, enter a name for your predictor, how long in the future you want to forecast with the forecasting frequency, and the number of quantiles you want to forecast for.
  5. AutoPredictor is enabled by default; no further action is needed from you.
  6. For Optimization metric, you can choose an optimization metric to optimize AutoPredictor to tune a model for a specific accuracy metric of your choice. We leave this as default for our walkthrough.
  7. To get the predictor explainability report, select Enable predictor explainability.
  8. Under the input data configuration, you can add local weather information and national holidays for more accurate demand forecasts.
  9. In the Attribute configuration section, you can choose filling options for missing values.
  10. Choose Start to start training your predictor.
  11. After your predictor is trained, choose your predictor on the Predictors page.

On the predictor’s details page, you can view the overall predictor accuracy metrics and the explainability impact score.

  1. Now that your model is trained, choose Forecasts in the navigation pane.
  2. Choose Create a forecast.
  3. For Predictor, choose your trained predictor to create a forecast.

Retrain your predictor with new data

We now walk through how to use the Forecast console to retrain your predictor when you have new data for the same forecasting problem. You can also follow the notebook in our GitHub repo to learn how to use the CreateAutoPredictor API for retraining your predictor.

Before you retrain your predictor, you have to re-import your dataset with the latest available historical observations.

  1. On the Forecast console, under your dataset group in the navigation pane, choose Datasets.

In our example, we only update the target time series data. You can follow the same steps to update the related time series data as well.

  1. Choose the dataset name to view the details.
  2. In the Dataset imports section, choose Create dataset import.
  3. Provide the Amazon Simple Storage Service (Amazon S3) location of your dataset and complete importing your data.
  4. After your dataset has been imported, choose Predictors in the navigation pane.
  5. Select the predictor for which AutoPredictor enabled is True.

Only predictors with AutoPredictor enabled are eligible to be retrained.

  1. On the Predictor actions menu, choose Retrain.
  2. Enter a new name for the retrained predictor and choose Retrain predictor.

All the predictor configuration from the source predictor is automatically copied over to the new predictor that you retrain.

You’re redirected to the predictor details page where you can review the predictor settings.

  1. Now that your model is trained, choose Forecasts in the navigation pane.
  2. Choose Create a forecast.
  3. Choose your trained predictor to create a forecast.

Upgrade your existing legacy predictor to AutoPredictor

You can easily move your existing predictors to AutoPredictor to take advantage of more accurate forecasts by using a predictor that selects the best ensemble of algorithms for each item, faster retraining, and predictor explainability. Forecast takes the old predictor as a reference and creates a new AutoPredictor. You can follow the notebook in our GitHub repo to do the same through the CreateAutoPredictor API.

  1. On the Forecast console, choose a dataset group for which you have previously trained a predictor.
  2. In the navigation pane, under your dataset, choose Predictors.

An Upgrade link is next to any legacy predictor for which AutoPredictor is False.

  1. Select your predictor and on the Predictor actions menu, choose Upgrade.
  2. Enter the name of the new predictor.

All the predictor configurations from the old predictor are automatically copied over to train the new AutoPredictor.

You’re redirected to the predictor details page where you can review the predictor settings.

  1. Now that your model is trained, choose Forecasts in the navigation pane.
  2. Choose Create a forecast.
  3. Choose your trained predictor to create a forecast.

Conclusion

To get more accurate forecasts, faster retraining, and explainability, you can follow the steps mentioned in this post or follow the notebook in our GitHub repo. If you want to upgrade your existing forecasting models to the new CreateAutoPredictor API, you can do so with one click either on through console or as shown in the notebook in our GitHub repo. To learn more, review Training Predictors. We recommend reviewing the pricing for using these new features. All these new capabilities are available in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.


About the Authors

Namita Das is a Sr. Product Manager for Amazon Forecast. Her current focus is to democratize machine learning by building no-code/low-code ML services. On the side, she frequently advises startups and loves training her dog with new tricks.

Jitendra Bangani is an Engineering Manager at AWS, leading a growing team of curious and driven engineers for Amazon Forecast. He started his career at Amazon as an intern in 2013; since then he has helped build engaging shopping experiences, hyperscale distributed systems, and autonomous AI services that delight Amazon and AWS customers.

Hilaf Hasson is a Machine Learning Scientist at AWS, and currently leads the R&D team of scientists working on Amazon Forecast. Before joining AWS, he held multiple faculty positions, including as an Assistant Professor of Mathematics at Stanford University.

 Adarsh Singh works as a Software Development Engineer in the Amazon Forecast team. In his current role, he focuses on engineering problems and building scalable distributed systems that provide the most value to end users. In his spare time, he enjoys watching anime and playing video games.

Chinmay Bapat is a Sr. Software Development Engineer in the Amazon Forecast team. His interests lie in the applications of machine learning and building scalable distributed systems. Outside of work, he enjoys playing board games and cooking.

Read More

Next Gen Stats Decision Guide: Predicting fourth-down conversion

It is fourth-and-one on the Texans’ 36-yard line with 3:21 remaining on the clock in a tie game. Should the Colts’ head coach Frank Reich send out kicker Rodrigo Blankenship to attempt a 54-yard field goal or rely on his offense to convert a first down? Frank chose to go for it, leading to a first-down conversion and an eventual touchdown to seal the win. Was this the optimal call or a gamble that ended up working? Through a collaboration between the NFL’s Next Gen Stats team and AWS, NFL fans can now get an answer to this question.

Like the Colts-Texans example, the decision of what to do on a fourth down late in the game can be the difference between a win and a loss. While it can be tempting to focus on fourth-downs late in the game, even fourth-down decisions that occur early in the game can be important. Fourth-down decisions early in the game can have reverberating effects that compound over the course of a game or season. Head coaches who consistently make the right call on the fourth down put their teams in the best possible position to win, but how does a coach know what the right call is? What factors do they have to weigh, and how can a computer give fans insights into this complicated decision-making process?

The problem can be represented as a tree of choices and their respective potential outcomes. On any fourth down, a team has three main options: punt, kick a field goal, or go for it. If a team punts, their opponent generally gains possession of the ball at some point farther down the field. On a field goal attempt, the two main outcomes are the offensive team either makes the field goal or misses the field goal. If they make the field goal, they gain three points. If they miss the field goal, the defense gains possession of the ball at the location of the attempt. Similarly, if a team chooses to go for it, there are two main outcomes. Either the team gains enough yards for a first-down (or potentially a touchdown), or the defense gains possession of the ball at the end of the play.

When coaches decide what to do on a fourth-down, they must weigh all the potential outcomes and the impact of these outcomes on the odds of winning the game. To help fans understand a coach’s decision, the NFL and AWS partnered to create the Next Gen Stats Decision Guide. The Next Gen Stats Decision Guide is a suite of machine learning (ML) models designed to determine the optimal fourth-down call. The decision guide does this by predicting the odds of each potential fourth-down outcome and the resulting odds of winning the game. By comparing the odds of winning the game for each fourth-down choice, the Next Gen Stats Decision Guide provides a data-driven answer to that optimal fourth-down call.

Going back to Frank Reich’s decision, the Colts needed 0.25 yards to gain a first down. What is the probability that they convert? As shown in the following figure, our fourth-down conversion probability model predicts an 81% chance. When paired with the updated win probability of 75% if they convert, we get an expected win probability of 69%. However, if they choose to kick a field goal, the chance of making the field goal is around 42%. Paired with the win probability of 71% if successful, we get an expected win probability of 56%. Based on these expected probabilities, the Next Gen Stats Decision Guide recommends going for it with a 13% difference.

In addition to fourth-down decisions, coaches must decide what to do after scoring a touchdown. The team can kick an extra point (+1 point) or elect to attempt a two-point conversion (+2 points). The application of the Next Gen Stats Decision Guide to fourth-down plays and after-touchdown plays has been presented before, and is a good primer for this discussion. In this post, we focus on the models that determine the probability of converting a fourth-down conversion. We share how we feature engineered and developed the ML model and metrics that were used to evaluate the quality of predictions.

Go-for-it model

If a team chooses to go for it on a fourth-down, the team must gain enough yards to make a first-down on that single play. This means that not all fourth-downs are equal. Some require the offense to gain less than a yard, while others may occasionally require the offense to gain more than 10 yards. The location on the field, time left on the clock, and relative strengths of the teams are among the important parameters in understanding the odds of success. In building the Go-for-it model, we examine these and other factors to determine which features are most important in constructing a performant model.

Problem formulation

The odds of converting on a fourth-down can be formulated as a multi-class classifier. In this formulation, each class represents the offense gaining some number of yards on the play. The probability of each class is used as the odds that the team will gain that number of yards on the play. The following histogram shows the yards gained on third- and fourth-down plays from 2016–2020. An initial approach might be to make each class in the model represent an integer number of yards gained, but the histogram shows that this approach will be difficult. Classes in the long tail of the graph (roughly 40–100 yards) occur infrequently, and this sort of class imbalance can be difficult account for in model training.

To combat the potential class imbalance, we used an unequal distribution of yards to classes. Instead of each yard gained being an individual class, we used 17 different classes to encompass all the potential outcomes shown in in the graph.

As shown in the following table, we use one class for all negative or zero-yards-gained results. Between 1–15 yards gained, we use one class for each potential outcome. The reason for this breakdown is that 88% of fourth-down plays have somewhere between 1–15 yards to go. This enables the model to capture a large majority of fourth-down situations with high fidelity. To address plays with more than 15 yards to go, we employ a decay factor to represent the decreasing probability of getting more yards on a single play.

Yards Model Classes (17)
Less than or equal to 0 0
1–15 yards 1–15 (15 classes)
16+ yards 16

The following equation shows the decay factor used where the probability of converting ( Pconversion ) is the probability of getting 16 or more yards () divided by the actual distance needed for a first down (d ) minus 15 yards.

Features

Just as a coach needs to consider many factors when deciding what to do in a game, the conversion probability models also have many potential features to use. Part of the modeling process involved determining which features to incorporate into the model. We used feature importance measures like correlation to help us identify several high-value features (see the following table). These features include the actual yards-to-go, the Vegas spread, and the historical aggregations of expected points added (EPA) by team and quarterback.

The actual yards-to-go is arguably the most important feature for this model, aligning with general football knowledge. The more yards a team needs to gain, the less likely the team is to achieve that outcome. What makes the actual yards-to-go metric even more valuable in this model is that it is derived from the NGS tracking data. Traditional NFL datasets often represent the yards-to-go as an integer, which obscures the variable nature of the game. With the NGS tracking data, we can get a measurement of the football’s location with sub-foot accuracy. This allows our model to understand the difference between fourth and inches versus fourth and 1 yard.

Although the actual yards-to-go is a clear metric to provide the model, some information is harder to quantify immediately and provide to the model. For example, a coach understands the unique skillsets of their team and the opposition, both on that day and historically. To assess coaching decisions, the model needs a way to use similar information. The Vegas lines are a useful condensation of vast amounts of situational and historical knowledge about the teams into a small set of numbers. Specifically, the point spread and the total points lines capture information about prevailing beliefs regarding the relative strengths of the teams, and the model found these values useful.

Input Features Description
actualYardsToGo The yards to go as measured using NGS tracking data between the ball at snap and the yards-to-go marker
isCalledPass Is the play predicted to be a pass or a rush?
totalLine The closing spread line for the game
possessionTeamLine The number of points the possession team is favored by according to Vegas
possessionTeamTotal The number of total points the possession team is expected to score as indicated by the Vegas total and spread lines
offEpa A team offense’s average expected points added per play over the last X number of plays in similar situations
defEpa A team defense’s average expected points added allowed per play over the last X number of plays in similar situations
qbEpa A team offense’s average expected points added per play over the last X number of plays when the quarterback on the field attempted a pass, run, or was sacked
qbSuccessEpa Quarterback success EPA for the last N similar plays

Similar to how the Vegas lines provide game-level insight into relative team strengths, we can use EPA values to provide insight into relative team strengths at a more granular level. These EPA values, calculated using other NGS models, provide insight into how the team has performed in similar situations in the past. The EPA models can be broken down by the offense, defense, and quarterback. This provides the model with information about how successful the respective teams have been in the past in addition to how successful the current quarterback has been. The following figure shows the relative importance of the features after HPO. As discussed earlier, this feature importance makes intuitive sense.

Model training

To train the model, we used all the data from third- and fourth-down plays from 2016–2019 regular seasons as the training set. We held out the data from 2020 for the testing set.

For model architecture, a handful of different models were compared, including XGBoost, PyTorch Tabular, and AutoML-based models. Of these options, the XGBoost model provided the best results. It is also explained by using the Shapely Additive Explanations (SHAP) feature importance measures. Because our goal is to optimize for conversion probabilities, we used the Brier score (probabilistic loss function) to measure the performance of our models. The Brier score measures the mean squared difference between predicted probability assigned to the possible outcomes and actual outcomes. A lower Brier score is considered better.

To optimize our models, we used Amazon SageMaker hyperparameter optimization (HPO) to fine-tune XGBoost parameters like learning rate, max depth, subsamples, alpha, and gamma. The SageMaker-managed HPO service helped us run multiple experiments in parallel to identify optimal hyperparameter configurations. Each experiment took only a few minutes because tuning jobs are distributed across 10 instances. In addition, we used SageMaker features, including automatic early stopping and warm starting from previous tuning jobs. This combined with custom metrics improved the performance of the model within minutes. Examples of various SageMaker-based HPO tuning jobs are available on GitHub.

Go-for-it model results

After training and HPO, the XGBoost model achieved a Brier score of 0.21. In addition to the Brier score, we examined the model predictions to ensure they were recreating known aspects of the game. For example, the odds of converting on a fourth-down play decrease as the number of yards needed for a first-down increase. The following figure shows the model’s predicted conversion probabilities as a function of the yards-to-go. We can observe two key trends. First, as expected, the conversion probability decreases as the yards-to-go increases. Second, a team is generally better off running the ball on short yards-to-go situations and passing the ball on long yards-to-go situations.

For the Next Gen Stats Decision Guide, it’s not sufficient for the model to make correct predictions. It must also assign valid probabilities to those predictions. To examine the validity of the model probabilities, we compare the probabilities against the aggregate play outcomes, as shown in the following graph. The model predictions were binned into 10%-wide categories from 0–90%. For each bin, the fraction of plays that were converted was calculated (bar height). For an ideal model, the bin heights should be roughly the midpoint of each bin (solid line). The following graph shows that when the model provides a conversion probability between 0–60%, the actual aggregate outcomes of these plays closely match the model’s predictions. For model predictions between 60–90%, the model slightly appears to underestimate the offense’s probabilities of converting (most notably between 60–70%). In situations where the agreement is poor, we can use postprocessing techniques to increase the agreement between play outcomes and the model probabilities. For an example for deep learning models, see Quantifying uncertainty in deep learning systems.

ML production pipeline

For the model in production, we used SageMaker for preprocessing, training, and postprocessing. The model is hosted using a highly scalable, available, and secured Amazon Elastic Kubernetes Service (Amazon EKS) for production usage. The following figure shows a high-level diagram of the production pipeline. All steps are automated and require minimal maintenance.

Summary

AWS and the NFL NGS team jointly developed the Next Gen Stats Decision Guide, which helps fans understand the choices coaches make at pivotal moments in the game. The odds of converting on a fourth-down play are a key component of the Next Gen Stats Decision Guide. In this post, we provided insight into how AWS helped the NFL create the model powering fourth-down conversions and discussed methods to assess model performance.

The NGS team will be hosting these models as part of the 2021 NFL season. Keep an eye out for the Next Gen Stats Decision Guide during the next NFL game.

You can find full examples of creating custom training jobs, implementing HPO, and deploying models on SageMaker at the AWS Labs GitHub repo. If you would like us to help and accelerate your use of ML, contact the Amazon ML Solutions Lab program.


About the Authors

Selvan Senthivel is a Senior ML Engineer with Amazon ML Solutions Lab team at AWS, focusing on helping customers on Machine Learning, Deep Learning problems and end-to-end ML solutions. He was the founding engineering lead of Amazon Comprehend Medical service and contributed to the design/architecture of multiple AWS AI services.

Lin Lee Cheong is a Senior Scientist and Manager with the Amazon ML Solutions Lab team at Amazon Web Services. She works with strategic AWS customers to explore and apply artificial intelligence and machine learning to discover new insights and solve complex problems.

Tyler Mullenbach is a Principal Data Science Manager with AWS Professional Services. He leads a global team of data science consultants focusing on helping customers turn their data into insights and bring ML models to production.

Ankit Tyagi is a Senior Software Engineer with the NFL’s Next Gen Stats team. He focuses on backend data pipelines and machine learning for delivering stats to fans. Outside of work, you can find him playing tennis, experimenting with brewing beer, or playing guitar.

Mike Band is the Lead Analyst for NFL’s Next Gen Stats. He contributes to the ideation, development, and communication of advanced football performance metrics for the NFL Media Group, NFL Broadcast Partners, and fans.

Juyoung Lee is a Senior Software Engineer with the NFL’s Next Gen Stats. Her work focuses on designing and developing machine learning models to create stats for fans. On her spare time, she enjoys being active by playing Ultimate Frisbee and doing CrossFit.

Michael Schaefer was the Director of Product and Analytics for NFL’s Next Gen Stats. His work focuses on the design and execution of statistics, applications, and content delivered to NFL Media, NFL Broadcaster Partners, and fans.

Michael Chi is the Director of Technology for NFL’s Next Gen Stats. He is responsible for all technical aspects of the platform which is used by all 32 clubs, NFL Media and Broadcast Partners. In his free time, he enjoys being outdoors and spending time with his family.

Read More

Chain custom Amazon SageMaker Ground Truth jobs for image processing

Amazon SageMaker Ground Truth supports many different types of labeling jobs, including several image-based labeling workflows like image-level labels, bounding box-specific labels, or pixel-level labeling. For situations not covered by these standard approaches, Ground Truth also supports custom image-based labeling, which allows you to create a labeling workflow with a completely unique UI and associated processing. Beyond that, you can chain different Ground Truth labeling jobs together so that the output of one job acts as the input to another job, to allow even more flexibility in a labeling workflow by breaking the job into multiple stages.

In this post, we show how to chain two custom Ground Truth jobs together to perform advanced image manipulations, including isolating portions of images, and de-skewing images that were photographed from an angle. Additionally, we demonstrate several techniques for augmenting source images, which are helpful for situations where you have a limited number of source images.

Extracting regions of an image

Suppose we’re tasked with creating a machine learning (ML) model that processes an image of a shelving unit and determines whether any of the bins in that shelving unit need restocking. Due to the size of the storage room, a single camera is used to capture images of several shelving units, each from a different angle. The following image is an example of such a shelving unit.

Figure 1: A shelving unit with many bins full, photographed from an angle

Figure 1: A shelving unit with many bins full, photographed from an angle

For training or inference, we need images of individual bins, rather than the overall shelving unit. The model we’re developing takes an image of a single bin, and return a classification of Empty or Full. This classification feeds into an automated restocking system, allowing us to maintain stock levels at the bin level without the trouble of someone physically checking the levels.

Unfortunately, because the shelf images are taken at an angle, each bin is skewed and has a different size and shape. Because any bin images extracted from the main image are rectangular, the extracted images include undesirable content, as shown in the following image of two adjoining bins.

Figure 2: A closeup of a single bin which shows two adjoining bins

Figure 2: A closeup of a single bin, which shows two adjoining bins

In this example, we’ve isolated a rectangular region that bounds a given bin, but because the image was taken from an angle, portions of the bins on the left and right are also partially included. Because a rectangular section includes information from other bins, an image like this performs poorly when used for training or for inference.

To solve this, we can select a non-rectangular section of the original image and warp it to create a new image. The following image demonstrates the results of a warp transformation applied to the original image.

Figure 3: Original shelving unit with just the bins isolated, and the image warped to make it orthogonal

Figure 3: Original shelving unit with just the bins isolated, and the image warped to make it orthogonal

This warping accomplishes two tasks. First, we’ve selected just the shelving unit, cropping out the nearby walls, floor, and any other irrelevant areas near the edges of the shelves. Second, the warping of the image results in each bin being more rectangular than the original version.

This warped image doesn’t have any new content—it’s just a distortion of the original image. But by performing this warping, each bin can be selected using a rectangular bounding box, which provides needed consistency, no matter what position a bin is in. Compare the following two bin images: the image on the left is extracted from the original image, and the image on the right is the same bin, extracted from the de-skewed image.

Figure 4: A single bin from the original image (left) compared with the bin from the warped image (right)

Figure 4: A single bin from the original image (left) compared with the bin from the warped image (right)

The bottom opening of the bin was originally at an angle, and now it’s horizontal. Overall, we’ve reduced the amount of the bin shown, and increased the proportion of the contents of the bin within the image. This improves our ML training process, because each bin image has less superfluous content.

Ground Truth jobs

Each custom Ground Truth labeling job is defined with a web-based user interface and two associated AWS Lambda functions (for more information, see Processing with AWS Lambda). One function runs prior to each image displayed by the UI, and the other runs after the user finishes the labeling job for all the images. Ground Truth offers several pre-made user interfaces (like bounding box-based selection), but you can also create your own custom UI if needed, as we do for this example.

When Ground Truth jobs are chained together, the output of one job is used as the input of another job. For this task, we use two chained jobs to process our images, as illustrated in the following diagram.

Figure 5: Architecture diagram showing two chained Ground Truth jobs, each with a Pre- and Post- UI Lambda function

Figure 5: Architecture diagram showing two chained Ground Truth jobs, each with a Pre- and Post- UI Lambda function

Images that need to be labeled are stored in Amazon Simple Storage Solution (Amazon S3). The first Ground Truth job retrieves images from Amazon S3 and displays them one at a time, waiting for the user to specify the four corners of the shelving unit within the image, using a custom UI. When that step is complete, the post-UI Lambda function uses the corner coordinates to warp or de-skew each image, which is then saved to the same S3 bucket that the original image resides in. Note that it’s not necessary to do this during inference—for a situation where the camera is in a fixed location, you can save those corner coordinates for later use during inference.

After the first Ground Truth job has de-skewed the source image, the second job uses simple bounding boxes to label each bin within the de-skewed image. The post-UI Lambda function then extracts the individual bin images, augments them with rotations, flipping, and color and brightness alterations, and writes the resulting data to Amazon S3, where it can be used for model training or other purposes.

You can find example code and deployment instructions in the GitHub repo.

Custom user interface

From a labeler’s perspective, after they log in and select a job, they use the custom UI to select the four corners of a bin.

Figure 6: The custom Ground Truth UI for the first labeling job

Figure 6: The custom Ground Truth UI for the first labeling job

For custom Ground Truth user interfaces, a set of custom tags is available, known as Crowd tags. These tags include bounding boxes, lines, points, and other user interface elements that you can use to build a labeling UI. In this case, we use the crowd-polygon tag, which is displayed as a yellow polygon.

After the labeler draws a polygon with four corners on the UI for all source images, they exit the UI by choosing Done. At this point, the post-UI Lambda function is run and each de-skewed image is saved to Amazon S3. When the function is complete, control is passed to the next chained Ground Truth job.

Generally, chained Ground Truth jobs reuse an output manifest file as the input manifest file for the next (chained) labeling job. In this case, we created a new image, so we modify the pre-UI Lambda function so it passes in the correct (de-skewed) file name, rather than the original, skewed image file name.

The second job in the chain uses the bounding box-based labeling functionality that is built in to Ground Truth. The bounding boxes don’t cover the entire contents of each bin, but they do cover the openings of the bins. This provides enough data to create a model to detect whether a bin is full or empty.

Figure 7: De-skewed image with bounding boxes from the second chained Ground Truth labeling job

Figure 7: De-skewed image with bounding boxes from the second chained Ground Truth labeling job

After the labeler selects all the bins, they exit the UI by choosing Done. At this point, the post-UI Lambda function runs and crops out each bin image, makes variations of it for image augmentation purposes, and saves the variations into a folder structure in Amazon S3 based on classification. The top level of the folder structure is named training_data, with two subfolders: empty and full. Each subfolder contains images of bins that are either empty or full, suitable for use in model training.

Image augmentation

Image augmentation is a technique sometimes used in image-based ML workloads. It’s especially helpful when the number of source images is low, or limited in the number of variants. Typically, image augmentation is performed by taking a source image and creating multiple variants of it, altering factors like brightness and contrast, coloring, and even cropping or rotating images. These variations help the resulting model be more robust and capable of handling images that are dissimilar to the original training images.

In this example, we use image augmentation methods in the post-UI Lambda function of the second Ground Truth job. The labeler has specified the bounding boxes for each bin image in the Ground Truth UI, and that data is used to extract portions of the overall image. Those extracted portions are of the individual bins, and these smaller images are used as input into our image augmentation process.

In our case, we create 14 variants of each bin image, with variations of brightness, contrast, and sharpness, as well horizontal flipping combined with these variations. With this approach, a single source image of a shelving unit with 24 bins generates 14 variants for each bin image, for a total of 336 images that can be used for training a model. The following shows an original bin image (upper left) and each of its variants.

Conclusion

Custom Ground Truth jobs provide a great deal of flexibility, and using them with images allows advanced functionality like cropping and de-skewing images, as well as performing custom image augmentation. The supplied Crowd HTML tags support many different labeling approaches like polygons, lines, text boxes, modal alerts, key point placement, and others. Combined with the power of pre-UI and post-UI Lambda functions, a custom Ground Truth job allows you to construct complex labeling jobs to support a wide variety of use cases, and combining different custom jobs by chaining them together provides even more options.

You can use the GitHub repo associated with this post as a starting point for your own chained image labeling jobs. You can also extend the code to support additional image augmentation methods (like cropping or rotating the source images), or modify it to fit your particular use case.

To learn more about chained Ground Truth jobs, see Chaining Labeling Jobs.

For more information about the Crowd tags you can use in the Ground Truth UI, see Crowd HTML Elements Reference.


About the Author

Greg Sommerville is a Senior Prototyping Architect on the AWS Envision Engineering Americas Prototyping team, where he helps AWS customers implement innovative solutions to challenging problems with machine learning, IoT and serverless technologies. He lives in Ann Arbor, Michigan and enjoys practicing yoga, catering to his dogs, and playing poker.

Read More

Accelerate data preparation using Amazon SageMaker Data Wrangler for diabetic patient readmission prediction

Patient readmission to hospital after prior visits for the same disease results in an additional burden on healthcare providers, the health system, and patients. Machine learning (ML) models, if built and trained properly, can help understand reasons for readmission, and predict readmission accurately. ML could allow providers to create better treatment plans and care, which would translate to a reduction of both cost and mental stress for patients. However, ML is a complex technique that has been limiting organizations that don’t have the resources to recruit a team of data engineers and scientists to build ML workloads. In this post, we show you how to build an ML model based on the XGBoost algorithm to predict diabetic patient readmission easily and quickly with a graphical interface from Amazon SageMaker Data Wrangler.

Data Wrangler is an Amazon SageMaker Studio feature designed to allow you to explore and transform tabular data for ML use cases without coding. Data Wrangler is the fastest and easiest way to prepare data for ML. It gives you the ability to use a visual interface to access data and perform exploratory data analysis (EDA) and feature engineering. It also seamlessly operationalizes your data preparation steps by allowing you to export your data flow into Amazon SageMaker Pipelines, a Data Wrangler job, Python file, or Amazon SageMaker Feature Store.

Data Wrangler comes with over 300 built-in transforms and custom transformations using either Python, PySpark, or SparkSQL runtime. It also comes with built-in data analysis capabilities for charts (such as scatter plot or histogram) and time-saving model analysis capabilities such as feature importance, target leakage, and model explainability.

In this post, we explore the key capabilities of Data Wrangler using the UCI diabetic patient readmission dataset. We showcase how you can build ML data transformation steps without writing sophisticated coding, and how to create a model training, feature store, or ML pipeline with reproducibility for a diabetic patient readmission prediction use case.

We also have published a related GitHub project repo that includes the end-to-end ML workflow steps and relevant assets, including Jupyter notebooks.

We walk you through the following high-level steps:

  • Studio prerequisites and input dataset setup
  • Design your Data Wrangler flow file
  • Create processing and training jobs for model building
  • Host a trained model for real-time inference

Studio prerequisites and input dataset setup

To use Studio and Studio notebooks, you must complete the Studio onboarding process. Although you can choose from a few authentication methods, the simplest way to create a Studio domain is to follow the Quick start instructions. The Quick start uses the same default settings as the standard Studio setup. You can also choose to onboard using AWS Single Sign-On (AWS SSO) for authentication (see Onboard to Amazon SageMaker Studio Using AWS SSO).

Dataset

The patient readmission dataset captures 10 years (1999–2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes with about 100,000 observations.

You can start by downloading the public dataset and uploading it to an Amazon Simple Storage Service (Amazon S3) bucket. For demonstration purposes, we split the dataset into four tables based on feature categories: diabetic_data_hospital_visits.csv, diabetic_data_demographic.csv, diabetic_data_labs.csv, and diabetic_data_medication.csv. Review and run the code in datawrangler_workshop_pre_requisite.ipynb. If you leave everything at its default inside the notebook, the CSV files will be available in s3://sagemaker-${region}-${account_number}/sagemaker/demo-diabetic-datawrangler/.

Design your Data Wrangler flow file

To get started – on the Studio File menu, choose New, and choose Data Wrangler Flow.

This launches a Data Wrangler instance and configures it with the Data Wrangler app. The process takes a few minutes to complete.

Load the data from Amazon S3 into Data Wrangler

To load the data into Data Wrangler, complete the following steps:

  1. On the Import tab, choose Amazon S3 as the data source.
  2. Choose Add data source.

You could also import data from Amazon Athena, Amazon Redshift, or Snowflake. For more information about the currently supported import sources, see Import.

  1. Select the CSV files from the bucket s3://sagemaker-${region}-${account_number}/sagemaker/demo-diabetic-datawrangler/ one at a time.
  2. Choose Import for each file.

When the import is complete, data in an S3 bucket is available inside Data Wrangler for preprocessing.

Join the CSV files

Now that we have imported multiple CSV source dataset, let’s join them for a consolidated dataset.

  1. On the Data flow tab, for Data types, choose the plus sign.
  2. On the menu, choose Join.
  3. Choose the diabetic_data_hospital_visits.csv dataset as the Right dataset.
  4. Choose Configure to set up the join criteria.
  5. For Name, enter a name for the join.
  6. For Join type¸ choose a join type (for this post, Inner).
  7. Choose the columns for Left and Right.
  8. Choose Apply to preview the joined dataset.
  9. Choose Add to add it to the data flow file.

Built-in analysis

Before we apply any transformations on the input source, let’s perform a quick analysis of the dataset. Data Wrangler provides several built-in analysis types, like histogram, scatter plot, target leakage, bias report, and quick model. For more information about analysis types, see Analyze and Visualize.

Target leakage

Target leakage occurs when information in an ML training dataset is strongly correlated with the target label, but isn’t available when the model is used for prediction. You might have a column in your dataset that serves as a proxy for the column you want to predict with your model. For classification tasks, Data Wrangler calculates the prediction quality metric of ROC-AUC, which is computed individually for each feature column via cross-validation to generate a target leakage report.

  1. On the Data Flow tab, for Join, choose the plus sign.
  2. Choose Add analysis.
  3. For Analysis type, choose Target Leakage.
  4. For Analysis name¸ enter a name.
  5. For Max features, enter 50.
  6. For Problem Type¸ choose classification.
  7. For Target, choose readmitted.
  8. Choose Preview to generate the report.

As shown in the preceding screenshot, there is no indication of target leakage in our input dataset. However, a few features like encounter_id_1, encounter_id_0, weight, and payer_code are marked as possibly redundant with 0.5 predictive ability of ROC. This means these features by themselves aren’t providing any useful information towards predicting the target. Before making the decision to drop these uninformative features, you should consider whether these could add value when used in tandem with other features. For our use case, we keep them as is and move to the next step.

  1. Choose Save to save the analysis into your Data Wrangler data flow file.

Bias report

AI/ML systems are only as good as the data we put into them. ML-based systems are more accessible than ever before, and with the growth of adoption throughout various industries, further questions arise surrounding fairness and how it is ensured across these ML systems. Understanding how to detect and avoid bias in ML models is imperative and complex. With the built-in bias report in Data Wrangler, data scientists can quickly detect bias during the data preparation stage of the ML workflow. Bias report analysis uses Amazon SageMaker Clarify to perform bias analysis.

To generate a bias report, you must specify the target column that you want to predict and a facet or column that you want to inspect for potential biases. For example, we can generate a bias report on the gender feature for Female values to see whether there is any class imbalance.

  1. On the Analysis tab, choose Create new analysis.
  2. For Analysis type¸ choose Bias Report.
  3. For Analysis name, enter a name.
  4. For Select the column your model predicts, choose readmitted.
  5. For Predicted value, enter NO.
  6. For Column to analyze for bias, choose gender.
  7. For Column value to analyze for bias, choose Female.
  8. Leave remaining settings at their default.
  9. Choose Check for bias to generate the bias report.

As shown in the bias report, there is no significant bias in our input dataset, which means the dataset has a fair amount of representation by gender. For our dataset, we can move forward with a hypothesis that there is no inherent bias in our dataset. However, based on your use case and dataset, you might want to run similar bias reporting on other features of your dataset to identify any potential bias. If any bias is detected, you can consider applying a suitable transformation to address that bias.

  1. Choose Save to add this report to the data flow file.

Histogram

In this section, we use a histogram to gain insights into the target label patterns inside our input dataset.

  1. On the Analysis tab, choose Create new analysis.
  2. For Analysis type¸ choose Histogram.
  3. For Analysis name¸ enter a name.
  4. For X axis, choose readmitted.
  5. For Color by, choose race.
  6. For Facet by, choose gender.
  7. Choose Preview to generate a histogram.

This ML problem is a multi-class classification problem. However, we can observe a major target class imbalance between patients readmitted <30 days, >30 days, and NO readmission. We can also see that these two classifications are proportionate across gender and race. To improve our potential model predictability, we can merge <30 and >30 into a single positive class. This merge of target label classification turns our ML problem into a binary classification. As we demonstrate in the next section, we can do this easily by adding respective transformations.

Transformations

When it comes to training an ML model for structured or tabular data, decision tree-based algorithms are considered best in class. This is due to their inherent technique of applying ensemble tree methods in order to boost weak learners using the gradient descent architecture.

For our medical source dataset, we use the SageMaker built-in XGBoost algorithm because it’s one of the most popular decision tree-based ensemble ML algorithms. The XGBoost algorithm can only accept numerical values as input, therefore as a prerequisite we must apply categorical feature transformations on our source dataset.

Data Wrangler comes with over 300 built-in transforms, which require no coding. Let’s use built-in transforms to apply a few key transformations and prepare our training dataset.

Handle missing values

To address missing values, complete the following steps:

  1. Switch to Data tab to bring up all the built-in transforms
  2. Expand Handle missing in the list of transforms.
  3. For Transform, choose Impute.
  4. For Column type¸ choose Numeric.
  5. For Input column, choose diag_1.
  6. For Imputing strategy, choose Mean.
  7. By default, the operation is performed in-place, but you can provide an optional Output column name, which creates a new column with imputed values. For our blog we go with default in-place update.
  8. Choose Preview to preview the results.
  9. Choose Add to include this transformation step into the data flow file.
  10. Repeat these steps for the diag_2 and diag_3 features and impute missing values.

Search and edit features with special characters

Because our source dataset has features with special characters, we need to clean them before training. Let’s use the search and edit transform.

  1. Expand Search and edit in the list of transforms.
  2. For Transform, choose Find and replace substring.
  3. For Input column, choose race.
  4. For Pattern, enter ?.
  5. For Replacement string¸ choose Other.
  6. Leave Output column blank for in-place replacements.
  7. Choose Preview.
  8. Choose Add to add the transform to your data flow.
  9. Repeat the same steps for other features to replace weight and payer_code with 0 and medical_specialty with Other.

One-hot encoding for categorical features

To use one-hot encoding for categorical features, complete the following steps:

  1. Expand Encode categorical in the list of transforms.
  2. For Transform, choose One-hot encode.
  3. For Input column, choose race.
  4. For Output style, choose Columns.
  5. Choose Preview.
  6. Choose Add to add the change to the data flow.
  7. Repeat these steps for age and medical_specialty_filler to one-hot encode those categorical features as well.

Ordinal encoding for categorical features

To use ordinal encoding for categorical features, complete the following steps:

  1. Expand Encode categorical in the list of transforms.
  2. For Transform, choose Ordinal encode.
  3. For Input column, choose gender.
  4. For Invalid handling strategy, choose Keep.
  5. Choose Preview.
  6. Choose Add to add the change to the data flow.

Custom transformations: Add new features to your dataset

If we decide to store our transformed features in Feature Store, a prerequisite is to insert the eventTime feature into the dataset. We can easily do that using a custom transformation.

  1. Expand Custom Transform in the list of transforms.
  2. Choose Python (Pandas) and enter the following line of code:
    # Table is available as variable `df`
    import time
    df['eventTime'] = time.time()

  3. Choose Preview to view the results.
  4. Choose Add to add the change to the data flow.

Transform the target Label

The target label readmitted has three classes: NO readmission, readmitted <30 days, and readmitted >30 days. We saw in our histogram analysis that there is a strong class imbalance because the majority of the patients didn’t readmit. We can combine the latter two classes into a positive class to denote the patients being readmitted, and turn the classification problem into a binary case instead of multi-class. Let’s use the search and edit transform to convert string values to binary values.

  1. Expand Search and edit in the list of transforms.
  2. For Transform, choose Find and replace substring.
  3. For Input column, choose readmitted.
  4. For Pattern, enter >30|<30.
  5. For the Replacement string, enter 1.

This converts all the values that have either >30 or <30 values to 1.

  1. Choose Preview to view the results.
  2. Choose Add to add this transform to the data flow.

Let’s repeat the same steps to convert NO values to 0.

  1. Expand Search and edit in the list of transforms.
  2. For Transform, choose Find and replace substring.
  3. For Input column, choose readmitted.
  4. For Pattern, enter NO.
  5. For Replacement string, enter 0.
  6. Choose Preview to review the converted column.
  7. Choose Add to add the transform to our data flow.

Now our target label readmitted is ready for ML training.

Position the target label as the first column to utilize XGBoost algorithm

Because we’re going to use the XGBoost built-in SageMaker algorithm to train the model, the algorithm assumes that the target label is in the first column. Let’s position the target label as such in order to use this algorithm.

  1. Expand Manage columns in the list of transforms.
  2. For Transform, choose Move column.
  3. For Move type, choose Move to start.
  4. For Column to move, choose readmitted.
  5. Choose Preview.
  6. Choose Add to add the change to your data flow.

Drop redundant columns

Next, we drop any redundant columns.

  1. Expand Manage columns in the list of transforms.
  2. For Transform, choose Drop column.
  3. For Column to drop, choose encounter_id_0.
  4. Choose Preview.
  5. Choose Add to add the changes to the flow file.
  6. Repeat these steps for the other redundant columns: patient_nbr_0, encounter_id_1, and patient_nbr_1.

At this stage, we have done a few analyses and applied a few transformations on our raw input dataset. If we choose to preserve the transformed state of the input dataset, like checkpoint, you can do so by choosing Export data. This option allows you to persist the transformed dataset to an S3 bucket.

Quick Model analysis

Now that we have applied transformations to our initial dataset, let’s explore the Quick Model analysis feature. Quick model helps you quickly evaluate the training dataset and produce importance scores for each feature. A feature importance score indicates how useful a feature is at predicting a target label. The feature importance score is between 0–1; a higher number indicates that the feature is more important to the whole dataset. Because our use case relates to the classification problem type, the quick model also generates an F1 score for the current dataset.

  1. Switch back to Analysis Tab and click Create new analysis to bring-up built-in analysis
  2. For Analysis type, choose Quick Model.
  3. Enter a name for your analysis.
  4. For Label, choose readmitted.
  5. Choose Preview and wait for the model to be trained and the results to appear.

The resulting quick model F1 score shows 0.618 (your generated score might be different) with the transformed dataset. Data Wrangler performs several steps to generate the F1 score, including preprocessing, training, evaluating, and finally calculating feature importance. For more details about these steps, see Quick Model.

With the quick model analysis feature, data scientists can iterate through applicable transformations until they have their desired transformed dataset that can potentially lead to better business accuracy and expectations.

  1. Choose Save to add the quick model analysis to the data flow.

Export options

We’re now ready to export our data flow for further processing.

  1. Navigate back to data flow designer by clicking Back to data flow on the top left
  2. On the Export tab, choose Steps to reveal the Data Wrangler flow steps.
  3. Choose the last step to mark it with a check.
  4. Choose Export step to reveal the export options.

As of this writing, you have four export options:

  • Save to S3 – Save the data to an S3 bucket using a SageMaker processing job
  • Pipeline – Export a Jupyter notebook that creates a SageMaker pipeline with your data flow
  • Python Code – Export your data flow to Python code
  • Feature Store – Export a Jupyter notebook that creates a Feature Store feature group and adds features to an offline or online feature store
  1. Choose Save to S3 to generate a fully implemented Jupyter notebook that creates a processing job using your data flow file.

Run processing and training jobs for model building

In this section, we show how to run processing and training jobs using the generated Jupyter notebook from Data Wrangler.

Submit a processing job

We’re now ready to submit a SageMaker processing job using our data flow file.

Run all the cells up to and including the Create Processing Job cell inside the exported notebook.

The cell Create Processing Job triggers a new SageMaker processing job by provisioning managed infrastructure and running the required Data Wrangler Docker container on that infrastructure.

You can check the status of the submitted processing job by running the next cell Job Status & S3 Output Location.

You can also check the status of the submitted processing job on the SageMaker console.

Train a model with SageMaker

Now that the data has been processed, let’s train a model using the data. The same notebook has sample steps to train a model using the SageMaker built-in XGBoost algorithm. Because our use case is a binary classification ML problem, we need to change the objective to binary:logistic inside the sample training steps.

Now we’re ready to run our training job using the SageMaker managed infrastructure. Run the cell Start the Training Job.

You can monitor the status of the submitted training job on the SageMaker console, on the Training jobs page.

Host a trained model for real-time inference

We now use another notebook available on GitHub under the project folder hosting/Model_deployment_Steps.ipynb. This is a simple notebook with two cells: the first cell has code for deploying your model to a persistent endpoint. You need to update model_url with your training job output S3 model artifact.

The second cell in the notebook runs inference on the sample test file under test_data/test_data_UCI_sample.csv. As you can see, we are able to generate predictions for our synthetic observations inside csv file. That concludes the ML workflow.

Clean up

After you have experimented with the steps in this post, perform the following cleanup steps to stop incurring charges:

  1. On the SageMaker console, under Inference in the navigation pane, choose Endpoints.
  2. Select your hosted endpoint.
  3. On the Actions menu, choose Delete.
  4. On the SageMaker Studio Control Panel, navigate to your SageMaker user profile.
  5. Under Apps, locate your Data Wrangler app and choose Delete app.

Conclusion

In this post, we explored Data Wrangler capabilities using a public medical dataset related to patient readmission and demonstrated how to perform feature transformations using built-in transforms and quick analysis. We showed how, without much coding, to generate the required steps to trigger data processing and ML training. This no-code/low-code capability of Data Wrangler accelerates training data preparation and increases data scientist agility with faster iterative data preparation. In the end, we hosted our trained model and ran inferences against synthetic test data. We encourage you to check out our GitHub repository to get hands-on practice and find new ways to improve model accuracy! To learn more about SageMaker, visit the SageMaker Development Guide.


About the Authors

Shyam Namavaram is a Senior Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud-native applications. He passionately works with customers accelerating their AI/ML adoption by providing technical guidance and helping them innovate and build secure cloud solutions on AWS. He specializes in AI/ML, containers, and analytics technologies. Outside of work, he loves playing sports and exploring nature with trekking.

Michael Hsieh is a Senior AI/ML Specialist Solutions Architect. He works with customers to advance their ML journey with a combination of Amazon ML offerings and his ML domain knowledge. As a Seattle transplant, he loves exploring the great nature the region has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at the Shilshole Bay.

Read More