OpenAI’s API Now Available with No Waitlist

OpenAI is committed to the safe deployment of AI. Since the launch of our API, we’ve made deploying applications faster and more streamlined while adding new safety features. Our progress with safeguards makes it possible to remove the waitlist for GPT-3. Starting today, developers in supported countries can sign up and start experimenting with our API right away.

Improvements to our API over the past year include the Instruct Series models that adhere better to human instructions, specialized endpoints for more truthful question-answering, and a free content filter to help developers mitigate abuse. Our work also allows us to review applications before they go live, monitor for misuse, support developers as their product scales, and better understand the effects of this technology.

Other changes include an improved Playground, which makes it easy to prototype with our models, an example library with dozens of prompts to get developers started, and Codex, a new model that translates natural language into code.

Tens of thousands of developers are already taking advantage of powerful AI models through our platform. We believe that by opening access to these models via an easy-to-use API, more developers will find creative ways to apply AI to a large number of useful applications and open problems.

To ensure API-backed applications are built responsibly, we provide tools and help developers use best practices so they can bring their applications to production quickly and safely. As our systems evolve and we work to improve the capabilities of our safeguards, we expect to continue streamlining the process for developers, refining our Usage Guidelines, and allowing even more use cases over time.

As another step in this direction, we are also updating our Content Guidelines to clarify what kind of content our API can be used to generate. Our policies have always prohibited the use of our API in ways that do not adhere to the principles described in our charter, and content like hate speech remains prohibited.

To help developers ensure their applications are used for their intended purpose, prevent potential misuse, and adhere to our content guidelines, we offer developers a free content filter. We are currently testing targeted filters for specific content categories with some customers.

We are also prohibiting certain types of content on our API, like adult content, where our system is not currently able to reliably discern harmful from acceptable use. We are continually working to make our content filters more robust and we intend to allow acceptable use within some categories as our system improves.

We’re excited to have the safeguards in place to open up GPT-3 for more developers. As our safeguards continue to improve, we will expand how the API can be used while further improving the experience for our users. Sign up today and try it out.

OpenAI

A GFN Thursday Deal: Get ‘Crysis Remastered’ Free With Any Six-Month GeForce NOW Membership

You’ve reached your weekly gaming checkpoint. Welcome to a positively packed GFN Thursday.

This week delivers a sweet deal for gamers ready to upgrade their PC gaming from the cloud. With any new, paid six-month Priority or GeForce NOW RTX 3080 subscription, members will receive Crysis Remastered for free for a limited time.

Gamers and music lovers alike can get hyped for an awesome entertainment experience playing Core this week and visiting the digital world of Oberhasli. There, they’ll enjoy the anticipated deadmau5 “Encore” concert and event.

And what kind of GFN Thursday would it be without new games? We’ve got eight new titles joining the GeForce NOW library this week.

GeForce NOW Can Run Crysis … And So Can You

Crysis Remastered with RTX ON on GeForce NOW
When your reflection looks this good, you can’t help but stop and admire it. We won’t judge.

But can it run Crysis? GeForce NOW sure can.

For a limited time, get a copy of Crysis Remastered free with select GeForce NOW memberships. Purchase a six-month Priority membership, or the new GeForce NOW RTX 3080 membership, and get a free redeemable code for Crysis Remastered on the Epic Games Store.

Current monthly Founders and Priority members are eligible by upgrading to a six-month membership. Founders, exclusively, can upgrade to a GeForce NOW RTX 3080 membership and receive 10 percent off the subscription price and no risk to their current Founders benefits. They can revert back to their original Founders plan and retain “Founders for Life” pricing, as long as they remain in consistent good standing on any paid membership plan.

This special bonus also applies to existing GeForce NOW RTX 3080 members and preorders, as a thank you for being among the first to upgrade to the next generation in cloud gaming. Current members on the GeForce NOW RTX 3080 plan will receive game codes in the days ahead; while members who have preordered but haven’t been activated yet, will receive their game code when their GeForce NOW RTX 3080 service is activated. Please note, terms and conditions apply.

Stream Crytek’s classic first-person shooter, remastered with graphics optimized for a new generation of hardware and complete with stunning support for RTX ON and DLSS. GeForce NOW members can experience the first game in the Crysis series — or 1,000+ more games — across nearly all of their devices, turning even a Mac or a mobile device into the ultimate gaming rig.

The mission starts here.

Experience the deadmau5 Encore in Core

This GFN Thursday brings Core and deadmau5 to the cloud. From shooters, survival and action adventure to MMORPGs, platformers and party games, Core is a multiverse of exciting gaming entertainment with over 40,000 free-to-play, Unreal-powered games and worlds.

This week, members can visit the fully immersive digital world of Oberhasli — designed with the vision of the legendary producer, musician and DJ, deadmau5 — and enjoy an epic “Encore” concert and event. Catch the deadmau5 performance, with six showings running from Friday, Nov. 19, to Saturday, Nov. 20. The concert becomes available every hour, on the hour, the following week.

Tomorrow, come to the world of Oberhasli, designed by deadmau5, and experience the ‘Encore’ concert in Core.

The fun continues with three games inspired by deadmau5’s music — Hop ‘Til You Drop, Mau5 Hau5 and Ballon Royale — set throughout 19 dystopian worlds featured in the official When The Summer Dies music video. Party on with exclusive deadmau5 skins, emotes and mounts, and interact with other fans while streaming the exclusive, interactive and must-experience deadmau5 performance celebrating the launch of Core on GeForce NOW with the “Encore” concert this week.

A New Challenge Calls

Icarus Beta on GeForce NOW
Explore a savage alien wilderness in the aftermath of terraforming gone wrong — even on a low-powered laptop.

It wouldn’t be GFN Thursday without a new set of games coming to the cloud. Get ready to grind one of the newest joining the GeForce NOW library this week:

  • Combat Mission Cold War (New release on Steam, Nov. 16)
  • The Last Stand: Aftermath (New release on Steam, Nov. 16)
  • Myth of Empires (New release on Steam, Nov. 18)
  • Icarus (Beta weekend on Steam, Nov. 19)
  • Assassin’s Creed: Syndicate Gold Edition (Ubisoft Connect)
  • Core (Epic Games Store)
  • Lost Words: Beyond the Page (Steam)
  • World of Tanks (Steam)

We make every effort to launch games on GeForce NOW as close to their release as possible, but, in some instances, games may not be available immediately.

Update on ‘Bright Memory: Infinite’

Bright Memory: Infinite was added last week, but during onboarding it was discovered that enabling RTX in the game requires an upcoming operating system upgrade to GeForce NOW servers. We expect the update to be complete in December and will provide more information here when it happens.

GeForce NOW Coming to LG Smart TVs

We’re working with LG Electronics to add support for GeForce NOW to LG TVs, starting with a beta release of the app in the LG Content Store for select 2021 LG OLED, QNED MiniLED and NanoCell models. If you have one of the supported TVs, check it out and share feedback to help us improve the experience.

And finally, here’s our question for the week:

𝙡𝙚𝙩’𝙨 𝙨𝙚𝙩𝙩𝙡𝙚 𝙩𝙝𝙞𝙨 𝙤𝙣𝙘𝙚 𝙖𝙣𝙙 𝙛𝙤𝙧 𝙖𝙡𝙡:

who’s the tougher enemy?

👽 aliens or zombies 🧟‍♂️

🌩 NVIDIA GeForce NOW (@NVIDIAGFN) November 17, 2021

Let us know on Twitter or in the comments below.

The post A GFN Thursday Deal: Get ‘Crysis Remastered’ Free With Any Six-Month GeForce NOW Membership appeared first on The Official NVIDIA Blog.

Read More

Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets



Fig. 1: The BRIDGE dataset contains 7200 demonstrations of kitchen-themed manipulation tasks across 71 tasks in 10 domains. Note that any GIF compression artifacts in this animation are not present in the dataset itself.

When we apply robot learning methods to real-world systems, we must usually collect new datasets for every task, every robot, and every environment. This is not only costly and time-consuming, but it also limits the size of the datasets that we can use, and this, in turn, limits generalization: if we train a robot to clean one plate in one kitchen, it is unlikely to succeed at cleaning any plate in any kitchen. In other fields, such as computer vision (e.g., ImageNet) and natural language processing (e.g., BERT), the standard approach to generalization is to utilize large, diverse datasets, which are collected once and then reused repeatedly. Since the dataset is reused for many models, tasks, and domains, the up-front cost of collecting such large reusable datasets is worth the benefits. Thus, to obtain truly generalizable robotic behaviors, we may need large and diverse datasets, and the only way to make this practical is to reuse data across many different tasks, environments, and labs (i.e. different background lighting conditions, etc.).

How to Train State-Of-The-Art Models Using TorchVision’s Latest Primitives

A few weeks ago, TorchVision v0.11 was released packed with numerous new primitives, models and training recipe improvements which allowed achieving state-of-the-art (SOTA) results. The project was dubbed “TorchVision with Batteries Included” and aimed to modernize our library. We wanted to enable researchers to reproduce papers and conduct research more easily by using common building blocks. Moreover, we aspired to provide the necessary tools to Applied ML practitioners to train their models on their own data using the same SOTA techniques as in research. Finally, we wanted to refresh our pre-trained weights and offer better off-the-shelf models to our users, hoping that they would build better applications.

Though there is still much work to be done, we wanted to share with you some exciting results from the above work. We will showcase how one can use the new tools included in TorchVision to achieve state-of-the-art results on a highly competitive and well-studied architecture such as ResNet50 [1]. We will share the exact recipe used to improve our baseline by over 4.5 accuracy points to reach a final top-1 accuracy of 80.7% and share the journey for deriving the new training process. Moreover, we will show that this recipe generalizes well to other model variants and families. We hope that the above will influence future research for developing stronger generalizable training methodologies and will inspire the community to adopt and contribute to our efforts.

The Results

Using our new training recipe found on ResNet50, we’ve refreshed the pre-trained weights of the following models:

Model Accuracy@1 Accuracy@5
ResNet50 80.674 95.166
ResNet101 81.728 95.670
ResNet152 82.042 95.926
ResNeXt50-32x4d 81.116 95.478

Note that the accuracy of all models except RetNet50 can be further improved by adjusting their training parameters slightly, but our focus was to have a single robust recipe which performs well for all.

There are currently two ways to use the latest weights of the model.

Using the Multi-pretrained weight API

We are currently working on a new prototype mechanism which will extend the model builder methods of TorchVision to support multiple weights. Along with the weights, we store useful meta-data (such as the labels, the accuracy, links to recipe etc) and the preprocessing transforms necessary for using the models. Example:

  from PIL import Image
  from torchvision import prototype as P
  img = Image.open("test/assets/encode_jpeg/grace_hopper_517x606.jpg")
   
  # Initialize model
  weights = P.models.ResNet50Weights.ImageNet1K_RefV2
  model = P.models.resnet50(weights=weights)
  model.eval()
   
  # Initialize inference transforms
  preprocess = weights.transforms()
   
  # Apply inference preprocessing transforms
  batch = preprocess(img).unsqueeze(0)
  prediction = model(batch).squeeze(0).softmax(0)
   
  # Make predictions
  label = prediction.argmax().item()
  score = prediction[label].item()
   
  # Use meta to get the labels
  category_name = weights.meta['categories'][label]
  print(f"{category_name}: {100 * score}%")

Using the legacy API

Those who don’t want to use a prototype API have the option of accessing the new weights via the legacy API using the following approach:

  from torchvision.models import resnet
   
  # Overwrite the URL of the previous weights
  resnet.model_urls["resnet50"] = "https://download.pytorch.org/models/resnet50-f46c3f97.pth"
   
  # Initialize the model using the legacy API
  model = resnet.resnet50(pretrained=True)
   
  # TODO: Apply preprocessing + call the model
  # ...

The Training Recipe

Our goal was to use the newly introduced primitives of TorchVision to derive a new strong training recipe which achieves state-of-the-art results for the vanilla ResNet50 architecture when trained from scratch on ImageNet with no additional external data. Though by using architecture specific tricks [2] one could further improve the accuracy, we’ve decided not to include them so that the recipe can be used in other architectures. Our recipe heavily focuses on simplicity and builds upon work by FAIR [3], [4], [5], [6], [7]]. Our findings align with the parallel study of Wightman et al. [7], who also report major accuracy improvements by focusing on the training recipes.

Without further ado, here are the main parameters of our recipe:

  # Optimizer & LR scheme
  ngpus=8,
  batch_size=128,  # per GPU

  epochs=600, 
  opt='sgd',  
  momentum=0.9,

  lr=0.5, 
  lr_scheduler='cosineannealinglr', 
  lr_warmup_epochs=5, 
  lr_warmup_method='linear', 
  lr_warmup_decay=0.01, 


  # Regularization and Augmentation
  weight_decay=2e-05, 
  norm_weight_decay=0.0,

  label_smoothing=0.1, 
  mixup_alpha=0.2, 
  cutmix_alpha=1.0, 
  auto_augment='ta_wide', 
  random_erase=0.1, 


  # EMA configuration
  model_ema=True, 
  model_ema_steps=32, 
  model_ema_decay=0.99998, 


  # Resizing
  interpolation='bilinear', 
  val_resize_size=232, 
  val_crop_size=224, 
  train_crop_size=176,

Using our standard training reference script, we can train a ResNet50 using the following command:

torchrun --nproc_per_node=8 train.py --model resnet50 --batch-size 128 --lr 0.5 
--lr-scheduler cosineannealinglr --lr-warmup-epochs 5 --lr-warmup-method linear 
--auto-augment ta_wide --epochs 600 --random-erase 0.1 --weight-decay 0.00002 
--norm-weight-decay 0.0 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 
--train-crop-size 176 --model-ema --val-resize-size 232

Methodology

There are a few principles we kept in mind during our explorations:

  1. Training is a stochastic process and the validation metric we try to optimize is a random variable. This is due to the random weight initialization scheme employed and the existence of random effects during the training process. This means that we can’t do a single run to assess the effect of a recipe change. The standard practice is doing multiple runs (usually 3 to 5) and studying the summarization stats (such as mean, std, median, max, etc).
  2. There is usually a significant interaction between different parameters, especially for techniques that focus on Regularization and reducing overfitting. Thus changing the value of one can have effects on the optimal configurations of others. To account for that one can either adopt a greedy search approach (which often leads to suboptimal results but tractable experiments) or apply grid search (which leads to better results but is computationally expensive). In this work, we used a mixture of both.
  3. Techniques that are non-deterministic or introduce noise usually require longer training cycles to improve model performance. To keep things tractable, we initially used short training cycles (small number of epochs) to decide which paths can be eliminated early and which should be explored using longer training.
  4. There is a risk of overfitting the validation dataset [8] because of the repeated experiments. To mitigate some of the risk, we apply only training optimizations that provide a significant accuracy improvements and use K-fold cross validation to verify optimizations done on the validation set. Moreover we confirm that our recipe ingredients generalize well on other models for which we didn’t optimize the hyper-parameters.

Break down of key accuracy improvements

As discussed in earlier blogposts, training models is not a journey of monotonically increasing accuracies and the process involves a lot of backtracking. To quantify the effect of each optimization, below we attempt to show-case an idealized linear journey of deriving the final recipe starting from the original recipe of TorchVision. We would like to clarify that this is an oversimplification of the actual path we followed and thus it should be taken with a grain of salt. 

Cumulative Accuracy Improvements for ResNet50

In the table below, we provide a summary of the performance of stacked incremental improvements on top of Baseline. Unless denoted otherwise, we report the model with best Acc@1 out of 3 runs:

  Accuracy@1 Accuracy@5 Incremental Diff Absolute Diff
ResNet50 Baseline 76.130 92.862 0.000 0.000
+ LR optimizations 76.494 93.198 0.364 0.364
+ TrivialAugment 76.806 93.272 0.312 0.676
+ Long Training 78.606 94.052 1.800 2.476
+ Random Erasing 78.796 94.094 0.190 2.666
+ Label Smoothing 79.114 94.374 0.318 2.984
+ Mixup 79.232 94.536 0.118 3.102
+ Cutmix 79.510 94.642 0.278 3.380
+ Weight Decay tuning 80.036 94.746 0.526 3.906
+ FixRes mitigations 80.196 94.672 0.160 4.066
+ EMA 80.450 94.908 0.254 4.320
+ Inference Resize tuning * 80.674 95.166 0.224 4.544

*The tuning of the inference size was done on top of the last model. See below for details.

Baseline

Our baseline is the previously released ResNet50 model of TorchVision. It was trained with the following recipe:

  # Optimizer & LR scheme
  ngpus=8,
  batch_size=32,  # per GPU

  epochs=90, 
  opt='sgd',  
  momentum=0.9,

  lr=0.1, 
  lr_scheduler='steplr', 
  lr_step_size=30, 
  lr_gamma=0.1, 


  # Regularization
  weight_decay=1e-4,


  # Resizing
  interpolation='bilinear', 
  val_resize_size=256, 
  val_crop_size=224, 
  train_crop_size=224,

Most of the above parameters are the defaults on our training scripts. We will start building on top of this baseline by introducing optimizations until we gradually arrive at the final recipe.

LR optimizations

There are a few parameter updates we can apply to improve both the accuracy and the speed of our training. This can be achieved by increasing the batch size and tuning the LR. Another common method is to apply warmup and gradually increase our learning rate. This is beneficial especially when we use very high learning rates and helps with the stability of the training in the early epochs. Finally, another optimization is to apply Cosine Schedule to adjust our LR during the epochs. A big advantage of cosine is that there are no hyper-parameters to optimize, which cuts down our search space.

Here are the additional optimizations applied on top of the baseline recipe. Note that we’ve run multiple experiments to determine the optimal configuration of the parameters:

  batch_size=128,  # per GPU

  lr=0.5, 
  lr_scheduler='cosineannealinglr', 
  lr_warmup_epochs=5, 
  lr_warmup_method='linear', 
  lr_warmup_decay=0.01,

The above optimizations increase our top-1 Accuracy by 0.364 points comparing to the baseline. Note that in order to combine the different LR strategies we use the newly introduced SequentialLR scheduler.

TrivialAugment

The original model was trained using basic augmentation transforms such as Random resized crops and horizontal flips. An easy way to improve our accuracy is to apply more complex “Automatic-Augmentation” techniques. The one that performed best for us is TrivialAugment [9], which is extremely simple and can be considered “parameter free”, which means it can help us cut down our search space further.

Here is the update applied on top of the previous step:

auto_augment='ta_wide',

The use of TrivialAugment increased our top-1 Accuracy by 0.312 points compared to the previous step.

Long Training

Longer training cycles are beneficial when our recipe contains ingredients that behave randomly. More specifically as we start adding more and more techniques that introduce noise, increasing the number of epochs becomes crucial. Note that at early stages of our exploration, we used relatively short cycles of roughly 200 epochs which was later increased to 400 as we started narrowing down most of the parameters and finally increased to 600 epochs at the final versions of the recipe.

Below we see the update applied on top of the earlier steps:

epochs=600,

This further increases our top-1 Accuracy by 1.8 points on top of the previous step. This is the biggest increase we will observe in this iterative process. It’s worth noting that the effect of this single optimization is overstated and somehow misleading. Just increasing the number of epochs on top of the old baseline won’t yield such significant improvements. Nevertheless the combination of the LR optimizations with strong Augmentation strategies helps the model benefit from longer cycles. It’s also worth mentioning that the reason we introduce the lengthy training cycles so early in the process is because in the next steps we will introduce techniques that require significantly more epochs to provide good results.

Random Erasing

Another data augmentation technique known to help the classification accuracy is Random Erasing [10], [11]]. Often paired with Automatic Augmentation methods, it usually yields additional improvements in accuracy due to its regularization effect. In our experiments we tuned only the probability of applying the method via a grid search and found that it’s beneficial to keep its probability at low levels, typically around 10%. 

Here is the extra parameter introduced on top of the previous:

random_erase=0.1,

Applying Random Erasing increases our Acc@1 by further 0.190 points.

Label Smoothing

A good technique to reduce overfitting is to stop the model from becoming overconfident. This can be achieved by softening the ground truth using Label Smoothing [12]. There is a single parameter which controls the degree of smoothing (the higher the stronger) that we need to specify. Though optimizing it via grid search is possible, we found that values around 0.05-0.15 yield similar results, so to avoid overfitting it we used the same value as on the paper that introduced it.

Below we can find the extra config added on this step:

label_smoothing=0.1,

We use PyTorch’s newly introduced CrossEntropyLoss label_smoothing parameter and that increases our accuracy by an additional 0.318 points.

Mixup and Cutmix

Two data augmentation techniques often used to produce SOTA results are Mixup and Cutmix [13], [14]]. They both provide strong regularization effects by softening not only the labels but also the images. In our setup we found it beneficial to apply one of them randomly with equal probability. Each is parameterized with a hyperparameter alpha, which controls the shape of the Beta distribution from which the smoothing probability is sampled. We did a very limited grid search, focusing primarily on common values proposed on the papers. 

Below you will find the optimal values for the alpha parameters of the two techniques:

mixup_alpha=0.2, 
cutmix_alpha=1.0,

Applying mixup increases our accuracy by 0.118 points and combining it with cutmix improves it by additional 0.278 points.

Weight Decay tuning

Our standard recipe uses L2 regularization to reduce overfitting. The Weight Decay parameter controls the degree of the regularization (the larger the stronger) and is applied universally to all learned parameters of the model by default. In this recipe, we apply two optimizations to the standard approach. First we perform grid search to tune the parameter of weight decay and second we disable weight decay for the parameters of the normalization layers. 

Below you can find the optimal configuration of weight decay for our recipe:

weight_decay=2e-05, 
norm_weight_decay=0.0,

The above update improves our accuracy by a further 0.526 points, providing additional experimental evidence for a known fact that tuning weight decay has significant effects on the performance of the model. Our approach for separating the Normalization parameters from the rest was inspired by ClassyVision’s approach.

FixRes mitigations

An important property identified early in our experiments is the fact that the models performed significantly better if the resolution used during validation was increased from the 224×224 of training. This effect is studied in detail on the FixRes paper 5 and two mitigations are proposed: a) one could try to reduce the training resolution so that the accuracy on the validation resolution is maximized or b) one could fine-tune the model on a two-phase training so that it adjusts on the target resolution. Since we didn’t want to introduce a 2-phase training, we went for option a). This means that we reduced the train crop size from 224 and used grid search to find the one that maximizes the validation on resolution of 224×224.

Below you can see the optimal value used on our recipe:

val_crop_size=224, 
train_crop_size=176,

The above optimization improved our accuracy by an additional 0.160 points and sped up our training by 10%. 

It’s worth noting that the FixRes effect still persists, meaning that the model continues to perform better on validation when we increase the resolution. Moreover, further reducing the training crop-size actually hurts the accuracy. This intuitively makes sense because one can only reduce the resolution so much before critical details start disappearing from the picture. Finally, we should note that the above FixRes mitigation seems to benefit models with similar depth to ResNet50. Deeper variants with larger receptive fields seem to be slightly negatively affected (typically by 0.1-0.2 points). Hence we consider this part of the recipe optional. Below we visualize the performance of the best available checkpoints (with the full recipe) for models trained with 176 and 224 resolution:

Best ResNet50 trained with 176 Resolution
Best ResNet50 trained with 224 Resolution

Exponential Moving Average (EMA)

EMA is a technique that allows one to push the accuracy of a model without increasing its complexity or inference time. It performs an exponential moving average on the model weights and this leads to increased accuracy and more stable models. The averaging happens every few iterations and its decay parameter was tuned via grid search. 

Below you can see the optimal values for our recipe:

model_ema=True, 
model_ema_steps=32, 
model_ema_decay=0.99998,

The use of EMA increases our accuracy by 0.254 points comparing to the previous step. Note that TorchVision’s EMA implementation is build on top of PyTorch’s AveragedModel class with the key difference being that it averages not only the model parameters but also its buffers. Moreover, we have adopted tricks from Pycls which allow us to parameterize the decay in a way that doesn’t depend on the number of epochs.

Inference Resize tuning

Unlike all other steps of the process which involved training models with different parameters, this optimization was done on top of the final model. During inference, the image is resized to a specific resolution and then a central 224×224 crop is taken from it. The original recipe used a resize size of 256, which caused a similar discrepancy as the one described on the FixRes paper [5]. By bringing this resize value closer to the target inference resolution, one can improve the accuracy. To select the value we run a short grid search between interval [224, 256] with step of 8. To avoid overfitting, the value was selected using half of the validation set and confirmed using the other half.

Below you can see the optimal value used on our recipe:

--val-resize-size 232

The above is the final optimization which improved our accuracy by 0.224 points. It’s worth noting that the optimal value for ResNet50 works also best for ResNet101, ResNet152 and ResNeXt50, which hints that it generalizes across models:

ResNet50 Inference Resize
ResNet101 Inference Resize
Best ResNet50 trained with 224 Resolution

Optimizations that were tested but not adopted

During the early stages of our research, we experimented with additional techniques, configurations and optimizations. Since our target was to keep our recipe as simple as possible, we decided not to include anything that didn’t provide a significant improvement. Here are a few approaches that we took but didn’t make it to our final recipe:

  • Optimizers: Using more complex optimizers such as Adam, RMSProp or SGD with Nesterov momentum didn’t provide significantly better results than vanilla SGD with momentum.
  • LR Schedulers: We tried different LR Scheduler schemes such as StepLR and Exponential. Though the latter tends to work better with EMA, it often requires additional hyper-parameters such as defining the minimum LR to work well. Instead, we just use cosine annealing decaying the LR up to zero and choose the checkpoint with the highest accuracy.
  • Automatic Augmentations: We’ve tried different augmentation strategies such as AutoAugment and RandAugment. None of these outperformed the simpler parameter-free TrivialAugment.
  • Interpolation: Using bicubic or nearest interpolation didn’t provide significantly better results than bilinear.
  • Normalization layers: Using Sync Batch Norm didn’t yield significantly better results than using the regular Batch Norm.

Acknowledgements

We would like to thank Piotr Dollar, Mannat Singh and Hugo Touvron for providing their insights and feedback during the development of the recipe and for their previous research work on which our recipe is based on. Their support was invaluable for achieving the above result. Moreover, we would like to thank Prabhat Roy, Kai Zhang, Yiwen Song, Joel Schlosser, Ilqar Ramazanli, Francisco Massa, Mannat Singh, Xiaoliang Dai, Samuel Gabriel and Allen Goodman for their contributions to the Batteries Included project.

References

  1. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. “Deep Residual Learning for Image Recognition”.
  2. Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, Mu Li. “Bag of Tricks for Image Classification with Convolutional Neural Networks”
  3. Piotr Dollár, Mannat Singh, Ross Girshick. “Fast and Accurate Model Scaling”
  4. Tete Xiao, Mannat Singh, Eric Mintun, Trevor Darrell, Piotr Dollár, Ross Girshick. “Early Convolutions Help Transformers See Better”
  5. Hugo Touvron, Andrea Vedaldi, Matthijs Douze, Hervé Jégou. “Fixing the train-test resolution discrepancy
  6. Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. “Training data-efficient image transformers & distillation through attention”
  7. Ross Wightman, Hugo Touvron, Hervé Jégou. “ResNet strikes back: An improved training procedure in timm”
  8. Benjamin Recht, Rebecca Roelofs, Ludwig Schmidt, Vaishaal Shankar. “Do ImageNet Classifiers Generalize to ImageNet?”
  9. Samuel G. Müller, Frank Hutter. “TrivialAugment: Tuning-free Yet State-of-the-Art Data Augmentation”
  10. Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, Yi Yang. “Random Erasing Data Augmentation”
  11. Terrance DeVries, Graham W. Taylor. “Improved Regularization of Convolutional Neural Networks with Cutout”
  12. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna. “Rethinking the Inception Architecture for Computer Vision”
  13. Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz. “mixup: Beyond Empirical Risk Minimization”
  14. Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo. “CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features”

Read More

Accelerate data preparation using Amazon SageMaker Data Wrangler for diabetic patient readmission prediction

Patient readmission to hospital after prior visits for the same disease results in an additional burden on healthcare providers, the health system, and patients. Machine learning (ML) models, if built and trained properly, can help understand reasons for readmission, and predict readmission accurately. ML could allow providers to create better treatment plans and care, which would translate to a reduction of both cost and mental stress for patients. However, ML is a complex technique that has been limiting organizations that don’t have the resources to recruit a team of data engineers and scientists to build ML workloads. In this post, we show you how to build an ML model based on the XGBoost algorithm to predict diabetic patient readmission easily and quickly with a graphical interface from Amazon SageMaker Data Wrangler.

Data Wrangler is an Amazon SageMaker Studio feature designed to allow you to explore and transform tabular data for ML use cases without coding. Data Wrangler is the fastest and easiest way to prepare data for ML. It gives you the ability to use a visual interface to access data and perform exploratory data analysis (EDA) and feature engineering. It also seamlessly operationalizes your data preparation steps by allowing you to export your data flow into Amazon SageMaker Pipelines, a Data Wrangler job, Python file, or Amazon SageMaker Feature Store.

Data Wrangler comes with over 300 built-in transforms and custom transformations using either Python, PySpark, or SparkSQL runtime. It also comes with built-in data analysis capabilities for charts (such as scatter plot or histogram) and time-saving model analysis capabilities such as feature importance, target leakage, and model explainability.

In this post, we explore the key capabilities of Data Wrangler using the UCI diabetic patient readmission dataset. We showcase how you can build ML data transformation steps without writing sophisticated coding, and how to create a model training, feature store, or ML pipeline with reproducibility for a diabetic patient readmission prediction use case.

We also have published a related GitHub project repo that includes the end-to-end ML workflow steps and relevant assets, including Jupyter notebooks.

We walk you through the following high-level steps:

  • Studio prerequisites and input dataset setup
  • Design your Data Wrangler flow file
  • Create processing and training jobs for model building
  • Host a trained model for real-time inference

Studio prerequisites and input dataset setup

To use Studio and Studio notebooks, you must complete the Studio onboarding process. Although you can choose from a few authentication methods, the simplest way to create a Studio domain is to follow the Quick start instructions. The Quick start uses the same default settings as the standard Studio setup. You can also choose to onboard using AWS Single Sign-On (AWS SSO) for authentication (see Onboard to Amazon SageMaker Studio Using AWS SSO).

Dataset

The patient readmission dataset captures 10 years (1999–2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes with about 100,000 observations.

You can start by downloading the public dataset and uploading it to an Amazon Simple Storage Service (Amazon S3) bucket. For demonstration purposes, we split the dataset into four tables based on feature categories: diabetic_data_hospital_visits.csv, diabetic_data_demographic.csv, diabetic_data_labs.csv, and diabetic_data_medication.csv. Review and run the code in datawrangler_workshop_pre_requisite.ipynb. If you leave everything at its default inside the notebook, the CSV files will be available in s3://sagemaker-${region}-${account_number}/sagemaker/demo-diabetic-datawrangler/.

Design your Data Wrangler flow file

To get started – on the Studio File menu, choose New, and choose Data Wrangler Flow.

This launches a Data Wrangler instance and configures it with the Data Wrangler app. The process takes a few minutes to complete.

Load the data from Amazon S3 into Data Wrangler

To load the data into Data Wrangler, complete the following steps:

  1. On the Import tab, choose Amazon S3 as the data source.
  2. Choose Add data source.

You could also import data from Amazon Athena, Amazon Redshift, or Snowflake. For more information about the currently supported import sources, see Import.

  1. Select the CSV files from the bucket s3://sagemaker-${region}-${account_number}/sagemaker/demo-diabetic-datawrangler/ one at a time.
  2. Choose Import for each file.

When the import is complete, data in an S3 bucket is available inside Data Wrangler for preprocessing.

Join the CSV files

Now that we have imported multiple CSV source dataset, let’s join them for a consolidated dataset.

  1. On the Data flow tab, for Data types, choose the plus sign.
  2. On the menu, choose Join.
  3. Choose the diabetic_data_hospital_visits.csv dataset as the Right dataset.
  4. Choose Configure to set up the join criteria.
  5. For Name, enter a name for the join.
  6. For Join type¸ choose a join type (for this post, Inner).
  7. Choose the columns for Left and Right.
  8. Choose Apply to preview the joined dataset.
  9. Choose Add to add it to the data flow file.

Built-in analysis

Before we apply any transformations on the input source, let’s perform a quick analysis of the dataset. Data Wrangler provides several built-in analysis types, like histogram, scatter plot, target leakage, bias report, and quick model. For more information about analysis types, see Analyze and Visualize.

Target leakage

Target leakage occurs when information in an ML training dataset is strongly correlated with the target label, but isn’t available when the model is used for prediction. You might have a column in your dataset that serves as a proxy for the column you want to predict with your model. For classification tasks, Data Wrangler calculates the prediction quality metric of ROC-AUC, which is computed individually for each feature column via cross-validation to generate a target leakage report.

  1. On the Data Flow tab, for Join, choose the plus sign.
  2. Choose Add analysis.
  3. For Analysis type, choose Target Leakage.
  4. For Analysis name¸ enter a name.
  5. For Max features, enter 50.
  6. For Problem Type¸ choose classification.
  7. For Target, choose readmitted.
  8. Choose Preview to generate the report.

As shown in the preceding screenshot, there is no indication of target leakage in our input dataset. However, a few features like encounter_id_1, encounter_id_0, weight, and payer_code are marked as possibly redundant with 0.5 predictive ability of ROC. This means these features by themselves aren’t providing any useful information towards predicting the target. Before making the decision to drop these uninformative features, you should consider whether these could add value when used in tandem with other features. For our use case, we keep them as is and move to the next step.

  1. Choose Save to save the analysis into your Data Wrangler data flow file.

Bias report

AI/ML systems are only as good as the data we put into them. ML-based systems are more accessible than ever before, and with the growth of adoption throughout various industries, further questions arise surrounding fairness and how it is ensured across these ML systems. Understanding how to detect and avoid bias in ML models is imperative and complex. With the built-in bias report in Data Wrangler, data scientists can quickly detect bias during the data preparation stage of the ML workflow. Bias report analysis uses Amazon SageMaker Clarify to perform bias analysis.

To generate a bias report, you must specify the target column that you want to predict and a facet or column that you want to inspect for potential biases. For example, we can generate a bias report on the gender feature for Female values to see whether there is any class imbalance.

  1. On the Analysis tab, choose Create new analysis.
  2. For Analysis type¸ choose Bias Report.
  3. For Analysis name, enter a name.
  4. For Select the column your model predicts, choose readmitted.
  5. For Predicted value, enter NO.
  6. For Column to analyze for bias, choose gender.
  7. For Column value to analyze for bias, choose Female.
  8. Leave remaining settings at their default.
  9. Choose Check for bias to generate the bias report.

As shown in the bias report, there is no significant bias in our input dataset, which means the dataset has a fair amount of representation by gender. For our dataset, we can move forward with a hypothesis that there is no inherent bias in our dataset. However, based on your use case and dataset, you might want to run similar bias reporting on other features of your dataset to identify any potential bias. If any bias is detected, you can consider applying a suitable transformation to address that bias.

  1. Choose Save to add this report to the data flow file.

Histogram

In this section, we use a histogram to gain insights into the target label patterns inside our input dataset.

  1. On the Analysis tab, choose Create new analysis.
  2. For Analysis type¸ choose Histogram.
  3. For Analysis name¸ enter a name.
  4. For X axis, choose readmitted.
  5. For Color by, choose race.
  6. For Facet by, choose gender.
  7. Choose Preview to generate a histogram.

This ML problem is a multi-class classification problem. However, we can observe a major target class imbalance between patients readmitted <30 days, >30 days, and NO readmission. We can also see that these two classifications are proportionate across gender and race. To improve our potential model predictability, we can merge <30 and >30 into a single positive class. This merge of target label classification turns our ML problem into a binary classification. As we demonstrate in the next section, we can do this easily by adding respective transformations.

Transformations

When it comes to training an ML model for structured or tabular data, decision tree-based algorithms are considered best in class. This is due to their inherent technique of applying ensemble tree methods in order to boost weak learners using the gradient descent architecture.

For our medical source dataset, we use the SageMaker built-in XGBoost algorithm because it’s one of the most popular decision tree-based ensemble ML algorithms. The XGBoost algorithm can only accept numerical values as input, therefore as a prerequisite we must apply categorical feature transformations on our source dataset.

Data Wrangler comes with over 300 built-in transforms, which require no coding. Let’s use built-in transforms to apply a few key transformations and prepare our training dataset.

Handle missing values

To address missing values, complete the following steps:

  1. Switch to Data tab to bring up all the built-in transforms
  2. Expand Handle missing in the list of transforms.
  3. For Transform, choose Impute.
  4. For Column type¸ choose Numeric.
  5. For Input column, choose diag_1.
  6. For Imputing strategy, choose Mean.
  7. By default, the operation is performed in-place, but you can provide an optional Output column name, which creates a new column with imputed values. For our blog we go with default in-place update.
  8. Choose Preview to preview the results.
  9. Choose Add to include this transformation step into the data flow file.
  10. Repeat these steps for the diag_2 and diag_3 features and impute missing values.

Search and edit features with special characters

Because our source dataset has features with special characters, we need to clean them before training. Let’s use the search and edit transform.

  1. Expand Search and edit in the list of transforms.
  2. For Transform, choose Find and replace substring.
  3. For Input column, choose race.
  4. For Pattern, enter ?.
  5. For Replacement string¸ choose Other.
  6. Leave Output column blank for in-place replacements.
  7. Choose Preview.
  8. Choose Add to add the transform to your data flow.
  9. Repeat the same steps for other features to replace weight and payer_code with 0 and medical_specialty with Other.

One-hot encoding for categorical features

To use one-hot encoding for categorical features, complete the following steps:

  1. Expand Encode categorical in the list of transforms.
  2. For Transform, choose One-hot encode.
  3. For Input column, choose race.
  4. For Output style, choose Columns.
  5. Choose Preview.
  6. Choose Add to add the change to the data flow.
  7. Repeat these steps for age and medical_specialty_filler to one-hot encode those categorical features as well.

Ordinal encoding for categorical features

To use ordinal encoding for categorical features, complete the following steps:

  1. Expand Encode categorical in the list of transforms.
  2. For Transform, choose Ordinal encode.
  3. For Input column, choose gender.
  4. For Invalid handling strategy, choose Keep.
  5. Choose Preview.
  6. Choose Add to add the change to the data flow.

Custom transformations: Add new features to your dataset

If we decide to store our transformed features in Feature Store, a prerequisite is to insert the eventTime feature into the dataset. We can easily do that using a custom transformation.

  1. Expand Custom Transform in the list of transforms.
  2. Choose Python (Pandas) and enter the following line of code:
    # Table is available as variable `df`
    import time
    df['eventTime'] = time.time()

  3. Choose Preview to view the results.
  4. Choose Add to add the change to the data flow.

Transform the target Label

The target label readmitted has three classes: NO readmission, readmitted <30 days, and readmitted >30 days. We saw in our histogram analysis that there is a strong class imbalance because the majority of the patients didn’t readmit. We can combine the latter two classes into a positive class to denote the patients being readmitted, and turn the classification problem into a binary case instead of multi-class. Let’s use the search and edit transform to convert string values to binary values.

  1. Expand Search and edit in the list of transforms.
  2. For Transform, choose Find and replace substring.
  3. For Input column, choose readmitted.
  4. For Pattern, enter >30|<30.
  5. For the Replacement string, enter 1.

This converts all the values that have either >30 or <30 values to 1.

  1. Choose Preview to view the results.
  2. Choose Add to add this transform to the data flow.

Let’s repeat the same steps to convert NO values to 0.

  1. Expand Search and edit in the list of transforms.
  2. For Transform, choose Find and replace substring.
  3. For Input column, choose readmitted.
  4. For Pattern, enter NO.
  5. For Replacement string, enter 0.
  6. Choose Preview to review the converted column.
  7. Choose Add to add the transform to our data flow.

Now our target label readmitted is ready for ML training.

Position the target label as the first column to utilize XGBoost algorithm

Because we’re going to use the XGBoost built-in SageMaker algorithm to train the model, the algorithm assumes that the target label is in the first column. Let’s position the target label as such in order to use this algorithm.

  1. Expand Manage columns in the list of transforms.
  2. For Transform, choose Move column.
  3. For Move type, choose Move to start.
  4. For Column to move, choose readmitted.
  5. Choose Preview.
  6. Choose Add to add the change to your data flow.

Drop redundant columns

Next, we drop any redundant columns.

  1. Expand Manage columns in the list of transforms.
  2. For Transform, choose Drop column.
  3. For Column to drop, choose encounter_id_0.
  4. Choose Preview.
  5. Choose Add to add the changes to the flow file.
  6. Repeat these steps for the other redundant columns: patient_nbr_0, encounter_id_1, and patient_nbr_1.

At this stage, we have done a few analyses and applied a few transformations on our raw input dataset. If we choose to preserve the transformed state of the input dataset, like checkpoint, you can do so by choosing Export data. This option allows you to persist the transformed dataset to an S3 bucket.

Quick Model analysis

Now that we have applied transformations to our initial dataset, let’s explore the Quick Model analysis feature. Quick model helps you quickly evaluate the training dataset and produce importance scores for each feature. A feature importance score indicates how useful a feature is at predicting a target label. The feature importance score is between 0–1; a higher number indicates that the feature is more important to the whole dataset. Because our use case relates to the classification problem type, the quick model also generates an F1 score for the current dataset.

  1. Switch back to Analysis Tab and click Create new analysis to bring-up built-in analysis
  2. For Analysis type, choose Quick Model.
  3. Enter a name for your analysis.
  4. For Label, choose readmitted.
  5. Choose Preview and wait for the model to be trained and the results to appear.

The resulting quick model F1 score shows 0.618 (your generated score might be different) with the transformed dataset. Data Wrangler performs several steps to generate the F1 score, including preprocessing, training, evaluating, and finally calculating feature importance. For more details about these steps, see Quick Model.

With the quick model analysis feature, data scientists can iterate through applicable transformations until they have their desired transformed dataset that can potentially lead to better business accuracy and expectations.

  1. Choose Save to add the quick model analysis to the data flow.

Export options

We’re now ready to export our data flow for further processing.

  1. Navigate back to data flow designer by clicking Back to data flow on the top left
  2. On the Export tab, choose Steps to reveal the Data Wrangler flow steps.
  3. Choose the last step to mark it with a check.
  4. Choose Export step to reveal the export options.

As of this writing, you have four export options:

  • Save to S3 – Save the data to an S3 bucket using a SageMaker processing job
  • Pipeline – Export a Jupyter notebook that creates a SageMaker pipeline with your data flow
  • Python Code – Export your data flow to Python code
  • Feature Store – Export a Jupyter notebook that creates a Feature Store feature group and adds features to an offline or online feature store
  1. Choose Save to S3 to generate a fully implemented Jupyter notebook that creates a processing job using your data flow file.

Run processing and training jobs for model building

In this section, we show how to run processing and training jobs using the generated Jupyter notebook from Data Wrangler.

Submit a processing job

We’re now ready to submit a SageMaker processing job using our data flow file.

Run all the cells up to and including the Create Processing Job cell inside the exported notebook.

The cell Create Processing Job triggers a new SageMaker processing job by provisioning managed infrastructure and running the required Data Wrangler Docker container on that infrastructure.

You can check the status of the submitted processing job by running the next cell Job Status & S3 Output Location.

You can also check the status of the submitted processing job on the SageMaker console.

Train a model with SageMaker

Now that the data has been processed, let’s train a model using the data. The same notebook has sample steps to train a model using the SageMaker built-in XGBoost algorithm. Because our use case is a binary classification ML problem, we need to change the objective to binary:logistic inside the sample training steps.

Now we’re ready to run our training job using the SageMaker managed infrastructure. Run the cell Start the Training Job.

You can monitor the status of the submitted training job on the SageMaker console, on the Training jobs page.

Host a trained model for real-time inference

We now use another notebook available on GitHub under the project folder hosting/Model_deployment_Steps.ipynb. This is a simple notebook with two cells: the first cell has code for deploying your model to a persistent endpoint. You need to update model_url with your training job output S3 model artifact.

The second cell in the notebook runs inference on the sample test file under test_data/test_data_UCI_sample.csv. As you can see, we are able to generate predictions for our synthetic observations inside csv file. That concludes the ML workflow.

Clean up

After you have experimented with the steps in this post, perform the following cleanup steps to stop incurring charges:

  1. On the SageMaker console, under Inference in the navigation pane, choose Endpoints.
  2. Select your hosted endpoint.
  3. On the Actions menu, choose Delete.
  4. On the SageMaker Studio Control Panel, navigate to your SageMaker user profile.
  5. Under Apps, locate your Data Wrangler app and choose Delete app.

Conclusion

In this post, we explored Data Wrangler capabilities using a public medical dataset related to patient readmission and demonstrated how to perform feature transformations using built-in transforms and quick analysis. We showed how, without much coding, to generate the required steps to trigger data processing and ML training. This no-code/low-code capability of Data Wrangler accelerates training data preparation and increases data scientist agility with faster iterative data preparation. In the end, we hosted our trained model and ran inferences against synthetic test data. We encourage you to check out our GitHub repository to get hands-on practice and find new ways to improve model accuracy! To learn more about SageMaker, visit the SageMaker Development Guide.


About the Authors

Shyam Namavaram is a Senior Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud-native applications. He passionately works with customers accelerating their AI/ML adoption by providing technical guidance and helping them innovate and build secure cloud solutions on AWS. He specializes in AI/ML, containers, and analytics technologies. Outside of work, he loves playing sports and exploring nature with trekking.

Michael Hsieh is a Senior AI/ML Specialist Solutions Architect. He works with customers to advance their ML journey with a combination of Amazon ML offerings and his ML domain knowledge. As a Seattle transplant, he loves exploring the great nature the region has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at the Shilshole Bay.

Read More

Predicting Text Readability from Scrolling Interactions

Posted by Sian Gooding, Intern, Google Research

Illiteracy affects at least 773 million people globally, both young and old. For these individuals, reading information from unfamiliar sources or on unfamiliar topics can be extremely difficult. Unfortunately, these inequalities have been further magnified by the global pandemic as a result of unequal access to education in reading and writing. In fact, UNESCO reports that over 100 million children are falling behind the minimum proficiency level in reading due to COVID-related school closures.

With increasing world-wide access to technology, reading on a device, such as a tablet or phone, has largely taken the place of traditional formats. This provides a unique opportunity to observe reading interactions, e.g., how a reader scrolls through a text, which can inform our understanding of what can make text difficult to read. This understanding is crucial when designing educational applications for low-proficiency readers and language learners, because it can be used to match learners with appropriately leveled texts as well as to support readers in understanding texts beyond their reading level.

In “Predicting Text Readability from Scrolling Interactions”, presented at CoNLL 2021, we show that data from on-device reading interactions can be used to predict how readable a text is. This novel approach provides insights into subjective readability — whether an individual reader has found a text accessible — and demonstrates that existing readability models can be improved by including feedback from scroll-based reading interactions. In order to encourage research in this area and to help enable more personalized tools for language learning and text simplification, we are releasing the dataset of reading interactions generated from our scrolling behavior–based readability assessment of English-language texts.

Understanding Text Difficulty
There are multiple aspects of a text that impact how difficult it is to read, including the vocabulary level, the syntactic structure, and overall coherence. Traditional machine learning approaches to measure readability have exclusively relied on such linguistic features. However, using these features alone does not work well for online content, because such content often contains abbreviations, emojis, broken text, and short passages, which detrimentally impact the performance of readability models.

To address this, we investigated whether aggregate data about the reading interactions of a group can be used to predict how difficult a text is, as well as how reading interactions may differ based on a readers’ understanding. When reading on a device, readers typically interact with text by scrolling in a vertical fashion, which we hypothesize can be used as a coarse proxy for reading comprehension. With this in mind, we recruited 518 paid participants and asked them to read English-language texts of different difficulty levels. We recorded the reading interactions by measuring different features of the participants’ scrolling behavior, such as the speed, acceleration and number of times areas of text were revisited. We then used this information to produce a set of features for a readability classifier.

Predicting Text Difficulty from Scrolling Behavior
We investigated which types of scrolling behaviors were most impacted by text difficulty and tested the significance using linear mixed effect models. In our set up, we have repeated measures, as multiple participants read the same texts and each participant reads more than one text. Using linear mixed-effect models gives us a higher confidence that the differences in interactions we are observing are because of the text difficulty, and not other random effects.

Our results showed that multiple reading behaviors differed significantly based on the text level, for example, the average, maximum and minimum acceleration of scrolling. We found the most significant features to be the total read time and the maximum reading speeds.

We then used these features as inputs to a machine learning algorithm. We designed and trained a support vector machine (i.e., a binary classifier) to predict whether a text is either advanced or elementary based only on scrolling behaviors as individuals interacted with it. The dataset on which the model was trained contains 60 articles, each of which were read by an average of 17 participants. From these interactions we produced aggregate features by taking the mean of the significant measures across participants.

 

We measured the accuracy of the approach using a metric called f-score, which measures how accurate the model is at classifying a text as either “easy” or “difficult” (where 1.0 reflects perfect classification accuracy). We are able to achieve an f-score of 0.77 on this task, using interaction features alone. This is the first work to show that it is possible to predict the readability of a text using only interaction features.

Improving Readability Models
In order to demonstrate the value of applying readability measures from scrolling behaviors to existing readability models, we integrated scroll-based features into the state-of-the-art automated readability assessment tool, which was released as part of the OneStopEnglish corpus. We found that the addition of interaction features improves the f-score of this model from 0.84 to 0.88. In addition, we were able to significantly outperform this system by using interaction information with simple vocabulary features, such as the number of words in the text, achieving an impressive f-score of 0.96.

In our study, we recorded comprehension scores to evaluate the understanding and readability of text for individuals. Participants were asked three questions per article to assess the reader’s understanding of what they had read. The interaction features of an individual’s scrolling behavior was represented as a high dimensional vector. To explore this data, we visualized the reading interaction features for each participant using t-distributed stochastic neighbor embeddings, which is a statistical method for visualizing high-dimensional data. The results revealed clusters in the comprehension score based on how well individuals understood the text. This shows that there is implicit information in reading interactions about the likelihood that an individual has understood a given text. We refer to this phenomenon as subjective readability. This information can be very useful for educational applications or for simplifying online content.

Plot showing t-SNE projection of scroll interactions in 2-dimensions. The color of each data point corresponds to the comprehension score. Clusters of comprehension scores indicate that there are correlations between reading behaviors and comprehension.

Finally, we investigated the extent to which reading interactions vary across audiences. We compared the average scrolling speed across different reader groups, covering reading proficiency and the reader’s first language. We found that the speed distribution varies depending on the proficiency and first language of the audience. This supports the case that first language and proficiency alter the reading behaviors of audiences, which allows us to contextualize the reading behavior of groups and better understand which areas of text may be harder for them to read.

Histogram showing the average speeds of scrolling (in vertical pixels per millisecond) across readers of different proficiency levels (beginner, intermediate and advanced), with lines showing the smoothed trend for each group. A higher average scroll speed indicates faster reading times. For example, a more challenging text that corresponds to slower scroll speeds by advanced readers is associated with higher scroll speeds by beginners because they engage with the text only superficially.

Histogram showing the average speeds of scrolling (in vertical pixels per millisecond) across audiences by first language of the readers, Tamil or English, with lines showing the smoothed trend for each group. A higher average scroll speed indicates faster reading times. Dark blue bars are where the histograms overlap.

Conclusion
This work is the first to show that reading interactions, such as scrolling behavior, can be used to predict the readability of text, which can yield numerous benefits. Such measures are language agnostic, unobtrusive, and robust to noisy text. Implicit user feedback allows insight into readability at an individual level, thereby allowing for a more inclusive and personalisable assessment of text difficulty. Furthermore, being able to judge the subjective readability of text benefits language learning and educational apps. We conducted a 518 participant study to investigate the impact of text readability on reading interactions and are releasing a novel dataset of the associated reading interactions. We confirm that there are statistically significant differences in the way that readers interact with advanced and elementary texts, and that the comprehension scores of individuals correlate with specific measures of scrolling interaction. For more information our conference presentation is available to view.

Acknowledgements
We thank our collaborators Yevgeni Berzak, Tony Mak and Matt Sharifi, as well as Dmitry Lagun and Blaise Aguera y Arcas for their helpful feedback on the paper.

Read More

Use Amazon SageMaker ACK Operators to train and deploy machine learning models

AWS recently released the new Amazon SageMaker Operators for Kubernetes using the AWS Controllers for Kubernetes (ACK). ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like databases or message queues simply by using the Kubernetes API. The new SageMaker ACK Operators make it easier for machine learning (ML) developers and data scientists who use Kubernetes as their control plane to train, tune, and deploy ML models in Amazon SageMaker without signing in to the SageMaker console.

Kubernetes and SageMaker

Building scalable ML workflows involves many iterative steps, including sourcing and preparing data, building ML models, training and evaluating these models, deploying them to production, and monitoring workloads after deployment.

SageMaker is a fully managed service designed and optimized specifically for managing these ML workflows. It removes the undifferentiated heavy lifting of infrastructure management and eliminates the need to invest in IT and DevOps to manage clusters for ML model building, training, and inference. Compute resources are only provisioned when requested, scaled as needed, and shut down automatically when jobs complete, thereby providing near 100% utilization. SageMaker provides many performance and cost optimizations for distributed training, spot training, automatic model tuning, inference latency, and multi-model endpoints.

Many AWS customers who have portability requirements implement a hybrid cloud approach, or implement on-premises and use Kubernetes, an open-source, general-purpose container orchestration system, to set up repeatable ML pipelines running training and inference workloads. However, to support ML workloads, these developers still need to write custom code to optimize the underlying ML infrastructure, provide high availability and reliability, provide data science productivity tools, and comply with appropriate security and regulatory requirements. Kubernetes customers therefore want to use fully managed ML services such as SageMaker for cost-optimized and managed infrastructure, but want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines to retain standardization and portability.

To address this need, AWS allows you to train, tune, and deploy models in SageMaker by using the new SageMaker ACK Operators, which includes a set of custom resource definitions for SageMaker resources that extends the Kubernetes API. With the SageMaker ACK Operators, you can take advantage of fully managed SageMaker infrastructure, tools, and optimizations natively from Kubernetes.

How did we get here?

In late 2019, AWS introduced the SageMaker Operators for Kubernetes to enable developers and data scientists to manage the end-to-end SageMaker training and production lifecycle using Kubernetes as the control plane. SageMaker operators were installed from the GitHub repo by downloading a YAML configuration file that configured your Kubernetes cluster with the custom resource definitions and operator controller service.

In 2020, AWS introduced ACK to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a common controller runtime, a code generator, and a set of AWS service-specific controllers, one of which is the SageMaker controller.

Going forward, new functionality will be added to the SageMaker Operators for Kubernetes through the ACK project.

How does ACK work?

The following diagram illustrates how ACK works.

In this example, Alice is a Kubernetes user. She wants to run model training on SageMaker from within the Kubernetes cluster using the Kubernetes API. Alice issues a call to kubectl apply, passing in a file that describes a Kubernetes custom resource describing her SageMaker training job. kubectl apply passes this file, called a manifest, to the Kubernetes API server running in the Kubernetes controller node (Step 1 in the workflow diagram).

The Kubernetes API server receives the manifest with the SageMaker training job specification and determines whether Alice has permissions to create a custom resource of kind sageMaker.services.k8s.aws/TrainingJob, and whether the custom resource is properly formatted (Step 2).

If Alice is authorized and the custom resource is valid, the Kubernetes API server writes (Step 3) the custom resource to its etcd data store and then responds back (Step 4) to Alice that the custom resource has been created.

The SageMaker controller, which is running on a Kubernetes worker node within the context of a normal Kubernetes Pod, is notified (Step 5) that a new custom resource of kind SageMaker.services.k8s.aws/TrainingJob has been created.

The SageMaker controller then communicates (Step 6) with the SageMaker API, calling the SageMaker CreateTrainingJob API to create the training job in AWS. After communicating with the SageMaker API, the SageMaker controller calls the Kubernetes API server to update (Step 7) the custom resource’s status with information it received from SageMaker. The SageMaker controller therefore provides the same information to the developers that they would have received using the AWS SDK. This results in a better and consistent developer experience.

Machine learning use case

For this post, we follow the SageMaker example provided in the following notebook. However, you can reuse the components in this example with your preference of SageMaker built-in or custom algorithms and your own datasets.

We use the Abalone dataset originally from the UCI data repository [1]. In the libsvm converted version, the nominal feature (male/female/infant) has been converted into a real valued feature. The age of abalone is to be predicted from eight physical measurements. This dataset is already processed and stored in Amazon Simple Storage Service (Amazon S3). We train an XGBoost model on the UCI Abalone dataset to replicate the flow in the example Jupyter notebook.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account.

An existing Amazon Elastic Kubernetes Service (Amazon EKS) cluster. It should be Kubernetes version 1.16+. For automated cluster creation using eksctl, see Getting started with Amazon EKS – eksctl and create your cluster with Amazon EC2 Linux managed nodes.

Install the following tools on the client machine used to access your Kubernetes cluster (you can use AWS Cloud9, a cloud-based integrated development environment (IDE) for the Kubernetes cluster setup):

  • kubectl – A command line tool for working with Kubernetes clusters.
  • Helm version 3.7+ – A tool for installing and managing Kubernetes applications.
  • AWS Command Line Interface (AWS CLI) – A command line tool for interacting with AWS services.
  • eksctl – A command line tool for working with Amazon EKS clusters that automates many individual tasks.
  • yq – A command line YAML processor. (For Linux environments, use the wget plain binary installation).

Set up IAM role-based authentication for the controller Pod

IAM roles for service accounts (IRSA) allows fine-grained roles at the Kubernetes Pod level by combining an OpenID Connect (OIDC) identity provider with Kubernetes service account annotations. In this section, we associate the Amazon EKS cluster with an OIDC provider and create an AWS Identity and Access Management (IAM) role that is assumed by the ACK controller Pod via its service account to access AWS services.

Create a cluster and OIDC ID provider

Make sure you’re connected to the right cluster. Substitute the values for CLUSTER_NAME and CLUSTER_REGION below:

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

# Set the cluster name, region where the cluster exists
export CLUSTER_NAME=<CLUSTER_NAME>
export CLUSTER_REGION=<CLUSTER_REGION>
export RANDOM_VAR=$RANDOM

aws eks update-kubeconfig --name $CLUSTER_NAME --region $CLUSTER_REGION
kubectl config get-contexts 

# Ensure cluster has compute
kubectl get nodes

Set up the OIDC ID provider (IdP) in AWS and associate it with your Amazon EKS cluster:

eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} 
--region ${CLUSTER_REGION} --approve

Get the identity issuer URL by running the following code:

export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
OIDC_PROVIDER_URL=$(aws eks describe-cluster --name $CLUSTER_NAME --region $CLUSTER_REGION --query "cluster.identity.oidc.issuer" --output text | cut -c9-)

Set up an IAM role

Next, let’s set up the IAM role that defines the access to the SageMaker and Application Auto Scaling services. For this, we also need to have an IAM trust policy in place, allowing the specified Kubernetes service account (for example, ack-sagemaker-controller) to assume the IAM role.

Create a file named trust.json and insert the following trust relationship code block required for IAM role:

printf '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::'$AWS_ACCOUNT_ID':oidc-provider/'$OIDC_PROVIDER_URL'"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "'$OIDC_PROVIDER_URL':aud": "sts.amazonaws.com",
          "'$OIDC_PROVIDER_URL':sub": [
            "system:serviceaccount:ack-system:ack-sagemaker-controller",
            "system:serviceaccount:ack-system:ack-applicationautoscaling-controller"
          ]
        }
      }
    }
  ]
}
' > ./trust.json

Updating an Application Auto Scaling Scalable Target requires additional permissions. First, create a service-linked role for Application Auto Scaling.

aws iam create-service-linked-role --aws-service-name sagemaker.application-autoscaling.amazonaws.com

Create a file named pass_role_policy.json to create the policy required for the IAM role.

printf '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::'$AWS_ACCOUNT_ID':role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint"
    }
  ]
}
' > ./pass_role_policy.json

Run the following command to create a role with the trust relationship defined in trust.json. This trust relationship is required so that Amazon EKS (via a webhook) can inject the necessary environment variables and mount volumes into the Pod that are required by the AWS SDK to assume this role.

OIDC_ROLE_NAME=ack-controller-role-$CLUSTER_NAME

aws iam create-role --role-name $OIDC_ROLE_NAME --assume-role-policy-document file://trust.json

# Attach the AmazonSageMakerFullAccess Policy to the Role. This policy provides full access to 
# Amazon SageMaker. Also provides select access to related services (e.g., Application Autoscaling,
# S3, ECR, CloudWatch Logs).
aws iam attach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

# Attach the iam:PassRole policy required for updating ApplicationAutoscaling ScalableTarget
aws iam put-role-policy --role-name $OIDC_ROLE_NAME --policy-name "iam-pass-role-policy" --policy-document file://pass_role_policy.json

export IAM_ROLE_ARN_FOR_IRSA=$(aws iam get-role --role-name $OIDC_ROLE_NAME --output text --query 'Role.Arn')
echo $IAM_ROLE_ARN_FOR_IRSA

Install SageMaker and Application Auto Scaling controllers

Choose an AWS Region for the SageMaker and automatic scaling resources we create in this post. For convenience, we recommend using us-east-1:

export SERVICE_REGION="us-east-1"
# Namespace for controller
export ACK_K8S_NAMESPACE="ack-system"

Now, let’s install the SageMaker and Application Auto Scaling controller using the following helper script. This script pulls the helm charts from ACK’s public Amazon Elastic Container Registry (Amazon ECR) repository and configures the values of the AWS account, default Region for resources to be created, and IAM role (created in previous step) in the service account to be used by the controller Pod to assume the role. Create a file named install-controllers.sh and insert the following code block:

#!/usr/bin/env bash

# Deploy ACK Helm Charts
export HELM_EXPERIMENTAL_OCI=1
export ACK_K8S_NAMESPACE=${ACK_K8S_NAMESPACE:-"ack-system"}

function install_ack_controller() {
    local service="$1"
    local release_version="$2"
    local chart_export_path=/tmp/chart
    local chart_ref=$service-chart
    local chart_repo=public.ecr.aws/aws-controllers-k8s/$chart_ref
    local chart_package=$chart_ref-$release_version.tgz
    
    # Download helm chart
    mkdir -p $chart_export_path
    helm pull oci://"$chart_repo" --version "$release_version" -d $chart_export_path
    tar xvf "$chart_export_path"/"$chart_package" -C "$chart_export_path"

    # Update the values in helm chart
    pushd $chart_export_path/$service-chart
        yq e '.aws.region = env(SERVICE_REGION)' -i values.yaml 
        yq e '.serviceAccount.annotations."eks.amazonaws.com/role-arn" = env(IAM_ROLE_ARN_FOR_IRSA)' -i values.yaml
    popd

    # Create a namespace and install the helm chart
    helm install -n $ACK_K8S_NAMESPACE --create-namespace ack-$service-controller $chart_export_path/$service-chart
}

install_ack_controller "sagemaker" "v0.3.0"
install_ack_controller "applicationautoscaling" "v0.2.0"

Run the script:

chmod +x install-controllers.sh
./install-controllers.sh

The output contains the following:

Pulled: public.ecr.aws/aws-controllers-k8s/sagemaker-chart:v0.3.0
...

NAME: ack-sagemaker-controller
LAST DEPLOYED: Tue Nov 16 01:53:34 2021
NAMESPACE: ack-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
Pulled: public.ecr.aws/aws-controllers-k8s/applicationautoscaling-chart:v0.2.0
...

NAME: ack-applicationautoscaling-controller
LAST DEPLOYED: Tue Nov 16 01:53:35 2021
NAMESPACE: ack-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

Next, we run the following commands to verify custom resource definitions were applied and controller Pods are running:

kubectl get crds | grep "services.k8s.aws"

The output of the command should contain a number of custom resource definitions related to SageMaker (such as trainingjobs or endpoint) and Application Auto Scaling (such as scalingpolicies and scalabletargets):

# Get pods in controller namespace
kubectl get pods -n $ACK_K8S_NAMESPACE

We see one controller Pod per service running in the ack-system namespace:

NAME                                                     READY   STATUS    RESTARTS   AGE
ack-applicationautoscaling-controller-7479dc78dd-ts9ng   1/1     Running   0          4m52s
ack-sagemaker-controller-788858fc98-6fgr6                1/1     Running   0          4m56s

Prepare SageMaker resources

Next, we create an S3 bucket and IAM role for SageMaker.

To train a model with SageMaker, we need an S3 bucket to store the dataset and artifacts from the training process. We simply use the preprocessed dataset at s3://SageMaker-sample-files/datasets/tabular/uci_abalone[1].

Let’s create a variable for the S3 bucket:

export SAGEMAKER_BUCKET=ack-sagemaker-bucket-$RANDOM_VAR

Create a file named create-bucket.sh and insert the following code block:

printf '
#!/usr/bin/env bash
# create bucket
if [[ $SERVICE_REGION != "us-east-1" ]]; then
  aws s3api create-bucket --bucket "$SAGEMAKER_BUCKET" --region "$SERVICE_REGION" --create-bucket-configuration LocationConstraint="$SERVICE_REGION"
else
  aws s3api create-bucket --bucket "$SAGEMAKER_BUCKET" --region "$SERVICE_REGION"
fi
# sync dataset
aws s3 sync s3://sagemaker-sample-files/datasets/tabular/uci_abalone/train s3://"$SAGEMAKER_BUCKET"/datasets/tabular/uci_abalone/train
aws s3 sync s3://sagemaker-sample-files/datasets/tabular/uci_abalone/validation s3://"$SAGEMAKER_BUCKET"/datasets/tabular/uci_abalone/validation
' > ./create-bucket.sh

Run the script to create the S3 bucket and copy the dataset:

chmod +x create-bucket.sh
./create-bucket.sh

The SageMaker training job that we run later in the post needs an IAM role to access Amazon S3 and SageMaker. Run the following commands to create a SageMaker execution IAM role that is used by SageMaker to access AWS resources:

export SAGEMAKER_EXECUTION_ROLE_NAME=ack-sagemaker-execution-role-$RANDOM_VAR

TRUST="{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }"
aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

SAGEMAKER_EXECUTION_ROLE_ARN=$(aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn')

echo $SAGEMAKER_EXECUTION_ROLE_ARN

Note down the execution role ARN to use in later steps.

Train an XGBoost model

Now, we create a training.yaml file to specify the parameters for a SageMaker training job. SageMaker training jobs enable remote training of ML models. You can customize each training job to run your own ML scripts with custom architectures, data loaders, hyperparameters, and more. To submit a SageMaker training job, we require a job name. Let’s create that variable first:

export JOB_NAME=ack-xgboost-training-job-$RANDOM_VAR

In the following code, we create a training.yaml file that contains the hyperparameters for the training job as well as the location of the training and validation data. It’s also where we specify the Amazon ECR image used for training.

Note: If your $SERVICE_REGION isn’t us-east-1, change the following image URI. For the XGBoost algorithm version 1.2-1 Region-specific image URI, see Docker Registry Paths and Example Code.

export XGBOOST_IMAGE=683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.2-1

printf '
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: TrainingJob
metadata:
  name: '$JOB_NAME'
spec:
  # Name that will appear in SageMaker console
  trainingJobName: '$JOB_NAME'
  hyperParameters: 
    max_depth: "5"
    gamma: "4"
    eta: "0.2"
    min_child_weight: "6"
    subsample: "0.7"
    objective: "reg:linear"
    num_round: "50"
    verbosity: "2"
  algorithmSpecification:
    trainingImage: '$XGBOOST_IMAGE'
    trainingInputMode: File
  roleARN: '$SAGEMAKER_EXECUTION_ROLE_ARN'
  outputDataConfig:
    # The output path of our model
    s3OutputPath: s3://'$SAGEMAKER_BUCKET'
  resourceConfig:
    instanceCount: 1
    instanceType: ml.m4.xlarge
    volumeSizeInGB: 5
  stoppingCondition:
    maxRuntimeInSeconds: 3600
  inputDataConfig:
    - channelName: train
      dataSource:
        s3DataSource:
          s3DataType: S3Prefix
          # The input path of our train data 
          s3URI: s3://'$SAGEMAKER_BUCKET'/datasets/tabular/uci_abalone/train/abalone.train
          s3DataDistributionType: FullyReplicated
      contentType: text/libsvm
      compressionType: None
    - channelName: validation
      dataSource:
        s3DataSource:
          s3DataType: S3Prefix
          # The input path of our validation data 
          s3URI: s3://'$SAGEMAKER_BUCKET'/datasets/tabular/uci_abalone/validation/abalone.validation
          s3DataDistributionType: FullyReplicated
      contentType: text/libsvm
      compressionType: None 
' > ./training.yaml

Now, we can create the training job:

kubectl apply -f training.yaml

You should see the following output:

trainingjob.sagemaker.services.k8s.aws/ack-xgboost-training-job-7420 created

You can watch the status of the training job. It takes a few minutes for STATUS to show as Completed.

kubectl get trainingjob.sagemaker --watch
NAME                            SECONDARYSTATUS   STATUS
ack-xgboost-training-job-7420   Starting          InProgress
ack-xgboost-training-job-7420   Downloading       InProgress
ack-xgboost-training-job-7420   Training          InProgress
ack-xgboost-training-job-7420   Completed         Completed

Deploy the results of the SageMaker training job

To deploy the model, we need to specify a model name, an endpoint config name, and an endpoint name:

export MODEL_NAME=ack-xgboost-model-$RANDOM_VAR
export ENDPOINT_CONFIG_NAME=ack-xgboost-endpoint-config-$RANDOM_VAR
export ENDPOINT_NAME=ack-xgboost-endpoint-$RANDOM_VAR

We deploy this model on a c5.large instance type. In the following .yaml file, we define the model, the endpoint config, and the endpoint:

printf '
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
  name: '$MODEL_NAME'
spec:
  modelName: '$MODEL_NAME'
  primaryContainer:
    containerHostname: xgboost
    # The source of the model data
    modelDataURL: s3://'$SAGEMAKER_BUCKET'/'$JOB_NAME'/output/model.tar.gz
    image: '$XGBOOST_IMAGE'
  executionRoleARN: '$SAGEMAKER_EXECUTION_ROLE_ARN'
---
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: EndpointConfig
metadata:
  name: '$ENDPOINT_CONFIG_NAME'
spec:
  endpointConfigName: '$ENDPOINT_CONFIG_NAME'
  productionVariants:
  - modelName: '$MODEL_NAME'
    variantName: AllTraffic
    instanceType: ml.c5.large
    initialInstanceCount: 1
---
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Endpoint
metadata:
  name: '$ENDPOINT_NAME'
spec:
  endpointName: '$ENDPOINT_NAME'
  endpointConfigName: '$ENDPOINT_CONFIG_NAME'
' > ./deploy.yaml

Now, the endpoint is ready to be deployed:

kubectl apply -f deploy.yaml

You should see the following output:

model.sagemaker.services.k8s.aws/ack-xgboost-model-7420 created
endpointconfig.sagemaker.services.k8s.aws/ack-xgboost-endpoint-config-7420 created
endpoint.sagemaker.services.k8s.aws/ack-xgboost-endpoint-7420 created

We can observe that the model and endpoint config were created. Deploying the endpoint may take some time:

kubectl describe models.sagemaker
kubectl describe endpointconfigs.sagemaker
kubectl describe endpoints.sagemaker

We can watch this process using the following command:

kubectl get endpoints.sagemaker --watch

After some time, the STATUS changes to InService:

NAME                        STATUS
ack-xgboost-endpoint-7420   Creating         
ack-xgboost-endpoint-7420   InService        

This indicates the deployed endpoint is ready for use.

Verify the inference capabilities of the trained model

We invoke the model endpoint using Python to emulate a typical use case. We reuse the code in SageMaker example notebook.

We first download the test set from Amazon S3. Then we load a single sample from the test set and use it to invoke the endpoint we deployed in the previous section. Download the test file with the following code:

pip install boto3 numpy
aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/test/abalone.test abalone.test
head -1 abalone.test > abalone.single.test

Use the Python interpreter to test inference. The Python interpreter is usually installed as /usr/local/bin/python<version> on those machines where it’s available; putting /usr/local/bin in your Unix/Linux shell’s search path makes it possible to start it by entering the Python command.

Create a file named predict.py and insert the following code block:

printf '
import sys
import math
import json
import boto3
import numpy as np
import os

region = os.environ.get("SERVICE_REGION")
endpoint_name = os.environ.get("ENDPOINT_NAME")

runtime_client = boto3.client("runtime.sagemaker", region_name=region)

file_name = "abalone.single.test"
with open(file_name, "r") as f:
    payload = f.read().strip()

response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="text/x-libsvm", Body=payload
)

result = response["Body"].read().decode("utf-8").split(",")
result = [math.ceil(float(i)) for i in result]
label = payload.strip(" ").split()[0]
print("Label: " + label)
print("Prediction:" + str(result[0]))
' > ./predict.py
python predict.py

Running this sample should give us the following result:

Label: 12
Prediction: 13

The age of the abalone that is provided in the test example is estimated to be 13 by the ML model. The actual age was 12. This suggests that our ML model has been trained and provides reasonable predictions. However, the experienced ML user may realize that we haven’t performed hyperparameter tuning and other methods of increasing accuracy yet, which is outside the scope of this post.

Dynamically scale the endpoint according to the load

SageMaker ACK Operators support custom resource definitions for automatic scaling (using ScalableTarget and ScalingPolicy) for your hosted models. The following resources adjust the number of instances (minimum 1 to maximum 20) provisioned for a model in response to changes in metric SageMakerVariantInvocationsPerInstancetracking, which is the average number of times per minute that each instance for a variant is invoked:

printf '
apiVersion: applicationautoscaling.services.k8s.aws/v1alpha1
kind: ScalableTarget
metadata:
  name: ack-scalable-target-predfined
spec:
  maxCapacity: 20
  minCapacity: 1
  resourceID: endpoint/'$ENDPOINT_NAME'/variant/AllTraffic
  scalableDimension: "sagemaker:variant:DesiredInstanceCount"
  serviceNamespace: sagemaker
---
apiVersion: applicationautoscaling.services.k8s.aws/v1alpha1
kind: ScalingPolicy
metadata:
  name: ack-scaling-policy-predefined
spec:
  policyName: ack-scaling-policy-predefined
  policyType: TargetTrackingScaling
  resourceID: endpoint/'$ENDPOINT_NAME'/variant/AllTraffic
  scalableDimension: "sagemaker:variant:DesiredInstanceCount"
  serviceNamespace: sagemaker
  targetTrackingScalingPolicyConfiguration:
    targetValue: 60
    scaleInCooldown: 700
    scaleOutCooldown: 300
    predefinedMetricSpecification:
        predefinedMetricType: SageMakerVariantInvocationsPerInstance
 ' > ./scale-endpoint.yaml

Apply with the following code:

kubectl apply -f scale-endpoint.yaml

You should see the following output:

scalabletarget.applicationautoscaling.services.k8s.aws/ack-scalable-target-predfined created
scalingpolicy.applicationautoscaling.services.k8s.aws/ack-scaling-policy-predefined created

We can observe that scalingpolicy was created:

kubectl describe scalingpolicy.applicationautoscaling

The output of scalingpolicy looks like the following:

Status:
  Ack Resource Metadata:
    Arn:               arn:aws:autoscaling:us-east-1:123456789012:scalingPolicy:b33d12b8-aa81-4cb8-855e-c7b6dcb9d6e7:resource/SageMaker/endpoint/ack-xgboost-endpoint/variant/AllTraffic:policyName/ack-scaling-policy-predefined
    Owner Account ID:  123456789012
  Alarms:
    Alarm ARN:   arn:aws:cloudwatch:us-east-1:123456789012:alarm:TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmHigh-966b8232-a9b9-467d-99f3-95436f5c0383
    Alarm Name:  TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmHigh-966b8232-a9b9-467d-99f3-95436f5c0383
    Alarm ARN:   arn:aws:cloudwatch:us-east-1:123456789012:alarm:TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmLow-71e39f85-1afb-401d-9703-b788cdc10a93
    Alarm Name:  TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmLow-71e39f85-1afb-401d-9703-b788cdc10a93

Clean up

Run the following commands to delete the resources created in this post:

kubectl delete -f scale-endpoint.yaml
kubectl delete -f deploy.yaml
kubectl delete -f training.yaml

Create a file named uninstall-controller.sh and insert the following code block required for deleting the controller and custom resource definitions:

printf '
#!/usr/bin/env bash

# Uninstall Controller

export HELM_EXPERIMENTAL_OCI=1
export ACK_K8S_NAMESPACE=${ACK_K8S_NAMESPACE:-"ack-system"}

function uninstall_ack_controller() {
   local service="$1"
   local chart_export_path=/tmp/chart
   
   helm uninstall -n $ACK_K8S_NAMESPACE ack-$service-controller
   kubectl delete -f $chart_export_path/ack-$service-controllerchart/crds
}

uninstall_ack_controller "sagemaker"
uninstall_ack_controller "applicationautoscaling"
' > ./uninstall-controller.sh

Run the following commands to uninstall the controller and custom resource definitions, and delete the namespace, IAM roles, and S3 bucket you created:

# uninstall controller and remove CRDs
chmod +x uninstall-controller.sh
./uninstall-controller.sh

# Delete controller namespace
kubectl delete namespace $ACK_K8S_NAMESPACE

# Delete S3 bucket
aws s3 rb s3://$SAGEMAKERageMaker_BUCKET --force

# Delete SageMaker execution role
aws iam detach-role-policy --role-name $SAGEMAKER_EXECUTION_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam detach-role-policy --role-name $SAGEMAKER_EXECUTION_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam delete-role --role-name $SAGEMAKER_EXECUTION_ROLE_NAME

# Delete application autoscaling service linked role
aws iam delete-service-linked-role --role-name AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint

# Delete IAM role created for IRSA
aws iam detach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam delete-role-policy --role-name $OIDC_ROLE_NAME --policy-name "iam-pass-role-policy"
aws iam delete-role --role-name $OIDC_ROLE_NAME

Conclusion

SageMaker ACK Operators provide engineering teams with a native Kubernetes experience for creating and interacting with the ML jobs on SageMaker, either with the Kubernetes API or with Kubernetes command line utilities such as kubectl. You can build automation, tooling, and custom interfaces for data scientists in Kubernetes by using these controllers—all without building, maintaining, or optimizing ML infrastructure. Data scientists and developers familiar with Kubernetes can compose and interact with fully managed SageMaker training, tuning, and inference jobs, as you would with Kubernetes jobs running locally. Logs from SageMaker jobs stream back to Kubernetes, allowing you to natively view logs for your model training, tuning, and prediction jobs in the command line.

ACK is a community-driven project and will soon include service controllers for other AWS service APIs.

Links

[1] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


About the Authors

Kanwaljit Khurmi is a Senior Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Suraj Kota is a Software Engineer specialized in Machine Learning infrastructure. He builds tools to easily get started and scale machine learning workload on AWS. He worked on the AWS Deep Learning Containers, Deep Learning AMI, SageMaker Operators for Kubernetes, and other open source integrations like Kubeflow.

Archis Joglekar is an AI/ML Partner Solutions Architect in the Emerging Technologies team. He is interested in performant, scalable deep learning and scientific computing using the building blocks at AWS. His past experiences range from computational physics research to machine learning platform development in academia, national labs, and startups. His time away from the computer is spent playing soccer and with friends and family.

Read More