Improve multi-hop reasoning in LLMs by learning from rich human feedback

Improve multi-hop reasoning in LLMs by learning from rich human feedback

Recent large language models (LLMs) have enabled tremendous progress in natural language understanding. However, they are prone to generating confident but nonsensical explanations, which poses a significant obstacle to establishing trust with users. In this post, we show how to incorporate human feedback on the incorrect reasoning chains for multi-hop reasoning to improve performance on these tasks. Instead of collecting the reasoning chains from scratch by asking humans, we instead learn from rich human feedback on model-generated reasoning chains using the prompting abilities of the LLMs. We collect two such datasets of human feedback in the form of (correction, explanation, error type) for StrategyQA and Sports Understanding datasets, and evaluate several common algorithms to learn from such feedback. Our proposed methods perform competitively to chain-of-thought prompting using the base Flan-T5, and ours is better at judging the correctness of its own answer.

Solution overview

With the onset of large language models, the field has seen tremendous progress on various natural language processing (NLP) benchmarks. Among them, the progress has been striking on relatively simpler tasks such as short context or factual question answering, compared to harder tasks that require reasoning such as multi-hop question answering. The performance of certain tasks using LLMs may be similar to random guessing at smaller scales, but improves significantly at larger scales. Despite this, the prompting abilities of LLMs have the potential to provide some relevant facts required to answer the question.

However, those models may not reliably generate correct reasoning chains or explanations. Those confident but nonsensical explanations are even more prevalent when LLMs are trained using Reinforcement Learning from Human Feedback (RLHF), where reward hacking may occur.

Motivated by this, we try to address the following research question: can we improve reasoning of LLMs by learning from human feedback on model-generated reasoning chains? The following figure provides an overview of our approach: we first prompt the model to generate reasoning chains for multi-hop questions, then collect diverse human feedback on these chains for diagnosis and propose training algorithms to learn from the collected data.

We collect diverse human feedback on two multi-hop reasoning datasets, StrategyQA and Sports Understanding from BigBench. For each question and model-generated reasoning chain, we collect the correct reasoning chain, the type of error in the model-generated reasoning chain, and a description (in natural language) of why that error is presented in the provided reasoning chain. The final dataset contains feedback for 1,565 samples from StrategyQA and 796 examples for Sports Understanding.

We propose multiple training algorithms to learn from the collected feedback. First, we propose a variant of self-consistency in chain-of-thought prompting by considering a weighted variant of it that can be learned from the feedback. Second, we propose iterative refinement, where we iteratively refine the model-generated reasoning chain until it’s correct. We demonstrate empirically on the two datasets that fine-tuning an LLM, namely Flan-T5 using the proposed algorithms, performs comparably to the in-context learning baseline. More importantly, we show that the fine-tuned model is better at judging if its own answer is correct compared to the base Flan-T5 model.

Data collection

In this section, we describe the details of the feedback we collected and the annotation protocol followed during data collection. We collected feedback for model generations based on two reasoning-based datasets: StrategyQA and Sports Understanding from BigBench. We used GPT-J to generate the answer for StrategyQA and Flan-T5 to generate the answer for the Sports Understanding dataset. In each case, the model was prompted with k in-context examples containing question, answer, and explanation, followed by the test question.

The following figure shows the interface we used. Annotators are given the question, the model-generated answer, and the explanation split into steps.

For each question, we collected the following feedback:

  • Subquestions – The annotators decompose the original question into simpler subquestions required to answer the original question. This task was added after a pilot where we found that adding this task helps prepare the annotators and improve the quality of the rest of the tasks.
  • Correction – Annotators are provided with a free-form text box pre-filled with the model-generated answer and explanation, and asked to edit it to obtain the correct answer and explanation.
  • Error type – Among the most common types of error we found in the model generations (Factual Error, Missing Facts, Irrelevant Facts, and Logical Inconsistency), annotators were asked to pick one or more of the error types that apply to the given answer and explanation.
  • Error description – The annotators were instructed to not only classify the errors but also give a comprehensive justification for their categorization, including pinpointing the exact step where the mistake occurred and how it applies to the answer and explanation provided.

We used Amazon SageMaker Ground Truth Plus in our data collection. The data collection took place over multiple rounds. We first conducted two small pilots of 30 examples and 200 examples, respectively, after which the annotator team was given detailed feedback on the annotation. We then conducted the data collection over two batches for StrategyQA, and over one batch for Sports Understanding, giving periodic feedback throughout—a total of 10 annotators worked on the task over a period of close to 1 month.

We gathered feedback on a total of 1,565 examples for StrategyQA and 796 examples for Sports Understanding. The following table illustrates the percentage of examples that were error-free in the model generation and the proportion of examples that contained a specific error type. It’s worth noting that some examples may have more than one error type.

Error Type StrategyQA Sports Understanding
None 17.6% 31.28%
Factual Error 27.6% 38.1%
Missing Facts 50.4% 46.1%
Irrelevant Facts 14.6% 3.9%
Logical Inconsistency 11.2% 5.2%

Learning algorithms

For each question q, and model-generated answer and explanation m, we collected the following feedback: correct answer and explanation c, type of error present in m (denoted by t), and error description d, as described in the previous section.

We used the following methods:

  • Multitask learning – A simple baseline to learn from the diverse feedback available is to treat each of them as a separate task. More concretely, we fine-tune Flan-T5 (text to text) with the objective maximize p(c|q) + p(t|q, m) + p(d|q, m). For each term in the objective, we use a separate instruction appropriate for the task (for example, “Predict error in the given answer”). We also convert the categorical variable t into a natural language sentence. During inference, we use the instruction for the term p(c|q) (“Predict the correct answer for the given question”) to generate the answer for the test question.
  • Weighted self-consistency – Motivated by the success of self-consistency in chain-of-thought prompting, we propose a weighted variant of it. Instead of treating each sampled explanation from the model as correct and considering the aggregate vote, we instead first consider whether the explanation is correct and then aggregate accordingly. We first fine-tune Flan-T5 with the same objective as in multitask learning. During inference, given a test question q, we sample multiple possible answers with the instruction for p(c|q)): a1, a2, .., an. For each sampled answer ai, we use the instruction for the term p(t|q, m) (“Predict error in the given answer”) to identify if it contains error ti = argmax p(t|q, a_i). Each answer ai is assigned a weight of 1 if it’s correct, otherwise it’s assigned a weight smaller than 1 (tunable hyperparameter). The final answer is obtained by considering a weighted vote over all the answers a1 to an.
  • Iterative refinement – In the previous proposed methods, the model directly generates the correct answer c conditioned on the question q. Here we propose to refine the model-generated answer m to obtain the correct answer for a given question. More specifically, we first fine-tune Flan-T5 (text to text with the objective) with maximize p(t; c|q, m), where ; denotes the concatenation (error type t followed by the correct answer c). One way to view this objective is that the model is first trained to identify the error in given generation m, and then to remove that error to obtain the correct answer c. During inference, we can use the model iteratively until it generates the correct answer—given a test question q, we first obtain the initial model generation m (using pre-trained Flan-T5). We then iteratively generate the error type ti and potential correct answer ci until ti = no error (in practice, we set a maximum number of iterations to a hyperparameter), in which case the final correct answer will be ci-1 (obtained from p(ti ; ci | q, ci-1)).

Results

For both datasets, we compare all the proposed learning algorithms with the in-context learning baseline. All models are evaluated on the dev set of StrategyQA and Sports Understanding. The following table shows the results.

Method StrategyQA Sports Understanding
Flan-T5 4-shot Chain-of-Thought In-Context Learning 67.39 ± 2.6% 58.5%
Multitask Learning 66.22 ± 0.7% 54.3 ± 2.1%
Weighted Self Consistency 61.13 ± 1.5% 51.3 ± 1.9%
Iterative Refinement 61.85 ± 3.3% 57.0 ± 2.5%

As observed, some methods perform comparable to the in-context learning baseline (multitask for StrategyQA, and iterative refinement for Sports Understanding), which demonstrates the potential of gathering ongoing feedback from humans on model outputs and using that to improve language models. This is different from recent work such as RLHF, where the feedback is limited to categorical and usually binary.

As shown in the following table, we investigate how models adapted with human feedback on reasoning mistakes can help improve the calibration or the awareness of confidently wrong explanations. This is evaluated by prompting the model to predict if its generation contains any errors.

Method Error Correction StrategyQA
Flan-T5 4-shot Chain-of-Thought In-Context Learning No 30.17%
Multitask Finetuned Model Yes 73.98%

In more detail, we prompt the language model with its own generated answer and reasoning chain (for which we collected feedback), and then prompt it again to predict the error in the generation. We use the appropriate instruction for the task (“Identify error in the answer”). The model is scored correctly if it predicts “no error” or “correct” in the generation if the annotators labeled the example as having no error, or if it predicts any of the error types in the generation (along with “incorrect” or “wrong”) when the annotators labeled it as having an error. Note that we don’t evaluate the model’s ability to correctly identify the error type, but rather if an error is present. The evaluation is done on a set of 173 additional examples from the StrategyQA dev set that were collected, which aren’t seen during fine-tuning. Four examples out of these are reserved for prompting the language model (first row in the preceding table).

Note that we do not show the 0-shot baseline result because the model is unable to generate useful responses. We observe that using human feedback for error correction on reasoning chains can improve the model’s prediction of whether it makes errors or not, which can improve the awareness or calibration of the wrong explanations.

Conclusion

In this post, we showed how to curate human feedback datasets with fine-grained error corrections, which is an alternative way to improve the reasoning abilities of LLMs. Experimental results corroborate that human feedback on reasoning errors can improve performance and calibration on challenging multi-hop questions.

If you’re looking for human feedback to improve your large language models, visit Amazon SageMaker Data Labeling and the Ground Truth Plus console.


About the Authors

Erran Li is the applied science manager at humain-in-the-loop services, AWS AI, Amazon. His research interests are 3D deep learning, and vision and language representation learning. Previously he was a senior scientist at Alexa AI, the head of machine learning at Scale AI and the chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and the machine learning platform team at Uber working on machine learning for autonomous driving, machine learning systems and strategic initiatives of AI. He started his career at Bell Labs and was adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and co-organized several workshops at NeurIPS, ICML, CVPR, ICCV on machine learning for autonomous driving, 3D vision and robotics, machine learning systems and adversarial machine learning. He has a PhD in computer science at Cornell University. He is an ACM Fellow and IEEE Fellow.

Nitish Joshi was an applied science intern at AWS AI, Amazon. He is a PhD student in computer science at New York University’s Courant Institute of Mathematical Sciences advised by Prof. He He. He works on Machine Learning and Natural Language Processing, and he was affiliated with the Machine Learning for Language (ML2) research group. He was broadly interested in robust language understanding: both in building models which are robust to distribution shifts (e.g. through human-in-the-loop data augmentation) and also in designing better ways to evaluate/measure the robustness of models. He has also been curious about the recent developments in in-context learning and understanding how it works.

Kumar Chellapilla is a General Manager and Director at Amazon Web Services and leads the development of ML/AI Services such as human-in-loop systems, AI DevOps, Geospatial ML, and ADAS/Autonomous Vehicle development. Prior to AWS, Kumar was a Director of Engineering at Uber ATG and Lyft Level 5 and led teams using machine learning to develop self-driving capabilities such as perception and mapping. He also worked on applying machine learning techniques to improve search, recommendations, and advertising products at LinkedIn, Twitter, Bing, and Microsoft Research.

Read More

How to extend the functionality of AWS Trainium with custom operators

How to extend the functionality of AWS Trainium with custom operators

Deep learning (DL) is a fast-evolving field, and practitioners are constantly innovating DL models and inventing ways to speed them up. Custom operators are one of the mechanisms developers use to push the boundaries of DL innovation by extending the functionality of existing machine learning (ML) frameworks such as PyTorch. In general, an operator describes the mathematical function of a layer in a deep learning model. A custom operator allows developers to build their own mathematical functions for a layer in the deep learning model.

AWS Trainium and AWS Inferentia2, which are purpose built for DL training and inference, extend their functionality and performance by supporting custom operators (or CustomOps, for short). AWS Neuron, the SDK that supports these accelerators, uses the standard PyTorch interface for CustomOps. Developers can easily get started with their existing code when using Trainium-based Amazon EC2 Trn1 instances or Inferentia2-based Amazon EC2 Inf2 instances. In this post, we cover the benefits of CustomOps, their efficient implementation on Trainium, and examples to get you started with CustomOps on Trainium-powered Trn1 instances.

To follow along, familiarity with core AWS services such as Amazon Elastic Compute Cloud (Amazon EC2) is implied, and basic familiarity with deep learning, PyTorch, and C++ would be helpful.

Custom operators in PyTorch and their benefits

CustomOps for PyTorch originated in version 1.10, called PyTorch C++ Frontend, and provided an easy-to-use mechanism to register CustomOps written in C++. The following are some of the benefits that CustomOps provide:

  • Performance optimization – CustomOps can be optimized for specific use cases, leading to faster model runs and improved performance.
  • Improved model expressiveness – With CustomOps, you can express complex computations that aren’t easily expressible using the built-in operators provided by PyTorch.
  • Increased modularity – You can use CustomOps as building blocks to create more complex models by creating C++ libraries of reusable components. This makes the development process easier and more modular, and facilitates rapid experimentation.
  • Increased flexibility – CustomOps enables operations beyond the built-in operators—that is, they provide a flexible way to define complex operations that aren’t implemented using the standard ones.

Trainium support for custom operators

Trainium (and AWS Inferentia2) supports CustomOps in software through the Neuron SDK and accelerates them in hardware using the GPSIMD engine (General Purpose Single Instruction Multiple Data engine). Let’s look at how these enable efficient CustomOps implementation and provide increased flexibility and performance when developing and innovating DL models.

Neuron SDK

The Neuron SDK helps developers train models on Trainium and deploy models on the AWS Inferentia accelerators. It integrates natively with frameworks, such as PyTorch and TensorFlow, so you can continue using your existing workflows and application code to train models on Trn1 instances.

The Neuron SDK uses the standard PyTorch interface for CustomOps. Developers can use the standard programming interface in PyTorch to write CustomOps in C++ and extend Neuron’s official operator support. Neuron then compiles these CustomOps to run efficiently on the GPSIMD engine, which is described in more detail in the following section. This makes it easy to implement new experimental CustomOps and accelerate them on purpose-built hardware, without any intimate knowledge of this underlying hardware.

General Purpose Single Instruction Multiple Data engine

At the core of Trainium optimizations resides the NeuronCore architecture, a fully independent, heterogeneous compute-unit with four main engines: tensor, vector, scalar, and the GPSIMD engine. The scalar and vector engines are highly parallelized and optimized for floating-point operations. The tensor engine is based on a power-optimized, systolic-array supporting mixed precision computation.

The GPSIMD engine is a general-purpose Single Instruction Multiple Data (SIMD) engine designed for running and accelerating CustomOps. This engine consists of eight fully programmable 512-bit wide general-purpose processors, which can run straight-line C-code and have direct inline access to the other NeuronCore-v2 engines, as well as the embedded SRAM and HBM memories. Together, these capabilities help run CustomOps efficiently on Trainium.

Take for example operators such as TopK, LayerNorm, or ZeroCompression, which read data from memory and only use it for a minimal number of ALU calculations. Regular CPU systems are completely memory bound for these calculations, and performance is limited by the time required to move the data into the CPU. In Trainium, the GP-SIMD engines are tightly coupled with the on-chip caches using a high bandwidth streaming interface, which can sustain 2 TB/sec of memory bandwidth. Therefore, CustomOps like these can be run really fast on Trainium.

Neuron SDK custom operators in practice

For this post, we assume a DLAMI (refer to instructions for either Ubuntu or Amazon Linux) is being used to instantiate an EC2 Trn1 instance (either 2x.large or 32x.large). Note all necessary software, drivers, and tools have already been installed on the DLAMIs, and only the activation of the Python environment is needed to start working with the tutorial. We reference the CustomOps functionality available in Neuron as “Neuron CustomOps.”

Similar to the process of PyTorch integration with C++ code, Neuron CustomOps requires a C++ implementation of an operator via a NeuronCore-ported subset of the Torch C++ API . The C++ implementation of the operator is called the kernel function, and the port of the C++ API contains everything required for CustomOps development and model integration, specifically tensor and scalar classes in c10 (a namespace used for low-level C++ code across different PyTorch libraries), and a subset of ATen operators (or Automatic Tensor, the C++ library that provides the core tensor operations used in PyTorch).

The torch.h header needs to be included when defining the kernel for you to have access to a NeuronCore-ported subset of the Pytorch C++ API:

#include <torch/torch.h>

Neuron CustomOps also require a shape function. The shape function has the same function signature as the kernel function, but doesn’t perform any computations. It only defines the shape of the output tensor but not the actual values.

Neuron CustomOps are grouped into libraries, and macros are used to register them with the NEURON_LIBRARY scope from within the shape function. The function will be run on the host at compilation time and will require the register.h header from the torchneuron library:

#include "torchneuron/register.h"

Finally, the custom library is built by calling the load API. If supplying the build_directory parameter, the library file will be stored in the indicated directory:

import torch_neuronx
from torch_neuronx.xla_impl import custom_op

custom_op.load(
name=name,# this is the name for the library(i.e, 'relu')
compute_srcs=['CustomOP.cpp'],
shape_srcs=['shape.cpp'],
build_directory*=*os.getcwd()
)

To use the CustomOp from a PyTorch model, simply load the library by calling the load_library API and call the Neuron CustomOp in the same manner that CustomOps are called in PyTorch via the torch.ops namespace. The format is usually torch.ops.<library_name>.<operator_name>. See the following code:

import torch
import torch_neuronx
from torch_neuronx.xla_impl import custom_op

custom_op.load_library('/home/user/libmy_ops.so')
out_tensor = torch.ops.my_lib.my_op(in_tensor)

Note that the custom_op.load API builds the C++ library, whereas the custom_op.load_library API loads an already-built library file.

Example: Neuron CustomOps in MLP training

To get started, perform the following steps:

  1. Create and launch your EC2 Trn1 instance. Be sure that you use a DLAMI image (either Ubuntu or Amazon Linux, pre-installed with all necessary Neuron software) and that you have specified a root volume size of 512 GB.
  2. After your instance is up and running, SSH to your instance.
  3. Install PyTorch Neuron (torch-neuronx) on your running Trn1 instance. For instructions, refer to Neuron Custom C++ Operators in MLP Training.
  4. Download the sample code from the GitHub repository.

Now that your environment is set up, continue through this post as we describe the implementation of a typical C++ CustomOp in Neuron in the form of Relu forward and backward functions to be used on a simple multilayer perceptron (MLP) model. The steps are described in the AWS Neuron Documentation.

The example code from the repository shows two folders:

  • ./customop_mlp/PyTorch – Contains the Relu code that will be compiled for a CPU
  • ./customop_mlp/neuron – Contains the Relu code that will be compiled for Trainium

Develop a Neuron CustomOp: The kernel function

The host or dev environment for the development of the kernel function (the Neuron CustomOp) can run PyTorch 1.13 and a C++17 compatible compiler in a Linux environment. This is the same as developing any C++ function for PyTorch, and the only libraries that need to be present in the development environment are those for PyTorch and C++. In the following example, we create a relu.cpp file with the custom Relu forward and backward functions:

#include <stdint.h>
#include <stdlib.h>
#include <torch/torch.h>

torch::Tensor relu_forward(const torch::Tensor& t_in) {
torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat);
auto t_in_acc = t_in.accessor<float, 2>();
auto t_out_acc = t_out.accessor<float, 2>();
auto shape = t_in.sizes();
for (int i = 0; i < shape[0]; i++) {
for (int j = 0; j < shape[1]; j++) {
t_out_acc[i][j] = t_in_acc[i][j] > 0.0 ? t_in_acc[i][j] : 0.0;
}
}
return t_out;
}

torch::Tensor relu_backward(const torch::Tensor& t_grad, const torch::Tensor& t_in) {
torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat);
auto t_in_acc = t_in.accessor<float, 2>();
auto t_grad_acc = t_grad.accessor<float, 2>();
auto t_out_acc = t_out.accessor<float, 2>();
auto shape = t_in.sizes();
for (int i = 0; i < shape[0]; i++) {
for (int j = 0; j < shape[1]; j++) {
t_out_acc[i][j] = t_in_acc[i][j] > 0.0 ? t_grad_acc[i][j] : 0.0;
}
}
return t_out;
}

When developing a Neuron CustomOp for Neuron, make sure you take into account the currently supported features and APIs. For more information, refer to Custom Operators API Reference Guide [Experimental].

Build and register the Neuron CustomOp: The shape function

The build for the Neuron CustomOp and runtime environment is the Trn1 instance where the training will take place, and the Neuron CustomOp will be compiled and registered as a neuronx-cc library and interpreted by the Neuron runtime to run on the highly optimized GP-SIMD engine.

To build and register the Neuron CustomOp, we need to create a shape function (shape.cpp) that will define the input and output tensors and register the operators: the relu_fwd_shape and relu_bwd_shape functions. See the following code:

#include <stdint.h>
#include <stdlib.h>
#include <torch/torch.h>
#include "torchneuron/register.h"

torch::Tensor relu_fwd_shape(torch::Tensor t_in) {
torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat);
return t_out;
}

torch::Tensor relu_bwd_shape(torch::Tensor t_grad, torch::Tensor t_in) {
torch::Tensor t_out = torch::zeros(t_in.sizes(), torch::kFloat);
return t_out;
}

NEURON_LIBRARY(my_ops, m) {
m.def("relu_forward", &relu_fwd_shape, "relu_forward");
m.def("relu_backward", &relu_bwd_shape, "relu_backward");
}

The relu_fwd_shape and relu_bwd_shape functions define the shape of the output tensor (to be the same size as the input tensor). Then we register the functions in the NEURON_LIBRARY scope.

In the ./customop_ml/neuron repository example, we have a build.py script to run the build and registration of the CustomOp, by simply calling the load function from the torch_neuronx.xla_impl package:

import os
import torch_neuronx
from torch_neuronx.xla_impl import custom_op

custom_op.load(
name='relu',
compute_srcs=['relu.cpp'],
shape_srcs=['shape.cpp'],
build_directory=os.getcwd()
)

In the build_directory, we should find the librelu.so library ready to be loaded and used in training our model.

Build the MLP model with the Neuron CustomOp

In this section, we go through the steps to build the MLP model with the Neuron CustomOp.

Define the Relu class

For a detailed explanation of how to train an MLP model, refer to Multi-Layer Perceptron Training Tutorial.

After we build the CustomOp, we create a Python package called my_ops.py, where we define a Relu PyTorch class, inheriting from the torch autograd function. The autograd function implements automatic differentiation, so that it can be used in a training loop.

First we load the librelu.so library, then we define the new class with the forward and backward functions defined with static method decorators. In this way, the methods can be called directly when we define the model. See the following code:

import torch
import torch_neuronx
from torch_neuronx.xla_impl import custom_op

custom_op.load_library('librelu.so')

class Relu(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
ctx.save_for_backward(input)
return torch.ops.my_ops.relu_forward(input)

@staticmethod
def backward(ctx, grad):
input, = ctx.saved_tensors
return torch.ops.my_ops.relu_backward(grad, input), None

Examine the MLP model

Now we’re ready to write our multilayer perceptron model with our Neuron CustomOp by importing the my_ops package where we have defined the Relu class:

import torch
import torch.nn as nn
from torch.nn import functional as F
import my_ops

# Declare 3-layer MLP for MNIST dataset
class MLP(nn.Module):
def __init__(self, input_size = 28 * 28, output_size = 10, layers = [120, 84]):
super(MLP, self).__init__()
self.fc1 = nn.Linear(input_size, layers[0])
self.fc2 = nn.Linear(layers[0], layers[1])
self.fc3 = nn.Linear(layers[1], output_size)

def forward(self, x):
f1 = self.fc1(x)
r1 = my_ops.Relu.apply(f1)
f2 = self.fc2(r1)
r2 = my_ops.Relu.apply(f2)
f3 = self.fc3(r2)
return torch.log_softmax(f3, dim=1)

Run the training script

Now we can train our model by using the train.py provided script:

import os
import time
import torch
from model import MLP

from torchvision.datasets import mnist
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

# XLA imports
import torch_xla.core.xla_model as xm

# Global constants
EPOCHS = 4
WARMUP_STEPS = 2
BATCH_SIZE = 32

# Load MNIST train dataset
train_dataset = mnist.MNIST(root='./MNIST_DATA_train',
train=True, download=True, transform=ToTensor())

def main():
# Prepare data loader
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE)

# Fix the random number generator seeds for reproducibility
torch.manual_seed(0)

# XLA: Specify XLA device (defaults to a NeuronCore on Trn1 instance)
device = 'xla'

# Move model to device and declare optimizer and loss function
model = MLP().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
loss_fn = torch.nn.NLLLoss()

# Run the training loop
print('----------Training ---------------')
model.train()
for epoch in range(EPOCHS):
start = time.time()
for idx, (train_x, train_label) in enumerate(train_loader):
optimizer.zero_grad()
train_x = train_x.view(train_x.size(0), -1)
train_x = train_x.to(device)
train_label = train_label.to(device)
output = model(train_x)
loss = loss_fn(output, train_label)
loss.backward()
optimizer.step()
xm.mark_step() # XLA: collect ops and run them in XLA runtime
if idx < WARMUP_STEPS: # skip warmup iterations
start = time.time()
# Compute statistics for the last epoch
interval = idx - WARMUP_STEPS # skip warmup iterations
throughput = interval / (time.time() - start)
print("Train throughput (iter/sec): {}".format(throughput))
print("Final loss is {:0.4f}".format(loss.detach().to('cpu')))

# Save checkpoint for evaluation
os.makedirs("checkpoints", exist_ok=True)
checkpoint = {'state_dict': model.state_dict()}
# XLA: use xm.save instead of torch.save to ensure states are moved back to cpu
# This can prevent "XRT memory handle not found" at end of test.py execution
xm.save(checkpoint,'checkpoints/checkpoint.pt')

print('----------End Training ---------------')

if __name__ == '__main__':
main()

By sending the model to the xla device, the model and Relu custom operator are compiled to be run by the Neuron runtime using the optimized Trainium hardware.

In this example, we showed how to create a custom Relu operator that takes advantage of the hardware engine (GP-SIMD) available on the Trainium ML accelerator chip. The result is a trained PyTorch model that can now be deployed for inferencing.

Conclusion

Modern state-of-the-art model architectures require an increasing number of resources from engineering staff (data scientists, ML engineers, MLOps engineers, and others) to actual infrastructure including storage, compute, memory, and accelerators. These requirements increase the cost and complexity of developing and deploying deep learning models. Trainium accelerators deliver a high-performance, low-cost solution for DL training in the cloud. The use of Trainium is facilitated by the Neuron SDK, which includes a deep learning compiler, runtime, and tools that are natively integrated into popular frameworks such as PyTorch and TensorFlow. (Note that at the time of writing, the Neuron SDK 2.9 only supports PyTorch for the development of custom operators.)

As demonstrated in this post, Trainium not only provides the means to train your models performantly and efficiently, but also offers the ability to customize your operators to add flexibility and expressiveness to both training and experimentation.

For more information, refer to the GitHub repo.


About the Authors

Lorea Arrizabalaga is a Solutions Architect aligned to the UK Public Sector, where she helps customers design ML solutions with Amazon SageMaker. She is also part of the Technical Field Community dedicated to hardware acceleration and helps with testing and benchmarking AWS Inferentia and AWS Trainium workloads.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt Amazon EC2 accelerated computing infrastructure for their machine learning needs.

Read More

Deliver your first ML use case in 8–12 weeks

Deliver your first ML use case in 8–12 weeks

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? You’re not alone. Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production.

This post describes how to implement your first ML use case using Amazon SageMaker in just 8–12 weeks by leveraging a methodology called Experience-based Acceleration (EBA).

Challenges

Customers may face several challenges when implementing machine learning (ML) solutions.

  • You may struggle to connect your ML technology efforts to your business value proposition, making it difficult for IT and business leadership to justify the investment it requires to operationalize models.
  • You may often select low-value use cases as proof of concept rather than solving a meaningful business or customer problem.
  • You may have gaps in skills and technologies, including operationalizing ML solutions, implementing ML services, and managing ML projects for rapid iterations.
  • Ensuring data quality, governance, and security may slow down or stall ML projects.

Solution overview: Machine Learning Experience-based Acceleration (ML EBA)

Machine learning EBA is a 3-day, sprint-based, interactive workshop (called a party) that uses SageMaker to accelerate business outcomes by guiding you through an accelerated and a prescriptive ML lifecycle. It starts with identifying business goals and ML problem framing, and takes you through data processing, model development, production deployment, and monitoring.

The following visual illustrates a sample ML lifecycle.

Sample Machine Learning Lifecycle

Two primary customer scenarios apply. The first is by using low-code or no-code ML services such as Amazon SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon SageMaker Autopilot, and Amazon SageMaker JumpStart to help data analysts prepare data, build models, and generate predictions. The second is by using SageMaker to help data scientists and ML engineers build, train, and deploy custom ML models.

We recognize that customers have different starting points. If you’re starting from scratch, it’s often simpler to begin with low-code or no-code solutions and gradually transition to developing custom models. In contrast, if you have an existing on-premises ML infrastructure, you can begin directly by using SageMaker to alleviate challenges with your current solution.

Through ML EBA, experienced AWS ML subject matter experts work side by side with your cross-functional team to provide prescriptive guidance, remove blockers, and build organizational capability for a continued ML adoption. This party steers you to solve a compelling business problem as opposed to thinking in terms of data and ML technology environments. Additionally, the party gets you started on driving material business value from untapped data.

ML EBA helps you to think big, start small, and scale fast. Although it creates a minimum viable ML model in 3 days, there are 4–6 weeks of preparation leading up to the EBA. Furthermore, you spend 4–6 weeks post-EBA to fine-tune the model with additional feature engineering and hyperparameter optimization before production deployment.

Let’s dive into what the whole process looks like and how you can use the ML EBA methodology to address the common blockers.

EBA prep (4–6 weeks)

In this section, we detail the 4–6 weeks of preparation leading up to the EBA.

6 weeks before the party: Problem framing and qualification

The first step is to frame and qualify the ML problem, which includes the following:

  • Identify the right business outcome – You must have a clear understanding of the problem you are trying to solve and the desired outcome you hope to achieve through the use of ML. You must be able to measure the business value gained against specific objectives and success criteria. Furthermore, you must be able to identify what should be observed, and what should be predicted. AWS works with you to help answer the following important questions before embarking on the ML EBA:
    • Does the ML use case solve a meaningful business problem?
    • Is it important enough to get the attention of business leadership?
    • Do you already have data to solve the ML use case?
    • Can the use case eventually be operationalized into production?
    • Does it really require ML?
    • Are there organizational processes in place for the business to use the model’s output?

The AI Use Case Explorer is a good starting point to explore the right use cases by industry, business function, or desired business outcome and discover relevant customer success stories.

  • Executive sponsorship – To help you move faster than you would have organically, AWS meets with the executive sponsor to confirm buy-in, remove internal obstacles, and commit resources. Additionally, AWS can offer financial incentives to help offset the costs for your first ML use case.
  • Meeting you where you are at in your ML journey – AWS assesses your current state—people, process, and technology. We help you detail requirements and dependencies; specifically, what teams and data are required to begin the journey successfully. Additionally, we provide recommendations on the technical path: starting with low-code or no-code services, or building a custom model using SageMaker.

5 weeks before the party: Workstream configuration and transition into action

The next step is to identify the teams needed to support the EBA effort. Commonly, the work is split up between the following workstreams:

  • Cloud engineering (infrastructure and security) – Focuses on verifying that the AWS accounts and infrastructure are set up and secure ahead of EBA. This includes AWS Identity and Access Management (IAM) or single sign-on (SSO) access, security guardrails, Amazon SageMaker Studio provisioning, automated stop/start to save costs, and Amazon Simple Storage Service (Amazon S3) set up.
  • Data engineering – Identifies the data sources, sets up data ingestion and pipelines, and prepares data using Data Wrangler.
  • Data science – The heart of ML EBA and focuses on feature engineering, model training, hyperparameter tuning, and model validation.
  • MLOps engineering – Focuses on automating the DevOps pipelines for operationalizing the ML use case. This may often be the same team as cloud engineering.
  • Leadership team – Responsible for orchestrating the effort, removing blockers, aligning with the executive sponsors, and is ultimately accountable for delivering the expected outcomes.

After these efforts have been completed, we must transition into action. A standard baseline 4-week timeline should be strictly adhered to make sure the EBA stays on track. Experienced AWS subject matter experts will guide and coach you through this preparation leading up to the EBA party.

4 weeks before the party: Inspire builders and curate a technical plan

Every customer is different; AWS helps you curate a technical plan of activities to be completed in the next 4 weeks leading up to the party.

AWS conducts Immersion Days to inspire your builders and build momentum for the party. An Immersion Day is a half or full day workshop with the right mix of presentation, hands-on labs, and Q&A to introduce AWS services or solutions. AWS will help you select the right Immersion Days from the AI/ML Workshops catalog.

We recognize that every builder in your organization is at a different level. We recommend that your builders use the ML ramp-up guide resources or digital or classroom training to start where they are at and build the necessary skills for the party.

3 weeks before the party: Tech prep focused on cloud and data engineering

Your cloud and data engineering teams should work on the following with guidance from AWS:

  • Create AWS accounts with network and security set up
  • Set up Amazon SageMaker Studio
  • Create Amazon S3 buckets to store data
  • Identify data sources (or producers)
  • Integrate external sources to dump data into S3 buckets

2 weeks before the party: Tech prep focused on data science

Your data science team should work on the following with guidance from AWS:

1 week before the party: Assess readiness (go/no-go)

AWS works with you to assess go/no-go readiness for technical activities, skills, and momentum for the party. Then we solidify the scope for the 3-day party, prioritizing progress over perfection.

EBA (3-day party)

Although the EBA party itself is customized for your organization, the recommended agenda for the 3 days is shown in the following table. You will learn by doing during the EBA with guidance from AWS subject matter experts.

. Day 1 Day 2 Day 3
Data Science

AM: Try AutoPilot or JumpStart models.

PM: Pick 1–2 models based on AutoPilot outcomes to experiment further.

Improve model accuracy:

  • In-depth feature engineering (example, PCA)
  • Hyperparameter optimization (HPO)

Quality assurance and validation with test data.

Deploy to production (inference endpoint).

Monitoring setup (model, data drift).

Data Engineering Explore using feature store for future ML use cases. Create a backlog of items for data governance and associated guardrails.
Cloud/MLOps Engineering Evaluate the MLOps framework solution library. Assess if this can be used for a repeatable MLOps framework. Identify gaps and create a backlog of things to enhance the solution library or create your own MLOps framework. Implement backlog items to create a repeatable MLOps framework. Continue implementing backlog items to create a repeatable MLOps framework.

Post-EBA

ML involves extensive experimentation, and it’s common to not reach your desired model accuracy during the 3-day EBA. Therefore, creating a well-defined backlog or path to production is essential, including improving model accuracy through experimentation, feature engineering, hyperparameter optimization, and production deployment. AWS will continue to assist you through production deployment.

Conclusion

By complementing ML EBA methodology with SageMaker, you can achieve the following results:

  • Move from pilot to production value in 8-12 weeks – Bring together business and technology teams to deploy the first ML use case to production in 8-12 weeks.
  • Build the organizational capability to speed up and scale ML across lines of business – The ML EBA inspires and up-skills builders with real work experience. It establishes a successful working model (a collaboration and iteration model) to sustain and scale ML initiatives across lines of business. It also creates reusable assets to speed up and scale ML in a repeatable way.
  • Reduce technical debt, pain points, and cost from existing on-premises ML models – The on-premises solutions may have challenges related to higher costs, inability to scale infrastructure, undifferentiated infrastructure management, and lack of advanced feature sets such as hyperparameter optimization, explainability for predictions, and more. Adoption of AWS ML services such as SageMaker reduces these issues.

Contact your AWS account team (Account Manager or Customer Solutions Manager) to learn more and get started.


About the Authors

Ritesh Shah is Senior Customer Solutions Manager at Amazon Web Services. He helps large US-Central enterprises accelerate their cloud-enabled transformation and build modern cloud-native solutions. He is passionate about accelerating customers’ ML journeys. In his free time, Ritesh enjoys spending time with his daughter, cooking, and learning something new, while also evangelizing cloud and ML. Connect with him on LinkedIn.

Nicholaus Lawson is a Solution Architect at AWS and part of the AIML specialty group. He has a background in software engineering and AI research. Outside of work, Nicholaus is often coding, learning something new, or woodworking. Connect with him on LinkedIn.

Read More

Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes

Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes

We recently introduced a new capability in the Amazon SageMaker Python SDK that lets data scientists run their machine learning (ML) code authored in their preferred integrated developer environment (IDE) and notebooks along with the associated runtime dependencies as Amazon SageMaker training jobs with minimal code changes to the experimentation done locally. Data scientists typically carry out several iterations of experimentation in data processing and training models while working on any ML problem. They want to run this ML code and carry out the experimentation with ease of use and minimal code change. Amazon SageMaker Model Training helps data scientists run fully managed large-scale training jobs on AWS’s compute infrastructure. SageMaker Training also helps data scientists with advanced tools such as Amazon SageMaker Debugger and Profiler to debug and analyze their large-scale training jobs.

For customers with small budgets, small teams, and tight timelines, every single new concept and line of code rewritten to run on SageMaker makes them less productive towards their core tasks, namely data processing and training ML models. They want to write code once in the framework of their choice and be able to move seamlessly from running code in their notebooks or laptops to running code at scale using SageMaker capabilities.

With this new capability of the SageMaker Python SDK, data scientists can onboard their ML code to the SageMaker Training platform in a few minutes. You just need to add a single line of code to your ML code, and SageMaker intelligently comprehends your code along with the datasets and workspace environment setup and runs it as a SageMaker Training job. You can then take advantage of the key capabilities of the SageMaker Training platform, like the ability to scale jobs easily, and other associated tools like Debugger and Profiler. In this release, you can run your local machine learning (ML) Python code as a single-node Amazon SageMaker training job or multiple parallel jobs. Distributed training jobs(across multiple nodes) are not supported by remote functions.

In this post, we show you how to use this new capability to run local ML code as a SageMaker Training job.

Solution overview

You can now run your ML code written in your IDE or notebook as a SageMaker Training job by annotating the function, which acts as an entry point to the user’s code base, with a simple decorator. Upon invocation, this capability automatically takes a snapshot of all the associated variables, functions, packages, environment variables, and other runtime requirements from your ML code, serializes them, and submits them as a SageMaker Training job. It integrates with the recently announced SageMaker Python SDK feature for setting default values for parameters. This capability simplifies the SageMaker constructs that you need to learn to be able to run code using SageMaker Training. Data scientists can write, debug, and iterate their code in any preferred IDE (such as Amazon SageMaker Studio, notebooks, VS Code, or PyCharm). When ready, you can annotate your Python function with the @remote decorator and run it as a SageMaker job at scale.

This capability takes familiar open-source Python objects as arguments and outputs. Furthermore, you don’t need to understand container lifecycle management and can simply run your workloads across different compute contexts (such as a local IDE, Studio, or training jobs) with minimal configuration overheads. To run any local code as a SageMaker Training job, this capability infers the configurations required to run jobs, such as the AWS Identity and Access Management (IAM) role, encryption key, and network configuration, from the Studio or IDE settings (which can be the default settings) and passes them to the platform by default. You have the flexibility to customize your runtime in the SageMaker managed infrastructure using the inferred configuration or override them at the SDK-level by passing them as arguments to the decorator.

This new capability of the SageMaker Python SDK transforms your ML code in an existing workspace environment and any associated data processing code and datasets into a SageMaker Training job. This capability looks for ML code wrapped inside a @remote decorator and automatically translates it into a job that runs in either Studio or a local IDE such as PyCharm.

In the following sections, we walk through the features of this new capability and how to launch python functions as SageMaker Training jobs.

Prerequisites

To use this new SageMaker Python SDK capability and run the code associated with this post, you need the following prerequisites:

  • An AWS account that will contain all your AWS resources
  • An IAM role to access SageMaker
  • Access to Studio or a SageMaker notebook instance or an IDE such as PyCharm

Use the SDK from Studio and SageMaker notebooks

You can use this capability from Studio by launching a notebook and wrapping your code with a @remote decorator inside the notebook. You first need to import the remote function using the following code:

from sagemaker.remote_function import remote

When you use the decorator function, this capability will automatically interpret the function of your code and run it as a SageMaker Training job.

You can also use this capability from a SageMaker notebook instance. You first need to start a notebook instance, open Jupyter or Jupyter Lab on it, and launch a notebook. Then import the remote function as shown in the preceding code and wrap your code with the @remote decorator. We include an example of how to use the decorator function and the associated settings later in this post.

Use the SDK from your local environment

You can also use this capability from your local IDE. As a prerequisite, you must have the AWS Command Line Interface (AWS CLI), SageMaker Python SDK, and AWS SDK for Python (Boto3) installed in your local environment. You need to import these libraries in your code, set the SageMaker session, specify settings, and decorate your function with the @remote decorator. In the following example code, we run a simple divide function as a SageMaker Training job:

import boto3
import sagemaker
from sagemaker.remote_function import remote

sm_session = sagemaker.Session(boto_session=boto3.session.Session(region_name="us-west-2"))
settings = dict(
    sagemaker_session=sm_session,
    role=<IAM_ROLE_NAME>
    instance_type="ml.m5.xlarge",
)
@remote(**settings)
def divide(x, y):
    return x / y
if __name__ == "__main__":
    print(divide(2, 3.0))

We can use a similar methodology to run advanced functions as training jobs, as shown in the next section.

Launch Python functions as SageMaker jobs

The new SageMaker Python SDK feature allows you to run Python functions as SageMaker Training jobs. Any Python code, ML training code developed by data scientists using their preferred local IDEs (PyCharm, VS Code), SageMaker notebooks, or Studio notebooks can be launched as a managed SageMaker job.

In ML workloads using this capability, associated datasets, dependencies, and workspace environment setups are serialized using the ML code and run as a SageMaker job synchronously and asynchronously.

You can add a @remote decorator annotation to any Python code including a local ML processing or training function to launch it as a managed SageMaker Training job, thereby taking advantage of the scale, performance, and cost benefits of SageMaker. This can be achieved with minimal code changes by adding a decorator to the Python function code. Invocation to the decorated function is run synchronously, and the function run waits until the SageMaker job is complete.

In the following example, we use the @remote decorator to launch SageMaker jobs in decorator mode using an ml.m5.large instance. SageMaker uses training jobs to launch this function as a managed job.

from sagemaker.remote_function import remote
from numpy as np

@remote(instance_type="ml.m5.large")
def matrix_multiply(a, b):
    return np.matmul(a, b)

a = np.array([[1, 0], [0, 1]])
b = np.array([1, 2])

assert matrix_multiply(a, b) == np.array([1,2])

You can also use decorator mode to launch SageMaker jobs, Python packages, and dependencies. You can include environment variables such as VPC, subnets, and security groups to launch SageMaker training jobs in the environment.yml file. This allows ML engineers and admins to configure these environment variables so data scientists can focus on ML model building and iterate faster. See the following code:

from sagemaker.remote_function import remote

@remote(instance_type="ml.g4dn.xlarge",dependencies = "./environment.yml")
def train_hf_model(
    train_input_path,test_input_path,s3_output_path = None,
    *,epochs = 1, train_batch_size = 32, eval_batch_size = 64,
    warmup_steps = 500,learning_rate = 5e-5
    ):  
    model_name = "distilbert-base-uncased"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    ... <TRUCNATED>
    return os.path.join(s3_output_path, model_dir), eval_result

You can use RemoteExecutor to launch Python functions as SageMaker jobs asynchronously. The executor asynchronously polls SageMaker Training jobs to update the status of the job. The RemoteExecutor class is an implementation of the concurrent.futures.Executor, which is used to submit SageMaker Training jobs asynchronously. See the following code:

from sagemaker.remote_function import RemoteExecutor

def train_hf_model(
    train_input_path,test_input_path,s3_output_path = None,
    *,epochs = 1, train_batch_size = 32, eval_batch_size = 64,
    warmup_steps = 500,learning_rate = 5e-5
    ):  
    model_name = "distilbert-base-uncased"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    ...<TRUNCATED>
    return os.path.join(s3_output_path, model_dir), eval_result


with RemoteExecutor(instance_type="ml.g4dn.xlarge", dependencies = './requirements.txt') as e:
    future = e.submit(divide, train_input_path,test_input_path,s3_output_path,
                      epochs, train_batch_size, eval_batch_size,warmup_steps,learning_rate)

Customize the runtime environment

Decorator mode and RemoteExecutor allow you to define and customize the runtime environments for the SageMaker job. The runtime dependencies, including Python packages and environment variables for SageMaker jobs, can be specified to customize the runtime. In order to run local Python code as SageMaker managed jobs, the Python package and dependencies need to be made available to SageMaker. ML engineers or data science administrators can configure networking and security configurations such as VPC, subnets, and security groups for SageMaker jobs, so data scientists can use these centrally managed configurations while launching SageMaker jobs. You can use either a requirements.txt file or a Conda environment.yaml file.

When dependencies are defined with requirements.txt, the packages will be installed using pip in the job runtime. If the image used for running the job comes with Conda environments, packages will be installed in the Conda environment declared to use for jobs. The following code shows an example requirements.txt file:

datasets
transformers
torch
scikit-learn
s3fs==0.4.2
sagemaker>=2.148.0

You can pass your Conda environment.yaml file to create the Conda environment you would like your code to run in during the training job. If the image used for running the job declares a Conda environment to run the code under, we will update the declared Conda environment with the given specification. The following code is an example of a Conda environment.yaml file:

name: sagemaker_example
channels:
  - conda-forge
dependencies:
  - python=3.10
  - pandas
  - pip:
      - sagemaker

Alternatively, you can set dependencies=”auto_capture” to let the SageMaker Python SDK capture the installed dependencies in the active Conda environment. You must have an active Conda environment for auto_capture to work. Note that there are prerequisites for auto_capture to work; we recommend that you pass in your dependencies as a requirement.txt or Conda environment.yml file as described in the previous section.

For more details, refer to Run your local code as a SageMaker Training job.

Configurations for SageMaker jobs

Infrastructure-related settings can be offloaded to a configuration file that admin users could help set up. You only need to set it up one time. Infrastructure settings cover the network configuration, IAM roles, Amazon Simple Storage Service (Amazon S3) folder for input, output data, and tags. Refer to Configuring and using defaults with the SageMaker Python SDK for more details.

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: path/to/requirements.txt
        EnvironmentVariables: {"EnvVarKey": "EnvVarValue"}
        ImageUri: 366666666666.dkr.ecr.us-west-2.amazonaws.com/my-image:latest
        InstanceType: ml.m5.large
        RoleArn: arn:aws:iam::366666666666:role/MyRole
        S3KmsKeyId: somekmskeyid
        S3RootUri: s3://my-bucket/my-project
        SecurityGroupIds:
          - sg123
        Subnets:
          - subnet-1234
        Tags:
          - {"Key": "someTagKey", "Value": "someTagValue"}
        VolumeKmsKeyId: somekmskeyid

Implementation

Deep learning models like PyTorch or TensorFlow can also be run within Studio by running the code as a training job within the notebook. To showcase this capability in Studio, you can clone this repo into your Studio and run the notebook located in the GitHub repository.

This example demonstrates an end-to-end binary text classification use case. We are using the Hugging Face transformers and datasets library to fine-tune a pre-trained transformer on binary text classification. In particular, the pre-trained model will be fine-tuned using the IMDb dataset.

When you clone the repository, you should locate the following files:

  • config.yaml – Most of the decorator arguments can be offloaded to the configuration file in order to separate out the infrastructure-related settings from the code base
  • huggingface.ipynb – This contains the code to train a pre-trained HuggingFace model, which will be fine-tuned using the IMDB dataset
  • requirements.txt – This file contains all the dependencies to run the function that will be used in this notebook for running the code and running the training remotely in a GPU instance as a training job

When you open the notebook, you will be prompted to set up the notebook environment. You can select the Data Science 3.0 image with the Python 3 kernel and ml.m5.large as the fast launch instance type for running the notebook code. This instance type is significantly faster in spinning up an environment.

The training job will be run in an ml.g4dn.xlarge instance as defined in the config.yaml file:

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        # role arn is not required if in SageMaker Notebook instance or SageMaker Studio
        # Uncomment the following line and replace with the right execution role if in a local IDE
        # RoleArn: <IAM_ROLE_ARN>
        InstanceType: ml.g4dn.xlarge
        Dependencies: ./requirements.txt

The requirements.txt file dependencies to run the function for training the Hugging Face model include the following:

datasets
transformers
torch
scikit-learn
# lock s3fs to this specific version as more recent ones introduce dependency on aiobotocore, which is not compatible with botocore
s3fs==0.4.2
sagemaker>=2.148.0,<3

The Hugging Face notebook showcases how to run the training remotely via the @remote function, which is run synchronously. Therefore, the function run for training the model will wait until the SageMaker Training job is complete. The training will be run remotely with a GPU instance wherein the instance type is defined in the preceding configuration file.

from sagemaker.remote_function import remote

@remote(s3_root_uri=s3_root_folder, keep_alive_period_in_seconds=600)
def train_hf_model(
    train_input_path,
    test_input_path,
    s3_output_path = None,
    *,
    epochs = 1,
    train_batch_size = 32,
    eval_batch_size = 64,
    warmup_steps = 500,
    learning_rate = 5e-5
):  
    model_dir = 'model'

    train_dataset = load_from_disk(train_input_path, keep_in_memory=True)
    test_dataset = load_from_disk(test_input_path, keep_in_memory=True)
    
    model_name = 'distilbert-base-uncased'
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    
    training_args = TrainingArguments(
        output_dir=model_dir,
        num_train_epochs=epochs,
        per_device_train_batch_size=train_batch_size,
        per_device_eval_batch_size=eval_batch_size,
        warmup_steps=warmup_steps,
        evaluation_strategy="epoch",
        logging_dir="logs/",
        learning_rate=float(learning_rate),
    )

    # create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        compute_metrics=compute_metrics,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        tokenizer=tokenizer,
    )
    
    print("Starting model training..")
    trainer.train()
        
    trainer.save_model(model_dir)

After you run the training job, you can run the rest of the cells in the notebook to inspect the evaluation metrics and classify the text on our trained model.

You can also view the training job status that got remotely triggered in the GPU instance on the SageMaker dashboard by navigating back to the SageMaker console.

As soon as the training job is complete, it continues to run the instructions in the notebook for evaluation and classification. Similar jobs can be trained and run via the remote executor function embedded within Studio notebooks to carry out the runs asynchronously.

Integration with SageMaker experiments inside a @remote function

You can pass your experiment name, run name, and other parameters into your remote function to create a SageMaker experiments run. The following code example imports the experiment name, the name of the run, and the parameters to log for each run:

from sagemaker.remote_function import remote
from sagemaker.experiments.run import Run
# Define your remote function
@remote
def train(value_1, value_2, exp_name, run_name):
...
...
#Creates the experiment
with Run(
  experiment_name=exp_name,
  run_name=run_name,
  sagemaker_session=sagemaker_session
) as run:
...
...
#Define values for the parameters to log
run.log_parameter("param_1", value_1)
run.log_parameter("param_2", value_2)
...
...
#Define metrics to log
run.log_metric("metric_a", 0.5)
run.log_metric("metric_b", 0.1)

# Invoke your remote function
train(1.0, 2.0, "my-exp-name", "my-run-name")  

In the preceding example, the parameters p1 and p2 are logged over time inside a training loop. Common parameters may include batch size or epochs. In the example, the metrics A and B are logged for a run over time inside a training loop. Common metrics may include accuracy or loss. For more information, see Create an Amazon SageMaker Experiment.

Conclusion

In this post, we introduced a new SageMaker Python SDK capability that enables data scientists to run their ML code in their preferred IDE as SageMaker Training jobs. We discussed the prerequisites needed to use this capability along with its features. We also showed how to use this capability in Studio, SageMaker notebook instances, and your local IDE. In addition, we provided sample code examples to demonstrate how to use this capability. As a next step, we recommend trying this capability in your IDE or SageMaker by following the code examples referenced in this post.


About the Authors

Dipankar Patro is a Software Development Engineer at AWS SageMaker, innovating and building MLOps solutions to help customers adopt AI/ML solutions at scale. He has an MS in Computer Science and his areas of interest are Computer Security, Distributed Systems and AI/ML.

Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

Manoj Ravi is a Senior Product Manager for Amazon SageMaker. He is passionate about building next-gen AI products and works on software and tools to make large-scale machine learning easier for customers. He holds an MBA from Haas School of Business and a Masters in Information Systems Management from Carnegie Mellon University. In his spare time, Manoj enjoys playing tennis and pursuing landscape photography.

Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Vikram Elango is a Sr. AI/ML Specialist Solutions Architect at AWS, based in Virginia, US. He is currently focused on generative AI, LLMs, prompt engineering, large model inference optimization, and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking, and camping.

Read More

Perform intelligent search across emails in your Google workspace using the Gmail connector for Amazon Kendra

Perform intelligent search across emails in your Google workspace using the Gmail connector for Amazon Kendra

Many organizations use Gmail for their business email needs. Gmail for Business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Google Docs, Google Sheets, and more. For any organization, emails contain a wealth of information, which could be within the subject of an email, the message content, or even email attachments. Performing an intelligent search on email interactions with coworkers can help find answers to questions, thereby improving employee productivity and enhancing the overall customer experience for the organization.

Amazon Kendra is a highly accurate and intelligent search service that allows your users to search unstructured and structured data using natural language processing (NLP) and advanced search algorithms. You can now use the Gmail connector for Amazon Kendra to index emails and email attachments in Gmail, and search for answers to your questions on this content using intelligent search in Amazon Kendra, powered by machine learning (ML).

This post walks you through the process of configuring the Gmail connector for Amazon Kendra for your organization’s Google Workspace, allowing you to index emails based on a defined scope and take advantage of the intelligent search capabilities of Amazon Kendra.

Solution overview

A data source is a data repository or location that Amazon Kendra connects to and indexes your documents or content. After you create an Amazon Kendra index, you can create one or many data sources and configure them to start ingesting documents from the data source. In our solution, we ingest emails and attachments from Gmail by configuring the new Gmail data source connector to filter for emails that meet a certain filter criterion. After the configuration is complete, we can synchronize the data source to index the documents, allowing you to perform intelligent search on the Amazon Kendra index.

Prerequisites

To enable the Gmail connector for Amazon Kendra, you need the following:

  • An AWS account
  • A Google Workspace account and an organization for your business with one or many users that have access to Gmail
  • Administrator account credentials to Google Workspace and the Google Cloud console

Configure Google Workspace

To enable Amazon Kendra to access and index emails from Gmail accounts within the organization and perform intelligent search on them, it’s essential to configure your organization’s Google Workspace. In the steps that follow, we create a service account that the Gmail connector uses to index emails. The service account is provided with authorization scopes to allow access to certain Gmail APIs. The authorization scopes express the permissions you request users to authorize for your app and are applicable for all emails within your organization’s Google Workspace.

  1. Log in to your organization’s Google Cloud account.
  2. Create a new project with an appropriate name and assign it to your organization. In our example, we name the project KendraGmailConnector.
  3. Choose Create.

  1. Monitor the progress of creation of the new project on the Notifications menu on the top right of the Google Cloud console.

  1. After the project is created, choose the options menu, choose API & Services¸ and choose Library to view the API Library.

  1. On the API Library, search for Admin SDK API and choose Enable. The Admin SDK API enables managing the Google Workspace account resources and audit usage.

  1. Similarly, search for Gmail API on the API Library page and choose Enable. The Gmail API can help in viewing and managing Gmail mailbox data like threads, messages, and labels.

We now create a service account, which the Gmail connector for Amazon Kendra uses to access your organization’s emails based on the allowed API scope.

  1. On the options menu, choose IAM & Admin, then choose Service Accounts.

  1. Choose Create service account.

  1. Enter a name for your service account. For this post, we name our service account AmazonKendraGmailConnector.
  2. Enter your service account ID and account description.
  3. Skip the optional steps Grant this service account access to project and Grant users access to this service account and choose Done.

  1. Choose the service account you created to open the service account details page.
  2. Note the unique ID for the service account (also known as a client ID), to use in a later step.

Next, we create keys for the service account, which allows it to be used by the Gmail connector for Amazon Kendra.

  1. On the Keys tab, choose Add key.

  1. For Key type, select JSON.
  2. Choose Create.

This step downloads the private key to your computer, which must be kept safe to allow configuration on the Amazon Kendra console.

  1. Choose Close.

The following screenshot shows an example of the credentials JSON file.

  1. On the Details tab, expand the Advanced settings section.
  2. Under Domain-wide delegation, choose View Google Workspace admin console.

Granting access to the service account via a domain-wide delegation to your organization’s data must be done with caution, and can be reversed by disabling or deleting the service account or removing access through the Google Workspace admin console.

  1. Log in to the admin console using your Google Workspace admin credentials.
  2. In the navigation pane, under Security, choose Access and data control, then choose API controls.
  3. In the Domain-wide delegation section, choose Manage domain-wide delegation.

  1. Choose Add new.

This brings up the Add a new client ID dialog.

  1. Enter the unique ID for the service account you created earlier, and enter the following scopes to allow the service account to access the emails from Gmail:
    1. https://www.googleapis.com/auth/gmail.readonly
    2. https://www.googleapis.com/auth/admin.directory.user.readonly
  2. Choose Authorize.

This concludes the configuration within the Google Cloud console and Google Workspace admin console.

Configure the Gmail connector for Amazon Kendra

In this section, we walk through the configuration steps for the Gmail connector for Amazon Kendra:

  1. On the Amazon Kendra console, create a new index or open an existing index. For this post, we use the existing index EnterpriseKendraIndex.

  1. Under Data management in the navigation pane, choose Data sources.
  2. Choose Add data source.

  1. On the list of data sources, find the Gmail connector and choose Add connector.

  1. On the Specify data source details page, complete the following steps:
    1. For Data source name, enter a name.
    2. For Description, enter an optional description.
    3. Leave the language as the default setting, English (en).

    Amazon Kendra supports a select set of languages with full semantic search. These languages include Spanish, Japanese, French, and others. For more information, see Adding documents in languages other than English.

    1. Add any tags to the index, then choose Next.

Next, we create an AWS Secrets Manager secret to store the Gmail authentication details, and use the values in the credentials JSON file that we downloaded earlier.

  1. On the Define access and security page, complete the following steps:
      1. In the Authentication section, choose Create and add new secret, which opens the Create an AWS Secrets Manager secret dialog.
      2. For Secret name, enter a name.
      3. For Client email, enter the client email ID from the credentials JSON file.
      4. For Admin account email, enter the admin email for the Google Cloud console.
      5. For Private key, enter the private key from the credentials JSON file.
      6. Choose Save to return to the Define access and security page.

      1. In the Configure VPC and security group section, you can choose a VPC and the subnets that will contain the data source and security group that will grant access to the host. For our configuration, we choose No VPC.
      2. In the IAM role section, choose Create a new role and enter a role name.
      3. Choose Next.

  1. On the Configure sync settings page, set the following parameters to sync all emails and email attachments sent from the admin email address:
    1. In the Sync scope section, select Message attachments.
    2. Under Additional configuration, configure filters for the emails to ingest into the Amazon Kendra index:
      1. For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
      2. For Email domains, enter the email from domains, email to domains, subject, CC, and BCC emails you wish to include or exclude in your index. For this post, we set the email from domain as the admin email address.
      3. For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects.
      4. For Labels, add regular expression patterns to include or exclude certain labels or attachment types (up to 100 patterns).
      5. For Attachments, add regular expression patterns to include or exclude certain attachments (up to 100 patterns).

    1. In the Sync mode section, you can either specify a full sync to sync and index all contents in all entities regardless of the previous sync status, or only sync new, modified, or deleted content. For this post, we select Full sync.

    1. Lastly, we set an appropriate frequency for the sync. For this post, we choose Run on demand.
    2. Choose Next.
  1. On the Set field mappings page, you associate or create a mapping of the required data source fields with fields in your index. You can also create mappings for custom index fields. You can specify mapping for both messages and message attachments. For this post, we add field mappings in the Message section:
    1. Select the Gmail field mappings subject, from, and to.
    2. Choose Next.

  1. On the Review and create page, review all the steps and choose Add data source to create your Gmail connector data source.
  2. After the data source is created, on the Data sources page, select the data source (kendra-gmail-connector) and choose Sync now.

The amount of time the sync takes depends on the number of the emails that match the sync scope and the size of attachments that need be indexed. You can check the status of the sync operation for the Gmail data source if you choose the data source and scroll down to the Sync run history section. Choose the status of the individual sync to view more details.

This section shows the start and end times of the sync and also the number of documents that were added, deleted, failed, or modified during the sync. A status of Completed denotes a sync where there are no failures. In cases where a document being ingested is blank, the sync status is set to Completed with Errors with the number of failed documents listed as Failed, as shown in the following screenshot. In case of a sync failure, you can investigate the reason by either choosing the number of failed documents or by choosing the entry in the Details column, which brings up the Amazon CloudWatch logs. In the following example, two documents failed ingestion because they were blank.

After the sync is successful, you can perform a search on the Amazon Kendra index.

Search indexed content

To search on the indexed content, choose Search indexed content in the navigation pane on the Amazon Kendra console.

On the search console, enter any natural language question. In our example, we ask “What is SageMaker.” Amazon Kendra performs an intelligent search on the emails ingested into the index based on the scope of the sync and finds an answer, as shown in the following screenshot.

In this example, the Document fields section shows the field mappings that we specified while configuring our data source connector.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Gmail connector, delete the added data source.

Conclusion

In this post, we showed how organizations can now use the Gmail connector for Amazon Kendra to allow users to perform intelligent search on emails and email attachments, thereby improving employee productivity and customer satisfaction.

Additionally, we walked through how to define field mappings to the Amazon Kendra data source, allowing users to refine their search results.

To learn more about the Gmail connector for Amazon Kendra, refer to Gmail data source connector for Amazon Kendra.


About the Author

Roshan Thomas is a Senior Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia, and works closely with power and utilities customers to accelerate their journey in the cloud. He is passionate about technology and helping customers architect and build solutions on AWS.

Read More

Amazon SageMaker Data Wrangler for dimensionality reduction

Amazon SageMaker Data Wrangler for dimensionality reduction

In the world of machine learning (ML), the quality of the dataset is of significant importance to model predictability. Although more data is usually better, large datasets with a high number of features can sometimes lead to non-optimal model performance due to the curse of dimensionality. Analysts can spend a significant amount of time transforming data to improve model performance. Additionally, large datasets are costlier and take longer to train. If time is a constraint, model performance may be limited as a result.

Dimension reduction techniques can help reduce the size of your data while maintaining its information, resulting in quicker training times, lower cost, and potentially higher-performing models.

Amazon SageMaker Data Wrangler is a purpose-built data aggregation and preparation tool for ML. Data Wrangler simplifies the process of data preparation and feature engineering like data selection, cleansing, exploration, and visualization from a single visual interface. Data Wrangler has more than 300 preconfigured data transformations that can effectively be used in transforming the data. In addition, you can write custom transformation in PySpark, SQL, and pandas.

Today, we’re excited to add a new transformation technique that is commonly used in the ML world to the list of Data Wrangler pre-built transformations: dimensionality reduction using Principal Component Analysis. With this new feature, you can reduce the high number of dimensions in your datasets to one that can be used with popular ML algorithms with just a few clicks on the Data Wrangler console. This can have significant improvements in your model performance with minimal effort.

In this post, we provide an overview of this new feature and show how to use it in your data transformation. We will show how to use dimensionality reduction on large sparse datasets.

Overview of Principal Component Analysis

Principal Component Analysis (PCA) is a method by which the dimensionality of features can be transformed in a dataset with many numerical features into one with fewer features while still retaining as much information as possible from the original dataset. This is done by finding a new set of features called components, which are composites of the original features that are uncorrelated with one another. Several features in a dataset often have less impact on the final result and may increase the processing time of ML models. It can become difficult for humans to understand and solve such high-dimensional problems. Dimensionality reduction techniques like PCA can help solve this for us.

Solution overview

In this post, we show how you can use the dimensionality reduction transform in Data Wrangler on the MNIST dataset to reduce the number of features by 85% and still achieve similar or better accuracy than the original dataset. The MNIST (Modified National Institute of Standards and Technology) dataset, which is the de facto “hello world” dataset in computer vision, is a dataset of handwritten images. Each row of the dataset corresponds to a single image that is 28 x 28 pixels, for a total of 784 pixels. Each pixel is represented by a single feature in the dataset with a pixel value ranging from 0–255.

To learn more about the new dimensionality reduction feature, refer to Reduce Dimensionality within a Dataset.

Prerequisites

This post assumes that you have an Amazon SageMaker Studio domain set up. For details on how to set it up, refer to Onboard to Amazon SageMaker Domain Using Quick setup.

To get started with the new capabilities of Data Wrangler, open Studio after upgrading to the latest release and choose the File menu, New, and Flow, or choose New data flow from the Studio launcher.

Perform a Quick Model analysis

The dataset we use in this post contains 60,000 training examples and labels. Each row consists of 785 values: the first value is the label (a number from 0–9) and the remaining 784 values are the pixel values (a number from 0–255). First, we perform a Quick Model analysis on the raw data to get performance metrics and compare them with the model metrics post-PCA transformations for evaluation. Complete the following steps:

  1. Download the MNIST dataset training dataset.
  2. Extract the data from the .zip file and upload into an Amazon Simple Storage Service (Amazon S3) bucket.
  3. In Studio, choose New and Data Wrangler Flow to create a new Data Wrangler flow.
    Data Wrangler Flow
  4. Choose Import data to load the data from Amazon S3.
    Import Data
  5. Choose Amazon S3 as your data source.
    Choose S3 data connection
  6. Select the dataset uploaded to your S3 bucket.
  7. Leave the default settings and choose Import.
    Import from S3

After the data is imported, Data Wrangler automatically validates the datasets and detects the data types for all the columns based on its sampling. In the MNIST dataset, because all the columns are long, we leave this step as is and go back to the data flow.

  1. Choose Data flow at the top of the Data types page to return to the main data flow.
    Dataset Data Types

The flow editor now shows two blocks showcasing that the data was imported from a source and the data types recognized. You can also edit the data types if needed.

After confirming that the data quality is acceptable, we go back to the data flow and use Data Wrangler’s Data Quality and Insights Report. This report performs an analysis on the imported dataset and provides information about missing values, outliers, target leakage, imbalanced data, and a Quick Model analysis. Refer to Get Insights On Data and Data Quality for more information.

For this analysis, we only focus on the Quick Model part of the Data Quality report.

  1. Choose the plus sign next to Data types, then choose Add analysis.
  2. For Analysis type¸ choose Data Quality And Insights Report.
  3. For Target column, choose label.
  4. For Problem type, select Classification (this step is optional).
  5. Choose Create.
    Create Analysis

For this post, we use the Data Quality and Insights Report to show how the model performance is mostly preserved using PCA. We recommend that you use a deep learning-based approach for better performance.

The following screenshot shows a summary of the dataset from the report. Fortunately, we don’t have any missing values. The time taken for the report to generate depends on the size of the dataset, number of features, and the instance size used by Data Wrangler.

Data Quality Summary

The following screenshot shows how the model performed on the raw dataset. Here we notice that the model has an accuracy of 93.7% utilizing 784 features.

Quick Model Before PCA

Use the Data Wrangler dimensionality reduction transform

Now let’s use the Data Wrangler dimensionality reduction transform to reduce the number of features in this dataset.

  1. On the data flow page, choose the plus sign next to Data types, then choose Add transform.
  2. Choose Add step.
    Add Step
  3. Choose Dimensionality Reduction.
    Dimensionality Reduction

If you don’t see the dimensionality reduction option listed, you need to update Data Wrangler. For instructions, refer to Update Data Wrangler.

  1. Configure the key variables that go into PCA:
    1. For Transform, choose the dimensionality reduction technique that you want to use. For this post, we choose Principal component analysis.
    2. For Input Columns, choose the columns that you want to include in the PCA analysis. For this example, we choose all the features except the target column label (you can also use the Select all feature to select all features and deselect features not needed). These columns need to be of numeric data type.
    3. For Number of principal components, specify the number of target dimensions.
    4. For Variance threshold percentage, specify the percentage of variation in the data that you want to explain by the principal components. The default value is 95; for this post, we use 80.
    5. Select Center to center the data with the mean before scaling.
    6. Select Scale to scale the data with the unit standard deviation.
      PCA gives more emphasis to variables with high variance. Therefore, if the dimensions are not scaled, we will get inconsistent results. For example, the value for one variable might lie in the range of 50–100, and another variable is 5–10. In this case, PCA will give more weight to the first variable. Such issues can be resolved by scaling the dataset before applying PCA.
    7. For Output Format, specify if you want to output components into separate columns or vectors. For this post, we choose Columns.
    8. For Output column, enter a prefix for column names generated by PCA. For this post, we enter PCA80_.
  2. Choose Preview to preview the data, then choose Update.

After applying PCA, the number of columns will be reduced from 784 to 115—this is an 85% reduction in the number of features.

We can now use the transformed dataset and generate another Data Quality and Insights Report as shown in the following screenshot to observe the model performance.

Quick Model After PCA

We can see in the second analysis that the model performance has improved and accuracy increased to 91.8% compared to the first Quick Model report. PCA reduced the number of features in our dataset by 85% while maintaining the model accuracy at similar levels.

Based on the Quick Model analysis from the report, model performance is at 91.8%. With PCA, we reduced the columns by 85% while still maintaining the model accuracy at similar levels. For better results, you can try deep learning models, which might offer even better performance.

We found the following comparison in training time using Amazon SageMaker Autopilot with and without PCA dimensionality reduction:

  • With PCA dimensional reduction – 25 minutes
  • Without PCA dimensional reduction – 45 minutes

Operationalizing PCA

As data changes over time, it’s often desirable to retrain our parameters to new unseen data. Data Wrangler offers this capability through the use of refitting parameters. For more information on refitting trained parameters, refer to Refit trained parameters on large datasets using Amazon SageMaker Data Wrangler.

Previously, we applied PCA to a sample of the MNIST dataset containing 50,000 sample rows. Consequently, our flow file contains a model that has been trained on this sample and used for all created jobs unless we specify that we want to relearn those parameters.

To refit your model parameters on the MNIST training dataset, complete the following steps:

  1. Create a destination for our flow file in Amazon S3 so we can create a Data Wrangler processing job.
    Data Wrangler Processing Job
  2. Create a job and select Refit to learn new training parameters.

The Trained parameters section shows that there are 784 parameters. That is one parameter for each column because we excluded the label column in our PCA reduction.

Note that if we don’t select Refit in this step, the trained parameters learned during interactive mode will be used.

Create Job

  1. Create the job.
    Job Created
  2. Choose the processing job link to monitor the job and find the location of the resulting flow file on Amazon S3.
    Processing Job Flow

This flow file contains the model learned on the entire MNIST train dataset.

  1. Load this file into Data Wrangler.

Clean up

To clean up the environment so you don’t incur additional charges, delete the datasets and artifacts in Amazon S3. Additionally, delete the data flow file in Studio and shut down the instance it runs on. Refer to Shut Down Data Wrangler for more information.

Conclusion

Dimensionality reduction is a great technique to remove the unwanted variables from a model. It can be used to reduce the model complexity and noise in the data, thereby mitigating the common problem of overfitting in machine learning and deep learning models. In this blog we demonstrated that by reducing the number of features, we were still able to accomplish similar or higher accuracy for our models.

For more information about using PCA, refer to Principal Component Analysis (PCA) Algorithm. To learn more about the dimensionality reduction transform, refer to Reduce Dimensionality within a Dataset.


About the authors

Adeleke Coker is a Global Solutions Architect with AWS. He works with customers globally to provide guidance and technical assistance in deploying production workloads at scale on AWS. In his spare time, he enjoys learning, reading, gaming and watching sport events.

Abigail is a Software Development Engineer at Amazon SageMaker. She is passionate about helping customers prepare their data in DataWrangler and building distributed machine learning systems. In her free time, Abigail enjoys traveling, hiking, skiing, and baking.

Vishaal Kapoor is a Senior Applied Scientist with AWS AI. He is passionate about helping customers understand their data in Data Wrangler. In his spare time, he mountain bikes, snowboards, and spends time with his family.

Raviteja Yelamanchili is an Enterprise Solutions Architect with Amazon Web Services based in New York. He works with large financial services enterprise customers to design and deploy highly secure, scalable, reliable, and cost-effective applications on the cloud. He brings over 11+ years of risk management, technology consulting, data analytics, and machine learning experience. When he is not helping customers, he enjoys traveling and playing PS5.

Read More

Identify objections in customer conversations using Amazon Comprehend to enhance customer experience without ML expertise

Identify objections in customer conversations using Amazon Comprehend to enhance customer experience without ML expertise

According to a PWC report, 32% of retail customers churn after one negative experience, and 73% of customers say that customer experience influences their purchase decisions. In the global retail industry, pre- and post-sales support are both important aspects of customer care. Numerous methods, including email, live chat, bots, and phone calls, are used to provide customer assistance. Since conversational AI has improved in recent years, many businesses have adopted cutting-edge technologies like AI-powered chatbots and AI-powered agent support to improve customer service while increasing productivity and lowering costs.

Amazon Comprehend is a fully managed and continuously trained natural language processing (NLP) service that can extract insight about the content of a document or text. In this post, we explore how AWS customer Pro360 used the Amazon Comprehend custom classification API, which enables you to easily build custom text classification models using your business-specific labels without requiring you to learn machine learning (ML), to improve customer experience and reduce operational costs.

Pro360: Accurately detect customer objections in chatbots

Pro360 is a marketplace that aims to connect specialists with industry-specific talents with potential clients, allowing them to find new opportunities and expand their professional network. It allows customers to communicate directly with experts and negotiate a customized price for their services based on their individual requirements. Pro360 charges clients when successful matches occur between specialists and clients.

Pro360 had to deal with a problem related to unreliable charges that led to consumer complaints and reduced trust with the brand. The problem was that it was difficult to understand the customer’s objective during convoluted conversations filled with multiple aims, courteous denials, and indirect communication. Such conversations were leading to erroneous charges that reduced customer satisfaction. As an example, a customer may start a conversation and stop immediately, or end the conversation by politely declining by saying “I am busy” or “Let me chew on it.” Also, due to cultural differences, some customers might not be used to expressing their intentions clearly, particularly when they want to say “no.” This made it even more challenging.

To solve this problem, Pro360 initially added options and choices for the customer, such as “I would like more information” or “No, I have other options.” Instead of typing their own question or query, the customer simply chooses the options provided. Nonetheless, the problem was still not solved because customers preferred to speak plainly and in their own natural language while interacting with the system. Pro360 identified that the problem was a result of rules-based systems, and that moving to an NLP-based solution would result in a better understanding of customer intent, and lead to better customer satisfaction.

Custom classification is a feature of Amazon Comprehend, which allows you to develop your own classifiers using small datasets. Pro360 utilized this feature to build a model with 99.2% accuracy by training on 800 data points and testing on 300 data points. They followed a three-step approach to build and iterate the model to achieve their desired level of accuracy from 82% to 99.3%. Firstly, Pro360 defined two classes, reject and non-reject, that they wanted to use for classification. Secondly, they removed irrelevant emojis and symbols such as ~ and ... and identified negative emojis to improve the model’s accuracy. Lastly, they defined three additional content classifications to improve the misidentification rate, including small talk, ambiguous response, and reject with a reason, to further iterate the model.

In this post, we share how Pro360 utilized Amazon Comprehend to track down consumer objections during discussions and used a human-in-the-loop (HITL) mechanism to incorporate customer feedback into the model’s improvement and accuracy, demonstrating the ease of use and efficiency of Amazon Comprehend.

“Initially, I believed that implementing AI would be costly. However, the discovery of Amazon Comprehend enables us to efficiently and economically bring an NLP model from concept to implementation in a mere 1.5 months. We’re grateful for the support provided by the AWS account team, solution architecture team, and ML experts from the SSO and service team.”

– LC Lee, founder and CEO of Pro360.

Solution overview

The following diagram illustrates the solution architecture covering real-time inference, feedback workflow, and human review workflow, and how those components contribute to the Amazon Comprehend training workflow.

solution architecture

In the following sections, we walk you through each step in the workflow.

Real-time text classification

To use Amazon Comprehend custom classification in real time, you need to deploy an API as the entry point and call an Amazon Comprehend model to conduct real-time text classification. The steps are as follows:

  1. The client side calls Amazon API Gateway as the entry point to provide a client message as input.
  2. API Gateway passes the request to AWS Lambda and calls the API from Amazon DynamoDB and Amazon Comprehend in Steps 3 and 4.
  3. Lambda checks the current version of the Amazon Comprehend endpoint that stores data in DynamoDB, and calls an Amazon Comprehend endpoint to get real-time inference.
  4. Lambda, with a built-in rule, checks the score to determine whether it’s under the threshold or not. It then stores that data in DynamoDB and waits for human approval to confirm the evaluation result.

Feedback workflow

When the endpoint returns the classification result to the client side, the application prompts the end-user with a hint to get their feedback, and stores the data in the database for the next round (the training workflow). The steps for the feedback workflow are as follows:

  1. The client side sends the user feedback by calling API Gateway.
  2. API Gateway bypasses the request to Lambda. Lambda checks the format and stores it in DynamoDB.
  3. The user feedback from Lambda is stored in DynamoDB and will be used for the next training process.

Human review workflow

The human review process helps us clarify data with a confidence score below the threshold. This data is valuable for improving the Amazon Comprehend model, and is added to the next iteration of retraining. We used Elastic Load Balancing as the entry point to conduct this process because the Pro360 system is built on Amazon Elastic Complute Cloud (Amazon EC2). The steps for this workflow are as follows:

  1. We use an existing API on the Elastic Load Balancer as the entry point.
  2. We use Amazon EC2 as the compute resource to build a front-end dashboard for the reviewer to tag the input data with lower confidence scores.
  3. After the reviewer identifies the objection from the input data, we store the result in a DynamoDB table.

Amazon Comprehend training workflow

To start the training the Amazon Comprehend model, we need to prepare the training data. The following steps show you how to train the model:

  1. We use AWS Glue to conduct extract, transform, and load (ETL) jobs and merge the data from two different DynamoDB tables and store it in Amazon Simple Storage Service (Amazon S3).
  2. When the Amazon S3 training data is ready, we can trigger AWS Step Functions as the orchestration tool to run the training job, and we pass the S3 path into the Step Functions state machine.
  3. We invoke a Lambda function to validate that the training data path exists, and then trigger an Amazon Comprehend training job.
  4. After the training job starts, we use another Lambda function to check the training job status. If the training job is complete, we get the model metric and store it in DynamoDB for further evaluation.
  5. We check the performance of the current model with a Lambda model selection function. If the current version’s performance is better than the original one, we deploy it to the Amazon Comprehend endpoint.
  6. Then we invoke another Lambda function to check the endpoint status. The function updates information in DynamoDB for real-time text classification when the endpoint is ready.

Summary and next steps

In this post, we showed how Amazon Comprehend enables Pro360 to build an AI-powered application without ML expert practitioners, which is able to increase the accuracy of customer objection detection. Pro360 was able to build a custom-purposed NLP model in just 1.5 months, and now is able to identify 90% of customer polite rejections and detect customer intent with 99.2% overall accuracy. This solution not only enhances the customer experience, increasing 28.5% retention rate growth, but also improves financial outcomes, decreasing the operation cost by 8% and reducing the workload for customer service agents.

However, identifying customer objections is just the first step in improving the customer experience. By continuing to iterate on the customer experience and accelerate revenue growth, the next step is to identify the reasons for customer objections, such as lack of interest, timing issues, or influence from others, and to generate the appropriate response to increase the sales conversion rate.

To use Amazon Comprehend to build custom text classification models, you can access the service through the AWS Management Console. To learn more about how to use Amazon Comprehend, check out Amazon Comprehend developer resources.


About the Authors

Ray Wang is a Solutions Architect at AWS. With 8 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, and machine learning. As a hungry go-getter, he passed all 12 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.

Josie Cheng is a HKT AI/ML Go-To-Market at AWS. Her current focus is on business transformation in retail and CPG through data and ML to fuel tremendous enterprise growth. Before joining AWS, Josie worked for Amazon Retail and other China and US internet companies as a Growth Product Manager.

Shanna Chang is a Solutions Architect at AWS. She focuses on observability in modern architectures and cloud-native monitoring solutions. Before joining AWS, she was a software engineer. In her spare time, she enjoys hiking and watching movies.

Wrick Talukdar is a Senior Architect with the Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.

Read More

Create SageMaker Pipelines for training, consuming and monitoring your batch use cases

Create SageMaker Pipelines for training, consuming and monitoring your batch use cases

Batch inference is a common pattern where prediction requests are batched together on input, a job runs to process those requests against a trained model, and the output includes batch prediction responses that can then be consumed by other applications or business functions. Running batch use cases in production environments requires a repeatable process for model retraining as well as batch inference. That process should also include monitoring that model to measure performance over time.

In this post, we show how to create repeatable pipelines for your batch use cases using Amazon SageMaker Pipelines, Amazon SageMaker model registry, SageMaker batch transform jobs, and Amazon SageMaker Model Monitor. This solution highlights the ability to use the fully managed features within SageMaker MLOps to reduce operational overhead through fully managed and integrated capabilities.

Solution overview

There are multiple scenarios for performing batch inference. In some cases, you may be retraining your model every time you run batch inference. Alternatively, you may be training your model less frequently than you are performing batch inference. In this post, we focus on the second scenario. For this example, let’s assume you have a model that is trained periodically, roughly one time per month. However, batch inference is performed against the latest model version on a daily basis. This is a common scenario, in which the model training lifecycle is different than the batch inference lifecycle.

The architecture supporting the introduced batch scenario contains two separate SageMaker pipelines, as shown in the following diagram.

We use the first pipeline to train the model and baseline the training data. We use the generated baseline for ongoing monitoring in the second pipeline. The first pipeline includes the steps needed to prepare data, train the model, and evaluate the performance of the model. If the model performs acceptably according to the evaluation criteria, the pipeline continues with a step to baseline the data using a built-in SageMaker Pipelines step. For the data drift Model Monitor type, the baselining step uses a SageMaker managed container image to generate statistics and constraints based on your training data. This baseline is then used to monitor for signals of data drift during batch inference. Finally, the first pipeline completes when a new model version is registered into the SageMaker model registry. At this point, the model can be approved automatically, or a secondary manual approval can be required based on a peer review of model performance and any other identified criteria.

In the second pipeline, the first step queries the model registry for the latest approved model version and runs the data monitoring job, which compares the data baseline generated from the first pipeline with the current input data. The final step in the pipeline is performing batch inference against the latest approved model.

The following diagram illustrates the solution architecture for each pipeline.

For our dataset, we use a synthetic dataset from a telecommunications mobile phone carrier. This sample dataset contains 5,000 records, where each record uses 21 attributes to describe the customer profile. The last attribute, Churn, is the attribute that we want the ML model to predict. The target attribute is binary, meaning the model predicts the output as one of two categories (True or False).

The following GitHub repo contains the code for demonstrating the steps performed in each pipeline. It contains three notebooks: to perform the initial setup, to create the model train and baseline pipeline, and create the batch inference and Model Monitor pipeline. The repository also includes additional Python source code with helper functions, used in the setup notebook, to set up required permissions.

|-Custom_IAM_policies
	| |—Custom_IAM_roles_policy
	| |—Custom_Lambda_policy
|— pipeline_scripts
	| |— evaluate.py
	| |— preprocessing.py
|— 0.Setup.ipynb
|— 1.SageMakerPipeline-BaselineData-Train.ipynb
|— 2.SageMakerPipeline-ModelMonitoring-DataQuality-BatchTransform.ipynb
|— iam_helper.py
|— lambda_getapproved_model.py

Prerequisites

The following screenshot lists some permission policies that are required by the SageMaker execution role for the workflow. You can enable these permission policies through AWS Identity and Access Management (IAM) role permissions.

AmazonSageMaker-ExecutionPolicy-<...> is the execution role associated with the SageMaker user and has the necessary Amazon Simple Storage Service (Amazon S3) bucket policies. Custom_IAM_roles_policy and Custom_Lambda_policy are two custom policies created to support the required actions for the AWS Lambda function. To add the two custom policies, go to the appropriate role (associated with your SageMaker user) in IAM, click on Add permissions and then Create inline policy. Then, choose JSON inside Create policy, add the policy code for first custom policy and save the policy. Repeat the same for the second custom policy.

0.Setup.ipynb is a prerequisite notebook required before running notebooks 1 and 2. The code sets up the S3 paths for pipeline inputs, outputs, and model artifacts, and uploads scripts used within the pipeline steps. This notebook also uses one of the provided helper functions, create_lambda_role, to create a Lambda role that is used in notebook 2, 2.SageMakerPipeline-ModelMonitoring-DataQuality-BatchTransform.ipynb. See the following code:

# Create Lambda execution role for Lambda Function using helper function
from iam_helper import create_lambda_role

lambda_role = create_lambda_role("Lambda-SageMaker-GetModelRole")
print('Lambda Role:', lambda_role)

After you’ve successfully completed all of the tasks in the setup notebook, you’re ready to build the first pipeline to train and baseline the model.

Pipeline 1: Train and baseline pipeline

In this section, we take a deep dive into the SageMaker pipeline used to train and baseline the model. The necessary steps and code are in the 1.SageMakerPipeline-BaselineData-Train.ipynb notebook. This pipeline takes the raw customer churn data as input, and then performs the steps required to prepare the data, train the model, evaluate the model, baseline the model, and register the model in the model registry.

To build a SageMaker pipeline, you configure the underlying job (such as SageMaker Processing), configure the pipeline steps to run the job, and then configure and run the pipeline. We complete the following steps:

  1. Configure the model build pipeline to prepare the data, train the model, and evaluate the model.
  2. Configure the baseline step for the data drift with Model Monitor.
  3. Configure steps to package the model and register the model version.
  4. Configure a conditional step to evaluate model performance.

Configure the model build pipeline

The model build pipeline is a three-step process:

  1. Prepare the data.
  2. Train the model.
  3. Evaluate the model.

To prepare the data, we configure a data processing step. This step runs a SageMaker Processing job, using the built-in ProcessingStep, to prepare the raw data on input for training and evaluation.

To train the model, we configure a training job step. This step runs a SageMaker Training job, using the built-in TrainingStep. For this use case, we perform binary classification using XGBoost. The output of this step is a model artifact, model.tar.gz, stored in Amazon S3.

The last step is responsible for evaluating model performance using the test holdout dataset. This step uses the built-in ProcessingStep with the provided code, evaluation.py, to evaluate performance metrics (accuracy, area under curve).

Configure the baseline step

To monitor the model and data, a baseline is required.

Monitoring for data drift requires a baseline of training data. The baseline step uses Pipelines’ built-in QualityCheckStep. This step automatically runs a SageMaker Processing job that uses the Model Monitor pre-built container image. We use this same container image for the baselining as well as the model monitoring; however, the parameters used during configuration of this step direct the appropriate behavior. In this case, we are baselining the data, so we need to ensure that the quality_check_config parameter is using DataQualityCheckConfig, which identifies the S3 input and output paths. We’re also setting register_new_baseline and skip_check to true. When these values are both set to true, it tells SageMaker to run this step as a baseline job and create a new baseline. To get a better understanding of the parameters that control the behavior of the SageMaker pre-built container image, refer to Baseline calculation, drift detection and lifecycle with ClarifyCheck and QualityCheck steps in Amazon SageMaker Model Building Pipelines.

See the following code:

# Configure the Data Quality Baseline Job

# Configure the transient compute environment
check_job_config = CheckJobConfig(
    role=role_arn,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    volume_size_in_gb=120,
    sagemaker_session=session,
)

# Configure the data quality check input (training data), dataset format, and S3 output path
data_quality_check_config = DataQualityCheckConfig(
    baseline_dataset=data_preparation_step.properties.ProcessingOutputConfig.Outputs['train'].S3Output.S3Uri,
    dataset_format=DatasetFormat.csv(header=False, output_columns_position="START"),
    output_s3_uri=Join(on='/', values=['s3:/', bucket, bucket_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'dataqualitycheckstep'])
)

# Configure Pipeline Step - 'QualityCheckStep'
baseline_model_data_step = QualityCheckStep(
        name="DataQualityCheckStep",
        # skip_check, indicates a baselining job
        skip_check=True,
        register_new_baseline=True,
        quality_check_config=data_quality_check_config,
        check_job_config=check_job_config,
        model_package_group_name=model_package_group_name
    )

This step generates two JSON files as output:

  • statistics.json – Contains calculated statistics for each feature of the training dataset
  • constraints.json – Suggests data constraints based on the statistics collected

These constraints can also be modified and are used to detect signals of drift during model monitoring.

Configure steps to package and register the model version

Next, we configure the steps to package for deployment and register the model in the model registry using two additional pipeline steps.

The package model step packages the model for use with the SageMaker batch transform deployment option. model.create() creates a model entity, which will be included in the custom metadata registered for this model version and later used in the second pipeline for batch inference and model monitoring. See the following code:

# Configure step to package model for inference using Model object, model.create(

step_args = model.create()
    instance_type="ml.m5.large",
    accelerator_type="ml.eia1.medium",
)

create_model_step = ModelStep(
    name="CustomerChurnCreateModel",
    step_args=step_args,
)

The register model step registers the model version and associated metadata to the SageMaker model registry. This includes model performance metrics as well as metadata for the data drift baseline, including the Amazon S3 locations of the statistics and constraints files produced through the baselining step. You’ll also notice the additional custom metadata noted customer_metadata_properties pulling the model entity information that will be used later in the inference pipeline. The ability to provide custom metadata within the model registry is a great way to incorporate additional metadata that should be collected that isn’t explicitly defined in native SageMaker parameters. See the following code:

# Configure step to register model version using metadata and Model object: model.register()
model_registry_args = model.register(
    content_types=['text/csv'],
    response_types=['text/csv'],
    inference_instances=['ml.t2.medium', 'ml.m5.xlarge'],
    transform_instances=['ml.m5.xlarge'],
    model_package_group_name=model_package_group_name,
    customer_metadata_properties={"ModelName": create_model_step.properties.ModelName},
    drift_check_baselines=drift_check_baselines,
    approval_status='PendingManualApproval',
    model_metrics=model_metrics
)

register_step = ModelStep(
    name='RegisterModel',
    step_args=model_registry_args
)

Configure a conditional step to evaluate model performance

The conditional step, ConditionStep, compares model accuracy against an identified threshold and checks the quality of the trained model.

It reads the evaluation.json file and checks if the model accuracy, or whatever objective metric you are optimizing for, meets the criteria you’ve defined. In this case, the criteria is defined using one of the built-in conditions, ConditionGreaterThanOrEqualTo. If the condition is satisfied, the pipeline continues to baseline the data and perform subsequent steps in the pipeline. The pipeline stops if the condition is not met. Because the condition explicitly calls out the next steps in the pipeline, we have to ensure those steps are configured prior to configuring our conditional step. See the following code:

condition_step = ConditionStep(
    name='PerformanceConditionalCheck',
    conditions=[cond_gte],
    if_steps=[baseline_model_data_step,create_model_step, register_step],
    else_steps=[],
)

Define, create, and start the SageMaker pipeline

At this point, all the steps of the train and baseline pipeline are defined and configured. Now it’s time to define, create, and start the pipeline.

First, we define the pipeline, Pipeline(), providing a pipeline name and a list of steps previously configured to include in the pipeline. Next, we create the pipeline using training_pipeline.upsert(). Finally, we start the pipeline using training_pipeline.start(). See the following code:

step_list = [
             data_preparation_step,
             training_step,
             evaluation_step,
             condition_step]

training_pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        input_data,
      ],
    steps=step_list
)

When the pipeline starts running, you can visualize its status on Studio. The following diagram shows which steps from the pipeline process relate to the steps of the pipeline directed acyclic graph (DAG). After the train and baseline pipeline run successfully, it registers the trained model as part of the model group in the model registry. The pipeline is currently set up to register the model in a Pending state, which requires a manual approval. Optionally, you can configure the model registration step to automatically approve the model in the model registry. The second pipeline will pull the latest approved model from the registry for inference.

In Studio, you can choose any step to see its key metadata. As an example, the data quality check step (baseline step) within the pipeline DAG shows the S3 output locations of statistics.json and constraints.json in the Reports section. These are key files calculated from raw data used as a baseline.

After the pipeline has run, the baseline (statistics and constraints) for data quality monitoring can be inspected, as shown in the following screenshots.

Pipeline 2: Batch inference and Model Monitor pipeline

In this section, we dive into the second pipeline used for monitoring the new batch input data for signals of data drift and running batch inference using SageMaker Pipelines. The necessary steps and code are within 2.SageMakerPipeline-ModelMonitoring-DataQuality-BatchTransform.ipynb. This pipeline includes the following steps:

  1. A Lambda step to retrieve the latest approved model version and associated metadata from the model registry.
  2. A Model Monitor step to detect signals of data drift using the new input data and the baseline from Pipeline 1.
  3. A batch transform step to process the batch input data against the latest approved model.

Configure a Lambda Step

Before we start the model monitoring and batch transform job, we need to query the model registry to get the latest approved model that we will use for batch inference.

To do this, we use a Lambda step, which allows us to include custom logic within our pipeline. The lambda_getapproved_model.py Lambda function queries the SageMaker model registry for a specific model package group provided on input to identify the latest approved model version and return related metadata. The output includes metadata created from our first pipeline:

  • Model package ARN
  • Packaged model name
  • S3 URI for statistics baseline
  • S3 URI for constraints baseline

The output is then used as input in the next step in the pipeline, which performs batch monitoring and scoring using the latest approved model.

To create and run the Lambda function as part of the SageMaker pipeline, we need to add the function as a LambdaStep in the pipeline:

lambda_getmodel_step = LambdaStep(
    name="LambdaStepGetApprovedModel",
    lambda_func=func,
    inputs={
        "model_package_group_name": model_package_group_name
     },
    outputs=[output_param_1, output_param_2,output_param_3,output_param_4,output_param_5])

Configure the data monitor and batch transform steps

After we create the Lambda step to get the latest approved model, we can create the MonitorBatchTransformStep. This native step orchestrates and manages two child tasks that are run in succession. The first task includes the Model Monitor job that runs a Processing job using a built-in container image used to monitor the batch input data and compare it against the constraints from the previously generated baseline from Pipeline 1. In addition, this step kicks off the batch transform job, which processes the input data against the latest approved model in the model registry.

This batch deployment and data quality monitoring step takes the S3 URI of the batch prediction input data on input. This is parameterized to allow for each run of the pipeline to include a new input dataset. See the following code:

transform_input_param = ParameterString(   
	name="transform_input",
    default_value=batch_prediction_data,
)

Next, we need to configure the transformer for the batch transform job that will process the batch prediction requests. In the following code, we pass in the model name that was pulled from the custom metadata of the model registry, along with other required parameters:

transformer = Transformer(
    model_name=lambda_getmodel_step.properties.Outputs["modelName"],
    instance_count=1,
    instance_type="ml.m5.xlarge",
    accept="text/csv",
    assemble_with="Line",
    output_path=batch_transform_output_path,
    sagemaker_session=pipeline_session,
)

transform_arg = transformer.transform(
    transform_input_param,
    content_type="text/csv",
    split_type="Line",
    input_filter="$[1:]",
)

The data quality monitor accepts the S3 URI of the baseline statistics and constraints for the latest approved model version from the model registry to run the data quality monitoring job during the pipeline run. This job compares the batch prediction input data with the baseline data to identify any violations signaling potential data drift. See the following code:

job_config = CheckJobConfig(role=role)
data_quality_config = DataQualityCheckConfig(
    baseline_dataset=transform_input_param,
    dataset_format=DatasetFormat.csv(header=False),
    output_s3_uri=batch_monitor_reports_output_path,
)

Next, we use MonitorBatchTransformStep to run and monitor the transform job. This step runs a batch transform job using the transformer object we configured and monitors the data passed to the transformer before running the job.

Optionally, you can configure the step to fail if a violation to data quality is found by setting the fail_on_violation flag to False.

See the following code:

from sagemaker.workflow.monitor_batch_transform_step import MonitorBatchTransformStep

transform_and_monitor_step = MonitorBatchTransformStep(
    name="MonitorCustomerChurnDataQuality",
    transform_step_args=transform_arg,
    monitor_configuration=data_quality_config,
    check_job_configuration=job_config,
    monitor_before_transform=True,
    # if violation is detected in the monitoring, you can skip it and continue running batch transform
    fail_on_violation=False,
    supplied_baseline_statistics=lambda_getmodel_step.properties.Outputs["s3uriStatistics"],
    supplied_baseline_constraints=lambda_getmodel_step.properties.Outputs["s3uriConstraints"],
)

Define, create, and start the pipeline

After we define the LambdaStep and MonitorBatchTransformStep, we can create the SageMaker pipeline.

See the following code:

from sagemaker.workflow.pipeline import Pipeline

pipeline_name = 'sagemaker-batch-inference-monitor'

batch_monitor_pipeline = Pipeline(
    name=pipeline_name,
    parameters=[transform_input_param],
    steps=[
        lambda_getmodel_step,
        transform_and_monitor_step
    ],
)

We can now use the upsert() method, which will create or update the SageMaker pipeline with the configuration we specified:

batch_monitor_pipeline.upsert(role_arn=role)

Although there are multiple ways to start a SageMaker pipeline, when the pipeline has been created, we can run the pipeline using the start() method.

Note that in order for the LambdaStep to successfully retrieve an approved model, the model that was registered as part of Pipeline 1 needs to have an Approved status. This can be done in Studio or using Boto3. Refer to Update the Approval Status of a Model for more information.

execution = batch_monitor_pipeline.start()

To run the SageMaker pipeline on a schedule or based on an event, refer to Schedule a Pipeline with Amazon EventBridge.

Review the Model Monitor reports

Model Monitor uses a SageMaker Processing job that runs the DataQuality check using the baseline statistics and constraints. The DataQuality Processing job emits a violations report to Amazon S3 and also emits log data to Amazon CloudWatch Logs under the log group for the corresponding Processing job. Sample code for querying Amazon CloudWatch logs is provided in the notebook.

We’ve now walked you through how to create the first pipeline for model training and baselining, as well as the second pipeline for performing batch inference and model monitoring. This allows you to automate both pipelines while incorporating the different lifecycles between training and inference.

To further mature this reference pattern, you can identify a strategy for feedback loops, providing awareness and visibility of potential signals of drift across key stakeholders. At a minimum, it’s recommended to automate exception handling by filtering logs and creating alarms. These alarms may need additional analysis by a data scientist, or you can implement additional automation supporting an automatic retraining strategy using new ground truth data by integrating the model training and baselining pipeline with Amazon EventBridge. For more information, refer to Amazon EventBridge Integration.

Clean up

After you run the baseline and batch monitoring pipelines, make sure to clean up any resources that won’t be utilized, either programmatically via the SageMaker console, or through Studio. In addition, delete the data in Amazon S3, and make sure to stop any Studio notebook instances to not incur any further charges.

Conclusion

In this post, you learned how to create a solution for a batch model that is trained less frequently than batch inference is performed against that trained model using SageMaker MLOps features, including Pipelines, the model registry, and Model Monitor. To expand this solution, you could incorporate this into a custom SageMaker project that also incorporates CI/CD and automated triggers using standardized MLOps templates. To dive deeper into the solution and code shown in this demo, check out the GitHub repo. Also, refer to Amazon SageMaker for MLOps for examples related to implementing MLOps practices with SageMaker.


About the Authors

Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She has been in technology for 24 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background into the domain of MLOps to help customers deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee is a co-creator and instructor of the Practical Data Science specialization on Coursera. She is also the Co-Director of Women In Big Data (WiBD), Denver chapter. In her spare time, she likes to spend time with her family, friends, and overactive dogs.

Sovik Kumar Nath is an AI/ML solution architect with AWS. He has experience in designs and solutions for machine learning, business analytics within financial, operational, and marketing analytics; healthcare; supply chain; and IoT. Outside work, Sovik enjoys traveling and watching movies.

Marc Karp is a ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Read More

Improved ML model deployment using Amazon SageMaker Inference Recommender

Improved ML model deployment using Amazon SageMaker Inference Recommender

Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. With advancements in hardware design, a wide range of CPU- and GPU-based infrastructures are available to help you speed up inference performance. Also, you can build these ML systems with a combination of ML models, tasks, frameworks, libraries, tools, and inference engines, making it important to evaluate the ML system performance for the best possible deployment configurations. You need recommendations on finding the most cost-effective ML serving infrastructure and the right combination of software configuration to achieve the best price-performance to scale these applications.

Amazon SageMaker Inference Recommender is a capability of Amazon SageMaker that reduces the time required to get ML models in production by automating load testing and model tuning across SageMaker ML instances. In this post, we highlight some of the recent updates to Inference Recommender:

  • SageMaker Python SDK support for running Inference Recommender
  • Inference Recommender usability improvements
  • New APIs that provide flexibility in running Inference Recommender
  • Deeper integration with Amazon CloudWatch for logging and metrics

Credit card fraud detection use case

Any fraudulent activity that is not detected and mitigated immediately can cause significant financial loss. Particularly, credit card payment fraud transactions need to be identified right away to protect the individual’s and company’s financial health. In this post, we discuss a credit card fraud detection use case, and learn how to use Inference Recommender to find the optimal inference instance type and ML system configurations that can detect fraudulent credit card transactions in milliseconds.

We demonstrate how to set up Inference Recommender jobs for a credit card fraud detection use case. We train an XGBoost model for a classification task on a credit card fraud dataset. We use Inference Recommender with a custom load to meet inference SLA requirements to satisfy peak concurrency of 30,000 transactions per minute while serving predictions results in less than 100 milliseconds. Based on Inference Recommender’s instance type recommendations, we can find the right real-time serving ML instances that yield the right price-performance for this use case. Finally, we deploy the model to a SageMaker real-time endpoint to get prediction results.

The following table summarizes the details of our use case.

Model Framework XGBoost
Model Size 10 MB
End-to-End Latency 100 milliseconds
Invocations per Second 500 (30,000 per minute)
ML Task Binary Classification
Input Payload 10 KB

We use a synthetically created credit card fraud dataset. The dataset contains 28 numerical features, time of the transaction, transaction amount, and class target variables. The class column corresponds to whether or not a transaction is fraudulent. The majority of data is non-fraudulent (284,315 samples), with only 492 samples corresponding to fraudulent examples. In the data, Class is the target classification variable (fraudulent vs. non-fraudulent) in the first column, followed by other variables.

In the following sections, we show how to use Inference Recommender to get ML hosting instance type recommendations and find optimal model configurations to achieve better price-performance for your inference application.

Which ML instance type and configurations should you select?

With Inference Recommender, you can run two types of jobs: default and advanced.

The default Instance Recommender job runs a set of load tests to recommended the right ML instance types for any ML use case. SageMaker real-time deployment supports a wide range of ML instances to host and serve the credit card fraud detection XGBoost model. The default job can run a load test on a selection of instances that you provide in the job configuration. If you have an existing endpoint for this use case, you can run this job to find the cost-optimized performant instance type. Inference Recommender will compile and optimize the model for a specific hardware of inference endpoint instance type using Amazon SageMaker Neo. It’s important to note that not all compilation results in improved performance. Inference Recommender will report compilation details when the following conditions are met:

  • Successful compilation of the model using Neo. There could be issues in the compilation process such as invalid payload, data type, or more. In this case, compilation information is not available.
  • Successful inference using the compiled model that shows performance improvement, which appears in the inference job response.

An advanced job is a custom load test job that allows you to perform extensive benchmarks based on your ML application SLA requirements, such as latency, concurrency, and traffic pattern. You can configure a custom traffic pattern to simulate credit card transactions. Additionally, you can define the end-to-end model latency to predict if a transaction is fraudulent and define the maximum concurrent transactions to the model for prediction. Inference Recommender uses this information to run a performance benchmark load test. The latency, concurrency, and cost metrics from the advanced job help you make informed decisions about the ML serving infrastructure for mission-critical applications.

Solution overview

The following diagram shows the solution architecture for training an XGBoost model on the credit card fraud dataset, running a default job for instance type recommendation, and performing load testing to decide the optimal inference configuration for the best price-performance.

The diagram shows the following steps:

  1. Train an XGBoost model to classify credit card transactions as fraudulent or legit. Deploy the trained model to a SageMaker real-time endpoint. Package the model artifacts and sample payload (.tar.gz format), and upload them to Amazon Simple Storage Service (Amazon S3) so Inference Recommender can use these when the job is run. Note that the training step in this post is optional.
  2. Configure and run a default Inference Recommender job on a list of supported instance types to find the right ML instance type that gives the best price-performance for this use case.
  3. Optionally, run a default Inference Recommender job on an existing endpoint.
  4. Configure and run an advanced Inference Recommender job to perform a custom load test to simulate user interactions with the credit card fraud detection application. This helps you find the right configurations to satisfy latency, concurrency, and cost for this use case.
  5. Analyze the default and advanced Inference Recommender job results, which include ML instance type recommendation latency, performance, and cost metrics.

A complete example is available in our GitHub notebook.

Prerequisites

To use Inference Recommender, make sure to meet the prerequisites.

Python SDK support for Inference Recommender

We recently released Python SDK support for Inference Recommender. You can now run default and advanced jobs using a single function: right_size. Based on the parameters of the function call, Inference Recommender infers if it should run default or advanced jobs. This greatly simplifies the use of Inference Recommender using the Python SDK. To run the Inference Recommender job, complete the following steps:

  1. Create a SageMaker model by specifying the framework, version, and image scope:
    model = Model(
        model_data=model_url,
        role=role,
        image_uri = sagemaker.image_uris.retrieve(framework="xgboost", 
        region=region, 
        version="1.5-1", 
        py_version="py3", 
        image_scope='inference'),
        sagemaker_session=sagemaker_session
        )

  2. Optionally, register the model in the SageMaker model registry. Note that parameters such as domain and task during model package creation are also optional parameters in the recent release.
    model_package = model.register(
        content_types=["text/csv"],
        response_types=["text/csv"],
        model_package_group_name=model_package_group_name,
        image_uri=model.image_uri,
        approval_status="Approved",
        framework="XGBOOST"
    )

  3. Run the right_size function on the supported ML inference instance types using the following configuration. Because XGBoost is a memory-intensive algorithm, we provide ml.m5 type instances to get instance type recommendations. You can call the right_size function on the model registry object as well.
    model.right_size(
        sample_payload_url=sample_payload_url,
        supported_content_types=["text/csv"],
        supported_instance_types=["ml.m5.large", 
                                  "ml.m5.xlarge", 
                                  "ml.m5.2xlarge", 
                                  "ml.m5.4xlarge", 
                                  "ml.m5.12xlarge"],
        framework="XGBOOST",
        job_name="credit-card-fraud-default-job"
    )
    INFO:sagemaker:Advance Job parameters were not specified. Running Default job...

  4. Define additional parameters to the right_size function to run an advanced job and custom load test on the model:
    1. Configure the traffic pattern using the phases parameter. In the first phase, we start the load test with two initial users and create two new users for every minute for 2 minutes. In the following phase, we start the load test with six initial users and create two new users for every minute for 2 minutes. Stopping conditions for the load tests are p95 end-to-end latency of 100 milliseconds and concurrency to support 30,000 transactions per minute or 500 transactions per second.
    2. We tune the endpoint against the environment variable OMP_NUM_THREADS with values [3,4,5] and we aim to limit the latency requirement to 100 milliseconds and achieve max concurrency of 30,000 invocations per minute. The goal is to find which value for OMP_NUM_THREADS provides the best performance.
from sagemaker.parameter import CategoricalParameter 
from sagemaker.inference_recommender.inference_recommender_mixin import (  
    Phase,  
    ModelLatencyThreshold 
) 
hyperparameter_ranges = [ 
    { 
        "instance_types": CategoricalParameter(["ml.m5.4xlarge"]), 
        "OMP_NUM_THREADS": CategoricalParameter(["3", "4", "6"]),
    } 
] 
phases = [ 
    Phase(duration_in_seconds=120, initial_number_of_users=2, spawn_rate=2), 
    Phase(duration_in_seconds=120, initial_number_of_users=6, spawn_rate=2) 
] 
model_latency_thresholds = [ 
    ModelLatencyThreshold(percentile="P95", value_in_milliseconds=100) 
]

model.right_size( 
    sample_payload_url=sample_payload_url, 
    supported_content_types=["text/csv"], 
    framework="XGBOOST", 
    job_duration_in_seconds=7200, 
    hyperparameter_ranges=hyperparameter_ranges, 
    phases=phases, # TrafficPattern 
    max_invocations=30000, # StoppingConditions 
    model_latency_thresholds=model_latency_thresholds,
    job_name="credit-card-fraud-advanced-job"
)
INFO:sagemaker:Advance Job parameters were specified. Running Advanced job...

Run Inference Recommender jobs using the Boto3 API

You can use the Boto3 API to launch Inference Recommender default and advanced jobs. You need to use the Boto3 API (create_inference_recommendations_job) to run Inference Recommender jobs on an existing endpoint. Inference Recommender infers the framework and version from the existing SageMaker real-time endpoint. The Python SDK doesn’t support running Inference Recommender jobs on existing endpoints.

The following code snippet shows how to create a default job:

sagemaker_client.create_inference_recommendations_job(
    JobName = "credit-card-fraud-default-job",
    JobType = 'Default',
    RoleArn = <ROLE_ARN>,
    InputConfig = {
        'ModelPackageVersionArn': <MODEL_PACKAGE_ARN>, #optional
        'Endpoints': ['EndpointName': <ENDPOINT_POINT>]
    }
)

Later in this post, we discuss the parameters needed to configure an advanced job.

Configure a traffic pattern using the TrafficPattern parameter. In the first phase, we start a load test with two initial users (InitialNumberOfUsers) and create two new users (SpawnRate) for every minute for 2 minutes (DurationInSeconds). In the following phase, we start the load test with six initial users and create two new users for every minute for 2 minutes. Stopping conditions (StoppingConditions) for the load tests are p95 end-to-end latency (ModelLatencyThresholds) of 100 milliseconds (ValueInMilliseconds) and concurrency to support 30,000 transactions per minute or 500 transactions per second (MaxInvocations). See the following code:

env_parameter_ranges = [{"Name": "OMP_NUM_THREADS", "Value": ["3", "4", "5"]}]

sagemaker_client.create_inference_recommendations_job(JobName=load_test_job_name,
        JobType='Advanced', RoleArn=role_arn, InputConfig={
    'ModelPackageVersionArn': model_package_arn, #optional
    'JobDurationInSeconds': 7200,
    'TrafficPattern': {'TrafficType': 'PHASES',
                       'Phases': [
                       {'InitialNumberOfUsers': 2,
                       'SpawnRate': 2, 
                       'DurationInSeconds': 120
                       },
                       {'InitialNumberOfUsers': 6, 
                       'SpawnRate': 6,
                       'DurationInSeconds': 120
                       }]},
    'ResourceLimit': {'MaxNumberOfTests': 10, 'MaxParallelOfTests': 3},
    'EndpointConfigurations': [{'InstanceType': 'ml.m5.4xlarge'
                                'EnvironmentParameterRanges': 
                                {'CategoricalParameterRanges': env_parameter_ranges}
                                }],
    }, StoppingConditions={'MaxInvocations': 30000,
                           'ModelLatencyThresholds': 
                           [{'Percentile': 'P95',
                            'ValueInMilliseconds': 100
                           }]})

Inference Recommender job results and metrics

The results of the default Inference Recommender job contain a list of endpoint configuration recommendations, including instance type, instance count, and environment variables. The results contain configurations for SAGEMAKER_MODEL_SERVER_WORKERS and OMP_NUM_THREADS associated with the latency, concurrency, and throughput metrics. OMP_NUM_THREADS is the model server tunable environment parameter. As shown in the details in the following table, with an ml.m5.4xlarge instance with SAGEMAKER_MODEL_SERVER_WORKERS=3 and OMP_NUM_THREADS=3, we got a throughput of 32,628 invocations per minute and model latency under 10 milliseconds. ml.m5.4xlarge had 100% improvement in latency, an approximate 115% increase in concurrency compared to the ml.m5.xlarge instance configuration. Also, it was 66% more cost-effective compared to the ml.m5.12xlarge instance configurations while achieving comparable latency and throughput.

Instance Type Initial Instance Count OMP_NUM_THREADS Cost Per Hour Max Invocations Model Latency CPU Utilization Memory Utilization SageMaker Model Server Workers
ml.m5.xlarge 1 2 0.23 15189 18 108.864 1.62012 1
ml.m5.4xlarge 1 3 0.922 32628 9 220.57001 0.69791 3
ml.m5.large 1 2 0.115 13793 19 106.34 3.24398 1
ml.m5.12xlarge 1 4 2.765 32016 4 215.32401 0.44658 7
ml.m5.2xlarge 1 2 0.461 32427 13 248.673 1.43109 3

We have included CloudWatch helper functions in the notebook. You can use the functions to get detailed charts of your endpoints during the load test. The charts have details on invocation metrics like invocations, model latency, overhead latency, and more, and instance metrics such as CPUUtilization and MemoryUtilization. The following example shows the CloudWatch metrics for our ml.m5.4xlarge model configuration.

You can visualize Inference Recommender job results in Amazon SageMaker Studio by choosing Inference Recommender under Deployments in the navigation pane. With a deployment goal for this use case (high latency, high throughput, default cost), the default Inference Recommender job recommended an ml.m5.4xlarge instance because it provided the best latency performance and throughput to support a maximum 34,600 invocations per minute (576 TPS). You can use these metrics to analyze and find the best configurations that satisfy latency, concurrency, and cost requirements of your ML application.

We recently introduced ListInferenceRecommendationsJobSteps, which allows you to analyze subtasks in an Inference Recommender job. The following code snippet shows how to use the list_inference_recommendations_job_steps Boto3 API to get the list of subtasks. This can help with debugging Inference Recommender job failures at the step level. This functionality is not supported in the Python SDK yet.

sm_client = boto3.client("sagemaker", region_name=region)
list_job_steps_response = sm_client.list_inference_recommendations_job_steps
                          (JobName='<JOB_NAME>')
print(list_job_steps_response)

The following code shows the response:

{
    "Steps": [
        {
            "StepType": "BENCHMARK",
            "JobName": "SMPYTHONSDK-<JOB_NAME>",
            "Status": "COMPLETED",
            "InferenceBenchmark": {
                "Metrics": {
                    "CostPerHour": 1.8359999656677246,
                    "CostPerInference": 1.6814110495033674e-06,
                    "MaxInvocations": 18199,
                    "ModelLatency": 40,
                    "CpuUtilization": 106.06400299072266,
                    "MemoryUtilization": 0.3920480012893677
                },
                "EndpointConfiguration": {
                    "EndpointName": "sm-epc-<ENDPOINTNAME>",
                    "VariantName": "sm-epc-<VARIANTNAME>",
                    "InstanceType": "ml.c5.9xlarge",
                    "InitialInstanceCount": 1
                },
                "ModelConfiguration": {
                    "EnvironmentParameters": [
                        {
                            "Key": "SAGEMAKER_MODEL_SERVER_WORKERS",
                            "ValueType": "String",
                            "Value": "1"
                        },
                        {
                            "Key": "OMP_NUM_THREADS",
                            "ValueType": "String",
                            "Value": "28"
                        }
                    ]
                }
            }
        },
     ...... <TRUNCATED>
    "ResponseMetadata": {
        "RequestId": "<RequestId>",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "x-amzn-requestid": "<x-amzn-requestid>",
            "content-type": "application/x-amz-json-1.1",
            "content-length": "1443",
            "date": "Mon, 20 Feb 2023 16:53:30 GMT"
        },
        "RetryAttempts": 0
    }
}

Run an advanced Inference Recommender job

Next, we run an advanced Inference Recommender job to find optimal configurations such as SAGEMAKER_MODEL_SERVER_WORKERS and OMP_NUM_THREADS on an ml.m5.4xlarge instance type. We set the hyperparameters of the advanced job to run a load test on different combinations:

hyperparameter_ranges = [ 
    { 
        "instance_types": CategoricalParameter(["ml.m5.4xlarge"]), 
        "OMP_NUM_THREADS": CategoricalParameter(["3", "4", "6"]),
    } 
]

You can view the advanced Inference Recommender job results on the Studio console, as shown in the following screenshot.

Using the Boto3 API or CLI commands, you can access all the metrics from the advanced Inference Recommender job results. InitialInstanceCount is the number of instances that you should provision in the endpoint to meet ModelLatencyThresholds and MaxInvocations mentioned in StoppingConditions. The following table summarizes our results.

Instance Type Initial Instance Count OMP_NUM_THREADS Cost Per Hour Max Invocations Model Latency CPU Utilization Memory Utilization
ml.m5.2xlarge 2 3 0.922 39688 6 86.732803 3.04769
ml.m5.2xlarge 2 4 0.922 42604 6 177.164993 3.05089
ml.m5.2xlarge 2 5 0.922 39268 6 125.402 3.08665
ml.m5.4xlarge 2 3 1.844 38174 4 102.546997 2.68003
ml.m5.4xlarge 2 4 1.844 39452 4 141.826004 2.68136
ml.m5.4xlarge 2 5 1.844 40472 4 107.825996 2.70936

Clean up

Follow the instructions in the notebook to delete all the resources created as part of this post to avoid incurring additional charges.

Summary

Finding the right ML serving infrastructure, including instance type, model configurations, and auto scaling polices, can be tedious. This post showed how you can use the Inference Recommender Python SDK and Boto3 APIs to launch default and advanced jobs to find the optimal inference infrastructure and configurations. We also discussed the new improvements to Inference Recommender, including Python SDK support and usability improvements. Check out our GitHub repository to get started.


About the Authors

Shiva Raaj Kotini works as a Principal Product Manager in the AWS SageMaker inference product portfolio. He focuses on model deployment, performance tuning, and optimization in SageMaker for inference.

John Barboza is a Software Engineer at AWS. He has extensive experience working on distributed systems. His current focus is on improving the SageMaker inference experience. In his spare time, he enjoys cooking and biking.

Mohan Gandhi is a Senior Software Engineer at AWS. He has been with AWS for the last 10 years and has worked on various AWS services like Amazon EMR, Amazon EFA, and Amazon RDS. Currently, he is focused on improving the SageMaker inference experience. In his spare time, he enjoys hiking and marathons.

Ram Vegiraju is an ML Architect with the SageMaker service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Vikram Elango is an Sr. AIML Specialist Solutions Architect at AWS, based in Virginia USA. He is currently focused on Generative AI, LLMs, prompt engineering, large model inference optimization and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design, thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking and camping with his family.

Read More