Amazon SageMaker built-in LightGBM now offers distributed training using Dask

Amazon SageMaker built-in LightGBM now offers distributed training using Dask

Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.

Starting today, the SageMaker LightGBM algorithm offers distributed training using the Dask framework for both tabular classification and regression tasks. They’re available through the SageMaker Python SDK. The supported data format can be either CSV or Parquet. Extensive benchmarking experiments on four publicly available datasets with various settings are conducted to validate its performance.

Customers are increasingly interested in training models on large datasets with SageMaker LightGBM, which can take a day or even longer. In these cases, you might be able to speed up the process by distributing training over multiple machines or processes in a cluster. This post discusses how SageMaker LightGBM helps you set up and launch distributed training, without the expense and difficulty of directly managing your training clusters.

Problem statement

Machine learning has become an essential tool for extracting insights from large amounts of data. From image and speech recognition to natural language processing and predictive analytics, ML models have been applied to a wide range of problems. As datasets continue to grow in size and complexity, traditional training methods can become increasingly time-consuming and resource-intensive. This is where distributed training comes into play.

Distributed training is a technique that allows for the parallel processing of large amounts of data across multiple machines or devices. By splitting the data and training multiple models in parallel, distributed training can significantly reduce training time and improve the performance of models on big data. In recent years, distributed training has been a popular mechanism in training deep neural networks for use cases such as large language models (LLMs), image generation and classification, and text generation tasks using frameworks like PyTorch, TensorFlow, and MXNet. In this post, we discuss how distributed training can be applied to tabular data (a common type of data found in many industries such as finance, healthcare, and retail) using Dask and the LightGBM algorithm for tasks such as regression and classification.

Dask is an open-source parallel computing library that allows for distributed parallel processing of large datasets in Python. It’s designed to work with the existing Python and data science ecosystem such as NumPy and Pandas. When it comes to distributed training, Dask can be used to parallelize the data loading, preprocessing, and model training tasks, and it integrates well with popular ML algorithms like LightGBM. LightGBM is a gradient boosting framework that uses tree-based learning algorithms, which is designed to be efficient and scalable for training large models on big data. Combining these two powerful libraries, LightGBM v3.2.0 is now integrated with Dask to allow distributed learning across multiple machines to produce a single model.

How distributed training works

Distributed training for tree-based algorithms is a technique that is used when the dataset is too large to be processed on a single instance or when the computational resources of a single instance are not sufficient to train the tree-based model in a reasonable amount of time. It allows a model to be trained across multiple instances or machines, rather than on a single machine. This is done by dividing the dataset into smaller subsets, called chunks, and distributing them among the available instances. Each instance then trains a model on its assigned chunk of data, and the results are later combined using aggregation algorithms to form a single model.

In tree-based models like LightGBM, the main computational cost is in the building of the tree structure. This is typically done by sorting and selecting subsets of the data.

Now, let’s explore how LightGBM does the parallel training. LightGBM can use three types of parallelism:

  • Data parallelism – This is the most basic form of data parallelism. The data is divided horizontally into smaller subsets and distributed among multiple instances. Each instance constructs its local histogram, and all histograms are merged, then a split is performed using a reduce scatter algorithm. A histogram in local instances is constructed by dividing the subset of the local data into discrete bins, and counting the number of data points in each bin. This histogram-based algorithm helps speed up the training and reduces memory usage.
  • Feature parallelism – In feature parallelism, each machine is responsible for training a subset of the features of the model, rather than a subset of the data. This can be useful when working with datasets that have a large number of features, because it allows for more efficient use of resources. It works by finding the best local split point in each instance, then communicates the best split with the other instances. LightGBM implementation maintains all features of the data in every machine to reduce the cost of communicating the best splits.
  • Voting parallelism – In voting parallelism, the data is divided into smaller subsets and distributed among multiple machines. Each machine trains a model on its assigned subset of data, and the results are later combined to form a single, larger model. However, instead of using the gradients from all the machines to update the model parameters, a voting mechanism is used to decide which gradients to use. This can be useful when working with datasets that have a lot of noise or outliers, because it can help reduce the impact of these on the final model. At the time of writing this post, LightGBM integration with Dask only supports data and voting parallelism types.

SageMaker will automatically set up and manage a Dask cluster when using multiple instances with the LightGBM built-in container.

Solution overview

When a training job using LightGBM is started with multiple instances, we first create a Dask cluster. One instance acts as the Dask scheduler, and the remaining instances have Dask workers, where each worker has multiple threads. Each worker in the cluster has part of the data to perform the distributed computations, as illustrated in the following figure.

Enable distributed training

The requirements for the input data are as follows:

  • The supported input data format for training can be either CSV or Parquet. You are allowed to put more than one data file under both train and validation channels. If multiple files are identified, the algorithm will concatenate all of them as the training or validation data. The name of the data file can be any string as long as it ends with .csv or .parquet.
  • For each data file, the algorithm requires that the target variable is in the first column and that it should not have a header record. This follows the convention of the SageMaker XGBoost algorithm.
  • If your predictors include categorical features, you can provide a JSON file named cat_index.json in the same location as your training data. This file should contain a Python dictionary, where the key can be any string and the value is a list of unique integers. Each integer in the value list should indicate the column index of the corresponding categorical features in your data file. The index starts with value 1, because value 0 corresponds to the target variable. The cat_index.json file should be put under the training data directory, as shown in the following example.
  • The instance type supported by distributed training is CPU.

Let’s use data in CSV format as an example. The train and validation data can be structured as follows:

-- training_dataset_s3_path
    -- data_1.csv
    -- data_2.csv
    -- data_3.csv
    -- cat_idx.json
    
-- validation_dataset_s3_path
    -- data_1.csv

You can specify the input type to be either text/csv or application/x-parquet:

from sagemaker.inputs import TrainingInput

content_type = "text/csv" # or "application/x-parquet"

train_input = TrainingInput(
    training_dataset_s3_path, content_type=content_type
)

validation_input = TrainingInput(
    validation_dataset_s3_path, content_type=content_type
)

Before distributed training, you can retrieve the default hyperparameters of LightGBM and override them with custom values:

from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for LightGBM
hyperparameters = hyperparameters.retrieve_default(
    model_id=train_model_id, model_version=train_model_version
)

# [Optional] Override default hyperparameters with custom values
hyperparameters[
    "num_boost_round"
] = "500" 

hyperparameters["tree_learner"] = "voting" ### specify either 'data' or 'voting' parallelism for distributed training. Unfortnately, for dask lightgbm, the 'feature' is not supported. See github issue: https://github.com/microsoft/LightGBM/issues/3834

To enable distributed training, you can simply specify the argument instance_count in the class sagemaker.estimator.Estimator to be more than 1. The rest of work is taken care of under the hood. See the following example code:

from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

training_job_name = name_from_base("sagemaker-built-in-distributed-lgb")

# Create SageMaker Estimator instance
tabular_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=4, ### select the instance count you would like to use for distributed training
    volume_size=30, ### volume_size (int or PipelineVariable): Size in GB of the storage volume to use for storing input and output data during training (default: 30).
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

# Launch a SageMaker Training job by passing s3 path of the training data
tabular_estimator.fit(
    {
        "train": train_input,
        "validation": validation_input,
    }, logs=True, job_name=training_job_name
)

The following screenshots show a successful training job log from the notebook. The logs from different Amazon Elastic Compute Cloud (Amazon EC2) machines are marked by different colors.

The distributed training is also compatible with SageMaker automatic model tuning. For details, see the example notebook.

Benchmarking

We conducted benchmarking experiments to validate the performance of distributed training in SageMaker LightGBM on four different publicly available datasets for regression, binary, and multi-class classification tasks. The experiment details are as follows:

  • Each dataset is split into training, validation, and test data following the 80/20/10 split rule. For each dataset and instance type and count, we train LightGBM on the training data; record metrics such as billable time (per instance), total runtime, average training loss at the end of the last built tree over all instances, and validation loss at the end of the last built tree; and evaluate its performance on the hold-out test data.
  • For each trial, we use the exact same set of hyperparameter values, with the number of trees being 500 except for the lending dataset. For the lending dataset, we use 100 as the number of trees because it’s sufficient to get optimal results on the hold-out test data.
  • Each number presented in the table is averaged over three trials.
  • Because each model is trained with one fixed set of hyperparameter values, the evaluation metric numbers on the hold-out test data can be further improved with hyperparameter optimization.

Billable time refers to the absolute wall-clock time. The total runtime is the elastic time running the distributed training, which includes the billable time and time to spin up instances and install dependencies. For the validation loss at the end of the last built tree, we didn’t do the average over all the instances as the training loss because all of the validation data is assigned to a single instance and therefore only that instance has the validation loss metric. Out of Memory (OOM) means the dataset hit the out of memory error during training. The loss function and evaluation metrics used are binary and multi-class logloss, L2, accuracy, F1, ROC AUC, F1 macro, F1 micro, R2, MAE, and MSE.

The expectation is that as the instance count increases, the billable time (per instance) and total runtime decreases, while the average training loss and validation loss at the end of the last built tree and evaluation scores on the hold-out test data remain the same.

We conducted three experiments:

  • Benchmark on three publicly available datasets using CSV as the input data format
  • Benchmark on a different dataset using Parquet as the input data format
  • Compare the model performance on different instance types given a certain instance count

The datasets we used are lending club loan data, ad-tracking fraud detection data, code data, and NYC taxi data. The data statistics are presented as follows.

Dataset Size Number of Examples Number of Features Problem Type
lending club loan ~10 G 1, 439, 141 955 Binary classification
ad-tracking fraud detection ~10 G 145, 716, 493 9 Binary classification
code ~10 G 18, 268, 221 9 Multi-class classification (number of classes in target: 10)
NYC taxi ~0.5 G 83, 601, 440 8 Regression

The following table contains the benchmarking results for the first three datasets using CSV as the data input format. For demonstration purposes, we removed the categorical features for the lending club loan data. The data statistics are shown in the table. The experiment results matched our expectations.

Dataset Instance Count (m5.2xlarge) Billable Time per Instance (seconds) Total Runtime (seconds) Average Training Loss over all Instances at the End of the Last Built Tree Validation Loss at the End of the Last Built Tree Evaluation Metrics on Hold-Out Test Data
lending club loan . . . Binary logloss Binary logloss Accuracy (%) F1 (%) ROC AUC (%)
. 1 Out of Memory
. 2 Out of Memory
. 4 461 614 0.034 0.039 98.9 96.6 99.7
. 6 375 561 0.034 0.039 98.9 96.6 99.7
. 8 359 549 0.034 0.039 98.9 96.7 99.7
. 10 338 522 0.036 0.037 98.9 96.6 99.7
.
ad-tracking fraud detection . . . Binary logloss Binary logloss Accuracy (%) F1 (%) ROC AUC (%)
. 1 Out of Memory
. 2 Out of Memory
. 4 2649 2773 0.038 0.039 99.8 43.2 80.4
. 6 1926 2047 0.039 0.04 99.8 44.5 79.7
. 8 1677 1780 0.04 0.04 99.8 45.3 79.2
. 10 1595 1744 0.04 0.041 99.8 43 79.3
.
code . . . Multiclass logloss Multiclass logloss Accuracy (%) F1 Macro (%) F1 Micro (%)
. 1 5329 5414 0.937 0.947 65.6 59.3 65.6
. 2 3175 3294 0.94 0.942 65.5 59 65.5
. 4 2593 2695 0.937 0.942 65.6 59.3 65.6
. 8 2253 2377 0.938 0.943 65.6 59.3 65.6
. 10 2160 2285 0.937 0.942 65.6 59.3 65.6

The following table contains the benchmarking results using NYC taxi data with Parquet as the input data format. For the NYC taxi data, we use the yellow trip taxi records from 2009–2022. We follow the example notebook to conduct feature processing. The processed data takes 8.5 G of disk memory when saved as CSV format, and only 0.55 G when saved as Parquet format.

A similar pattern shown in the preceding table is observed. As the instance count increases, the billable time (per instance) and total runtime decreases, while the average training loss and validation loss at the end of the last built tree and evaluation scores on the hold-out test data remain the same.

Dataset Instance Count (m5.4xlarge) Billable Time per Instance (seconds) Total Runtime (seconds) Average Training Loss over all Instances at the End of the Last Built Tree Validation Loss at the End of the Last Built Tree Evaluation Metrics on Hold-Out Test Data
NYC taxi . . . L2 L2 R2 (%) MSE MAE
. 1 951 1036 6.543 6.543 54.7 42.8 2.7
. 2 635 727 6.545 6.545 54.7 42.8 2.7
. 4 501 628 6.637 6.639 53.4 44.1 2.8
. 6 435 552 6.74 6.74 52 45.4 2.8
. 8 410 510 6.919 6.924 52.3 44.9 2.9

We also conduct benchmarking experiments and compare the performance under different instance types using the code dataset. For a certain instance count, as the instance type becomes larger, the billable time and total runtime decrease.

. ml.m5.2xlarge ml.m5.4xlarge ml.m5.12xlarge
Instance Count Billable Time per Instance (seconds) Total Runtime (seconds) Billable Time per Instance (seconds) Total Runtime (seconds) Billable Time per Instance (seconds) Total Runtime (seconds)
1 5329 5414 2793 2904 1302 1394
2 3175 3294 1911 2000 1006 1098
4 2593 2695 1451 1557 891 973

Conclusion

With the power of Dask’s distributed computing framework and LightGBM’s efficient gradient boosting algorithm, data scientists and developers can train models on large datasets faster and more efficiently than using traditional single-node methods. The SageMaker LightGBM algorithm makes the process of setting up distributed training using the Dask framework for both tabular classification and regression tasks much easier. The algorithm is now available through the SageMaker Python SDK. The supported data format can be either CSV or Parquet. Extensive benchmarking experiments were conducted on four publicly available datasets with various settings to validate its performance.

You can bring your own dataset and try these new algorithms on SageMaker, and check out the example notebook to use the built-in algorithms available on GitHub.


About the authors

Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.

Will Badr is a Principal AI/ML Specialist SA who works as part of the global Amazon Machine Learning team. Will is passionate about using technology in innovative ways to positively impact the community. In his spare time, he likes to go diving, play soccer and explore the Pacific Islands.

Dr. Li Zhang is a Principal Product Manager-Technical for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms, a service that helps data scientists and machine learning practitioners get started with training and deploying their models, and uses reinforcement learning with Amazon SageMaker. His past work as a principal research staff member and master inventor at IBM Research has won the test of time paper award at IEEE INFOCOM.

Read More

Build a water consumption forecasting solution for a water utility agency using Amazon Forecast

Build a water consumption forecasting solution for a water utility agency using Amazon Forecast

Amazon Forecast is a fully managed service that uses machine learning (ML) to generate highly accurate forecasts, without requiring any prior ML experience. Forecast is applicable in a wide variety of use cases, including estimating supply and demand for inventory management, travel demand forecasting, workforce planning, and computing cloud infrastructure usage.

You can use Forecast to seamlessly conduct what-if analyses up to 80% faster to analyze and quantify the potential impact of business levers on your demand forecasts. A what-if analysis helps you investigate and explain how different scenarios might affect the baseline forecast created by Forecast. With Forecast, there are no servers to provision or ML models to build manually. Additionally, you only pay for what you use, and there is no minimum fee or upfront commitment. To use Forecast, you only need to provide historical data for what you want to forecast, and, optionally, any additional data that you believe may impact your forecasts.

Water utility providers have several forecasting use cases, but primary among them is predicting water consumption in an area or building to meet the demand. Also, it’s important for utility providers to forecast the increased consumption demand because of more apartments added in a building or more houses in the area. Predicting water consumption accurately is critical to avoid any service interruptions to the customer.

This post explores using Forecast to address this use case by using historical time series data.

Solution overview

Water is a natural resource and very critical to industry, agriculture, households, and our lives. Accurate water consumption forecasting is critical to make sure that an agency can run day-to-day operations efficiently. Water consumption forecasting is particularly challenging because demand is dynamic, and seasonal weather changes can have an impact. Predicting water consumption accurately is important so customers don’t face any service interruptions and in order to provide a stable service while maintaining low prices. Improved forecasting enables you to plan ahead to structure more cost-effective future contracts. The following are the two most common use cases:

  • Better demand management – As a utility provider agency, you need to find a balance between water demand and supply. The agency collects information like number of people living in an apartment and number of apartments in a building before providing service. As a utility agency, you must balance aggregate supply and demand. You need to store sufficient water in order to meet the demand. Moreover, demand forecasting has become more challenging for the following reasons:
    • The demand isn’t stable at all times and varies throughout the day. For example, water consumption at midnight is much less compared to in the morning.
    • Weather can also have an impact on the overall consumption. For example, water consumption is higher in the summer than the winter in the northern hemisphere, and the other way around in the southern hemisphere.
    • There is not enough rainfall or water storage mechanisms (lakes, reservoirs), or water filtering is insufficient. During the summer, demand can’t always keep up with supply. The water agencies have to forecast carefully to acquire other sources, which may be more expensive. Therefore, it’s critical for utility agencies to find alternative water sources like harvesting rainwater, capturing condensation from air handling units, or reclaiming wastewater.
  • Conducting a what-if analysis for increased demand – Demand for water is rising due to multiple reasons. This includes a combination of population growth, economic development, and changing consumption patterns. Let’s imagine a scenario where an existing apartment building builds an extension and the number of households and people increase by a certain percentage. Now you need to do an analysis to forecast the supply for increased demand. This also helps you make a cost-effective contract for increased demand.

Forecasting can be challenging because you first need accurate models to forecast demand and then a quick and simple way to reproduce the forecast across a range of scenarios.

This post focuses on a solution to perform water consumption forecasting and a what-if analysis. This post doesn’t consider weather data for model training. However, you can add weather data, given its correlation to water consumption.

Prerequisites

Before getting started, we set up our resources. For this post, we use the us-east-1 Region.

  1. Create an Amazon Simple Storage Service (Amazon S3) bucket for storing the historical time series data. For instructions, refer to Create your first S3 bucket.
  2. Download data files from the GitHub repo and upload to the newly created S3 bucket.
  3. Create a new AWS Identity and Access Management (IAM) role. For instructions, see Set Up Permissions for Amazon Forecast. Be sure to provide the name of your S3 bucket.

Create a dataset group and datasets

This post demonstrates two use cases related to water demand forecast: forecasting the water demand based on past water consumption, and conducting a what-if analysis for increased demand.

Forecast can accept three types of datasets: target time series (TTS), related time series (RTS), and item metadata (IM). Target time series data defines the historical demand for the resources you’re predicting. The target time series dataset is mandatory. A related time series dataset includes time-series data that isn’t included in a target time series dataset and might improve the accuracy of your predictor.

In our example, the target time series dataset contains item_id and timestamp dimensions, and the complementary related time series dataset includes no_of_consumer. An important note with this dataset: the TTS ends on 2023-01-01, and the RTS ends on 2023-01-15. When performing what-if scenarios, it’s important to manipulate RTS variables beyond your known time horizon in TTS.

To conduct a what-if analysis, we need to import two CSV files representing the target time series data and the related time series data. Our example target time series file contains the item_id, timestamp, and demand, and our related time series file contains the product item_id, timestamp, and no_of consumer.

To import your data, complete the following steps:

  1. On the Forecast console, choose View dataset groups.

  2. Choose Create dataset group.

  3. For Dataset group name, enter a name (for this post, water_consumption_datasetgroup).
  4. For Forecasting domain, choose a forecasting domain (for this post, Custom).
  5. Choose Next.
  6. On the Create target time series dataset page, provide the dataset name, frequency of your data, and data schema.
  7. On the Dataset import details page, enter a dataset import name.
  8. For Import file type, select CSV and enter the data location.
  9. Choose the IAM role you created earlier as a prerequisite.
  10. Choose Start.

You’re redirected to the dashboard that you can use to track progress.

  1. To import the related time series file, on the dashboard, choose Import.
  2. On the Create related time series dataset page, provide the dataset name and data schema.
  3. On the Dataset import details page, enter a dataset import name.
  4. For Import file type, select CSV and enter the data location.
  5. Choose the IAM role you created earlier.
  6. Choose Start.

Train a predictor

Next, we train a predictor.

  1. On the dashboard, choose Start under Train a predictor.
  2. On the Train predictor page, enter a name for your predictor.
  3. Specify how long in the future you want to forecast and at what frequency.
  4. Specify the number of quantiles you want to forecast for.

Forecast uses AutoPredictor to create predictors. For more information, refer to Training Predictors.

  1. Choose Create.

Create a forecast

After our predictor is trained (this can take approximately 3.5 hours), we create a forecast. You will know that your predictor is trained when you see the View predictors button on your dashboard.

  1. Choose Start under Generate forecasts on the dashboard.
  2. On the Create a forecast page, enter a forecast name.
  3. For Predictor, choose the predictor that you created.
  4. Optionally, specify the forecast quantiles.
  5. Specify the items to generate a forecast for.
  6. Choose Start.

Query your forecast

You can query a forecast using the Query forecast option. By default, the complete range of the forecast is returned. You can request a specific date range within the complete forecast. When you query a forecast, you must specify filtering criteria. A filter is a key-value pair. The key is one of the schema attribute names (including forecast dimensions) from one of the datasets used to create the forecast. The value is a valid value for the specified key. You can specify multiple key-value pairs. The returned forecast will only contain items that satisfy all the criteria.

  1. Choose Query forecast on the dashboard.
  2. Provide the filter criteria for start date and end date.
  3. Specify your forecast key and value.
  4. Choose Get Forecast.

The following screenshot shows the forecast energy consumption for the same apartment (item ID A_10001) using the forecast model.

Create a what-if analysis

At this point, we have created our baseline forecast can now conduct a what-if analysis. Let’s imagine a scenario where an existing apartment building adds an extension, and the number of households and people increases by 20%. Now you need to do an analysis to forecast increased supply based on increased demand.

There are three stages to conducting a what-if analysis: setting up the analysis, creating the what-if forecast by defining what is changed in the scenario, and comparing the results.

  1. To set up your analysis, choose Explore what-if analysis on the dashboard.
  2. Choose Create.
  3. Enter a unique name and choose the baseline forecast.
  4. Choose the items in your dataset you want to conduct a what-if analysis for. You have two options:
    • Select all items is the default, which we choose in this post.
    • If you want to pick specific items, choose Select items with a file and import a CSV file containing the unique identifier for the corresponding item and any associated dimensions.
  5. Choose Create what-if analysis.

Create a what-if forecast

Next, we create a what-if forecast to define the scenario we want to analyze.

  1. In the What-if forecast section, choose Create.
  2. Enter a name of your scenario.
  3. You can define your scenario through two options:
    • Use transformation functions – Use the transformation builder to transform the related time series data you imported. For this walkthrough, we evaluate how the demand for an item in our dataset changes when the number of consumers increases by 20% when compared to the price in the baseline forecast.
    • Define the what-if forecast with a replacement dataset – Replace the related time series dataset you imported.

For our example, we create a scenario where we increase no_of_consumer by 20% applicable to item ID A_10001, and no_of_consumer is a feature in the dataset. You need this analysis to forecast and meet the water supply for increased demand. This analysis also helps you make a cost-effective contract based on the water demand forecast.

  1. For What-if forecast definition method, select Use transformation functions.
  2. Choose Multiply as our operator, no_of_consumer as our time series, and enter 1.2.
  3. Choose Add condition.
  4. Choose Equals as the operation and enter A_10001 for item_id.
  5. Choose Create.

Compare the forecasts

We can now compare the what-if forecasts for both our scenarios, comparing a 20% increase in consumers with the baseline demand.

  1. On the analysis insights page, navigate to the Compare what-if forecasts section.
  2. For item_id, enter the item to analyze (in our scenario, enter A_10001).
  3. For What-if forecasts, choose water_demand_whatif_analyis.
  4. Choose Compare what-if.
  5. You can choose the baseline forecast for the analysis.

The following graph shows the resulting demand for our scenario. The red line shows the forecast of future water consumption for 20% increased population. The P90 forecast type indicates the true value is expected to be lower than the predicted value 90% of the time. You can use this demand forecast to effectively manage water supply for increased demand and avoid any service interruptions.

Export your data

To export your data to CSV, complete the following steps:

  1. Choose Create export.
  2. Enter a name for your export file (for this post, water_demand_export).
  3. Specify the scenarios to be exported by selecting the scenarios on the What-If Forecast drop-down menu.

You can export multiple scenarios at once in a combined file.

  1. For Export location, specify the Amazon S3 location.
  2. To begin the export, choose Create Export.
  3. To download the export, navigate to S3 file path location on the Amazon S3 console, select the file, and choose Download.

The export file will contain the timestamp, item_id, and forecasts for each quantile for all scenarios selected (including the base scenario).

Clean up the resources

To avoid incurring future charges, remove the resources created by this solution:

  1. Delete the Forecast resources you created.
  2. Delete the S3 bucket.

Conclusion

In this post, we showed you how easy to use how to use Forecast and its underlying system architecture to predict water demand using water consumption data. A what-if scenario analysis is a critical tool to help navigate through the uncertainties of business. It provides foresight and a mechanism to stress-test ideas, leaving businesses more resilient, better prepared, and in control of their future. Other utility providers like electricity or gas providers can use Forecast to build solutions and meet utility demand in a cost-effective way.

The steps in this post demonstrated how to build the solution on the AWS Management Console. To directly use Forecast APIs for building the solution, follow the notebook in our GitHub repo.

We encourage you to learn more by visiting the Amazon Forecast Developer Guide and try out the end-to-end solution enabled by these services with a dataset relevant to your business KPIs.


About the Author

Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.

Read More

Best Egg achieved three times faster ML model training with Amazon SageMaker Automatic Model Tuning

Best Egg achieved three times faster ML model training with Amazon SageMaker Automatic Model Tuning

This post is co-authored by Tristan Miller from Best Egg.

Best Egg is a leading financial confidence platform that provides lending products and resources focused on helping people feel more confident as they manage their everyday finances. Since March 2014, Best Egg has delivered $22 billion in consumer personal loans with strong credit performance, welcomed almost 637,000 members to the recently launched Best Egg Financial Health platform, and empowered over 180,000 cardmembers who carry the new Best Egg Credit Card in their wallet.

Amazon SageMaker is a fully managed machine learning (ML) service providing various tools to build, train, optimize, and deploy ML models. SageMaker provides automated model tuning, which manages the undifferentiated heavy lifting of provisioning and managing compute infrastructure to run several iterations and select the optimized model candidate from training.

To help you efficiently tune your required hyperparameters and determine the best-performing model, this post will discuss how Best Egg used SageMaker hyperparameter tuning with warm pools and achieved a three-fold improvement in model training time.

Use case overview

Risk credit analysts use credit rating models when lending or offering a credit card to customers by taking a variety of user attributes into account. This statistical model generates a final score, or Good Bad Indicator (GBI), which determines whether to approve or reject a credit application. ML insights facilitate decision-making. To assess the risk of credit applications, ML uses various data sources, thereby predicting the risk that a customer will be delinquent.

The challenge

A significant problem in the financial sector is that there is no universally accepted method or structure for dealing with the overwhelming array of possibilities that must be considered at any one time. It’s difficult to standardize the tools that teams use in order to promote transparency and tracking across the board. The application of ML can help those in the finance industry make better judgments regarding pricing, risk management, and consumer behavior. Data scientists train multiple ML algorithms to examine millions of consumer data records, identify anomalies, and evaluate if a person is eligible for credit.

SageMaker can run automated hyperparameter tuning based on multiple optimization techniques such as grid search, Bayesian, random search, and Hyperband. Automatic model tuning makes it easy to zero in on the optimal model configuration, freeing up time and money for better use elsewhere in the financial sector. As part of hyperparameter tuning, SageMaker runs several iterations of the training code on the training dataset with various hyperparameter combinations. SageMaker then determines the best model candidate with the optimal hyperparameters based on the objective metric configured.

Best Egg was able to automate hyperparameter tuning with the automated hyperparameter optimization (HPO) feature of SageMaker and parallelize it. However, each hyperparameter tuning job could take hours, and selecting the best model candidate took many hyperparameter tuning jobs run over the course of several days. Hyperparameter tuning jobs could be slow due to the nature of the iterative tasks that HPO runs under the hood. Every time a training job is initiated, new resource provisioning occurs, which consumes a significant amount of time before the training actually begins. This is a common problem that data scientists face when training their models. Time efficiency was a major pain point because these long-running training jobs were impeding productivity and data scientists were stuck on these jobs for hours.

Solution overview

The following diagram represents the different components used in this solution.

The Best Egg data science team uses Amazon SageMaker Studio for building and running Jupyter notebooks. SageMaker processing jobs run feature engineering pipelines on the input dataset to generate features. Best Egg trains multiple credit models using classification and regression algorithms. The data science team must sometimes work with limited training data in the order of tens of thousands of records given the nature of their use cases. Best Egg runs SageMaker training jobs with automated hyperparameter tuning powered by Bayesian optimization. To reduce variance, Best Egg uses k-fold cross validation as part of their custom container to evaluate the trained model.

The trained model artifact is registered and versioned in the SageMaker model registry. Inference is run in two ways—real time and batch—based on the user requirements. The trained model artifact is hosted on a SageMaker real-time endpoint using the built-in auto scaling and load balancing features. The model is also scored through batch transform jobs scheduled on a daily basis. The whole pipeline is orchestrated through Amazon SageMaker Pipelines, consisting of a sequence of steps such as a processing step for feature engineering, a tuning step for training and automated model tuning, and a model step for registering the artifact.

With respect to the core problem of long-running hyperparameter tuning jobs, Best Egg explored the recently released warm pools feature managed by SageMaker. SageMaker Managed Warm Pools allows you to retain and reuse provisioned infrastructure after the completion of a training job to reduce latency for repetitive workloads, such as iterative experimentation or consecutively running jobs where specific job configuration parameters like instance type or count match with the previous runs. This allowed Best Egg to reuse the existing infrastructure for their repetitive training jobs without wasting time on infrastructure provisioning.

Deep Dive into Model Tuning and Benefits of Warm Pools

SageMaker Automated Model Tuning leverages Warm Pools by default for any tuning job as of August 2022 (announcement). This makes it straightforward to reap the benefits of Warm Pools as you just need to launch a tuning job and SageMaker Automatic Model Tuning will automatically use Warm Pools between subsequent training jobs launched as part of the tuning. When each training job completes, the provisioned resources are kept alive in a warm pool so that the next training job launched as part of the tuning will start on the same pool with minimal startup overhead.

The below workflow depicts a series of training job runs using warm pool.

  1. After the first training job is complete, the instances used for training are retained in the warm pool cluster.
  2. The next training job triggered will use the instance in the warm pool to run, eliminating the cold start time needed to prepare the instance to start up.
  3. Likewise, if more training jobs come in with instance type, instance count, volume & networking criteria similar to the warm pool cluster resources, then the matched instances will be used for running the jobs.
  4. Once the training job is completed, the instances will be retained in the warm pool waiting for new jobs.
  5. The maximum length of time that a warm pool cluster can continue running consecutive training jobs is 7 days.
    • As long as the cluster is healthy and the warm pool is within the specified time duration, the warm pool status is Available.
    • The warm pool stays Available until it identifies a matching training job for reuse. If the warm pool status is Terminated, then this is the end of the warm pool lifecycle.

The following diagram illustrates this workflow.

How Best Egg benefitted: Improvements and data points

Best Egg noticed that with warm pools, their training jobs on SageMaker were running faster by a factor of 3. In one credit model project, the best model was selected from eight different HPO jobs, each of which had 40 iterations with five parallel jobs at a time. Each iteration took about 1 minute to compute, whereas without warm pools they typically took 5 minutes each. In total, the process took 2 hours of computation time, with additional input from the data scientist adding up to about half a business day. Without warm pools, we estimate that the computation would have taken 6 hours alone, likely spread out over the course of 2–3 business days.

Summary

In conclusion, this post discussed elements of Best Egg’s business and the company’s ML landscape. We reviewed how Best Egg was able to speed up its model training and tuning by enabling warm pools for their hyperparameter tuning jobs on SageMaker. We also explained how simple it is to implement warm pools for your training jobs with a simple configuration. At AWS, we recommend our readers start exploring warm pools for iterative and repetitive training jobs.


About the Authors

Tristan Miller is a Lead Data Scientist at Best Egg. He builds and deploys ML models to make important underwriting and marketing decisions. He develops bespoke solutions to address specific problems, as well as automation to increase efficiency and scale. He is also a skilled origamist.

Valerio Perrone is an Applied Science Manager at AWS. He leads the science and engineering team owning the service for automatic model tuning across Amazon SageMaker. Valerio’s expertise lies in developing algorithms for large-scale machine learning and statistical models, with a focus on data-driven decision making and the democratization of artificial intelligence

Ganapathi Krishnamoorthi is a Senior ML Solutions Architect at AWS. Ganapathi provides prescriptive guidance to startup and enterprise customers, helping them design and deploy cloud applications at scale. He is specialized in machine learning and is focused on helping customers use AI/ML for their business outcomes. When not at work, he enjoys exploring the outdoors and listening to music

Ajjay Govindaram is a Sr. Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Hariharan Suresh is a Senior Solutions Architect at AWS. He is passionate about databases, machine learning, and designing innovative solutions. Prior to joining AWS, Hariharan was a product architect, core banking implementation specialist, and developer, and worked with BFSI organizations for over 11 years. Outside of technology, he enjoys paragliding and cycling.

Read More

Build a loyalty points anomaly detector using Amazon Lookout for Metrics

Build a loyalty points anomaly detector using Amazon Lookout for Metrics

Today, gaining customer loyalty cannot be a one-off thing. A brand needs a focused and integrated plan to retain its best customers—put simply, it needs a customer loyalty program. Earn and burn programs are one of the main paradigms. A typical earn and burn program rewards customers after a certain number of visits or spend.

For example, a fast food chain has launched its earn and burn loyalty pilot program in some locations. They are looking to use the loyalty program to make their customer experience more personal. Upon testing, they want to expand it to more locations across different countries in the future. The program allows customers to earn points for every dollar that they spend. They can redeem the points toward different rewards options. To attract new customers, they also give points to new customers. They test the redeem pattern every month to check the performance of the loyalty program at different locations. Identifying redeem pattern anomalies is crucial in order to take corrective action in time and ensure the overall success of the program. Customers have different earn and redeem patterns at different locations based on their spend and choice of food. Therefore, the process of identifying an anomaly and quickly diagnosing the root cause is difficult, costly, and error-prone.

This post shows you how to use an integrated solution with Amazon Lookout for Metrics to break these barriers by quickly and easily detecting anomalies in the key performance indicators (KPIs) of your interest.

Lookout for Metrics automatically detects and diagnoses anomalies (outliers from the norm) in business and operational data. You don’t need ML experience to use Lookout for Metrics. It’s a fully managed machine learning (ML) service that uses specialized ML models to detect anomalies based on the characteristics of your data. For example, trends and seasonality are two characteristics of time series metrics in which threshold-based anomaly detection doesn’t work. Trends are continuous variations (increases or decreases) in a metric’s value. On the other hand, seasonality is periodic patterns that occur in a system, usually rising above a baseline and then decreasing again.

In this post, we demonstrate a common loyalty points earn and burn scenario, in which we detect anomalies in the customer’s earn and redeem pattern. We show you how to use these managed services from AWS to help find anomalies. You can apply this solution to other use cases such as detecting anomalies in air quality, traffic patterns, and power consumption patterns, to name a few.

Solution overview

This post demonstrates how you can set up anomaly detection on a loyalty points earn and redeem pattern using Lookout for Metrics. The solution allows you to download relevant datasets and set up anomaly detection to detect earn and redeem patterns.

Let’s see how a loyalty program typically works, as shown in the following diagram.

Customers earn points for the money they spend on the purchase. They can redeem the accumulated points in exchange for discounts, rewards, or incentives.

Building this system requires three simple steps:

  1. Create an Amazon Simple Storage Service (Amazon S3) bucket and upload your sample dataset.
  2. Create a detector for Lookout for Metrics.
  3. Add a dataset and activate the detector to detect anomalies on historical data.

Then you can review and analyze the results.

Create an S3 bucket and upload your sample dataset

Download the file loyalty.csv and save it locally. Then continue through the following steps:

  1. On the Amazon S3 console, create an S3 bucket to upload the loyalty.csv file.

This bucket needs to be unique and in the same Region where you’re using Lookout for Metrics.

  1. Open the bucket you created.
  2. Choose Upload.

  1. Choose Add files and choose the loyalty.csv file.
  2. Choose Upload.

Create a detector

A detector is a Lookout for Metrics resource that monitors a dataset and identifies anomalies at a predefined frequency. Detectors use ML to find patterns in data and distinguish between expected variations in data and legitimate anomalies. To improve its performance, a detector learns more about your data over time.

In our use case, the detector analyzes daily data. To create the detector, complete the following steps:

  1. On the Lookout for Metrics console, choose Create detector.
  2. Enter a name and optional description for the detector.
  3. For Interval, choose 1 day intervals.
  4. Choose Create.

Your data is encrypted by default with a key that AWS owns and manages for you. You can also configure if you want to use a different encryption key from the one that is used by default.

Now let’s point this detector to the data that you want it to run anomaly detection on.

Create a dataset

A dataset tells the detector where to find your data and which metrics to analyze for anomalies. To create a dataset, complete the following steps:

  1. On the Lookout for Metrics console, navigate to your detector.
  2. Choose Add a dataset.

  1. For Name, enter a name (for example, loyalty-point-anomaly-dataset).
  2. For Timezone, choose as applicable.
  3. For Datasource, choose your data source (for this post, Amazon S3).
  4. For Detector mode, select your mode (for this post, Backtest).

With Amazon S3, you can create a detector in two modes:

  • Backtest – This mode is used to find anomalies in historical data. It needs all records to be consolidated in a single file. We use this mode with our use case because we want to detect anomalies in a customer’s historical loyalty points redeem pattern in different locations.
  • Continuous – This mode is used to detect anomalies in live data.
  1. Enter the S3 path for the live S3 folder and path pattern.
  2. Choose Detect format settings.
  3. Leave all default format settings as is and choose Next.

Configure measures, dimensions, and timestamps

Measures define KPIs that you want to track anomalies for. You can add up to five measures per detector. The fields that are used to create KPIs from your source data must be of numeric format. The KPIs can be currently defined by aggregating records within the time interval by doing a SUM or AVERAGE.

Dimensions give you the ability to slice and dice your data by defining categories or segments. This allows you to track anomalies for a subset of the whole set of data for which a particular measure is applicable.

In our use case, we add two measures, which calculate the sum of the objects seen in the 1-day interval, and have one dimension, for which earned and redeemed points are measured.

Every record in the dataset must have a timestamp. The following configuration allows you to choose the field that represents the timestamp value and also the format of the timestamp.

The next page allows you to review all the details you added and then choose Save and activate to create the detector.

The detector then begins learning the data inthe data source. At this stage, the status of the detector changes to Initializing.

It’s important to note the minimum amount of data that is required before Lookout for Metrics can start detecting anomalies. For more information about requirements and limits, see Lookout for Metrics quotas.

With minimal configuration, you have created your detector, pointed it at a dataset, and defined the metrics that you want Lookout for Metrics to find anomalies in.

Review and analyze the results

When the backtesting job is complete, you can see all the anomalies that Lookout for Metrics detected in the last 30% of your historical data. From here, you can begin to unpack the kinds of results you will see from Lookout for Metrics in the future when you start getting the new data.

Lookout for Metrics provides a rich UI experience for users who want to use the AWS Management Console to analyze the anomalies being detected. It also provides the capability to query the anomalies via APIs.

Let’s look at an example anomaly detected from our loyalty points anomaly detector use case. The following screenshot shows an anomaly detected in loyalty points redemption at a specific location on the designated time and date with a severity score of 91.

It also shows the percentage contribution of the dimension towards the anomaly. In this case, 100% contribution comes from the location ID A-1002 dimension.

Clean up

To avoid incurring ongoing charges, delete the following resources created in this post:

  • Detector
  • S3 bucket
  • IAM role

Conclusion

In this post, we showed you how to use Lookout for Metrics to remove the undifferentiated heavy lifting involved in managing the end-to-end lifecycle of building ML-powered anomaly detection applications. This solution can help you accelerate your ability to find anomalies in key business metrics and allow you focus your efforts on growing and improving your business.

We encourage you to learn more by visiting the Amazon Lookout for Metrics Developer Guide and trying out the end-to-end solution enabled by these services with a dataset relevant to your business KPIs.


About the Author

Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.

Read More

Explain text classification model predictions using Amazon SageMaker Clarify

Explain text classification model predictions using Amazon SageMaker Clarify

Model explainability refers to the process of relating the prediction of a machine learning (ML) model to the input feature values of an instance in humanly understandable terms. This field is often referred to as explainable artificial intelligence (XAI). Amazon SageMaker Clarify is a feature of Amazon SageMaker that enables data scientists and ML engineers to explain the predictions of their ML models. It uses model agnostic methods like SHapely Additive exPlanations (SHAP) for feature attribution. Apart from supporting explanations for tabular data, Clarify also supports explainability for both computer vision (CV) and natural language processing (NLP) using the same SHAP algorithm.

In this post, we illustrate the use of Clarify for explaining NLP models. Specifically, we show how you can explain the predictions of a text classification model that has been trained using the SageMaker BlazingText algorithm. This helps you understand which parts or words of the text are most important for the predictions made by the model. Among other things, these observations can then be used to improve various processes like data acquisition that reduces bias in the dataset and model validation to ensure that models are performing as intended, and earn trust with all stakeholders when the model is deployed. This can be a key requirement in many application domains like sentiment analysis, legal reviews, medical diagnosis, and more.

We also provide a general design pattern that you can use while using Clarify with any of the SageMaker algorithms.

Solution overview

SageMaker algorithms have fixed input and output data formats. For example, the BlazingText algorithm container accepts inputs in JSON format. But customers often require specific formats that are compatible with their data pipelines. We present a couple of options that you can follow to use Clarify.

Option A

In this option, we use the inference pipeline feature of SageMaker hosting. An inference pipeline is a SageMaker model that constitutes a sequence of containers that processes inference requests. The following diagram illustrates an example.

Clarify job invokes inference pipeline with one container handling the format of data and the other container holding the model.

You can use inference pipelines to deploy a combination of your own custom models and SageMaker built-in algorithms packaged in different containers. For more information, refer to Hosting models along with pre-processing logic as serial inference pipeline behind one endpoint. Because Clarify supports only CSV and JSON Lines as input, you need to complete the following steps:

  1. Create a model and a container to convert the data from CSV (or JSON Lines) to JSON.
  2. After the model training step with the BlazingText algorithm, directly deploy the model. This will deploy the model using the BlazingText container, which accepts JSON as input. When using a different algorithm, SageMaker creates the model using that algorithm’s container.
  3. Use the preceding two models to create a PipelineModel. This chains the two models in a linear sequence and creates a single model. For an example, refer to Inference pipeline with Scikit-learn and Linear Learner.

With this solution, we have successfully created a single model whose input is compatible with Clarify and can be used by it to generate explanations.

Option B

This option demonstrates how you can integrate the use of different data formats between Clarify and SageMaker algorithms by bringing your own container for hosting the SageMaker model. The following diagram illustrates the architecture and the steps that are involved in the solution:

The steps are as follows:

  1. Use the BlazingText algorithm via the SageMaker Estimator to train a text classification model.
  2. After the model is trained, create a custom Docker container that can be used to create a SageMaker model and optionally deploy the model as a SageMaker model endpoint.
  3. Configure and create a Clarify job to use the hosting container for generating an explainability report.
  4. The custom container accepts the inference request as a CSV and enables Clarify to generate explanations.

It should be noted that this solution demonstrates the idea of obtaining offline explanations using Clarify for a BlazingText model. For more information about online explainability, refer to Online Explainability with SageMaker Clarify.

The rest of this post explains each of the steps in the second option.

Train a BlazingText model

We first train a text classification model using the BlazingText algorithm. In this example, we use the DBpedia Ontology dataset. DBpedia is a crowd-sourced initiative to extract structured content using information from various Wikimedia projects like Wikipedia. Specifically, we use the DBpedia ontology dataset as created by Zhang et al. It is constructed by selecting 14 non-overlapping classes from DBpedia 2014. The fields contain an abstract of a Wikipedia article and the corresponding class. The goal of a text classification model is to predict the class of an article given its abstract.

A detailed step-by-step process for training the model is available in the following notebook. After you have trained the model, take note of the Amazon Simple Storage Service (Amazon S3) URI path where the model artifacts are stored. For a step-by-step guide, refer to Text Classification using SageMaker BlazingText.

Deploy the trained BlazingText model using your own container on SageMaker

With Clarify, there are two options to provide the model information:

  • Create a SageMaker model without deploying it to an endpoint – When a SageMaker model is provided to Clarify, it creates an ephemeral endpoint using the model.
  • Create a SageMaker model and deploy it to an endpoint – When an endpoint is made available to Clarify, it uses the endpoint for obtaining explanations. This avoids the creation of an ephemeral endpoint and can reduce the runtime of a Clarify job.

In this post, we use the first option with Clarify. We use the SageMaker Python SDK for this purpose. For other options and more details, refer to Create your endpoint and deploy your model.

Bring your own container (BYOC)

We first build a custom Docker image that is used to create the SageMaker model. You can use the files and code in the source directory of our GitHub repository.

The Dockerfile describes the image we want to build. We start with a standard Ubuntu installation and then install Scikit-learn. We also clone fasttext and install the package. It’s used to load the BlazingText model for making predictions. Finally, we add the code that implements our algorithm in the form of the preceding files and set up the environment in the container. The entire Dockerfile is provided in our repository and you can use it as it is. Refer to Use Your Own Inference Code with Hosting Services for more details on how SageMaker interacts with your Docker container and its requirements.

Furthermore, predictor.py contains the code for loading the model and making the predictions. It accepts input data as a CSV, which makes it compatible with Clarify.

After you have the Dockerfile, build the Docker container and upload it to Amazon Elastic Container Registry (Amazon ECR). You can find the step-by-step process in the form of a shell script in our GitHub repository, which you can use to create and upload the Docker image to Amazon ECR.

Create the BlazingText model

The next step is to create a model object from the SageMaker Python SDK Model class that can be deployed to an HTTPS endpoint. We configure Clarify to use this model for generating explanations. For the code and other requirements for this step, refer to Deploy your trained SageMaker BlazingText Model using your own container in Amazon SageMaker.

Configure Clarify

Clarify NLP is compatible with regression and classification models. It helps you understand which parts of the input text influence the predictions of your model. Clarify supports 62 languages and can handle text with multiple languages. We use the SageMaker Python SDK to define the three configurations that are used by Clarify for creating the explainability report.

First, we need to create the processor object and also specify the location of the input dataset that will be used for the predictions and the feature attribution:

import sagemaker
sagemaker_session = sagemaker.Session()
from sagemaker import clarify
clarify_processor = clarify.SageMakerClarifyProcessor(
role=role,
instance_count=1,
instance_type="ml.m5.xlarge",
sagemaker_session=sagemaker_session,
)
file_path = "<location of the input dataset>"

DataConfig

Here, you should configure the location of the input data, the feature column, and where you want the Clarify job to store the output. This is done by passing the relevant arguments while creating a DataConfig object:

explainability_output_path = "s3://{}/{}/clarify-text-explainability".format(
sagemaker_session.default_bucket(), "explainability"
)

explainability_data_config = clarify.DataConfig(
s3_data_input_path=file_path,
s3_output_path=explainability_output_path,
headers=["Review Text"],
dataset_type="text/csv",
)

ModelConfig

With ModelConfig, you should specify information about your trained model. Here, we specify the name of the BlazingText SageMaker model that we created in a prior step and also set other parameters like the Amazon Elastic Compute Cloud (Amazon EC2) instance type and the format of the content:

model_config = clarify.ModelConfig(
model_name=model_name,
instance_type="ml.m5.xlarge",
instance_count=1,
accept_type="application/jsonlines",
content_type="text/csv",
endpoint_name_prefix=None,
)

SHAPConfig

This is used to inform Clarify about how to obtain the feature attributions. TextConfig is used to specify the granularity of the text and the language. In our dataset, because we want to break down the input text into words and the language is English, we set these values to token and English, respectively. Depending on the nature of your dataset, you can set granularity to sentence or paragraph. The baseline is set to a special token. This means that Clarify will drop subsets of the input text and replace them with values from the baseline while obtaining predictions for computing the SHAP values. This is how it determines the effect of the tokens on the model’s predictions and in turn identifies their importance. The number of samples that are to be used in the Kernel SHAP algorithm is determined by the value of the num_samples argument. Higher values result in more robust feature attributions, but that can also increase the runtime of the job. Therefore, you need to make a trade-off between the two. See the following code:

shap_config = clarify.SHAPConfig(
baseline=[["<UNK>"]],
num_samples=1000,
agg_method="mean_abs",
save_local_shap_values=True,
text_config=clarify.TextConfig(granularity="token", language="english"),
)

For more information, see Feature Attributions that Use Shapley Values and Amazon AI Fairness and Explainability Whitepaper.

ModelPredictedLabelConfig

For Clarify to extract a predicted label or predicted scores or probabilities, this config object needs to be set. See the following code:

from sagemaker.clarify import ModelPredictedLabelConfig
modellabel_config = ModelPredictedLabelConfig(probability="prob", label="label")

For more details, refer to the documentation in the SDK.

Run a Clarify job

After you create the different configurations, you’re now ready to trigger the Clarify processing job. The processing job validates the input and parameters, creates the ephemeral endpoint, and computes local and global feature attributions using the SHAP algorithm. When that’s complete, it deletes the ephemeral endpoint and generates the output files. See the following code:

clarify_processor.run_explainability(
data_config=explainability_data_config,
model_config=model_config,
explainability_config=shap_config,
model_scores=modellabel_config,
)

The runtime of this step depends on the size of the dataset and the number of samples that are generated by SHAP.

Visualize the results

Finally, we show a visualization of the results from the local feature attribution report that was generated by the Clarify processing job. The output is in a JSON Lines format and with some processing; you can plot the scores for the tokens in the input text like the following example. Higher bars have more impact on the target label. Furthermore, positive values are associated with higher predictions in the target variable and negative values with lower predictions. In this example, the model makes a prediction for the input text “Wesebach is a river of Hesse Germany.” The predicted class is Natural Place and the scores indicate that the model found the word “river” to be the most informative to make this prediction. This is intuitive for a human and by examining more samples, you can determine if the model is learning the right features and behaving as expected.

Conclusion

In this post, we explained how you can use Clarify to explain predictions from a text classification model that was trained using SageMaker BlazingText. Get started with explaining predictions from your text classification models using the sample notebook Text Explainability for SageMaker BlazingText.

We also discussed a more generic design pattern that you can use when using Clarify with SageMaker built-in algorithms. For more information, refer to What Is Fairness and Model Explainability for Machine Learning Predictions. We also encourage you to read the Amazon AI Fairness and Explainability Whitepaper, which provides an overview on the topic and discusses best practices and limitations.


About the Authors

Pinak Panigrahi works with customers to build machine learning driven solutions to solve strategic business problems on AWS. When not occupied with machine learning, he can be found taking a hike, reading a book or catching up with sports.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Read More

Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart. Today, we announce a new feature that lets you upscale images (resize images without losing quality) with Stable Diffusion models in JumpStart. An image that is low resolution, blurry, and pixelated can be converted into a high-resolution image that appears smoother, clearer, and more detailed. This process, called upscaling, can be applied to both real images and images generated by text-to-image Stable Diffusion models. This can be used to enhance image quality in various industries such as ecommerce and real estate, as well as for artists and photographers. Additionally, upscaling can improve the visual quality of low-resolution images when displayed on high-resolution screens.

Stable Diffusion uses an AI algorithm to upscale images, eliminating the need for manual work that may require manually filling gaps in an image. It has been trained on millions of images and can accurately predict high-resolution images, resulting in a significant increase in detail compared to traditional image upscalers. Additionally, unlike non-deep-learning techniques such as nearest neighbor, Stable Diffusion takes into account the context of the image, using a textual prompt to guide the upscaling process.

In this post, we provide an overview of how to deploy and run inference with the Stable Diffusion upscaler model in two ways: via JumpStart’s user interface (UI) in Amazon SageMaker Studio, and programmatically through JumpStart APIs available in the SageMaker Python SDK.

Solution overview

The following images show examples of upscaling performed by the model. On the left is the original low-resolution image enlarged to match the size of the image generated by the model. On the right is the image generated by the model.

The first generated image is the result of low resolution cat image and the prompt “a white cat.”

The second generated image is the result of low resolution butterfly image and the prompt “a butterfly on a green leaf.”

Running large models like Stable Diffusion requires custom inference scripts. You have to run end-to-end tests to make sure that the script, the model, and the desired instance work together efficiently. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. You can access these scripts with one click through the Studio UI or with very few lines of code through the JumpStart APIs.

The following sections provide an overview of how to deploy the model and run inference using either the Studio UI or the JumpStart APIs.

Note that by using this model, you agree to the CreativeML Open RAIL++-M License.

Access JumpStart through the Studio UI

In this section, we demonstrate how to train and deploy JumpStart models through the Studio UI. The following video shows how to find the pre-trained Stable Diffusion upscaler model on JumpStart and deploy it. The model page contains valuable information about the model and how to use it. For inference, we use the ml.p3.2xlarge instance type because it provides the GPU acceleration needed for low-inference latency at a low price point. After you configure the SageMaker hosting instance, choose Deploy. It will take 5–10 minutes until the endpoint is up and running and ready to respond to inference requests.

Video: stable diffusion upscaling.mov

To accelerate the time to inference, JumpStart provides a sample notebook that shows how to run inference on the newly created endpoint. To access the notebook in Studio, choose Open Notebook in the Use Endpoint from Studio section of the model endpoint page.

Use JumpStart programmatically with the SageMaker SDK

You can use the JumpStart UI to deploy a pre-trained model interactively in just a few clicks. However, you can also use JumpStart models programmatically by using APIs that are integrated into the SageMaker Python SDK.

In this section, we choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and run inference on the deployed endpoint, all using the SageMaker Python SDK. The following examples contain code snippets. For the full code with all of the steps in this demo, see the Introduction to JumpStart – Enhance image quality guided by prompt example notebook.

Deploy the pre-trained model

SageMaker utilizes Docker containers for various build and runtime tasks. JumpStart utilizes the SageMaker Deep Learning Containers (DLCs) that are framework-specific. We first fetch any additional packages, as well as scripts to handle training and inference for the selected task. Then the pre-trained model artifacts are separately fetched with model_uris, which provides flexibility to the platform. This allows multiple pre-trained models to be used with a single inference script. The following code illustrates this process:

model_id, model_version = "model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16", "*"
# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")

base_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference")

Next, we provide those resources into a SageMaker model instance and deploy an endpoint:

# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model - note that we need to pass the Predictor class when we deploy the model through the Model class,
# in order to run inference through the SageMaker API
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

After our model is deployed, we can get predictions from it in real time!

Input format

The endpoint accepts a low-resolution image as raw RGB values or a base64 encoded image. The inference handler decodes the image based on content_type:

  • For content_type = “application/json”, the input payload must be a JSON dictionary with the raw RGB values, a textual prompt, and other optional parameters
  • For content_type = “application/json;jpeg”, the input payload must be a JSON dictionary with the base64 encoded image, a textual prompt, and other optional parameters

Output format

The following code examples give you a glimpse of what the outputs look like. Similarly to the input format, the endpoint can respond with the raw RGB values of the image or a base64 encoded image. This can be specified by setting accept to one of the two values:

  • For accept = “application/json”, the endpoint returns the a JSON dictionary with RGB values for the image
  • For accept = “application/json;jpeg”, the endpoint returns a JSON dictionary with the JPEG image as bytes encoded with base64.b64 encoding

Note that sending or receiving the payload with the raw RGB values may hit default limits for the input payload and the response size. Therefore, we recommend using the base64 encoded image by setting content_type = “application/json;jpeg” and accept = “application/json;jpeg”.

The following code is an example inference request:

content_type = “application/json;jpeg” 

# We recommend rescaling the image of low_resolution_image such that both height and width are powers of 2.
# This can be achieved by original_image = Image.open('low_res_image.jpg'); rescaled_image = original_image.rescale((128,128)); rescaled_image.save('rescaled_image.jpg')
with open(low_res_img_file_name,'rb') as f: low_res_image_bytes = f.read()

encoded_image = base64.b64encode(bytearray(low_res_image_bytes)).decode()

payload = { "prompt": "a cat", "image": encoded_image,  "num_inference_steps":50, "guidance_scale":7.5}

accept = "application/json;jpeg"

def query(model_predictor, payload, content_type, accept):
    """Query the model predictor."""
    query_response = model_predictor.predict(
        payload,
        {
            "ContentType": content_type,
            "Accept": accept,
        },
    )
    return query_response

The endpoint response is a JSON object containing the generated images and the prompt:

def parse_response(query_response):
"""Parse response and return the generated images and prompt."""

    response_dict = json.loads(query_response)
    return response_dict["generated_images"], response_dict["prompt"]
    
query_response = query(model_predictor, json.dumps(payload).encode('utf-8'), content_type, accept)
generated_images, prompt = parse_response(query_response)

Supported parameters

Stable Diffusion upscaling models support many parameters for image generation:

  • image – A low resolution image.
  • prompt – A prompt to guide the image generation. It can be a string or a list of strings.
  • num_inference_steps (optional) – The number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer. Note that more inference steps will lead to a longer response time.
  • guidance_scale (optional) – A higher guidance scale results in an image more closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.
  • negative_prompt (optional) – This guides the image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if the prompt is a list of strings, then the negative_prompt must also be a list of strings.
  • seed (optional) – This fixes the randomized state for reproducibility. If specified, it must be an integer. Whenever you use the same prompt with the same seed, the resulting image will always be the same.
  • noise_level (optional) – This adds noise to latent vectors before upscaling. If specified, it must be an integer.

You can recursively upscale an image by invoking the endpoint repeatedly to get higher and higher quality images.

Image size and instance types

Images generated by the model can be up to four times the size of the original low-resolution image. Furthermore, the model’s memory requirement (GPU memory) grows with the size of the generated image. Therefore, if you’re upscaling an already high-resolution image or are recursively upscaling images, select an instance type with a large GPU memory. For instance, ml.g5.2xlarge has more GPU memory than the ml.p3.2xlarge instance type we used earlier. For more information on different instance types, refer to Amazon EC2 Instance Types.

Upscaling images piece by piece

To decrease memory requirements when upscaling large images, you can break the image into smaller sections, known as tiles, and upscale each tile individually. After the tiles have been upscaled, they can be blended together to create the final image. This method requires adapting the prompt for each tile so the model can understand the content of the tile and avoid creating strange images. The style part of the prompt should remain consistent for all tiles to make blending easier. When using higher denoising settings, it’s important to be more specific in the prompt because the model has more freedom to adapt the image. This can be challenging when the tile contains only background or isn’t directly related to the main content of the picture.

Limitations and bias

Even though Stable Diffusion has impressive performance in upscaling, it suffers from several limitations and biases. These include but are not limited to:

  • The model may not generate accurate faces or limbs because the training data doesn’t include sufficient images with these features
  • The model was trained on the LAION-5B dataset, which has adult content and may not be fit for product use without further considerations
  • The model may not work well with non-English languages because the model was trained on English language text
  • The model can’t generate good text within images

For more information on limitations and bias, refer to the Stable Diffusion upscaler model card.

Clean up

After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the billing is stopped. The code to clean up the endpoint is available in the associated notebook.

Conclusion

In this post, we showed how to deploy a pre-trained Stable Diffusion upscaler model using JumpStart. We showed code snippets in this post—the full code with all of the steps in this demo is available in the Introduction to JumpStart – Enhance image quality guided by prompt example notebook. Try out the solution on your own and send us your comments.

To learn more about the model and how it works, see the following resources:

To learn more about JumpStart, check out the following blog posts:


About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on Natural Language Processing (NLP), Large Language Models (LLMs), and Generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers being successful in their AI/ML journey on AWS and has worked with organizations in many industries, including Insurance, Financial Services, Media and Entertainment, Healthcare, Utilities, and Manufacturing. In his spare time Heiko travels as much as possible.

Read More

Cohere brings language AI to Amazon SageMaker

Cohere brings language AI to Amazon SageMaker

This is a guest post by Sudip Roy, Manager of Technical Staff at Cohere.

It’s an exciting day for the development community. Cohere’s state-of-the-art language AI is now available through Amazon SageMaker. This makes it easier for developers to deploy Cohere’s pre-trained generation language model to Amazon SageMaker, an end-to-end machine learning (ML) service. Developers, data scientists, and business analysts use Amazon SageMaker to build, train, and deploy ML models quickly and easily using its fully managed infrastructure, tools, and workflows.

At Cohere, the focus is on language. The company’s mission is to enable developers and businesses to add language AI to their technology stack and build game-changing applications with it. Cohere helps developers and businesses automate a wide range of tasks, such as copywriting, named entity recognition, paraphrasing, text summarization, and classification. The company builds and continually improves its general-purpose large language models (LLMs), making them accessible via a simple-to-use platform. Companies can use the models out of the box or tailor them to their particular needs using their own custom data.

Developers using SageMaker will have access to Cohere’s Medium generation language model. The Medium generation model excels at tasks that require fast responses, such as question answering, copywriting, or paraphrasing. The Medium model is deployed in containers that enable low-latency inference on a diverse set of hardware accelerators available on AWS, providing different cost and performance advantages for SageMaker customers.

“Amazon SageMaker provides the broadest and most comprehensive set of services that eliminate heavy lifting from each step of the machine learning process. We’re excited to offer Cohere’s general purpose large language model with Amazon SageMaker. Our joint customers can now leverage the broad range of Amazon SageMaker services and integrate Cohere’s model with their applications for accelerated time-to-value and faster innovation.”

-Rajneesh Singh, General Manager AI/ML at Amazon Web Services.

“As Cohere continues to push the boundaries of language AI, we are excited to join forces with Amazon SageMaker. This partnership will allow us to bring our advanced technology and innovative approach to an even wider audience, empowering developers and organizations around the world to harness the power of language AI and stay ahead of the curve in an increasingly competitive market.”

-Saurabh Baji, Senior Vice President of Engineering at Cohere.

The Cohere Medium generation language model available through SageMaker, provide developers with three key benefits:

  • Build, iterate, and deploy quickly – Cohere empowers any developer (no NLP, ML, or AI expertise required) to quickly get access to a pre-trained, state-of-the-art generation model that understands context and semantics at unprecedented levels. This high-quality, large language model reduces the time-to-value for customers by providing an out-of-the-box solution for a wide range of language understanding tasks.
  • Private and secure – With SageMaker, customers can spin up containers serving Cohere’s models without having to worry about their data leaving these self-managed containers.
  • Speed and accuracy Cohere’s Medium model offers customers a good balance across quality, cost, and latency. Developers can easily integrate the Cohere Generate endpoint into apps using a simple API and SDK.

Get started with Cohere in SageMaker

Developers can use the visual interface of the SageMaker JumpStart foundation models to test Cohere’s models without writing a single line of code. You can evaluate the model on your specific language understanding task and learn the basics of using generative language models. See Cohere’s documentation and blog for various tutorials and tips-and-tricks related to language modeling.

Deploy the SageMaker endpoint using a notebook

Cohere has packaged Medium models, along with an optimized, low-latency inference framework, in containers that can be deployed as SageMaker inference endpoints. Cohere’s containers can be deployed on a range of different instances (including ml.p3.2xlarge, ml.g5.xlarge, and ml.g5.2xlarge) that offer different cost/performance trade-offs. These containers are currently available in two Regions: us-east-1 and eu-west-1. Cohere intends to expand its offering in the near future, including adding to the number and size of models available, the set of supported tasks (such as the endpoints built on top of these models), the supported instances, and the available Regions.

To help developers get started quickly, Cohere has provided Jupyter notebooks that make it easy to deploy these containers and run inference on the deployed endpoints. With the preconfigured set of constants in the notebook, deploying the endpoint can be easily done with only a couple of lines of code as shown in the following example:

After the endpoint is deployed, users can use Cohere’s SDK to run inference. The SDK can be installed easily from PyPI as follows:

It can also be installed from the source code in Cohere’s public SDK GitHub repository.

After the endpoint is deployed, users can use the Cohere Generate endpoint to accomplish multiple generative tasks, such as text summarization, long-form content generation, entity extraction, or copywriting. The Jupyter notebook and GitHub repository include examples demonstrating some of these use cases.

Conclusion

The availability of Cohere natively on SageMaker via the AWS Marketplace represents a major milestone in the field of NLP. The Cohere model’s ability to generate high-quality, coherent text makes it a valuable tool for anyone working with text data.

If you’re interested in using Cohere for your own SageMaker projects, you can now access it on SageMaker JumpStart. Additionally, you can reference Cohere’s GitHub notebook for instructions on deploying the model and accessing it from the Cohere Generate endpoint.


About the authors

Sudip Roy is Manager of Technical Staff at Cohere, a provider of cutting-edge natural language processing (NLP) technology. Sudip is an accomplished researcher who has published and served on program committees for top conferences like NeurIPS, MLSys, OOPSLA, SIGMOD, VLDB, and SIGKDD, and his work has earned Outstanding Paper awards from SIGMOD and MLSys.

Karthik Bharathy is the product leader for the Amazon SageMaker team with over a decade of product management, product strategy, execution, and launch experience.

Karl Albertsen leads product, engineering, and science for Amazon SageMaker Algorithms and JumpStart, SageMaker’s machine learning hub. He is passionate about applying machine learning to unlock business value.

Read More

­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

This post is co-written by Christopher Diaz, Sam Kinard, Jaime Hidalgo and Daniel Suarez  from CCC Intelligent Solutions.

In this post, we discuss how CCC Intelligent Solutions (CCC) combined Amazon SageMaker with other AWS services to create a custom solution capable of hosting the types of complex artificial intelligence (AI) models envisioned. CCC is a leading software-as-a-service (SaaS) platform for the multi-trillion-dollar property and casualty insurance economy powering operations for insurers, repairers, automakers, part suppliers, lenders, and more. CCC cloud technology connects more than 30,000 businesses digitizing mission-critical workflows, commerce, and customer experiences. A trusted leader in AI, Internet of Things (IoT), customer experience, and network and workflow management, CCC delivers innovations that keep people’s lives moving forward when it matters most.

The challenge

CCC processes more than $1 trillion claims transactions annually. As the company continues to evolve to integrate AI into its existing and new product catalog, this requires sophisticated approaches to train and deploy multi-modal machine learning (ML) ensemble models for solving complex business needs. These are a class of models that encapsulate proprietary algorithms and subject matter domain expertise that CCC has honed over the years. These models should be able to ingest new layers of nuanced data and customer rules to create single prediction outcomes. In this blog post, we will learn how CCC leveraged Amazon SageMaker hosting and other AWS services to deploy or host multiple multi-modal models into an ensemble inference pipeline.

As shown in the following diagram, an ensemble is a collection of two or more models that are orchestrated to run in a linear or nonlinear fashion to produce a single prediction. When stacked linearly, the individual models of an ensemble can be directly invoked for predictions and later consolidated for unification. At times, ensemble models can also be implemented as a serial inference pipeline.

For our use case, the ensemble pipeline is strictly nonlinear, as depicted in the following diagram. Nonlinear ensemble pipelines are theoretically directly acyclic graphs (DAGs). For our use case, this DAG pipeline had both independent models that are run in parallel (Services B, C) and other models that use predictions from previous steps (Service D).

A practice that comes out of the research-driven culture at CCC is the continuous review of technologies that can be leveraged to bring more value to customers. As CCC faced this ensemble challenge, leadership launched a proof-of-concept (POC) initiative to thoroughly assess the offerings from AWS to discover, specifically, whether Amazon SageMaker and other AWS tools could manage the hosting of individual AI models in complex, nonlinear ensembles.

Ensemble explained: In this context, an ensemble is a group of 2 or more AI models that work together to produce 1 overall prediction.

Questions driving the research

Can Amazon SageMaker be used to host complex ensembles of AI models that work together to provide one overall prediction? If so, can SageMaker offer other benefits out of the box, such as increased automation, reliability, monitoring, automatic scaling, and cost-saving measures?

Finding alternative ways to deploy CCC’s AI models using the technological advancements from cloud providers will allow CCC to bring AI solutions to market faster than its competition. Additionally, having more than one deployment architecture provides flexibility when finding the balance between cost and performance based on business priorities.

Based on our requirements, we finalized the following list of features as a checklist for a production-grade deployment architecture:

  • Support for complex ensembles
  • Guaranteed uptime for all components
  • Customizable automatic scaling for deployed AI models
  • Preservation of AI model input and output
  • Usage metrics and logs for all components
  • Cost-saving mechanisms

With a majority of CCC’s AI solutions relying on computer vision models, a new architecture was required to support image and video files that continue to increase in resolution. There was a strong need to design and implement this architecture as an asynchronous model.

After cycles of research and initial benchmarking efforts, CCC determined SageMaker was a perfect fit to meet a majority of their production requirements, especially the guaranteed uptime SageMaker provides for most of its inference components. The default feature of Amazon SageMaker Asynchronous Inference endpoints saving input/output in Amazon S3 simplifies the task of preserving data generated from complex ensembles. Additionally, with each AI model being hosted by its own endpoint, managing automatic scaling policies at the model or endpoint level becomes easier. By simplifying the management, a potential cost-saving benefit from this is development teams can allocate more time towards fine-tuning scaling policies to minimize over-provisioning of compute resources.

Having decided to proceed with using SageMaker as the pivotal component of the architecture, we also realized SageMaker can be part of an even larger architecture, supplemented with many other serverless AWS-managed services. This choice was needed to facilitate the higher-order orchestration and observability needs of this complex architecture.

Firstly, to remove payload size limitations and greatly reduce timeout risk during high-traffic scenarios, CCC implemented an architecture that runs predictions asynchronously using SageMaker Asynchronous Inference endpoints coupled with other AWS-managed services as the core building blocks. Additionally, the user interface for the system follows the fire-and-forget design pattern. In other words, once a user has uploaded their input to the system, nothing more needs to be done. They will be notified when the prediction is available. The figure below illustrates a high-level overview of our asynchronous event-driven architecture. In the upcoming section, let us do a deep dive into the execution flow of the designed architecture.

Step-by-step solution

Step 1

A client makes a request to the AWS API Gateway endpoint. The content of the request contains the name of the AI service from which they need a prediction and the desired method of notification.

This request is passed to a Lambda function called New Prediction, whose main tasks are to:

  • Check if the requested service by the client is available.
  • Assign a unique prediction ID to the request. This prediction ID can be used by the user to check the status of the prediction throughout the entire process.
  • Generate an Amazon S3 pre-signed URL that the user will need to use in the next step to upload the input content of the prediction request.
  • Create an entry in Amazon DynamoDB with the information of the received request.

The Lambda function will then return a response through the API Gateway endpoint with a message that includes the prediction ID assigned to the request and the Amazon S3 pre-signed URL.

Step 2

The client securely uploads the prediction input content to an S3 bucket using the pre-signed URL generated in the previous step. Input content depends on the AI service and can be composed of images, tabular data, or a combination of both.

Step 3

The S3 bucket is configured to trigger an event when the user uploads the input content. This notification is sent to an Amazon SQS queue and handled by a Lambda function called Process Input. The Process Input Lambda will obtain the information related to that prediction ID from DynamoDB to get the name of the service to which the request is to be made.

This service can either be a single AI model, in which case the Process Input Lambda will make a request to the SageMaker endpoint that hosts that model (Step 3-A), or it can be an ensemble AI service in which case the Process Input Lambda will make a request to the state machine of the step functions that hosts the ensemble logic (Step 3-B).

In either option (single AI model or ensemble AI service), when the final prediction is ready, it will be stored in the appropriate S3 bucket, and the caller will be notified via the method specified in Step 1 (more details about notifications in Step 4).

Step 3-A

If the prediction ID is associated to a single AI model, the Process Input Lambda will make a request to the SageMaker endpoint that serves the model. In this system, two types of SageMaker endpoints are supported:

  • Asynchronous: The Process Input Lambda makes the request to the SageMaker asynchronous endpoint. The immediate response includes the S3 location where SageMaker will save the prediction output. This request is asynchronous, following the fire-and-forget pattern, and does not block the execution flow of the Lambda function.
  • Synchronous: The Process Input Lambda makes the request to the SageMaker synchronous endpoint. Since it is a synchronous request, Process Input waits for the response, and once obtained, it stores it in S3 in an analogous way that SageMaker asynchronous endpoints would do.

In both cases (synchronous or asynchronous endpoints), the prediction is processed in an equivalent way, storing the output in an S3 bucket. When the asynchronous SageMaker endpoint completes a prediction, an Amazon SNS event is triggered. This behavior is also replicated for synchronous endpoints with additional logic in the Lambda function.

Step 3-B

If the prediction ID is associated with an AI ensemble, the Process Input Lambda will make the request to the step function associated to that AI Ensemble. As mentioned above, an AI Ensemble is an architecture based on a group of AI models working together to generate a single overall prediction. The orchestration of an AI ensemble is done through a step function.

The step function has one step per AI service that comprises the ensemble. Each step will invoke a Lambda function that will prepare its corresponding AI service’s input using different combinations of the output content from previous AI service calls of previous steps. It then makes a call to each AI service which in this context, can wither be a single AI model or another AI ensemble.

The same Lambda function, called GetTransformCall used to handle the intermediate predictions of an AI Ensemble is used throughout the step function, but with different input parameters for each step. This input includes the name of the AI service to be called. It also includes the mapping definition to construct the input for the specified AI service. This is done using a custom syntax that the Lambda can decode, which in summary, is a JSON dictionary where the values should be replaced with the content from the previous AI predictions. The Lambda will download these previous predictions from Amazon S3.

In each step, the GetTransformCall Lambda reads from Amazon S3 the previous outputs that are needed to build the input of the specified AI service. It will then invoke the New Prediction Lambda code previously used in Step 1 and provide the service name, callback method (“step function”), and token needed for the callback in the request payload, which is then saved in DynamoDB as a new prediction record. The Lambda also stores the created input of that stage in an S3 bucket. Depending on whether that stage is a single AI model or an AI ensemble, the Lambda makes a request to a SageMaker endpoint or a different step function that manages an AI ensemble that is a dependency of the parent ensemble.

Once the request is made, the step function enters a pending state until it receives the callback token indicating it can move to the next stage. The action of sending a callback token is performed by a Lambda function called notifications (more details in Step 4) when the intermediate prediction is ready. This process is repeated for each stage defined in the step function until the final prediction is ready.

Step 4

When a prediction is ready and stored in the S3 bucket, an SNS notification is triggered. This event can be triggered in different ways depending on the flow:

  1. Automatically when a SageMaker asynchronous endpoint completes a prediction.
  2. As the very last step of the step function.
  3. By Process Input or GetTransformCall Lambda when a synchronous SageMaker endpoint has returned a prediction.

For B and C, we create an SNS message similar to what A automatically sends.

A Lambda function called notifications is subscribed to this SNS topic. The notifications Lambda will get the information related to the prediction ID from DynamoDB, update the entry with status value to “completed” or “error,” and perform the necessary action depending on the callback mode saved in the database record.

If this prediction is an intermediate prediction of an AI ensemble, as described in step 3-B, the callback mode associated to this prediction will be “step function,” and the database record will have a callback token associated with the specific step in the step function. The notifications Lambda will make a call to the AWS Step Functions API using the method “SendTaskSuccess” or “SendTaskFailure.” This will allow the step function to continue to the next step or exit.

If the prediction is the final output of the step function and the callback mode is “Webhook” [or email, message brokers (Kafka), etc.], then the notifications Lambda will notify the client in the specified way. At any point, the user can request the status of their prediction. The request must include the prediction ID that was assigned in Step 1 and point to the correct URL within API Gateway to route the request to the Lambda function called results.

The results Lambda will make a request to DynamoDB, obtaining the status of the request and returning the information to the user. If the status of the prediction is error, then the relevant details on the failure will be included in the response. If the prediction status is success, an S3 pre-signed URL will be returned for the user to download the prediction content.

Outcomes

Preliminary performance testing results are promising and support the case for CCC to extend the implementation of this new deployment architecture.

Notable observations:

  • Tests reveal strength in processing batch or concurrent requests with high throughput and a 0 percent failure rate during high traffic scenarios.
  • Message queues provide stability within the system during sudden influxes of requests until scaling triggers can provision additional compute resources. When increasing traffic by 3x, average request latency only increased by 5 percent.
  • The price of stability is increased latency due to the communication overhead between the various system components. When user traffic is above the baseline threshold, the added latency can be partially mitigated by providing more compute resources if performance is a higher priority over cost.
  • SageMaker’s asynchronous inference endpoints allow the instance count to be scaled to zero while keeping the endpoint active to receive requests. This functionality enables deployments to continue running without incurring compute costs and scale up from zero when needed in two scenarios: service deployments used in lower test environments and those that have minimal traffic without requiring immediate processing.

Conclusion

As observed during the POC process, the innovative design jointly created by CCC and AWS provides a solid foundation for using Amazon SageMaker with other AWS managed services to host complex multi-modal AI ensembles and orchestrate inference pipelines effectively and seamlessly. By leveraging Amazon SageMaker’s out-of-the-box functionalities like Asynchronous Inference, CCC has more opportunities to focus on specialized business-critical tasks. In the spirit of CCC’s research-driven culture, this novel architecture will continue to evolve as CCC leads the way forward, alongside AWS, in unleashing powerful new AI solutions for clients.­­­

For detailed steps on how to create, invoke, and monitor asynchronous inference endpoints, refer to the documentation, which also contains a sample notebook to help you get started. For pricing information, visit Amazon SageMaker Pricing.

For examples on using asynchronous inference with unstructured data such as computer vision and natural language processing (NLP), refer to Run computer vision inference on large videos with Amazon SageMaker asynchronous endpoints and Improve high-value research with Hugging Face and Amazon SageMaker asynchronous inference endpoints, respectively.


About the Authors

Christopher Diaz is a Lead R&D Engineer at CCC Intelligent Solutions. As a member of the R&D team, he has worked on a variety of projects ranging from ETL tooling, backend web development, collaborating with researchers to train AI models on distributed systems, and facilitating the delivery of new AI services between research and operations teams. His recent focus has been on researching cloud tooling solutions to enhance various aspects of the company’s AI model development lifecycle. In his spare time, he enjoys trying new restaurants in his hometown of Chicago and collecting as many LEGO sets as his home can fit. Christopher earned his Bachelor of Science in Computer Science from Northeastern Illinois University.

Emmy Award winner Sam Kinard is a Senior Manager of Software Engineering at CCC Intelligent Solutions. Based in Austin, Texas, he wrangles the AI Runtime Team, which is responsible for serving CCC’s AI products at high availability and large scale. In his spare time, Sam enjoys being sleep deprived because of his two wonderful children. Sam has a Bachelor of Science in Computer Science and a Bachelor of Science in Mathematics from the University of Texas at Austin.

Jaime Hidalgo is a Senior Systems Engineer at CCC Intelligent Solutions. Before joining the AI research team, he led the company’s global migration to Microservices Architecture, designing, building, and automating the infrastructure in AWS to support the deployment of cloud products and services. Currently, he builds and supports an on-premises data center cluster built for AI training and also designs and builds cloud solutions for the company’s future of AI research and deployment.

Daniel Suarez is a Data Science Engineer at CCC Intelligent Solutions. As a member of the AI Engineering team, he works on the automation and preparation of AI Models in the production, evaluation, and monitoring of metrics and other aspects of ML operations. Daniel received a Master’s in Computer Science from the Illinois Institute of Technology and a Master’s and Bachelor’s in Telecommunication Engineering from Universidad Politecnica de Madrid.

Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.

Justin McWhirter is a Solutions Architect Manager at AWS. He works with a team of amazing Solutions Architects who help customers have a positive experience while adopting the AWS platform. When not at work, Justin enjoys playing video games with his two boys, ice hockey, and off-roading in his Jeep.

Read More