Accelerate your career with ML skills through the AWS Machine Learning Engineer Scholarship

Amazon Web Services and Udacity are partnering to offer free services to educate developers of all skill levels on machine learning (ML) concepts with the AWS Machine Learning Engineer Scholarship program. The program offers free enrollment to the AWS Machine Learning Foundations course and 325 scholarships awarded to the AWS Machine Learning Engineer Nanodegree, a $2,000 USD value, powered through Udacity.

Machine learning will not only change the way we work and live, but also open pathways to millions of new jobs, with the World Economic Forum estimating 97 million new roles may be created by 2025 in AI and ML. Gaining access to the job-ready skills to break into an ML career encounters high cost to traditional education and rigorous content, with a lack of real-world application from theory into practice. AWS is invested in addressing these challenges by providing free educational content and hands-on learning, such as exploring reinforcement learning concepts with AWS DeepRacer, as well as a community of learner support with technical experts and like-minded peers.

The AWS Machine Learning Engineer Nanodegree Program gave me a solid footing in understanding the foundational building blocks of Machine Learning workflows,” said Jikmyan Mangut Sunday, AWS Machine Learning Scholarship Alumni. “This shaped my knowledge of the fundamental concepts in building state-of-the-art Machine Learning models. Udacity curated learning materials that were easy to grasp and applicable to every field of endeavor, my learning experience was challenging and fun-filled.

AWS is also collaborating with Girls in Tech and National Society for Black Engineers, to provide scholarships to women and underrepresented groups in tech. Organizations like these aims to inspire, support, train, and empower people from underrepresented groups to pursue careers in tech. In partnership, AWS will aid in providing access and resources to programs such as the AWS Machine Learning Engineer Scholarship Program to increase the diversity and talent in technical roles.

Tech needs representation from women, BIPOC, and other marginalized communities in every aspect of our industry,” says Adriana Gascoigne, founder and CEO of Girls in Tech. “Girls in Tech applauds our collaborator AWS, as well as Udacity, for breaking down the barriers that so often leave women behind in tech. Together, we aim to give everyone a seat at the table.

Open pathways to new career opportunities

Learners in the program are able to apply theory into hands on application to a suite of AWS ML services including AWS DeepRacer, Amazon SageMaker, and AWS DeepComposer. As many struggle to get started with machine learning, the scholarship program provides easy to learn, self-paced modules to provide the flexibility at a self-guided pace. Throughout the course journey, learners will have access to a supportive online community for technical assistance through Udacity tutors.

Before taking the program, the many tools provided by AWS seemed frustrating but now I have a good grasp of them. I learned how to organize my code and work in a professional setting,” said Kariem Gazer AWS Machine Learning Scholarship Alumni. “The organized modules, follow up quizzes, and personalized feedback all made the learning experience smoother and concrete.

Gain ML skills beyond the classroom

The AWS Machine Learning Engineer Scholarship program is open to all developers interested in expanding their ML skills and expertise through AWS curated content and services. Applicants 18 years of age or older are invited to register for the program. All applicants will have immediate classroom access to the free AWS ML Foundations course upon application completion.

Phase 1: AWS Machine Learning Foundations Course

  • Learn object-oriented programming skills, including writing clean and modularized code and understanding the fundamental aspects of ML.
  • Learn reinforcement learning with AWS DeepRacer and generative AI with AWS DeepComposer.
  • Take advantage of support through the Discourse Tech community with technical moderators.
  • Receive a certificate for course completion and take an online assessment quiz to receive a full scholarship to the AWS Machine Learning Engineer Nanodegree program.
  • Dedicate 3–5 hours a week on the course and work towards earning one of the follow-up Nanodegree program scholarships.

Phase 2: Full scholarship to the AWS Machine Learning Engineer Udacity Nanodegree ($2,000 USD value)

  • Learn advanced ML techniques and algorithms, including how to package and deploy your models to a production environment.
  • Acquire practical experience such as using Amazon SageMaker to prepare you for a career in ML.
  • Take advantage of community support through a learner connect program for technical assistance and learner engagement.
  • Dedicate 5–10 hours a week on the course to earn an Udacity Nanodegree certificate.

Program dates

June 21, 2022 Scholarship applications open and students are automatically enrolled in the AWS Machine Learning Foundations Course (Phase 1)

July 21, 2022 Scholarship applications close
November 23, 2022 AWS Machine Learning Foundations Course (Phase 1) ends
December 6, 2022 AWS Machine Learning Engineer Scholarship winners announced
December 8, 2022 AWS Machine Learning Engineer Nanodegree (Phase 2) opens
March 22, 2023 AWS Machine Learning Engineer Nanodegree (Phase 2) closes

Connect with the ML community and take the next step

Connect with experts and like-minded aspiring ML developers on the AWS Machine Learning Discord and enroll today in the AWS Machine Learning Engineer Scholarship program.


About the Author

Anastacia Padilla is a Product Marketing Manager for AWS AI & ML Education. She spends her time building and evangelizing offerings for the aspiring ML developer community to upskill students and underrepresented groups in tech. She is focused on democratizing AI & ML education to be accessible to all who want to learn.

Read More

Identify mangrove forests using satellite image features using Amazon SageMaker Studio and Amazon SageMaker Autopilot – Part 2

Mangrove forests are an import part of a healthy ecosystem, and human activities are one of the major reasons for their gradual disappearance from coastlines around the world. Using a machine learning (ML) model to identify mangrove regions from a satellite image gives researchers an effective way to monitor the size of the forests over time. In Part 1 of this series, we showed how to gather satellite data in an automated fashion and analyze it in Amazon SageMaker Studio with interactive visualization. In this post, we show how to use Amazon SageMaker Autopilot to automate the process of building a custom mangrove classifier.

Train a model with Autopilot

Autopilot provides a balanced way of building several models and selecting the best one. While creating multiple combinations of different data preprocessing techniques and ML models with minimal effort, Autopilot provides complete control over these component steps to the data scientist, if desired.

You can use Autopilot using one of the AWS SDKs (details available in the API reference guide for Autopilot) or through Studio. We use Autopilot in our Studio solution following the steps outlined in this section:

  1. On the Studio Launcher page, choose the plus sign for New Autopilot experiment.
  2. For Connect your data, select Find S3 bucket, and enter the bucket name where you kept the training and test datasets.
  3. For Dataset file name, enter the name of the training data file you created in the Prepare the training data section in Part 1.
  4. For Output data location (S3 bucket), enter the same bucket name you used in step 2.
  5. For Dataset directory name, enter a folder name under the bucket where you want Autopilot to store artifacts.
  6. For Is your S3 input a manifest file?, choose Off.
  7. For Target, choose label.
  8. For Auto deploy, choose Off.
  9. Under the Advanced settings, for Machine learning problem type, choose Binary Classification.
  10. For Objective metric, choose AUC.
  11. For Choose how to run your experiment, choose No, run a pilot to create a notebook with candidate definitions.
  12. Choose Create Experiment.

    For more information about creating an experiment, refer to Create an Amazon SageMaker Autopilot experiment.It may take about 15 minutes to run this step.
  13. When complete, choose Open candidate generation notebook, which opens a new notebook in read-only mode.
  14. Choose Import notebook to make the notebook editable.
  15. For Image, choose Data Science.
  16. For Kernel, choose Python 3.
  17. Choose Select.

This auto-generated notebook has detailed explanations and provides complete control over the actual model building task to follow. A customized version of the notebook, where a classifier is trained using Landsat satellite bands from 2013, is available in the code repository under notebooks/mangrove-2013.ipynb.

The model building framework consists of two parts: feature transformation as part of the data processing step, and hyperparameter optimization (HPO) as part of the model selection step. All the necessary artifacts for these tasks were created during the Autopilot experiment and saved in Amazon Simple Storage Service (Amazon S3). The first notebook cell downloads those artifacts from Amazon S3 to the local Amazon SageMaker file system for inspection and any necessary modification. There are two folders: generated_module and sagemaker_automl, where all the Python modules and scripts necessary to run the notebook are stored. The various feature transformation steps like imputation, scaling, and PCA are saved as generated_modules/candidate_data_processors/dpp*.py.

Autopilot creates three different models based on the XGBoost, linear learner, and multi-layer perceptron (MLP) algorithms. A candidate pipeline consists of one of the feature transformations options, known as data_transformer, and an algorithm. A pipeline is a Python dictionary and can be defined as follows:

candidate1 = {
    "data_transformer": {
        "name": "dpp5",
        "training_resource_config": {
            "instance_type": "ml.m5.4xlarge",
            "instance_count": 1,
            "volume_size_in_gb":  50
        },
        "transform_resource_config": {
            "instance_type": "ml.m5.4xlarge",
            "instance_count": 1,
        },
        "transforms_label": True,
        "transformed_data_format": "application/x-recordio-protobuf",
        "sparse_encoding": True
    },
    "algorithm": {
        "name": "xgboost",
        "training_resource_config": {
            "instance_type": "ml.m5.4xlarge",
            "instance_count": 1,
        },
    }
}

In this example, the pipeline transforms the training data according to the script in generated_modules/candidate_data_processors/dpp5.py and builds an XGBoost model. This is where Autopilot provides complete control to the data scientist, who can pick the automatically generated feature transformation and model selection steps or build their own combination.

You can now add the pipeline to a pool for Autopilot to run the experiment as follows:

from sagemaker_automl import AutoMLInteractiveRunner, AutoMLLocalCandidate

automl_interactive_runner = AutoMLInteractiveRunner(AUTOML_LOCAL_RUN_CONFIG)
automl_interactive_runner.select_candidate(candidate1)

This is an important step where you can decide to keep only a subset of candidates suggested by Autopilot, based on subject matter expertise, to reduce the total runtime. For now, keep all Autopilot suggestions, which you can list as follows:

automl_interactive_runner.display_candidates()
Candidate Name Algorithm Feature Transformer
dpp0-xgboost xgboost dpp0.py
dpp1-xgboost xgboost dpp1.py
dpp2-linear-learner linear-learner dpp2.py
dpp3-xgboost xgboost dpp3.py
dpp4-xgboost xgboost dpp4.py
dpp5-xgboost xgboost dpp5.py
dpp6-mlp mlp dpp6.py

The full Autopilot experiment is done in two parts. First, you need to run the data transformation jobs:

automl_interactive_runner.fit_data_transformers(parallel_jobs=7)

This step should complete in about 30 minutes for all the candidates, if you make no further modifications to the dpp*.py files.

The next step is to build the best set of models by tuning the hyperparameters for the respective algorithms. The hyperparameters are usually divided into two parts: static and tunable. The static hyperparameters remain unchanged throughout the experiment for all candidates that share the same algorithm. These hyperparameters are passed to the experiment as a dictionary. If you choose to pick the best XGBoost model by maximizing AUC from three rounds of a five-fold cross-validation scheme, the dictionary looks like the following code:

{
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    '_kfold': 5,
    '_num_cv_round': 3,
} 

For the tunable hyperparameters, you need to pass another dictionary with ranges and scaling type:

{
    'num_round': IntegerParameter(64, 1024, scaling_type='Logarithmic'),
    'max_depth': IntegerParameter(2, 8, scaling_type='Logarithmic'),
        'eta': ContinuousParameter(1e-3, 1.0, scaling_type='Logarithmic'),
...    
}

The complete set of hyperparameters is available in the mangrove-2013.ipynb notebook.

To create an experiment where all seven candidates can be tested in parallel, create a multi-algorithm HPO tuner:

multi_algo_tuning_parameters = automl_interactive_runner.prepare_multi_algo_parameters(
    objective_metrics=ALGORITHM_OBJECTIVE_METRICS,
    static_hyperparameters=STATIC_HYPERPARAMETERS,
    hyperparameters_search_ranges=ALGORITHM_TUNABLE_HYPERPARAMETER_RANGES)

The objective metrics are defined independently for each algorithm:

ALGORITHM_OBJECTIVE_METRICS = {
    'xgboost': 'validation:auc',
    'linear-learner': 'validation:roc_auc_score',
    'mlp': 'validation:roc_auc',
}

Trying all possible values of hyperparameters for all the experiments is wasteful; you can adopt a Bayesian strategy to create an HPO tuner:

multi_algo_tuning_inputs = automl_interactive_runner.prepare_multi_algo_inputs()
ase_tuning_job_name = "{}-tuning".format(AUTOML_LOCAL_RUN_CONFIG.local_automl_job_name)

tuner = HyperparameterTuner.create(
    base_tuning_job_name=base_tuning_job_name,
    strategy='Bayesian',
    objective_type='Maximize',
    max_parallel_jobs=10,
    max_jobs=50,
    **multi_algo_tuning_parameters,
)

In the default setting, Autopilot picks 250 jobs in the tuner to pick the best model. For this use case, it’s sufficient to set max_jobs=50 to save time and resources, without any significant penalty in terms of picking the best set of hyperparameters. Finally, submit the HPO job as follows:

tuner.fit(inputs=multi_algo_tuning_inputs, include_cls_metadata=None)

The process takes about 80 minutes on ml.m5.4xlarge instances. You can monitor progress on the SageMaker console by choosing Hyperparameter tuning jobs under Training in the navigation pane.

You can visualize a host of useful information, including the performance of each candidate, by choosing the name of the job in progress.

Finally, compare the model performance of the best candidates as follows:

from sagemaker.analytics import HyperparameterTuningJobAnalytics

SAGEMAKER_SESSION = AUTOML_LOCAL_RUN_CONFIG.sagemaker_session
SAGEMAKER_ROLE = AUTOML_LOCAL_RUN_CONFIG.role

tuner_analytics = HyperparameterTuningJobAnalytics(
    tuner.latest_tuning_job.name, sagemaker_session=SAGEMAKER_SESSION)

df_tuning_job_analytics = tuner_analytics.dataframe()

df_tuning_job_analytics.sort_values(
    by=['FinalObjectiveValue'],
    inplace=True,
    ascending=False if tuner.objective_type == "Maximize" else True)

# select the columns to display and rename
select_columns = ["TrainingJobDefinitionName", "FinalObjectiveValue", "TrainingElapsedTimeSeconds"]
rename_columns = {
	"TrainingJobDefinitionName": "candidate",
	"FinalObjectiveValue": "AUC",
	"TrainingElapsedTimeSeconds": "run_time"  
}

# Show top 5 model performances
df_tuning_job_analytics.rename(columns=rename_columns)[rename_columns.values()].set_index("candidate").head(5)
candidate AUC run_time (s)
dpp6-mlp 0.96008 2711.0
dpp4-xgboost 0.95236 385.0
dpp3-xgboost 0.95095 202.0
dpp4-xgboost 0.95069 458.0
dpp3-xgboost 0.95015 361.0

The top performing model based on MLP, while marginally better than the XGBoost models with various choices of data processing steps, also takes a lot longer to train. You can find important details about the MLP model training, including the combination of hyperparameters used, as follows:

df_tuning_job_analytics.loc[df_tuning_job_analytics.TrainingJobName==best_training_job].T.dropna() 
TrainingJobName mangrove-2-notebook–211021-2016-012-500271c8
TrainingJobStatus Completed
FinalObjectiveValue 0.96008
TrainingStartTime 2021-10-21 20:22:55+00:00
TrainingEndTime 2021-10-21 21:08:06+00:00
TrainingElapsedTimeSeconds 2711
TrainingJobDefinitionName dpp6-mlp
dropout_prob 0.415778
embedding_size_factor 0.849226
layers 256
learning_rate 0.00013862
mini_batch_size 317
network_type feedforward
weight_decay 1.29323e-12

Create an inference pipeline

To generate inference on new data, you have to construct an inference pipeline on SageMaker to host the best model that can be called later to generate inference. The SageMaker pipeline model requires three containers as its components: data transformation, algorithm, and inverse label transformation (if numerical predictions need to be mapped on to non-numerical labels). For brevity, only part of the required code is shown in the following snippet; the complete code is available in the mangrove-2013.ipynb notebook:

from sagemaker.estimator import Estimator
from sagemaker import PipelineModel
from sagemaker_automl import select_inference_output

…
# Final pipeline model 
model_containers = [best_data_transformer_model, best_algo_model]
if best_candidate.transforms_label:
	model_containers.append(best_candidate.get_data_transformer_model(
    	transform_mode="inverse-label-transform",
    	role=SAGEMAKER_ROLE,
    	sagemaker_session=SAGEMAKER_SESSION))

# select the output type
model_containers = select_inference_output("BinaryClassification", model_containers, output_keys=['predicted_label'])

After the model containers are built, you can construct and deploy the pipeline as follows:

from sagemaker import PipelineModel

pipeline_model = PipelineModel(
	name=f"mangrove-automl-2013",
	role=SAGEMAKER_ROLE,
	models=model_containers,
	vpc_config=AUTOML_LOCAL_RUN_CONFIG.vpc_config)

pipeline_model.deploy(initial_instance_count=1,
                  	instance_type='ml.m5.2xlarge',
                  	endpoint_name=pipeline_model.name,
                  	wait=True)

The endpoint deployment takes about 10 minutes to complete.

Get inference on the test dataset using an endpoint

After the endpoint is deployed, you can invoke it with a payload of features B1–B7 to classify each pixel in an image as either mangrove (1) or other (0):

import boto3
sm_runtime = boto3.client('runtime.sagemaker')

pred_labels = []
with open(local_download, 'r') as f:
    for i, row in enumerate(f):
        payload = row.rstrip('n')
        x = sm_runtime.invoke_endpoint(EndpointName=inf_endpt,
                                   	ContentType="text/csv",
                                   	Body=payload)
        pred_labels.append(int(x['Body'].read().decode().strip()))

Complete details on postprocessing the model predictions for evaluation and plotting are available in notebooks/model_performance.ipynb.

Get inference on the test dataset using a batch transform

Now that you have created the best-performing model with Autopilot, we can use the model for inference. To get inference on large datasets, it’s more efficient to use a batch transform. Let’s generate predictions on the entire dataset (training and test) and append the results to the features, so that we can perform further analysis to, for instance, check the predicted vs. actuals and the distribution of features amongst predicted classes.

First, we create a manifest file in Amazon S3 that points to the locations of the training and test data from the previous data processing steps:

import boto3
data_bucket = <Name of the S3 bucket that has the training data>
prefix = "LANDSAT_LC08_C01_T1_SR/Year2013"
manifest = "[{{"prefix": "s3://{}/{}/"}},n"train.csv",n"test.csv"n]".format(data_bucket, prefix)
s3_client = boto3.client('s3')
s3_client.put_object(Body=manifest, Bucket=data_bucket, Key=f"{prefix}/data.manifest")

Now we can create a batch transform job. Because our input train and test dataset have label as the last column, we need to drop it during inference. To do that, we pass InputFilter in the DataProcessing argument. The code "$[:-2]" indicates to drop the last column. The predicted output is then joined with the source data for further analysis.

In the following code, we construct the arguments for the batch transform job and then pass to the create_transform_job function:

from time import gmtime, strftime

batch_job_name = "Batch-Transform-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
output_location = "s3://{}/{}/batch_output/{}".format(data_bucket, prefix, batch_job_name)
input_location = "s3://{}/{}/data.manifest".format(data_bucket, prefix)

request = {
    "TransformJobName": batch_job_name,
    "ModelName": pipeline_model.name,
    "TransformOutput": {
        "S3OutputPath": output_location,
        "Accept": "text/csv",
        "AssembleWith": "Line",
    },
    "TransformInput": {
        "DataSource": {"S3DataSource": {"S3DataType": "ManifestFile", "S3Uri": input_location}},
        "ContentType": "text/csv",
        "SplitType": "Line",
        "CompressionType": "None",
    },
    "TransformResources": {"InstanceType": "ml.m4.xlarge", "InstanceCount": 1},
    "DataProcessing": {"InputFilter": "$[:-2]", "JoinSource": "Input"}
}

sagemaker = boto3.client("sagemaker")
sagemaker.create_transform_job(**request)
print("Created Transform job with name: ", batch_job_name)

You can monitor the status of the job on the SageMaker console.

Visualize model performance

You can now visualize the performance of the best model on the test dataset, consisting of regions from India, Myanmar, Cuba, and Vietnam, as a confusion matrix. The model has a high recall value for pixels representing mangroves, but only about 75% precision. The precision of non-mangrove or other pixels stand at 99% with an 85% recall. You can tune the probability cutoff of the model predictions to adjust the respective values depending on the particular use case.

It’s worth noting that the results are a significant improvement over the built-in smileCart model.

Visualize model predictions

Finally, it’s useful to observe the model performance on specific regions on the map. In the following image, the mangrove area in the India-Bangladesh border is depicted in red. Points sampled from the Landsat image patch belonging to the test dataset are superimposed on the region, where each point is a pixel that the model determines to be representing mangroves. The blue points are classified correctly by the model, whereas the black points represent mistakes by the model.

The following image shows only the points that the model predicted to not represent mangroves, with the same color scheme as the preceding example. The gray outline is the part of the Landsat patch that doesn’t include any mangroves. As is evident from the image, the model doesn’t make any mistake classifying points on water, but faces a challenge when distinguishing pixels representing mangroves from those representing regular foliage.

The following image shows model performance on the Myanmar mangrove region.

In the following image, the model does a better job identifying mangrove pixels.

Clean up

The SageMaker inference endpoint continues to incur cost if left running. Delete the endpoint as follows when you’re done:

sagemaker.delete_endpoint(EndpointName=pipeline_model.name)

Conclusion

This series of posts provided an end-to-end framework for data scientists for solving GIS problems. Part 1 showed the ETL process and a convenient way to visually interact with the data. Part 2 showed how to use Autopilot to automate building a custom mangrove classifier.

You can use this framework to explore new satellite datasets containing a richer set of bands useful for mangrove classification and explore feature engineering by incorporating domain knowledge.


About the Authors

Andrei Ivanovic is an incoming Master’s of Computer Science student at the University of Toronto and a recent graduate of the Engineering Science program at the University of Toronto, majoring in Machine Intelligence with a Robotics/Mechatronics minor. He is interested in computer vision, deep learning, and robotics. He did the work presented in this post during his summer internship at Amazon.

David Dong is a Data Scientist at Amazon Web Services.

Arkajyoti Misra is a Data Scientist at Amazon LastMile Transportation. He is passionate about applying Computer Vision techniques to solve problems that helps the earth. He loves to work with non-profit organizations and is a founding member of ekipi.org.

Read More

Identify mangrove forests using satellite image features using Amazon SageMaker Studio and Amazon SageMaker Autopilot – Part 1

The increasing ubiquity of satellite data over the last two decades is helping scientists observe and monitor the health of our constantly changing planet. By tracking specific regions of the Earth’s surface, scientists can observe how regions like forests, water bodies, or glaciers change over time. One such region of interest for geologists is mangrove forests. These forests are essential to the overall health of the planet and are one of the many areas across the world that are impacted by human activities. In this post, we show how to get access to satellite imagery data containing mangrove forests and how to visually interact with the data in Amazon SageMaker Studio. In Part 2 of this series, we show how to train a machine learning (ML) model using Amazon SageMaker Autopilot to identify those forests from a satellite image.

Overview of solution

A large number of satellites orbit the Earth, scanning its surface on a regular basis. Typical examples of such satellites are Landsat, Sentinel, CBERS, and MODIS, to name a few. You can access both recent and historical data captured by these satellites at no cost from multiple providers like USGS EarthExplorer, Land Viewer, or Copernicus Open Access Hub. Although they provide an excellent service to the scientific community by making their data freely available, it takes a significant amount of effort to gain familiarity with the interfaces of the respective providers. Additionally, such data from satellites is made available in different formats and may not comply with the standard Geographical Information Systems (GIS) data formatting. All of these challenges make it extremely difficult for newcomers to GIS to prepare a suitable dataset for ML model training.

Platforms like Google Earth Engine (GEE) and Earth on AWS make a wide variety of satellite imagery data available in a single portal that eases searching for the right dataset and standardizes the ETL (extract, transform, and load) component of the ML workflow in a convenient, beginner-friendly manner. GEE additionally provides a coding platform where you can programmatically explore the dataset and build a model in JavaScript. The Python API for GEE lacks the maturity of its JavaScript counterpart; however, that gap is sufficiently bridged by the open-sourced project geemap.

In this series of posts, we present a complete end-to-end example of building an ML model in the GIS space to detect mangrove forests from satellite images. Our goal is to provide a template solution that ML engineers and data scientists can use to explore and interact with the satellite imagery, make the data available in the right format for building a classifier, and have the option to validate model predictions visually. Specifically, we walk through the following:

  • How to download satellite imagery data to a Studio environment
  • How to interact with satellite data and perform exploratory data analysis in a Studio notebook
  • How to automate training an ML model in Autopilot

Build the environment

The solution presented in this post is built in a Studio environment. To configure the environment, complete the following steps:

  1. Add a new SageMaker domain user and launch the Studio app. (For instructions, refer to Get Started.)
  2. Open a new Studio notebook by choosing the plus sign under Notebook and compute resources (make sure to choose the Data Science SageMaker image).
  3. Clone the mangrove-landcover-classification Git repository, which contains all the code used for this post. (For instructions, refer to Clone a Git Repository in SageMaker Studio).
  4. Open the notebook notebooks/explore_mangrove_data.ipynb.
  5. Run the first notebook cell to pip install all the required dependencies listed in the requirements.txt file in the root folder.
  6. Open a new Launcher tab and open a system terminal found in the Utilities and files section.
  7. Install the Earth Engine API:
    pip install earthengine-api

  8. Authenticate Earth Engine:
    earthengine authenticate

  9. Follow the Earth Engine link in the output and sign up as a developer so that you can access GIS data from a notebook.

Mangrove dataset

The Global Mangrove Forest Distribution (GMFD) is one of the most cited datasets used by researchers in the area. The dataset, which contains labeled mangrove regions at a 30-meter resolution from around the world, is curated from more than 1,000 Landsat images obtained from the USGS EROS Center. One of the disadvantages of using the dataset is that it was compiled in 2000. In the absence of a newer dataset that is as comprehensive as the GMFD, we decided to use it because it serves the purpose of demonstrating an ML workload in the GIS space.

Given the visual nature of GIS data, it’s critical for ML practitioners to be able to interact with satellite images in an interactive manner with full map functionalities. Although GEE provides this functionality through a browser interface, it’s only available in JavaScript. Fortunately, the open-sourced project geemap aids data scientists by providing those functionalities in Python.

Go back to the explore_mangrove_data.ipynb notebook you opened earlier and follow the remaining cells to understand how to use simple interactive maps in the notebook.

  1. Start by importing Earth Engine and initializing it:
    import ee
    import geemap.eefolium as geemap
    ee.Initialize()

  2. Now import the satellite image collection from the database:
    mangrove_images_landsat = ee.ImageCollection('LANDSAT/MANGROVE_FORESTS')

  3. Extract the collection, which contains just one set:
    mangrove_images_landsat = mangrove_images_landsat.first()

  4. To visualize the data on a map, you first need to instantiate a map through geemap:
    mangrove_map = geemap.Map()

  5. Next, define some parameters that make it easy to visualize the data on a world map:
    mangrovesVis = {
          min: 0,
          max: 1.0,
          'palette': ['d40115'],
        }

  6. Now add the data as a layer on the map instantiated earlier with the visualization parameters:
    mangrove_map.addLayer(mangrove_images_landsat, mangrovesVis, 'Mangroves')

You can add as many layers as you want to the map and then interactively turn them on or off for a cleaner view when necessary. Because mangrove forests aren’t everywhere on the earth, it makes sense to center the map to a coastal region with known mangrove forests and then render the map on the notebook as follows:

mangrove_map.setCenter(-81, 25, 9)
mangrove_map

The latitude and longitude chosen here, 25 degrees north and 81 degrees west, respectively, correspond to the gulf coast of Florida, US. The map is rendered at a zoom level of 9, where a higher number provides a more closeup view.

You can obtain some useful information about the dataset by accessing the associated metadata as follows:

geemap.image_props(mangrove_images_landsat).getInfo()

You get the following output:

{'IMAGE_DATE': '2000-01-01',
 'NOMINAL_SCALE': 30.359861978395436,
 'system:asset_size': '41.133541 MB',
 'system:band_names': ['1'],
 'system:id': 'LANDSAT/MANGROVE_FORESTS/2000',
 'system:index': '2000',
 'system:time_end': '2001-01-01 00:00:00',
 'system:time_start': '2000-01-01 00:00:00',
 'system:version': 1506796895089836
}

Most of the fields in the metadata are self-explanatory, except for the band names. The next section discusses this field in more detail.

Landsat dataset

The following image is a satellite image of an area at the border of French Guiana and Suriname, where mangrove forests are common. The left image shows a raw satellite image of the region; the image on the right depicts the GMFD data superimposed on it. Pixels representing mangroves are shown in red. It’s quite evident from the side-by-side comparison that there is no straightforward visual cue in either structure or color in the underlying satellite image that distinguishes mangroves from the surrounding region. In the absence of any such distinguishing pattern in the images, it poses a considerable challenge even for state-of-the-art deep learning-based classifiers to identify mangroves accurately. Fortunately, satellite images are captured at a range of wavelengths on the electromagnetic spectrum, part of which falls outside the visible range. Additionally, they also contain important measurements like surface reflectance. Therefore, researchers in the field have traditionally relied upon these measurements to build ML classifiers.

Unfortunately, apart from marking whether or not an individual pixel represents mangroves, the GMFD dataset doesn’t provide any additional information. However, other datasets can provide a host of features for every pixel that can be utilized to train a classifier. In this post, you use the USGS Landsat 8 dataset for that purpose. The Landsat 8 satellite was launched in 2013 and orbits the Earth every 99 minutes at an altitude of 705 km, capturing images covering a 185 km x 180 km patch on the Earth’s surface. It captures nine spectral bands, or portions of the electromagnetic spectrum sensed by a satellite, ranging from ultra blue to shortwave infrared. Therefore, the images available in the Landsat dataset are a collection of image patches containing multiple bands, with each patch time stamped by the date of collection.

To get a sample image from the Landsat dataset, you need to define a point of interest:

point = ee.Geometry.Point([<longitude>, <latitude>])

Then you filter the image collection by the point of interest, a date range, and optionally by the bands of interest. Because the images collected by the satellites are often obscured by cloud cover, it’s absolutely necessary to extract images with the minimum amount of cloud cover. Fortunately, the Landsat dataset already comes with a cloud detector. This streamlines the process of accessing all available images over several months, sorting them by amount of cloud cover, and picking the one with minimum cloud cover. For example, you can perform the entire process of extracting a Landsat image patch from the northern coast of the continent of South America in a few lines of code:

point = ee.Geometry.Point([-53.94, 5.61])
image_patch = ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') 
    .filterBounds(point) 
    .filterDate('2016-01-01', '2016-12-31') 
    .select('B[1-7]') 
    .sort('CLOUD_COVER') 
    .first()

When specifying a region using a point of interest, that region doesn’t necessarily have to be centered on that point. The extracted image patch simply contains the point somewhere within it.

Finally, you can plot the image patch over a map by specifying proper plotting parameters based on a few of the chosen bands:

vis_params = {
    			'min': 0,
'max': 3000,
'bands': ['B5', 'B4', 'B3']
  }
landsat = geemap.Map()
landsat.centerObject(point, 8)
landsat.addLayer(image_patch, vis_params, "Landsat-8")
landsat

The following is a sample image patch collected by Landsat 8 showing in false color the Suriname-French Guiana border region. The mangrove regions are too tiny to be visible at the scale of the image.

As usual, there is a host of useful metadata available for the extracted image:

geemap.image_props(image_patch).getInfo()

{'CLOUD_COVER': 5.76,
 'CLOUD_COVER_LAND': 8.93,
 'EARTH_SUN_DISTANCE': 0.986652,
 'ESPA_VERSION': '2_23_0_1a',
 'GEOMETRIC_RMSE_MODEL': 9.029,
 'GEOMETRIC_RMSE_MODEL_X': 6.879,
 'GEOMETRIC_RMSE_MODEL_Y': 5.849,
 'IMAGE_DATE': '2016-11-27',
 'IMAGE_QUALITY_OLI': 9,
 'IMAGE_QUALITY_TIRS': 9,
 'LANDSAT_ID': 'LC08_L1TP_228056_20161127_20170317_01_T1',
 'LEVEL1_PRODUCTION_DATE': 1489783959000,
 'NOMINAL_SCALE': 30,
 'PIXEL_QA_VERSION': 'generate_pixel_qa_1.6.0',
 'SATELLITE': 'LANDSAT_8',
 'SENSING_TIME': '2016-11-27T13:52:20.6150480Z',
 'SOLAR_AZIMUTH_ANGLE': 140.915802,
 'SOLAR_ZENITH_ANGLE': 35.186565,
 'SR_APP_VERSION': 'LaSRC_1.3.0',
 'WRS_PATH': 228,
 'WRS_ROW': 56,
 'system:asset_size': '487.557501 MB',
 'system:band_names': ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7'],
 'system:id': 'LANDSAT/LC08/C01/T1_SR/LC08_228056_20161127',
 'system:index': 'LC08_228056_20161127',
 'system:time_end': '2016-11-27 13:52:20',
 'system:time_start': '2016-11-27 13:52:20',
 'system:version': 1522722936827122}

The preceding image isn’t free from clouds, which is confirmed by the metadata suggesting a 5.76% cloud cover. Compared to a single binary band available from the GMFD image, the Landsat image contains the bands B1–B7.

ETL process

To summarize, you need to work with two distinct datasets to train a mangrove classifier. The GMFD dataset provides only the coordinates of pixels belonging to the minority class (mangrove). The Landsat dataset, on the other hand, provides band information for every pixel in a collection of patches, each patch covering roughly a 180 km2 area on the Earth’s surface. You now need to combine these two datasets to create the training dataset containing pixels belonging to both the minority and majority classes.

It’s wasteful to have a training dataset covering the entire surface of the Earth, because the mangrove regions cover a tiny fraction of the surface area. Because these regions are generally isolated from one another, an effective strategy is to create a set of points, each representing a specific mangrove forest on the earth’s surface, and collect the Landsat patches around those points. Subsequently, pixels can be sampled from each Landsat patch and a class—either mangrove or non-mangrove—can be assigned to it depending on whether the pixel appears in the GMFD dataset. The full labeled dataset can then be constructed by aggregating points sampled from this collection of patches.

The following table shows a sample of the regions and the corresponding coordinates to filter the Landsat patches.

. region longitude latitude
0 Mozambique1 36.2093 -18.7423
1 Mozambique2 34.7455 -20.6128
2 Nigeria1 5.6116 5.3431
3 Nigeria2 5.9983 4.5678
4 Guinea-Bissau -15.9903 12.1660

Due to the larger expanse of mangrove forests in Mozambique and Nigeria, two points each are required to capture the respective regions in the preceding table. The full curated list of points is available on GitHub.

To sample points representing both classes, you have to create a binary mask for each class first. The minority class mask for a Landsat patch is simply the intersection of pixels in the patch and the GMFD dataset. The mask for the majority class for the patch is simply the inverse of the minority class mask. See the following code:

mangrove_mask = image_patch.updateMask(mangrove_images_landsat.eq(1))
non_mangrove_mask = image_patch.updateMask(mangrove_mask.unmask().Not())

Use these two masks for the patch and create a set of labeled pixels by randomly sampling pixels from the respective masks:

mangrove_training_pts = mangrove_mask.sample(**{
    'region': mangrove_mask.geometry(),
    'scale': 30,
    'numPixels': 100000,
    'seed': 0,
    'geometries': True
})
non_mangrove_training_pts = non_mangrove_mask.sample(**{
    'region': non_mangrove_mask.geometry(),
    'scale': 30,
    'numPixels': 50000,
    'seed': 0,
    'geometries': True
})

numPixels is the number of samples drawn from the entire patch, and the sampled point is retained in the collection only if it falls in the target mask area. Because the mangrove region is typically a small fraction of the Landsat image patch, you need to use a larger value of numPixels for the mangrove mask compared to that for the non-mangrove mask. You can always look at the size of the two classes as follows to adjust the corresponding numPixels values:

mangrove_training_pts.size().getInfo(), non_mangrove_training_pts.size().getInfo()
(900, 49500)

In this example, the mangrove region is a tiny fraction of the Landsat patch because only 900 points were sampled from 100,000 attempts. Therefore, you should probably increase the value for numPixels for the minority class to restore balance between the two classes.

It’s a good idea to visually verify that the sampled points from the two respective sets indeed fall in the intended region in the map:

# define the point of interest
suriname_lonlat = [-53.94, 5.61]
suriname_point = ee.Geometry.Point(suriname_lonlat)
training_map = geemap.Map()
training_map.setCenter(*suriname_lonlat, 13)

# define visualization parameters
vis_params = {
    'min': 0,
    'max': 100,
    'bands': ['B4']
}

# define colors for the two set of points
mangrove_color = 'eb0000'
non_mangrove_color = '1c5f2c'

# create legend for the map
legend_dict = {
    'Mangrove Point': mangrove_color,
    'Non-mangrove Point': non_mangrove_color
}

# add layers to the map
training_map.addLayer(mangrove_mask, vis_params, 'mangrove mask', True)
training_map.addLayer(mangrove_training_pts, {'color': mangrove_color}, 'Mangrove Sample')
training_map.addLayer(non_mangrove_mask, {}, 'non mangrove mask', True)
training_map.addLayer(non_mangrove_training_pts, {'color': non_mangrove_color}, 'non mangrove training', True)
training_map.add_legend(legend_dict=legend_dict)

# display the map
training_map

Sure enough, as the following image shows, the red points representing mangrove pixels fall in the white regions and the green points representing a lack of mangroves fall in the gray region. The maps.ipynb notebook walks through the process of generation and visual inspection of sampled points on a map.

Now you need to convert the sampled points into a DataFrame for ML model training, which can be accomplished by the ee_to_geopandas module of geemap:

from geemap import ee_to_geopandas
mangrove_gdf = ee_to_geopandas(mangrove_training_pts)
                    geometry    B1    B2    B3    B4    B5    B6    B7
0  POINT (-53.95268 5.73340)   251   326   623   535  1919   970   478
1  POINT (-53.38339 5.55982)  4354  4483  4714  4779  5898  4587  3714
2  POINT (-53.75469 5.68400)  1229  1249  1519  1455  3279  1961  1454
3  POINT (-54.78127 5.95457)   259   312   596   411  3049  1644   740
4  POINT (-54.72215 5.97807)   210   279   540   395  2689  1241   510

The pixel coordinates at this stage are still represented as a Shapely geometry point. In the next step, you have to convert those into latitudes and longitudes. Additionally, you need to add labels to the DataFrame, which for the mangrove_gdf should all be 1, representing the minority class. See the following code:

mangrove_gdf["lon"] = mangrove_gdf["geometry"].apply(lambda p: p.x)
mangrove_gdf["lat"] = mangrove_gdf["geometry"].apply(lambda p: p.y)
mangrove_gdf["label"] = 1 
mangrove_gdf = mangrove_gdf.drop("geometry", axis=1)
print(mangrove_gdf.head())

     B1    B2    B3    B4    B5    B6    B7        lon       lat  label
0   251   326   623   535  1919   970   478 -53.952683  5.733402      1
1  4354  4483  4714  4779  5898  4587  3714 -53.383394  5.559823      1
2  1229  1249  1519  1455  3279  1961  1454 -53.754688  5.683997      1
3   259   312   596   411  3049  1644   740 -54.781271  5.954568      1
4   210   279   540   395  2689  1241   510 -54.722145  5.978066      1

Similarly, create another DataFrame, non_mangrove_gdf, using sampled points from the non-mangrove part of the Landsat image patch and assigning label=0 to all those points. A training dataset for the region is created by appending mangrove_gdf and non_mangrove_gdf.

Exploring the bands

Before diving into building a model to classify pixels in an image representing mangroves or not, it’s worth looking into the band values associated with those pixels. There are seven bands in the dataset, and the kernel density plots in the following figure show the distribution of those bands extracted from the 2015 Landsat data for the Indian mangrove region. The distribution of each band is broken down into two groups: pixels representing mangroves, and pixels representing other surface features like water or cultivated land.

One important aspect of building a classifier is to understand how these distributions vary over different regions of the Earth. The following figure shows the kernel density plots for bands captured in the same year from the Miami area of the US in 2015. The apparent similarity of the density profiles indicate that it may be possible to build a universal mangrove classifier that can be generalized to predict new areas excluded from the training set.

The plots shown in both figures are generated from band values that represent minimum cloud coverage, as determined by the built-in Earth Engine algorithm. Although this is a very reasonable approach, because different regions on the Earth have varying amounts of cloud coverage on the specific date of data collection, there exist alternative ways to capture the band values. For example, it’s also useful to calculate the median from a simple composite and use it for model training, but those details are beyond the scope of this post.

Prepare the training data

There are two main strategies to split the labeled dataset into training and test sets. In the first approach, datasets corresponding to the different regions can be combined into a single DataFrame and then split into training and test sets while preserving the fraction of the minority class. The alternative approach is to train a model on a subset of the regions and treat the remaining regions as part of the test set. One of the critical questions we want to address here is how good a model trained in a certain region generalizes over other regions previously unseen. This is important because mangroves from different parts of the world can have some local characteristics, and one way to judge the quality of a model is to investigate how reliable it is in predicting mangrove forests from the satellite image of a new region. Therefore, although splitting the dataset using the first strategy would likely improve the model performance, we follow the second approach.

As indicated earlier, the mangrove dataset was broken down into geographical regions and four of those, Vietnam2, Myanmar3, Cuba2, and India, were set aside to create the test dataset. The remaining 21 regions made up the training set. The dataset for each region was created by setting numPixels=10000 for mangrove and numPixels=1000 for the non-mangrove regions in the sampling process. The larger value of numPixels for mangroves ensures a more balanced dataset, because mangroves usually cover a small fraction of the satellite image patches. The resulting training data ended up having a 75/25 split between the majority and minority classes, whereas the split was 69/31 for the test dataset. The regional datasets as well as the training and test datasets were stored in an Amazon Simple Storage Service (Amazon S3) bucket. The complete code for generating the training and test sets is available in the prep_mangrove_dataset.ipynb notebook.

Train a model with smileCart

One of the few built-in models GEE provides is a classification and regression tree-based algorithm (smileCart) for quick classification. These built-in models allow you to quickly train a classifier and perform inference, at the cost of detailed model tuning and customization. Even with this downside, using smileCart still provides a beginner-friendly introduction to land cover classification, and therefore can serve as a baseline.

To train the built-in classifier, you need to provide two pieces of information: the satellite bands to use as features and the column representing the label. Additionally, you have to convert the training and test datasets from Pandas DataFrames to GEE feature collections. Then you instantiate the built-in classifier and train the model. The following is a high-level version of the code; you can find more details in the smilecart.ipynb notebook:

bands = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7']
label = 'label'

# Train a CART classifier with default parameters.
classifier = ee.Classifier.smileCart().train(train_set_pts, label, bands)

# Inference on test set
result_featurecollection = test_set_pts.select(bands).classify(classifier)

Both train_set_pts and test_set_pts are FeatureCollections, a common GEE data structure, containing the train dataset and test dataset, respectively. The model prediction generates the following confusion matrix on the test dataset.

The model doesn’t predict mangroves very well, but this is a good starting point, and the result will act as a baseline for the custom models you build in part two of this series.

Conclusion

This concludes the first part of a two-part post, in which we show the ETL process for building a mangrove classifier based on features extracted from satellite images. We showed how to automate the process of gathering satellite images and visualize it in Studio for detailed exploration. In Part 2 of the post, we show how to use AutoML to build a custom model in Autopilot that performs better than the built-in smileCart model.


About the Authors

Andrei Ivanovic is an incoming Master’s of Computer Science student at the University of Toronto and a recent graduate of the Engineering Science program at the University of Toronto, majoring in Machine Intelligence with a Robotics/Mechatronics minor. He is interested in computer vision, deep learning, and robotics. He did the work presented in this post during his summer internship at Amazon.

David Dong is a Data Scientist at Amazon Web Services.

Arkajyoti Misra is a Data Scientist at Amazon LastMile Transportation. He is passionate about applying Computer Vision techniques to solve problems that helps the earth. He loves to work with non-profit organizations and is a founding member of ekipi.org.

Read More

Build an appointment scheduler interface integrated with Meta using Amazon Lex and Amazon Connect

This blog post is co-written with Nick Vargas and Anna Schreiber from Accenture.

Scheduling customer appointments is often a manual and labor-intensive process. You can utilize advances in self-service technology to automate appointment scheduling.

In this blog post, we show you how to build a self-service appointment scheduling solution built with Amazon Lex and Amazon Connect. This solution allows users to create appointments via Meta Messenger, and receive appointment confirmations through an SMS mobile message. It also provides a web-based dashboard so you can provide call to users with single-click button at the scheduled time.

Amazon Lex integrates with Meta messenger and can be used to enable chat conversations. Lex is a fully-managed artificial intelligence (AI) service with Natural language understanding (NLU) to design, build, test, and deploy conversational interfaces in applications.

Solution overview

The architecture diagram below shows a high-level overview of the interaction between different AWS components and services. The solution consists of these primary components: customer interaction using Meta messenger, appointment scheduling via SMS enabled by Lex and a customer outbound dialer from Connect. This outbound dialer makes it easy to create an outbound call to the customer from a simple UI interface.

This post uses the following sample bot conversation:

User: I would like to book an appointment.
Agent: What appointment can I get you? You can say Billing, General or Offers.
User: Billing
Agent: What’s your first name?
User: Sameer
Agent: What is your phone number with country code?
User: +10001234567
Agent: When should I schedule your Billing appointment?
User: Next week Tuesday
Agent: At what time should I schedule the Billing appointment?
User: 9:00 am
Agent: Sameer, 09:00 is available, should I go ahead and book your appointment?
User: Yes
Agent: Thanks Sameer, your appointment is confirmed for 09:00, and we have texted the details to your phone number.

For the scheduler and customer notification component, an AWS Lambda handler is used to process the scheduling request. The appointment information is then saved to a Amazon DynamoDB database. When the information is saved successfully, a notification is sent to the customer confirming the appointment details via SMS using Amazon Pinpoint.

A React.js application is created to display the saved customer appointments from the database in a calendar view format. This makes it easy for employees to identify the customers who need to be called. A call button from the calendar entry is clicked to initiate the call. This will immediately place an outbound call request to connect the customer with the employee using Amazon Connect.

Prerequisites

For this project, you should have the following prerequisites:

  • Downloaded the code files from the GitHub repository.
    The repository contains:

    • The React app files, located under the UI
    • The Amazon Connect Contact Flows, located under backend/connect/contact_flows There are four contact flows for this demo with files names AgentWhisper, CustomerWaiting, InboundCall and OutboundCall.
    • A zip file for an Amazon Lex Bot, located in backend/lex directory with file name AppointmentSchedulerBot.zip.
  • npm installed on your local machine. Refer how to install node.js and npm on your machine,

The deployment of this solution is automated where possible using CloudFormation, however, some configurations and steps in the deployment are manual.

Deploy the solution

To set up the required infrastructure for the appointment scheduler demo app in your AWS account, complete the following steps:

  1. Sign in to the AWS Management Console.
  2. Choose Launch Stack:
    Launch Stack
  3. On the Create Stack page, under Specify template, choose Upload a template file.
  4. Choose the AppointmentsSchedulerCFTemplate file that you downloaded from GitHub.
  5. Choose Next.
  6. For Stack name, enter a unique name for the stack, such as AppointmentSchedulerDemo.
    Lanuch CloudFormaiton stack
  7. Choose Next, and then choose Next on the Configure stack options page.
  8. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources and choose Create.
    The stack generates the following resources:
    • The DynamoDB table AppointmentSchedulerTable
    • The Amazon Pinpoint app AppointmentSchedulerPinpointApp
    • Two AWS Identity and Access Management (IAM) policies:
      • AppointmentSchedulerPinpointPolicy
      • AppointmentSchedulerDynamoApiPolicy
    • Two IAM roles:
      • AppointmentsLambdaRole
      • OutboundContactLambdaRole
    • Two Lambda functions:
      • AppointmentScheduler
      • AppointmentSchedulerOutboundContact
    • The Amazon API Gateway instance Appointments
    • Amazon CloudFront distribution
    • The Amazon Simple Storage Service (Amazon S3) bucket appointment-scheduler-website

Configure the Amazon Pinpoint app

To configure the Amazon Pinpoint app, complete the following steps:

  1. Go to the Pinpoint console.
  2. Navigate to the AppointmentSchedulerPinpointApp deployed in above.
  3. On the left menu under Settings click SMS and Voice.
  4. Under Number settings click Request Phone Number.
  5. Select your country of origin, choose Toll-free, and click Next, then Request.

The Amazon Lex bot for this post has one intent, MakeAppointment, which asks the user the series of questions in the preceding example to elicit the appointment type, date, time, name, and phone number of the customer.

AppointmentTypeValue is the only custom slot type for this bot and takes one of three values: Billing, General, or Offers. The Name, Phone, Date, and Time slots each use the built-in slot type provided by Amazon Lex.

Deploy the Amazon Lex bot

To deploy the bot, first import the Amazon Lex bot (AppointmentSchedulerLex.zip) into your account.

  1. Sign in to the Amazon Lex V2 console.
  2. If this is your first time using Amazon Lex, you will be shown the Welcome page, choose Create Bot.
  3. When presented with the Create your bot page, scroll down to the bottom of the page, and select Cancel. If this is not your first-time using Amazon Lex, skip this step.
  4. Choose Actions, then Import.
  5. Enter AppointmentSchedulerBot for the bot’s name then choose the .zip archive to import.
  6. Under IAM permissions, choose Create a role with basic Amazon Lex permissions.
  7. Under COPPA, choose No.
  8. Click Import.
  9. Open the bot by clicking on the bot’s name.
  10. Under Deployment on the left menu, click Aliases, select TestBotAlias and click English (US) under Languages. Choose the AppointmentScheduler Lambda function and click Save.
  11. Under Bot Versions on the left menu, select Intents and at the bottom right-hand side of the page, click Build.
  12. [Optional] Once the build has completed, click Test to test the bot using the window that appears on the right (click on the microphone icon to speak to your bot or type in the text box).

Set up an Amazon Connect Instance

To set up your Amazon Connect instance and contact flows, you complete the following steps:

  1. Set up an Amazon Connect instance.
    1. Go to the Amazon Connect console.
    2. If this is the first time you have been to the Amazon Connect console, you will see the Welcome page, choose Get Started.
    3. If this is not the first time you are using Amazon Connect, click Add an instance.
    4. For Identity management, select Store users in Amazon Connect.
    5. For Access URL, type a unique name for your instance, for example, AppointmentSchedulerDemo, then choose Next.
    6. On the Add administrator page, add a new administrator account for Amazon Connect. Use this account to log in to your instance later using the unique access URL. Click Next step.
    7. On the next two pages – Telephony Options and Data storage – accept the default settings and choose Next step.
    8. On the Review and Create page, choose Create instance.
  2. Add the Amazon Lex bots to your newly created Amazon Connect instance.
    1. Select the Instance Alias of the instance you just created.
    2. Choose Contact flows.
    3. Under Amazon Lex, use the drop-down to select the AppointmentSchedulerBot and the default alias.
    4. Choose + Add Amazon Lex Bot. If the name of your bot does not appear in the list, reload the page.
  3. Log in to the instance and claim a phone number
    1. Click on the Login URL for your Connect Instance.
    2. Enter the Administrator credentials you entered upon creation of the instance. This will open the Connect Console.
    3. From the Dashboard, under Explore your channels of communication select View phone numbers on the right.
    4. Click Claim a number.
    5. Choose a Country and leave the default type of DID (Direct Inward Dialing), choose a Phone Number from the dropdown list, and click Next.
    6. Click Save.
  4. Add the OutboundQueue
    1. From the navigation menu on the left, choose Queues from the Routing menu.
    2. Click Add New Queue.
    3. Name the Queue OutboundQueue, use the dropdown to set the Hours of operation to Basic Hours and use the dropdown for Outbound caller ID number to select the phone number you claimed earlier.
    4. Click Add new queue.
    5. From the navigation menu on the left, choose Routing Profiles from the Users menu.
    6. Click Basic Routing Profile. Under Routing profile queues, add OutboundQueue and click Save.
  5. Add the phone number to BasicQueue
    1. From the navigation menu on the left, choose Queues from the Routing menu.
    2. Click on BasicQueue.
    3. In the Outbound caller ID number field, add the phone number that you claimed earlier.
    4. Click Save on the top right corner.
  6. Import the InboundCall contact flow
    1. From the navigation menu on the left, choose Contact Flows from the Routing menu.
    2. Choose Create contact flow.
    3. On the right-hand side of the page, click on the down arrow and click Import flow (beta).
    4. Find the InboundCall file and choose Import.
    5. Click Publish.
  7. Then, associate this flow with the phone number.
    1. From the navigation menu on the left, choose Phone Numbers from the Routing menu.
    2. Choose the phone number we created earlier.
    3. Under the Contact flow/IVR section, select the InboundCall flow.
    4. Click Save.
  8. Import the AgentWhisper, CustomerWaiting, and OutboundCall contact flows
    1. From the left navigation menu, choose Contact Flows under Routing.
    2. Click Create Agent Whisper flow.
    3. On the right-hand side of the page, click on the down arrow and click Import flow (beta).
    4. Find the AgentWhisper file and choose Import.
    5. Click Publish.
    6. Navigate back to the Contact Flows list and click the down arrow next to Create contact flow.
    7. Click Create Customer Queue Flow.
    8. On the right-hand side of the page, click on the down arrow and click Import flow (beta).
    9. Find the  CustomerWaiting file and choose Import.
    10. Click Publish.
    11. Navigate back to the Contact Flows list and click the down arrow next to Create contact flow.
    12. Choose Create contact flow.
    13. On the right-hand side of the page, click on the down arrow and click Import flow (beta).
    14. Find the OutboundCall file from the GitHub repository you downloaded earlier and choose Import.
    15. Click Publish.

Edit Lambda Functions:

  1. Go to the Lambda console.
  2. Click on the AppointmentScheduler function.
  3. Click on Configuration and Environment Variables from the left menu.
  4. Click Edit. Replace the Value with your Pinpoint Project ID and Toll-free number. Click Save.
  5. Return to the Lambda console and click on the AppointmentSchedulerOutboundContact function.
  6. Repeat step 3 and 4, replacing the values for CONTACT_FLOW, INSTANCE_ID and QUEUE_ID with the correct values. Click Save once done.
    1. To find the contact flow ID, navigate to the OutboundCall Contact Flow in the Amazon Connect Console and click on the arrow next to Show additional flow information. The contact flow ID is the last value after contact-flow/.
    2. To find the instance ID, navigate to the Amazon Connect Console and click on your instance Alias. The instance ID is the last value in the Instance ARN after instance/.
    3. To find the queue ID, navigate to the OutboundQueue in the Amazon Connect Console and click on the arrow next to Show additional queue information. The contact flow ID is the last value after queue/.

The Lex Bots and Amazon Connect Instance are now ready to go. Next, we will deploy the UI.

Edit API Gateway route:

  1. Go to the API Gateway console
  2. Click the instance named Appointments
  3. Under the resources section, click the POST method belonging to the /outcall resource.
  4. Click Integration Request.
  5. Then click the edit icon next to the right of the Lambda Function field. Then click the checkmark icon that have appeared to the right of the text field.
  6. Click OK to add a permission to the Lambda function.

Deploy the UI:

  1. Configure the UI before deployment
    1. In your preferred code editor, open the ui folder from the downloaded code files.
    2. Replace <your-api-ID> and <region> with your API ID (accessible under the ID column of the API Gateway Console) and the region of your deployed resources in the following lines: 103, 168, 310, 397, 438, 453.
    3. Replace <your-instance-name> with your Amazon Connect instance name on line 172 and 402.
    4. [Optional] add an app logo in the index.js file, line 331:

      In the index.html file, line 5:
    5. In a terminal, navigate to the ui folder of the downloaded project.
    6. Run npm install. This will take a few minutes to complete.
    7. Run npm run-script build. This will generate a build folder in the ui directory.
  2. Add the code files to the S3 bucket:
    1. Go to the S3 Console.
    2. Search for the bucket deployed with the CloudFormation Stack, appointment-scheduler-website-<random_id>.
    3. Drag and drop the contents of the build folder in the ui directory created in the last step into the bucket.
    4. Click Upload.

      You should now be able to access the application from the CloudFront Distribution.
  3. Add the CloudFront Distribution as an approved origin.
      1. Go to the Amazon Connect console.
      2. Select the Instance Alias of the instance to which to add the bot.
      3. Choose Approved origins.
      4. Click + Add origin and enter the URL of your CloudFront Distribution.
      5. Click Add.
  4. Now navigate to your CloudFront Distribution URL plus index.html. (e.g., https:// <DistributionDomainName>.cloudfront.net/index.html)

Clean up

One finished with this solution, make sure to clean up your AWS environment as to not incur unwanted charges.

  1. Go to the S3 console, empty your bucket created by the CloudFormation template (appointment-scheduler-website).
  2. Go to the CloudFormation console, delete your stack. Ensure that all resources associated with this stack were deleted successfully.
  3. Go to the Amazon Connect console, delete your instance.
  4. Go to the Amazon Lex console, delete the bot you created.

Conclusion

For this blog, Accenture and AWS collaborated to develop a machine learning solution that highlights the use of AWS services to build an automated appointment scheduler. This solution demonstrates how easy it is to build an appointment scheduling solution in AWS. Amazon Lex’s ability to support third-party messaging services such as Meta messenger extends the potential reach of the solution across multiple channels. Customer notification via SMS is implemented with minimal effort using Amazon Pinpoint. With Amazon Connect, an outbound dialer is seamlessly integrated with the calendar view web application enabling employees to immediately connect to customers with a simple click-to-call button.

You can accelerate innovation with the Accenture AWS Business Group (AABG). You can learn from the resources, technical expertise, and industry knowledge of two leading innovators, helping you accelerate the pace of innovation to deliver disruptive products and services. The AABG helps customers ideate and innovate cloud solutions for customers through rapid prototype development. Connect with our team a accentureaws@amazon.com to learn and accelerate how to use machine learning in your products and services.


About the Authors

Sameer Goel is a Sr. Solutions Architect in the Netherlands, who drives customer success by building prototypes on cutting-edge initiatives. Prior to joining AWS, Sameer graduated with a master’s degree from Boston, with a concentration in data science. He enjoys building and experimenting with AI/ML projects on Raspberry Pi.

Nick Vargas is a Manager and Technology Architect at Accenture. He leads the project delivery for a rapid prototyping team within the Accenture AWS Business Group (AABG). He enjoys his morning walks with his dog Bingo, traveling, going to the beach, and hiking.

Anna Schreiber is part of a prototyping team within Accenture’s AWS Business Group (AABG). As a Senior AWS Developer, she has worked on several high-profile proof of concepts that help bring the client’s vision to life. When not working, she enjoys cooking, crafting, and playing fetch with her corgi Gimli.

Read More

How to scale machine learning inference for multi-tenant SaaS use cases

This post is co-written with Sowmya Manusani, Sr. Staff Machine Learning Engineer at Zendesk

Zendesk is a SaaS company that builds support, sales, and customer engagement software for everyone, with simplicity as the foundation. It thrives on making over 170,000 companies worldwide serve their hundreds of millions of customers efficiently. The Machine Learning team at Zendcaesk is responsible for enhancing Customer Experience teams to achieve their best. By combining the power of data and people, Zendesk delivers intelligent products that make their customers more productive by automating manual work.

Zendesk has been building ML products since 2015, including Answer Bot, Satisfaction Prediction, Content Cues, Suggested Macros, and many more. In the last few years, with the growth in deep learning, especially in NLP, they saw a lot of opportunity to automate workflows and assist agents in supporting their customers with Zendesk solutions. Zendesk currently use TensorFlow and PyTorch to build deep learning models.

Customers like Zendesk have built successful, high-scale software as a service (SaaS) businesses on Amazon Web Services (AWS). A key driver for a successful SaaS business model is the ability to apply multi-tenancy in the application and infrastructure. This enables cost and operational efficiencies because the application only needs to be built once, but it can be used many times and the infrastructure can be shared. We see many customers build secure, cost-efficient, multi-tenant systems on AWS at all layers of the stack, from compute, storage, database, to networking, and now we’re seeing customers needing to apply it to machine learning (ML).

Making the difficult tradeoff between model reuse and hyper-personalization

Multi-tenancy for SaaS businesses typically means that a single application is reused between many users (SaaS customers). This creates cost efficiencies and lowers operational overhead. However, machine learning models sometimes need to be personalized to a high degree of specificity (hyper-personalized) to make accurate predictions. This means the “build once, use many times” SaaS paradigm can’t always be applied to ML if models have specificity. Take for example the use case of customer support platforms. The language that users include in a support ticket varies depending on if it’s a ride share issue (“ride took too long”) or a clothing purchase issue (“discoloration when washed”). In this use case, improving the accuracy of predicting the best remediation action may require training a natural language processing (NLP) model on a dataset specific to a business domain or an industry vertical. Zendesk face exactly this challenge when trying to leverage ML in their solutions. They needed to create thousands of highly customized ML models, each tailored for a specific customer. To solve this challenge of deploying thousands of models, cost effectively, Zendesk turned to Amazon SageMaker.

In this post, we show how to use some of the newer features of Amazon SageMaker, a fully managed machine learning service, to build a multi-tenant ML inference capability. We also share a real-world example of how Zendesk successfully achieved the same outcome by deploying a happy medium between being able to support hyper-personalization in their ML models and the cost-efficient, shared use of infrastructure using SageMaker multi-model endpoints (MME).

SageMaker multi-model endpoints

SageMaker multi-model endpoints enable you to deploy multiple models behind a single inference endpoint that may contain one or more instances. Each instance is designed to load and serve multiple models up to its memory and CPU capacity. With this architecture, a SaaS business can break the linearly increasing cost of hosting multiple models and achieve reuse of infrastructure consistent with the multi-tenancy model applied elsewhere in the application stack.

The following diagram illustrates the architecture of a SageMaker multi-model endpoint.

The SageMaker multi-model endpoint dynamically loads models from Amazon Simple Storage Service (Amazon S3) when invoked, instead of downloading all the models when the endpoint is first created. As a result, an initial invocation to a model might see higher inference latency than the subsequent inferences, which are completed with low latency. If the model is already loaded on the container when invoked, then the download step is skipped and the model returns the inferences with low latency. For example, assume you have a model that is only used a few times a day. It is automatically loaded on demand, while frequently accessed models are retained in memory and invoked with consistently low latency.

Let’s take a closer look at how Zendesk used SageMaker MME to achieve cost-effective, hyper-scale ML deployment with their Suggested Macros ML feature.

Why Zendesk built hyper-personalized models

Zendesk’s customers are spread globally in different industry verticals with different support ticket semantics. Therefore, to serve their customers best, they often have to build personalized models that are trained on customer-specific support ticket data to correctly identify intent, macros, and more.

In October 2021, they released a new NLP ML feature, Suggested Macros, which recommends macros (predefined actions) based on thousands of customer-specific model predictions. Zendesk’s ML team built a TensorFlow-based NLP classifier model trained from the previous history of ticket content and macros per customer. With these models available, a macro prediction is recommended whenever an agent views the ticket (as shown in the following screenshot), which assists the agent in serving customers quickly. Because macros are specific to customers, Zendesk needs customer-specific models to serve accurate predictions.

Under the hood of Zendesk’s Suggested Macros

Suggested Macros models are NLP-based neural nets that are around 7–15 MB in size. The main challenge is to put thousands of these models in production with cost-efficient, reliable, and scalable solutions.

Each model has different traffic patterns, with a minimum of two requests per second and a peak of hundreds of requests per second, serving millions of predictions per day with a model latency of approximately 100 milliseconds when the model is available in memory. SageMaker endpoints are deployed in multiple AWS Regions, serving thousands of requests per minute per endpoint.

With its ability to host multiple models on a single endpoint, SageMaker helped Zendesk reduce deployment overhead and create a cost-effective solution when compared to deploying a single-model endpoint per customer. The tradeoff here is less control on per-model management; however, this is an area where Zendesk is collaborating with AWS to improve multi-model endpoints.

One of the SageMaker multi-model features is lazy loading of models, that is, models are loaded into memory when invoked for the first time. This is to optimize memory utilization; however, it causes response time spikes on first load, which can be seen as a cold start problem. For Suggested Macros, this was a challenge; however, Zendesk overcame this by implementing a preloading functionality on top of the SageMaker endpoint provisioning to load the models into memory before serving production traffic. Secondly, MME unloads infrequently used models from memory, so to achieve consistent low latency on all the models and avoid “noisy neighbors” impacting other less active models, Zendesk is collaborating with AWS to add new features, discussed later in the post, to enable more explicit per-model management. Additionally, as an interim solution, Zendesk has right-sized the MME fleet to minimize too many models unloading. With this, Zendesk is able to serve predictions to all their customers with low latency, around 100 milliseconds, and still achieve 90% cost savings when compared to dedicated endpoints.

On right-sizing MME, Zendesk observed during load testing that having a higher number of smaller instances (bias on horizontal scaling) behind MME was a better choice than having fewer larger memory instances (vertical scaling). Zendesk observed that bin packing too many models (beyond 500 TensorFlow models in their case) on a single large memory instance didn’t work well because memory is not the only resource on an instance that can be a bottleneck. More specifically, they observed that TensorFlow spawned multiple threads (3 x total instance vCPUs) per model, so loading over 500 models on a single instance caused kernel level limits to be breached on the max number of threads that could be spawned on an instance. Another issue with using fewer, larger instances occurred when Zendesk experienced throttling (as a safety mechanism) on some instances behind MME because the unique model invocation per second rate exceeded what the Multi Model Server (MMS) on a single instance could safely handle without browning out the instance. This was another issue that was resolved with the use of more and smaller instances.

From the observability perspective, which is a crucial component of any production application, Amazon CloudWatch metrics like invocations, CPU, memory utilization, and multi model-specific metrics like loaded models in memory, model loading time, model load wait time, and model cache hit are informative. Specifically, the breakdown of model latency helped Zendesk understand the cold start problem and its impact.

Under the hood of MME auto scaling

Behind each multi-model endpoint, there are model hosting instances, as depicted in the following diagram. These instances load and evict multiple models to and from memory based on the traffic patterns to the models.

SageMaker continues to route inference requests for a model to the instance where the model is already loaded such that the requests are served from cached model copy (see the following diagram, which shows the request path for the first prediction request vs. the cached prediction request path). However, if the model receives many invocation requests, and there are additional instances for the multi-model endpoint, SageMaker routes some requests to another instance to accommodate the increase. To take advantage of automated model scaling in SageMaker, make sure you have instance auto scaling set up to provision additional instance capacity. Set up your endpoint-level scaling policy with either custom parameters or invocations per minute (recommended) to add more instances to the endpoint fleet.

Use cases best suited for MME

SageMaker multi-model endpoints are well suited for hosting a large number of similar models that you can serve through a shared serving container and don’t need to access all the models at the same time. MME is best suited for models that are similar in size and invocation latencies. Some variation in model size is acceptable; for example, Zendesk’s models range from 10–50 Mb, which works fine, but variations in size that are a factor of 10, 50, or 100 times greater aren’t suitable. Larger models may cause a higher number of loads and unloads of smaller models to accommodate sufficient memory space, which can result in added latency on the endpoint. Differences in performance characteristics of larger models could also consume resources like CPU unevenly, which could impact other models on the instance.

MME is also designed for co-hosting models that use the same ML framework because they use the shared container to load multiple models. Therefore, if you have a mix of ML frameworks in your model fleet (such as PyTorch and TensorFlow), SageMaker dedicated endpoints or multi-container hosting is a better choice. Finally, MME is suited for applications that can tolerate an occasional cold start latency penalty because infrequently used models can be off-loaded in favor of frequently invoked models. If you have a long tail of infrequently accessed models, a multi-model endpoint can efficiently serve this traffic and enable significant cost savings.

Summary

In this post, you learned how SaaS and multi-tenancy relate to ML and how SageMaker multi-model endpoints enable multi-tenancy and cost-efficiency for ML inference. You learned about Zendesk’s multi-tenanted use case of per-customer ML models and how they hosted thousands of ML models in SageMaker MME for their Suggested Macros feature and achieved 90% cost savings on inference when compared to dedicated endpoints. Hyper-personalization use cases can require thousands of ML models, and MME is a cost-effective choice for this use case. We will continue to make enhancements in MME to enable you to host models with low latency and with more granular controls for each personalized model. To get started with MME, see Host multiple models in one container behind one endpoint.


About the Authors

Syed Jaffry is a Sr. Solutions Architect with AWS. He works with a range of companies from mid-sized organizations to large enterprises, financial services to ISVs, in helping them build and operate secure, resilient, scalable, and high performance applications in the cloud.

Sowmya Manusani is a Senior Staff Machine Learning Engineer at Zendesk. She works on productionalizing NLP-based Machine Learning features that focus on improving Agent productivity for thousands of Zendesk Enterprise customers. She has experience with building automated training pipelines for thousands of personalized models and serving them using secure, resilient, scalable, and high-performance applications. In her free time, she likes to solve puzzles and try painting.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and making machine learning more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Deepti Ragha is a Software Development Engineer in the Amazon SageMaker team. Her current work focuses on building features to host machine.

Read More

How Mantium achieves low-latency GPT-J inference with DeepSpeed on Amazon SageMaker

Mantium is a global cloud platform provider for building AI applications and managing them at scale. Mantium’s end-to-end development platform enables enterprises and businesses of all sizes to build AI applications and automation faster and easier than what has been traditionally possible. With Mantium, technical and non-technical teams can prototype, develop, test, and deploy AI applications, all with a low-code approach. Through automatic logging, monitoring, and safety features, Mantium also releases software and DevOps engineers from spending their time reinventing the wheel. At a high level, Mantium delivers:

  • State-of-the-art AI – Experiment and develop with an extensive selection of open-source and private large language models with a simple UI or API.
  • AI process automation – Easily build AI-driven applications with a growing library of integrations and Mantium’s graphical AI Builder.
  • Rapid deployment – Shorten the production timeline from months to weeks or even days with one-click deployment. This feature turns AI applications into shareable web apps with one click.
  • Safety and regulation – Ensure safety and compliance with governance policies and support for human-in-the-loop processes.

With the Mantium AI Builder, you can develop sophisticated workflows that integrate external APIs, logic operations, and AI models. The following screenshot shows an example of the Mantium AI app, which chains together a Twilio input, governance policy, AI block (which can rely on an open-source model like GPT-J) and Twilio output.

To support this app, Mantium provides comprehensive and uniform access to not only model APIs from AI providers like Open AI, Co:here, and AI21, but also state-of-the-art open source models. At Mantium, we believe that anyone should be able to build modern AI applications that they own, end-to-end, and we support this by providing no-code and low-code access to performance-optimized open-source models.

For example, one of Mantium’s core open-source models is GPT-J, a state-of-the-art natural language processing (NLP) model developed by EleutherAI. With 6 billion parameters, GPT-J is one of the largest and best-performing open-source text generation models. Mantium users can integrate GPT-J into their AI applications via Mantium’s AI Builder. In the case of GPT-J, this involves specifying a prompt (a natural language representation of what the model should do) and configuring some optional parameters.

For example, the following screenshot shows an abbreviated demonstration of a sentiment analysis prompt that produces explanations and sentiment predictions. In this example, the author wrote that the “food was wonderful” and that their “service was extraordinary.” Therefore, this text expresses positive sentiment.

However, one challenge with open-source models is that they’re rarely designed for production-grade performance. In the case of large models like GPT-J, this can make production deployment impractical and even infeasible, depending on the use case.

To ensure that our users have access to best-in-class performance, we’re always looking for ways to decrease the latency of our core models. In this post, we describe the results of an inference optimization experiment in which we use DeepSpeed’s inference engine to increase GPT-J’s inference speed by approximately 116%. We also describe how we have deployed the Hugging Face Transformers implementation of GPT-J with DeepSpeed in our Amazon SageMaker inference endpoints.

Overview of the GPT-J model

GPT-J is a generative pretrained (GPT) language model and, in terms of its architecture, it’s comparable to popular, private, large language models like Open AI’s GPT-3. As noted earlier, it consists of approximately 6 billion parameters and 28 layers, which consist of a feedforward block and a self-attention block. When it was first released, GPT-J was one of the first large language models to use rotary embeddings, a new position encoding strategy that unifies absolute and relative position encoders. It also employs an innovative parallelization strategy where dense and feedforward layers are combined in a single layer, which minimizes communication overhead.

Although GPT-J might not quite qualify as large by today’s standards—large models typically consist of more than 100 billion parameters—it’s still impressively performant, and with some prompt engineering or minimal fine-tuning, you can use it to solve many problems. Furthermore, its relatively modest size means that you can deploy it more rapidly and at a much lower cost than larger models.

That said, GPT-J is still pretty big. For example, training GPT-J in FP32 with full weight updates and the Adam optimizer requires over 200 GB memory: 24 GB for the model parameters, 24 GB for the gradients, 24 GB for Adam’s squared gradients, 24 GB for the optimizer states, and the additional memory requirements for loading training batches and storing activations. Of course, training in FP16 reduces these memory requirements almost by half, but a memory footprint of over 100 GB still necessitates innovative training strategies. For instance, in collaboration with SageMaker, Mantium’s NLP team developed a workflow for training (fine-tuning) GPT-J using the SageMaker distributed model parallel library.

In contrast, serving GPT-J for inference has much lower memory requirements—in FP16, model weights occupy less than 13 GB, which means that inference can easily be conducted on a single 16 GB GPU. However, inference with out-of-the-box implementations of GPT-J, such as the Hugging Face Transformers implementation that we use, is relatively slow. To support use cases that require highly responsive text-generation, we’ve focused on reducing GPT-J’s inference latency.

Response latency challenges of GPT-J

Response latency is a core obstacle for the generative pretrained transformers (GPTs) such as GPT-J that power modern text generation. GPT models generate text through sequences of inference steps. At each inference step, the model is given text as input, and, conditional on this input, it samples a word from its vocabulary to append to the text. For example, given the sequence of tokens “I need an umbrella because it’s,” a high-likelihood next token might be “raining.” However, it could also be “sunny” or “bound,” which could be the first step toward a text sequence like “I need an umbrella because it’s bound to start raining.”

Scenarios like this raise some interesting challenges for deploying GPT models because real-world use cases might involve tens, hundreds, or even thousands of inference steps. For example, generating a 1,000-token response requires 1,000 inference steps! Accordingly, although a model might offer inference speeds that seem fast enough in isolation, it’s easy for latency to reach untenable levels when long texts are generated. We observed an average latency of 280 milliseconds per inference step on a V100 GPU. This might seem fast for a 6.7 billion parameter model, but with such latencies, it takes approximately 30 seconds to generate a 500-token response, which isn’t ideal from a user experience perspective.

Optimizing inference speeds with DeepSpeed Inference

DeepSpeed is an open-source deep-learning optimization library developed by Microsoft. Although it primarily focuses on optimizing of training large models, DeepSpeed also provides an inference optimization framework that supports a select set of models, including BERT, Megatron, GPT-Neo, GPT2, and GPT-J. DeepSpeed Inference facilitates high-performance inference with large Transformer-based architectures through a combination of model parallelism, inference-optimized CUDA kernels, and quantization.

To boost inference speed with GPT-J, we use DeepSpeed’s inference engine to inject optimized CUDA kernels into the Hugging Face Transformers GPT-J implementation.

To evaluate the speed benefits of DeepSpeed’s inference engine, we conducted a series of latency tests in which we timed GPT-J under various configurations. Specifically, we varied whether or not DeepSpeed was used, hardware, output sequence length, and input sequence length. We focused on both output and input sequence length, because they both affect inference speed. To generate an output sequence of 50 tokens, the model must perform 50 inference steps. Furthermore, the time required to perform an inference step depends on the size of the input sequence—larger inputs require more processing time. Although the effect of output sequence size is much larger than the effect of input sequence size, it’s still necessary to account for both factors.

In our experiment, we used the following design:

  • DeepSpeed inference engine – On, off
  • Hardware – T4 (ml.g4dn.2xlarge), V100 (ml.p3.2xlarge)
  • Input sequence length – 50, 200, 500, 1000
  • Output sequence length – 50, 100, 150, 200

In total, this design has 64 combinations of these four factors, and for each combination, we ran 20 latency tests. Each test was run on a pre-initialized SageMaker inference endpoint, ensuring that our latency tests reflect production times, including API exchanges and preprocessing.

Our tests demonstrate that DeepSpeed’s GPT-J inference engine is substantially faster than the baseline Hugging Face Transformers PyTorch implementation. The following figure illustrates the mean text generation latencies for GPT-J with and without DeepSpeed acceleration on ml.g4dn.2xlarge and ml.p3.2xlarge SageMaker inference endpoints.

On the ml.g4dn.2xlarge instance, which is equipped with a 16 GB NVIDIA T4 GPU, we observed a mean latency reduction of approximately 24% [Standard Deviation (SD) = 0.05]. This corresponded to an increase from a mean 12.5 (SD = 0.91) tokens per second to a mean 16.5 (SD = 2.13) tokens per second. Notably, DeepSpeed’s acceleration effect was even stronger on the ml.p3.2xlarge instance, which is equipped with an NVIDIA V100 GPU. On that hardware, we observed a 53% (SD = .07) mean latency reduction. In terms of tokens per second, this corresponded to an increase from a mean 21.9 (SD = 1.97) tokens per second to a mean 47.5 (SD = 5.8) tokens per second.

We also observed that the acceleration offered by DeepSpeed attenuated slightly on both hardware configurations as the size of the input sequences grew. However, across all conditions, inference with DeepSpeed’s GPT-J optimizations was still substantially faster than the baseline. For example, on the g4dn instance, the maximum and minimum latency reductions were 31% (input sequence size = 50) and 15% (input sequence size = 1000), respectively. And on the p3 instance, the maximum and minimum latency reductions were 62% (input sequence size = 50) and 40% (input sequence size = 1000), respectively.

Deploying GPT-J with DeepSpeed on a SageMaker inference endpoint

In addition to dramatically increasing text generation speeds for GPT-J, DeepSpeed’s inference engine is simple to integrate into a SageMaker inference endpoint. Before adding DeepSpeed to our inference stack, our endpoints were running on a custom Docker image based on an official PyTorch image. SageMaker makes it very easy to deploy custom inference endpoints, and integrating DeepSpeed was as simple as including the dependency and writing a few lines of code. The open-sourced guide to the deployment workflow to deploy GPT-J with DeepSpeed is available on GitHub.

Conclusion

Mantium is dedicated to leading innovation so that everyone can quickly build with AI. From AI-driven process automation to stringent safety and compliance settings, our complete platform provides all the tools necessary to develop and manage robust, responsible AI applications at scale and lowers the barrier to entry. SageMaker helps companies like Mantium get to market quickly.

To learn how Mantium can help you build complex AI-driven workflows for your organization, visit www.mantiumai.com.


About the authors

Joe Hoover is a Senior Applied Scientist on Mantium’s AI R&D team. He is passionate about developing models, methods, and infrastructure that help people solve real-world problems with cutting-edge NLP systems. In his spare time, he enjoys backpacking, gardening, cooking, and hanging out with his family.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Sunil Padmanabhan is a Startup Solutions Architect at AWS. As a former startup founder and CTO, he is passionate about machine learning and focuses on helping startups leverage AI/ML for their business outcomes and design and deploy ML/AI solutions at scale.

Read More