A version of the BERT language model that’s 20 times as fast

December 3, 2020

by admin Amazon AWS

Determining the optimal architectural parameters reduces network size by 84% while improving performance on natural-language-understanding tasks.Read More

Performing simulations at scale with Amazon SageMaker Processing and R on RStudio

December 2, 2020

by Michael Hsieh Amazon AWS

Statistical analysis and simulation are prevalent techniques employed in various fields, such as healthcare, life science, and financial services. The open-source statistical language R and its rich ecosystem with more than 16,000 packages has been a top choice for statisticians, quant analysts, data scientists, and machine learning (ML) engineers. RStudio is an integrated development environment (IDE) designed for data science and statistics for R users. However, an RStudio IDE hosted on a single machine used for day-to-day interactive statistical analysis isn’t suited for large-scale simulations that can require scores of GB of RAM (or more). This is especially difficult for scientists wanting to run analyses locally on a laptop or a team of statisticians developing in RStudio on one single instance.

In this post, we show you a solution that allows you to offload a resource-intensive Monte Carlo simulation to more powerful machines, while still being able to develop your scripts in your RStudio IDE. This solution takes advantage of Amazon SageMaker Processing.

Amazon SageMaker and SageMaker Processing

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality ML artifacts. Running workloads on SageMaker is easy. When you’re ready to fit a model in SageMaker, simply specify the location of your data in Amazon Simple Storage Service (Amazon S3) and indicate the type and quantity of SageMaker ML instances you need. SageMaker sets up a distributed compute cluster, performs the training, outputs the result to Amazon S3, and tears down the cluster when complete.

SageMaker Processing allows you to quickly and easily perform preprocessing and postprocessing on data using your own scripts or ML models. This use case fits the pattern of many R and RStudio users, who frequently perform custom statistical analysis using their own code. SageMaker Processing uses the AWS Cloud infrastructure to decouple the development of the R script from its deployment, and gives you flexibility to choose the instance it’s deployed on. You’re no longer limited to the RAM and disk space limitations of the machine you develop on; you can deploy the simulation on a larger instance of your choice.

Another major advantage of using SageMaker Processing is that you’re only billed (per second) for the time and resources you use. When your script is done running, the resources are shut down and you’re no longer billed beyond that time.

Statisticians and data scientists using the R language can access SageMaker features and the capability to scale their workload via the Reticulate library, which provides an R interface to the SageMaker Python SDK library. Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability. The Reticulate package provides an R interface to make API calls to SageMaker with the SageMaker Python SDK. We use Reticulate to interface SageMaker Python SDK in this post.

Alternatively, you can access SageMaker and other AWS services via Paws. Paws isn’t an official AWS SDK, but it covers most of the same functionality as the official SDKs for other languages. For more information about accessing AWS resources using Paws, see Getting started with R on Amazon Web Services.

In this post, we demonstrate how to run non-distributed, native R programs with SageMaker Processing. If you have distributed computing use cases using Spark and SparkR within RStudio, you can use Amazon EMR to power up your RStudio instance. To learn more, see the following posts:

Use case

In many use cases, more powerful compute instances are desired for developers conducting analyses on RStudio. For this post, we consider the following use case: the statisticians in your team have developed a Monte Carlo simulation in the RStudio IDE. The program requires some R libraries and it runs smoothly with a small number of iterations and computations. The statisticians are cautious about running a full simulation because RStudio is running on an Amazon Elastic Compute Cloud (Amazon EC2) instance shared by 10 other statisticians on the same team. You’re all running R analyses at a certain scale, which that makes the instance very busy most of the time. If anyone starts a full-scale simulation, it slows everyone’s RStudio session and possibly freezes the RStudio instance.

Even for a single user, running a large-scale simulation on a small- or a mid-sized EC2 instance is a problem that this solution can solve.

To walk through the solution for this use case, we designed a Monte Carlo simulation: given an area of certain width and length and a certain number of people, the simulation randomly places the people in the area and calculates the number of social distancing violations; each time a person is within 6 units of another, two violations are counted (because each person is violating the social distance rules). The simulation then calculates the average violation per person. For example, if there are 10 people and two violations, the average is 0.2. How many violations occur is also a function of how the people in the area are positioned. People can be bunched together, causing many violations, or spread out, causing fewer violations. The simulation performs many iterations of this experiment, randomly placing people in the area for each iteration (this is the characteristic that makes it a Monte Carlo simulation).

Solution overview

With a couple of lines of R code and a Docker container that encapsulates the runtime dependencies, you can dispatch the simulation program to the fully managed SageMaker compute infrastructure with the desired compute resources at scale. You can interactively submit the simulation script from within RStudio hosted on an EC2 instance to SageMaker Processing with a user-defined Docker container hosted on Amazon Elastic Container Registry (Amazon ECR) and data located on Amazon S3 (we discuss Docker container basics in the Building an R container in RStudio IDE and hosting it in Amazon ECR section). SageMaker Processing takes care of provisioning the infrastructure, running the simulation, reading and saving the data to Amazon S3, and tearing tear down the compute without any manual attention on the infrastructure.

The following diagram illustrates this solution architecture.

Deploying the resources

We first deploy an RStudio Server on an EC2 instance inside a VPC using an AWS CloudFormation template, which is largely based on the post Using R with Amazon SageMaker with some modifications. In addition to the RStudio Server, we install the Docker engine, SageMaker Python SDK, and Reticulate as part of the deployment. To deploy your resources, complete the following steps:

Download the CloudFormation template
On the AWS CloudFormation console, choose Template is ready.
Choose Upload a template file.
Choose Choose file.
Upload the provided ec2_ubuntu_rstudio_sagemaker.yaml template.

The template is designed to work in the following Regions:

us-east-1
us-east-2
us-west-2
eu-west-1

In the YAML file, you can change the instance type to a different instance. For this workload, we recommend an instance no smaller than a t3.xlarge for running RStudio smoothly.

Choose Next.

For Stack name, enter a name for your stack.
For AcceptRStudioLicenseAndInstall, review and accept the AGPL v3 license for installing RStudio Server on Amazon EC2.
For KeyName, enter an Amazon EC2 key pair that you have previously generated to access an Amazon EC2 instance.

For instructions on creating a key pair, see Amazon EC2 key pairs and Linux instances.

Choose Next.
In the Configure stack options section, keep everything at their default values.
Choose Next.
Review the stack details and choose Create stack.

Stack creation takes about 15-20 minutes to complete.

When stack creation is complete, go to the stack’s Outputs tab on the AWS CloudFormation console to find the RStudio IDE login URL: ec2-xx-xxx-xxx-xxx.us-west-2.compute.amazonaws.com:8787.
Copy the URL and enter it into your preferred browser.

You should then see the RStudio sign-in page, as in the following screenshot.

This setup is for demonstration purposes. Using a public-facing EC2 instance and simple login credential is not the best security practice to host your RStudio instance.

You now clone the code repository via the command-line terminal in the RStudio IDE.

Switch to Terminal tab and execute the command:

git clone https://github.com/aws-samples/amazon-sagemaker-statistical-simulation-rstudio.git

This repository contains the relevant scripts needed to run the simulations and the files to create our Docker container.

Running small simulations locally

On the R console (Console tab), enter the following code to set the working directory to the correct location and install some dependencies:

setwd("~/amazon-sagemaker-statistical-simulation-rstudio/Submit_SageMaker_Processing_Job/")
install.packages(c('doParallel'))

For illustration purposes, we run small simulations on the development machine (the EC2 instance that RStudio is installed on). You can also find the following code in the script Local_Simulation_Commands.R.

On the R Console, we run a very small simulation with 10 iterations:

# takes about: 5.5 seconds
max_iterations <- 10
x_length <- 1000
y_length <- 1000
num_people <- 1000

local_simulation <- 1 # we are running the simulation locally on 1 vCPU
cmd = paste('Rscript Social_Distancing_Simulations.R --args',paste(x_length),paste(y_length), paste(num_people),paste(max_iterations),paste(local_simulation), sep = ' ')
result = system(cmd)

The result is a mean number of 0.11 violations per person, and the time it took to calculate this result was about 5.5 seconds on a t3.xlarge (the precise number of violations per person and time it takes to perform the simulation may vary).

You can play around with running this simulation with different numbers of iterations. Every tenfold increase corresponds to approximately a tenfold increase in the time needed for the simulation. To test this, I ran this simulation with 10,000 iterations, and it finished after 1 hour and 33 minutes. Clearly, a better approach is needed for developers. (If you’re interested in running these, you can find the code in Local_Simulation_Commands.R.)

Building an R container in RStudio IDE and hosting it in Amazon ECR

SageMaker Processing runs your R scripts with a Docker container in a remote compute infrastructure. In this section, we provide an introduction to Docker, how to build a Docker container image, and how to host it in AWS to use in SageMaker Processing.

Docker is a software platform that allows you to build once and deploy applications quickly into any compute environment. Docker packages software into standardized units called containers that have everything the software needs to run, including libraries, system tools, code, and runtime. Docker containers provide isolation and portability for your workload.

A Docker image is a read-only template that defines your container. The image contains the code to run, including any libraries and dependancies your code needs. Docker builds images by reading the instructions from a Dockerfile, which is a text document that contains all the commands you can call on the command line to assemble an image. You can build your Docker images from scratch or base them on other Docker images that you or others have built.

Images are stored in repositories that are indexed and maintained by registries. An image can be pushed into or pulled out of a repository using its registry address, which is similar to a URL. AWS provides Amazon ECR, a fully managed Docker container registry that makes it easy to store, manage, and deploy Docker container images.

Suppose that the Social_Distancing_Simulations.R was originally developed with R 3.4.1 Single Candle from a version of RStudio on ubuntu 16.04. The program uses the library doParallel for parallelism purposes. We want to run the simulation using a remote compute cluster exactly as developed. We need to either install all the dependencies on the remote cluster, which can be difficult to scale, or build a Docker image that has all the dependencies installed in the layers and run it anywhere as a container.

In this section, we create a Docker image that has an R interpreter with dependent libraries to run the simulation and push the image to Amazon ECR so that, with the image, we can run our R script the exact way on any machine that may or may not have the compatible R or the R packages installed, as long as there is a Docker engine on the host system. The following code is the Dockerfile that describes the runtime requirement and how the container should be executed.

#### Dockerfile
FROM ubuntu:16.04

RUN apt-get -y update && apt-get install -y --no-install-recommends 
    wget 
    r-base 
    r-base-dev 
    apt-transport-https 
    ca-certificates

RUN R -e "install.packages(c('doParallel'), repos='https://cloud.r-project.org')"

ENTRYPOINT ["/usr/bin/Rscript"]

Each line is an instruction to create a layer for the image:

FROM creates a layer from the ubuntu:16.04 Docker image
RUN runs shell command lines to create a new layer
ENTRYPOINT allows you configure how a container can be run as an executable

The Dockerfile describes what dependency (ubuntu 16.04, r-base, or doParallel) to include in the container image.

Next, we need to build a Docker image from the Dockerfile, create an ECR repository, and push the image to the repository for later use. The provided shell script build_and_push_docker.sh performs all these actions. In this section, we walk through the steps in the script.

Execute the main script build_and_push_docker.sh that we prepared for you in the terminal:

cd /home/ubuntu/amazon-sagemaker-statistical-simulation-rstudio/
sh build_and_push_docker.sh r_simulation_container v1

The shell script takes two input arguments: a name for the container image and repository, followed by a tag name. You can replace the name r_simulation_container with something else if you want. v1 is the tag of the container, which is the version of the container. You can change that as well. If you do so, remember to change the corresponding repository and image name later.

If all goes well, you should see lots of actions and output indicating that Docker is building and pushing the layers to the repository, followed by a message like the following:

v1: digest: sha256:91adaeb03ddc650069ba8331243936040c09e142ee3cd360b7880bf0779700b1 size: 1573

You may receive warnings regarding storage of credentials. These warnings don’t interfere with pushing the container to ECR, but can be fixed. For more information, see Credentials store.

In the script, the docker build command builds the image and its layers following the instruction in the Dockerfile:

#### In build_and_push_docker.sh
docker build -t $image_name .

The following commands interact with Amazon ECR to create a repository:

#### In build_and_push_docker.sh
# Get the AWS account ID
account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

# Define the full image name on Amazon ECR
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${image_name}:${tag}"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${image_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${image_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region} 
  | docker login 
      --username AWS 
      --password-stdin ${account}.dkr.ecr.${region}.amazonaws.com

Finally, the script tags the image and pushes it to the ECR repository:

#### In build_and_push_docker.sh
# Tag and push the local image to Amazon ECR
docker tag ${image_name} ${fullname}
docker push ${fullname}

At this point, we have created a container and pushed it to a repository in Amazon ECR. We can confirm it exists on the Amazon ECR console.

Copy and save the URI for the image; we need it in a later step.

We can use this image repeatedly to run any R scripts that use doParallel. If you have other dependencies, either R native packages that can be downloaded and installed from CRAN (the Comprehensive R Archive Network) with install.packages() or packages that have other runtime dependencies. For instance, RStan, a probabilistic package that implements full Bayesian statistical inference via Markov Chain Monte Carlo that depends on Stan and C++, can be installed into a Docker image by translating their installation instructions in a Dockerfile.

Modifying your R script for SageMaker Processing

Next, we need to modify the existing simulation script so it can talk to the resources available to the running container in the SageMaker Processing compute infrastructure. The resource we need to make the script aware is typically input and output data from S3 buckets. The SageMaker Processing API allows you to specify where the input data is and how it should be mapped to the container so you can access programmatically in the script.

For example, in the following diagram, if you specify the input data s3://bucket/path/to/input_data to be mapped to /opt/ml/processing/input, you can access your input data within the script and container in /opt/ml/processing/input. SageMaker Processing manages the data transfer between the S3 buckets and the container. Similarly, for output, if you need to persist any artifact, you can save the them to /opt/ml/processing/output within the script. The files are then available in the s3://bucket/path/to/output_data.

The only change for the Social_Distancing_Simulations.R script is where the output file gets written to. Instead of a file path on the local EC2 instance, we change it to write to /opt/ml/processing/output/output_result_mean.txt.

Submitting your R script to SageMaker Processing

Very large simulations may be slow on a local machine. As we saw earlier, doing 10,000 iterations of the social distancing simulation takes about 1 hour and 33 minutes on the local machine using 1 vCPU. Now we’re ready to run the simulation with SageMaker Processing.

With SageMaker Processing, we can use the remote compute infrastructure to run the simulation and free up the local compute resources. SageMaker spins up a Processing infrastructure, takes your script, copies your input data from Amazon S3 (if any), and pulls the container image from Amazon ECR to perform the simulation.

SageMaker fully manages the underlying infrastructure for a Processing job. Cluster resources are provisioned for the duration of your job, and cleaned up when a job is complete. The output of the Processing job is stored in the S3 bucket you specified. You can treat your RStudio instance as a launching station to submit simulations to remote compute with various parameters or input datasets.

The complete SageMaker API is accessible through the Reticulate library, which provides an R interface to make calls to the SageMaker Python SDK. To orchestrate these steps, we use another R script.

Copy the following code into the RStudio console. Set the variable container to the URI of the container with the tag (remember to include the tag, and not just the container). It should look like XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/r_simulation_container:v1. You can retrieve this URI from the Amazon ECR console by choosing the r_simulation_container repository and copying the relevant URI from the Image URI field (this code is also in the SageMaker_Processing_SDS.R script):

library(reticulate)

use_python('/usr/bin/python') # this is where we installed the SageMaker Python SDK
sagemaker <- import('sagemaker')
session <- sagemaker$Session()
bucket <- session$default_bucket()
role_arn <- sagemaker$get_execution_role()

## using r_container
container <- 'URI TO CONTAINER AND TAG' # can be found under $ docker images. Remember to include the tag

# one single run
processor <- sagemaker$processing$ScriptProcessor(role = role_arn,
                                                  image_uri = container,
                                                  command = list('/usr/bin/Rscript'),
                                                  instance_count = 1L,
                                                  instance_type = 'ml.m5.4xlarge',
                                                  volume_size_in_gb = 5L,
                                                  max_runtime_in_seconds = 3600L,
                                                  base_job_name = 'social-distancing-simulation',
                                                  sagemaker_session = session)

max_iterations <- 10000
x_length <- 1000
y_length <- 1000
num_people <- 1000

is_local <- 0 #we are going to run this simulation with SageMaker processing
result=processor$run(code = 'Social_Distancing_Simulations.R',
              outputs=list(sagemaker$processing$ProcessingOutput(source='/opt/ml/processing/output')),
              arguments = list('--args', paste(x_length), paste(y_length), paste(num_people), paste(max_iterations),paste(is_local)),
              wait = TRUE,
              logs = TRUE)

In the preceding code, we’re off-loading the heavy simulation work to a remote, larger EC2 instance (instance_type = 'ml.m5.4xlarge'). Not only do we not consume any local compute resources, but we also have an opportunity to optimally choose a right-sized instance to perform the simulation on a per-job basis. The machine that we run this simulation on is a general purpose instance with 64 GB RAM and 16 virtual CPUs. The simulation runs faster in the right-sized instance. For example, when we used the ml.m5.4xlarge (64 GB RAM and 16 vCPUs), the simulation took 10.8 minutes. By way of comparison, we performed this exact same simulation on the local development machine using only 1 vCPU and the exact same simulation took 93 minutes.

If you want to run another simulation that is more complex, with more iterations or with a larger dataset, you don’t need to stop and change your EC2 instance type. You can easily change the instance with the instance_type argument to a larger instance for more RAM or virtual CPUs, or to a compute optimized instance, such as ml.c5.4xlarge, for cost-effective high performance at a low price per compute ratio.

We configured our job (by setting wait = TRUE) to run synchronously. The R interpreter is busy until the simulation is complete even though the job is running in a remote compute. In many cases (such as simulations that last many hours) it’s more useful to set wait = FALSE to run the job asynchronously. This allows you to proceed with your script and perform other tasks within RStudio while the heavy-duty simulation occurs via the SageMaker Processing job.

You can inspect and monitor the progress of your jobs on the Processing jobs page on the SageMaker console (you can also monitor jobs via API calls).

The following screenshot shows the details of our job.

The Monitoring section provides links to Amazon CloudWatch logs for your jobs. This important feature allows you to monitor the logs in near-real time as the job runs, and take necessary action if errors or bugs are detected.

Because logs are reported in near-real time, you don’t have to wait until an entire job is complete to detect problems; you can rely on the emitted logs.

For more information about how SageMaker Processing runs your container image and simulation script, see How Amazon SageMaker Processing Runs Your Processing Container Image.

Accessing simulation results from your R script

Your processing job writes its results to Amazon S3; you can control what is written and in what format it’s written. The Docker container on which the processing job runs writes out the results to the /opt/ml/processing/output directory; this is copied over to Amazon S3 when the processing job is complete. In the Social_Distancing_Simulations.R script, we write the mean of the entire simulation run (this number corresponds to the mean number of violations per person in the room). To access those results, enter the following code (this code is also in SageMaker_Processing_SDS.R script):

get_job_results <- function(session,processor){
    #get the mean results of the simulation
    the_bucket=session$default_bucket()
    job_name=processor$latest_job$job_name
    cmd=capture.output(cat('aws s3 cp s3://',the_bucket,"/",job_name,"/","output/output-1/output_result_mean.txt .",fill = FALSE,sep="")
    )
    system(cmd)
    my_data <- read.delim('output_result_mean.txt',header=FALSE)$V1
    return(my_data)
    }

simulation_mean=get_job_results(session,processor)
cat(simulation_mean) #displays about .11

In the preceding code, we point to the S3 bucket where the results are stored, read the result, and display it. For our use case, our processing job only writes out the mean of the simulation results, but you can configure it to write other values as well.

The following table compares the total time it took us to perform the simulation on the local machine one time, as well as two other instances you can use for SageMaker Processing. For these simulations, the number of iterations changes, but x_length, y_length, and num_people equal 1000 in all cases.

	Instance Type
Number of Iterations	t3.xlarge (local machine)	ml.m5.4xlarge (SageMaker Processing)	ml.m5.24xlarge (SageMaker Processing)
10	5.5 (seconds)	254	285
100	87	284	304
1,000	847	284	253
10,000	5602	650	430
100,000	Not Tested	Not Tested	1411

For testing on the local machine, we restrict the number of virtual CPUs (vCPU) to 1; the t3.xlarge has 4 vCPUs. This restriction mimics a common pattern in which a large machine is shared by multiple statisticians, and one statistician might not distribute work to multiple CPUs for fear of slowing down their colleagues’ work. For timing for the ml.m5.4xlarge and the ml.m5.24xlarge instances, we use all vCPUs and include the time taken by SageMaker Processing to bring up the requested instance and write the results, in addition to the time required to perform the simulation itself. We perform each simulation one time.

As you can see from the table, local machines are more efficient for fewer iterations, but larger machines using SageMaker Processing are faster when the number of iterations gets to 1,000 or more.

(Optional) Securing your workload in your VPC

So far, we have submitted the SageMaker Processing jobs to a SageMaker managed VPC and accessed the S3 buckets via public internet. However, in healthcare, life science, and financial service industries, it’s usually required to run production workloads in a private VPC with strict networking configuration for security purposes. It’s a security best practice to launch your SageMaker Processing jobs into a private VPC where you can have more control over the network configuration and access the S3 buckets via an Amazon S3 VPC endpoint. For more information and setup instructions, see Give SageMaker Processing Jobs Access to Resources in Your Amazon VPC.

We have provisioned an Amazon S3 VPC endpoint attached to the VPC as part of the CloudFormation template. To launch a job into a private VPC, we need to add the network configuration to an additional argument network_config to the ScriptProcessor construct:

subnet <- 'subnet-xxxxxxx'  # can be found in CloudFormation > Resources
security_group <- 'sg-xxxxxxx'  # can be found in CloudFormation > Resources
network <- sagemaker$network$NetworkConfig(subnets = list(subnet), 
                                           security_group_ids = list(security_group),
                                           enable_network_isolation = TRUE)

processor <- sagemaker$processing$ScriptProcessor(..., network_config = network)

When you run processor$run(...), the SageMaker Processing job is forced to run inside the specified VPC rather than the SageMaker managed VPC, and access the S3 bucket via the Amazon S3 VPC endpoint rather than public internet.

Cleaning up

When you complete this post, delete the stack from the AWS CloudFormation console by selecting the stack and choosing Delete. This cleans up all the resources we created for this post.

Conclusion

In this post, we presented a solution using SageMaker Processing as a compute resource extension for R users performing statistical workload on RStudio. You can obtain the scalability you desire with a few lines of code to call the SageMaker API and a reusable Docker container image, without leaving your RStudio IDE. We also showed how you can launch SageMaker Processing jobs into your own private VPC for security purposes.

A question that you might be asking is: Why should the data scientist bother with job submission in the first place? Why not just run RStudio on a very, very large instance that can handle the simulations locally? The answer is that although this is technically possible, it could be expensive, and doesn’t scale to teams of even small sizes. For example, assume your company has 10 statisticians that need to run simulations that use up to 60 GB of RAM; they need to run in aggregate 1,200 total hours (50 straight days) of simulations. If each statistician is provisioned with their own m5.4xlarge instance for 24/7 operation, it costs about 10 * 24 * 30 * $0.768 = $5,529 a month (on-demand Amazon EC2 pricing in us-west-2 as of December 2020). By comparison, provisioning one m5.4xlarge instance to be shared by 10 statisticians to perform exploratory analysis and submit large-scale simulations in SageMaker Processing costs only $553 a month on Amazon EC2, and an additional $1,290 for the 1,200 total hours of simulation on ml.m5.4xlarge SageMaker ML instances ($1.075 per hour).

For more information about R and SageMaker, see the R User Guide to Amazon SageMaker. For details on SageMaker Processing pricing, see the Processing tab on Amazon SageMaker pricing.

About the Authors

Michael Hsieh is a Senior AI/ML Specialist Solutions Architect. He works with customers to advance their ML journey with a combination of AWS ML offerings and his ML domain knowledge. As a Seattle transplant, he loves exploring the great mother nature the city has to offer, such as the hiking trails, scenery kayaking in the SLU, and the sunset at the Shilshole Bay.

Joshua Broyde is an AI/ML Specialist Solutions Architect on the Global Healthcare and Life Sciences team at Amazon Web Services. He works with customers in healthcare and life sciences on a number of AI/ML fronts, including analyzing medical images and video, analyzing machine sensor data and performing natural language processing of medical and healthcare texts.

Delivering operational insights directly to your on-call team by integrating Amazon DevOps Guru with Atlassian Opsgenie

December 2, 2020

by Adam Strickland Amazon AWS

As organizations continue to adopt microservices, the number of disparate services that contribute to delivering applications increases, driving the scope of signals that on-call teams monitor to grow exponentially. It’s becoming more important than ever for these teams to have tools that can quickly and autonomously detect anomalous behaviors across the services they support. Amazon DevOps Guru uses machine learning (ML) to quickly identify when your applications are behaving outside of their normal operating patterns, and may even predict these anomalous behaviors before they become a problem. You can deliver these insights in near-real-time directly to your on-call teams by integrating DevOps Guru with Atlassian Opsgenie, allowing them to immediately react to critical anomalies.

Opsgenie is an alert management solution that ensures critical alerts are delivered to the right person in your on-call team, and includes a preconfigured integration for DevOps Guru. This makes it easy to configure the delivery of notifications from DevOps Guru to Opsgenie via Amazon Simple Notification Service (Amazon SNS) in three simple steps. This post will walk you through configuring the delivery of these notifications.

Configuring DevOps Guru Integration

To start integrating DevOps Guru with Opsgenie, complete the following steps:

On the Opsgenie console, choose Settings.

In the navigation pane, choose Integration list.
Filter the list of built-in integrations by DevOps Guru.

Hover over Amazon DevOps Guru and choose Add.

This integration has been pre-configured with a set of defaults that work for many teams. However, you can also customize the integration settings to meet your needs on the Advanced configuration page.

When you’re ready, assign the integration to a team.
Save a copy of the subscription URL (you will need this later).
Choose Save Integration.

Creating an SNS topic and subscribing Opsgenie

To configure Amazon SNS notifications, complete the following steps:

On the Amazon SNS console, choose Topics.
Choose Create topic.
For Type, select Standard.
For name, enter a name, such as operational-insights.
Leave the default settings as they are or configure them to suit your needs.
Choose Create Topic.
After the topic has been created, scroll down to the Subscriptions section and choose Create subscription.
For Protocol, choose HTTPS.
For Endpoint, enter the subscription URL you saved earlier.
Leave the remaining options as the defaults, or configure them to meet your needs.
Choose Create subscription.

Upon creating the subscription, Amazon SNS sends a confirmation message to your Opsgenie integration, which Opsgenie automatically acknowledges on your behalf.

Opsgenie is now ready to receive notifications from DevOps Guru, and there’s just one thing left to do: configure DevOps Guru to monitor your resources and send notifications to our newly created SNS topic.

Setting up Amazon DevOps Guru

The first time you browse to the DevOps Guru console, you will need to enable DevOps Guru to operate on your account.

On the DevOps Guru console, choose Get Started.

If you have already enabled DevOps Guru, you can add your SNS topic by choosing Settings on the DevOps Guru Console, and then skip to step 3.

Select the Resources you want to monitor (for this post, we chose Analyze all AWS resources in the current AWS account).
For Choose an SNS notification topic, select Select an existing SNS topic.
For Choose a topic in your AWS account, choose the topic you created earlier (operational-insights).
Choose Add SNS topic.
Choose Enable (or Save if you have already enabled the service).

DevOps Guru starts monitoring your resources and learning what’s normal behavior for your applications.

Sample application

For DevOps Guru to have an application to monitor, I use AWS CodeStar to build and deploy a simple web service application using Amazon API Gateway and AWS Lambda. The service simply returns a random number.

After deploying my app, I configure a simple load test to run endlessly, and leave it running for a few hours to allow DevOps Guru to baseline the behavior of my app.

Generating Insights

Now that my app has been running for a while, it’s time to change the behavior of my application to generate an insight. To do this, I deployed a small code change that introduces some random latency and HTTP 5xx errors.

Soon after, Opsgenie sends an alert to my phone, triggered by an insight from DevOps Guru. The following screenshot shows the alert I received in Opsgenie.

From this alert, I can see that there is an anomaly in the latency of my random number service. Choosing the InsightUrl provided in the alert directs me to the DevOps Guru console, where I can start digging into the events and metrics that lead to this insight being generated.

The Relevant events page shows an indicator of the events that occurred in the lead-up to the change in behavior. In my case, the key event was a deployment triggered by the update to the code in the Lambda function.

The DevOps Guru Insights page also provides the pertinent metrics that can be used to further highlight the behavior change—in my case, the duration of my Lambda function and the number of API Gateway 5xx errors had increased.

Resolving the error

Now that I’ve investigated the cause of the anomalous behavior, I resolve it by rolling back the code and redeploying. Shortly after, my application returns to normal behavior. DevOps Guru automatically resolves the insight and sends a notification to Opsgenie, closing the related alert.

To confirm that the application is behaving normally again, I return to the Insights page and check the pertinent metrics, where I can see that they have indeed returned to normal again.

If you plan on testing DevOps Guru in this way, keep in mind that the service learns the behavior of your app over time, and a continual break and fix cycle in your app may eventually be considered normal behavior, no longer generating new insights.

Conclusion

Amazon DevOps Guru continuously analyzes streams of disparate data and monitors thousands of metrics to establish normal application behavior. It’s available now in preview, and the Atlassian Opsgenie integration for Amazon DevOps Guru is also available to use now. Opsgenie centralizes alerts from monitoring, logging, and ITSM tools so Dev and IT Ops teams can stay aware and in control. Opsgenie’s flexible rules engine ensures critical alerts are never missed, and the right person is notified at the right time via email, phone, SMS, or mobile push notifications.

Sign up for the Amazon DevOps Guru preview today and start delivering insights about your applications directly to your on-call teams for immediate investigation and remediation using Opsgenie.

About the Author

Adam Strickland is a Principal Solutions Architect based in Sydney, Australia. He has a strong background in software development across research and commercial organizations, and has built and operated global SaaS applications. Adam is passionate about helping Australian organizations build better software, and scaling their SaaS applications to a global audience.

Introducing AWS Panorama – Improve your operations with computer vision at the edge

December 2, 2020

by Banu Nagasundaram Amazon AWS

Yesterday at AWS re:Invent 2020, we announced AWS Panorama, a new machine learning (ML) Appliance and SDK, which allows organizations to bring computer vision (CV) to their on-premises cameras to make automated predictions with high accuracy and low latency. In this post, you learn how customers across a range of industries are using AWS Panorama to improve their operations by automating monitoring and visual inspection tasks.

For many organizations, deriving actionable insights from onsite camera video feeds to improve operations remains a challenge, whether it be increasing manufacturing quality, ensuring safety or operating compliance of their facilities, or analyzing customer traffic in retail locations. To derive these insights, customers must monitor live video of facilities or equipment, or review recorded footage after an incident has occurred, which is manual, error-prone, and difficult to scale.

Customers have begun to take advantage of CV models running in the cloud to automate these visual inspection tasks, but there are circumstances when relying exclusively on the cloud isn’t optimal due to latency requirements or intermittent connectivity. For these reasons, CV processed locally at the edge is needed, which makes the data immediately actionable. Some customers have begun exploring existing capabilities for CV at the edge with enterprise cameras, but find that many cameras lack the capability to perform on-device CV, or offer only simple, hard-coded ML models that can’t be customized or improved over time.

AWS Panorama

AWS Panorama, is an ML Appliance and SDK (software development kit) that enables you to add CV to your existing on-premises cameras or use new AWS Panorama-enabled cameras for edge CV, coming soon from partners like Axis Communications and Basler AG.

With AWS Panorama, you can use CV to help automate costly visual inspection tasks, with the flexibility to bring your own CV models, such as those built with Amazon SageMaker, or use pre-built models from AWS or third parties. AWS Panorama removes the heavy lifting from each step of the CV process by making it easier to use live video feeds to enhance tasks that traditionally required visual inspection and monitoring, like evaluating manufacturing quality, finding bottlenecks in industrial processes, assessing worker safety within your facilities, and analyzing customer traffic in retail stores.

The AWS Panorama Appliance

The AWS Panorama Appliance analyzes video feeds from onsite cameras, acting locally on your data in locations where network connectivity is intermittent, and generates highly accurate ML predictions within milliseconds to improve operations.

The AWS Panorama Appliance, when connected to a network, can discover and connect to existing IP cameras that support the ONVIF standard, and run multiple CV models per stream. Your cameras don’t need any built-in ML or smart capabilities, because the AWS Panorama Appliance provides the ability to add CV to your existing IP cameras.

With an IP62 rating, the AWS Panorama Appliance is dust proof and water resistant, making it appropriate for use in harsh environmental conditions, enabling you to bring CV to where it’s needed in industrial locations.

The AWS Panorama Device SDK

The AWS Panorama Device SDK is a device software stack for CV, sample code, APIs, and tools that will support the NVIDIA Jetson product family and Ambarella CV 2x product line. With the AWS Panorama Device SDK, device manufacturers can build new AWS Panorama-enabled edge devices and smart cameras that run more meaningful CV models at the edge, and offer you a selection of edge devices to satisfy your use cases. For more information, refer to the AWS Panorama SDK page.

Customer stories

In this section, we share the stories of customers who are developing with AWS Panorama to improve manufacturing quality control, retail insights, workplace safety, supply chain efficiency, and transportation and logistics, and are innovating faster with CV at the edge.

Manufacturing and industrial

AWS Panorama can help improve product quality and decrease costs that arise from common manufacturing defects by enabling you to take quick corrective action. With the AWS Panorama Appliance, you can run CV applications at the edge to detect manufacturing anomalies using videos from existing IP camera streams that monitor your manufacturing lines. You can integrate the real-time results with your on-premises systems, facilitate automation, and immediately improve manufacturing processes on factory floors or production lines.

“Many unique components go into each guitar, and we rely upon a skilled workforce to craft each part. With AWS Panorama and help from the Amazon Machine Learning Solutions Lab, we can track how long it takes for an associate to complete each task in the assembly of a guitar so that we’re able to optimize efficiency and track key metrics.”

– Michael Spandau, SVP Global IT, Fender.

“For packages at Amazon Fulfillment Centers to be successfully packed in a timely manner, the items must first be inbounded into our structured robotic field via an efficient stow process. Items are stowed individually into different bins within each pod carried by our robotic drive units. Today, we use ML computer vision action detection models deployed on SageMaker (in the cloud) to accurately predict the bin in which each item was placed. AWS Panorama gives us the flexibility to run these same models in real time on edge devices, which opens the door to further optimize the stowing process.”

– Joseph Quinlivan, Tech VP, Robotics & Fulfillment, Amazon

Reimagined retail insights

In retail environments, the AWS Panorama Appliance enables you to run multiple, simultaneous CV models on the video feeds from your existing onsite cameras. Applications for retail analytics, such as for people counting, heat mapping, and queue management, can help you get started quickly. With the streamlined management capabilities that AWS Panorama offers, you can easily scale your CV applications to include multiple process locations or stores. This means you can access insights faster and with more accuracy, allowing you to make real-time decisions that create better experiences for your customers.

“We want to use computer vision to better understand consumer needs in our stores, optimize operations, and increase the convenience for our visitors. We plan to use AWS Panorama to deploy different computer vision applications at our stores and experiment over time to strengthen our customer experience and value proposition.”

– Ian White, Senior Vice President, Strategic Marketing and Innovation, Parkland

“TensorIoT was founded on the instinct that the majority of the ‘compute’ is moving to the edge and all ‘things’ are becoming smarter. AWS Panorama has made moving computer vision to the edge much easier, and we’ve engaged with Parkland Fuel to use AWS Panorama to gather important retail analytics that will help their business thrive.”

– Ravikumar Raghunathan, CEO, TensorIoT

“Pilot.AI solves high-impact problems by developing computationally efficient algorithms to enable pervasive artificial intelligence running at the edge. With AWS Panorama, customers can rapidly add intelligence to their existing IP cameras and begin generating real-time insights on their retail operations using Pilot.AI’s high-performance computer vision models.”

– Jon Su, CEO, Pilot AI

Workplace safety

AWS Panorama allows you to monitor workplace safety, get notified immediately about any potential issues or unsafe situations, and take corrective action. AWS Panorama allows you to easily route real-time CV application results to AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon Kinesis Video Streams, or Amazon CloudWatch and gather analytics. This means you can make improved data-based decisions to enhance workplace safety and security for your employees.

“Bigmate is focused on critical risk management solutions that leverage computer vision to help organizations improve workplace health and safety. Whether it’s keeping your people out of the way of hazardous equipment or ensuring they have the proper Personal Protective Equipment (PPE), with AWS Panorama we can rapidly deploy a suite of apps using your existing CCTV cameras that provide real-time notifications to avoid critical events while providing you the data you need to drive a safety-first culture.”

– Brett Orr, General Manager Chairman, Bigmate

“Organizations are facing unprecedented demand to transform and secure their physical spaces. With Accenture’s Rhythm.IO, we’re focused on helping customers create maximal situational awareness and safer environments, whether for shopping, travel, or public safety, by fusing together operational data and multi-sensor inputs with computer vision insights from AWS Panorama.”

– Matthew Lancaster, Managing Director, Accenture

“Construction zones are dynamic environments. At any given time, you’ve got hundreds of deliveries and subcontractors sharing the site with heavy equipment, and it’s changing every day. INDUS.AI is focused on delivering construction intelligence for general contractors. Computer vision is an especially valuable tool for this because of its ability to handle multiple tasks at once. We are looking forward to delivering real-time insights on jobsite management and safety in a SaaS-like experience for AWS Panorama customers.”

– Matt Man, CEO, INDUS.AI

Supply chain efficiency

In manufacturing and assembly environments, AWS Panorama can help provide critical input to supply chain operations by tracking throughput and recognizing bar codes, labels of parts, or completed products. Customers in an assembly plant, for example, might want to use AWS Panorama to automatically identify labels and bar codes of goods received at certain identification points, for automatic booking of goods into a warehouse management system.

“Computer vision helps us innovate and optimize several processes, and the applications are endless. We want to use computer vision to assess the size of trucks coming to our granaries in order to determine the optimal loading dock for each truck. We also want to use computer vision to understand the movement of assets in our plants to remove bottlenecks. AWS Panorama enables all of these solutions with a managed service and edge appliance for deploying and managing a variety of local computer vision applications.”

– Victor Caldas, Computer Vision Capability Lead, Cargill

“Every month, millions of trucks enter Amazon facilities, so creating technology that automates trailer loading, unloading, and parking is incredibly important. Amazon’s Middle Mile Product and Technology (MMPT) has begun using AWS Panorama to recognize license plates on these vehicles and automatically expedite entry and exit for drivers. This enables a safe and fast visit to Amazon sites, ensuring faster package delivery for our customers.”

– Steve Armato, VP Middle Mile Product and Technology, Amazon

Transportation and logistics

AWS Panorama allows you to process data to improve infrastructure, logistics, and transportation; get notified immediately about any potential issues or unsafe situations; and implement appropriate solutions. AWS Panorama Appliance allows you to easily connect to existing network cameras, process videos right at the edge, and collect metrics for real-time intelligence, while complying with regulatory requirements on data privacy such as processing data locally without storing the videos locally or transmitting videos to the cloud. This means you can get the information needed to provide improved services to your personnel.

“Siemens Mobility has been a leader for seamless, sustainable, and secure transport solutions for more than 160 years. The Siemens ITS Digital Lab is the innovation team in charge of bringing the latest digital advances to the traffic industry, and is uniquely positioned to provide data analytics and AI solutions to public agencies. As cities face new challenges, municipalities have turned to us to innovate on their behalf. Cities would like to understand how to effectively manage their assets and improve congestion and direct traffic. We want to use AWS Panorama to bring computer vision to existing security cameras to monitor traffic and intelligently allocate curbside space, help cities optimize parking and traffic, and improve quality of life for their constituents.”

– Laura Sanchez, Innovation Manager, Siemens Mobility ITS Digital Lab

“The Future of Mobility practice at Deloitte is focused on helping supply chain leaders apply advanced technologies to their biggest transportation and logistics challenges. Computer vision is a powerful tool for helping organizations manage, track, and automate the safe movement of goods. AWS Panorama enables our customers to quickly add these capabilities to their existing camera infrastructure. We’re looking forward to using AWS Panorama to provide real-time intelligence on the location and status of shipping containers. We anticipate logistics providers leveraging this important technology throughout their ground operations.”

– Scott Corwin, Managing Director, Deloitte Future of Mobility

How to get started

You can improve your business operations with AWS Panorama in three steps:

Identify the process you want to improve with computer vision.
Develop CV models with SageMaker or use pre-built models from AWS or third parties. If you need CV expertise, take advantage of the wealth of experience that the AWS Panorama partners offer.
Get started now with the preview and evaluate, develop, and test your CV applications with the AWS Panorama Appliance Developer Kit.

About the Authors

Banu Nagasundaram is a Senior Product Manager – Technical for AWS Panorama. She helps enterprise customers to be successful using AWS AI/ML services and solves real world business problems. Banu has over 11 years of semiconductor technology experience prior to AWS, working on AI and HPC compute design for datacenter customers. In her spare time, she enjoys hiking and painting.

Jason Copeland is a veteran product leader at AWS with deep experience in machine learning and computer vision at companies including Apple, Deep Vision, and RingCentral. He holds an MBA from Harvard Business School.

Introducing the AWS Panorama Device SDK: Scaling computer vision at the edge with AWS Panorama-enabled devices

December 2, 2020

by Shardul Brahmbhatt Amazon AWS

Yesterday, at AWS re:Invent, we announced AWS Panorama, a new Appliance and Device SDK that allows organizations to bring computer vision to their on-premises cameras to make automated predictions with high accuracy and low latency. With AWS Panorama, companies can use compute power at the edge (without requiring video streamed to the cloud) to improve their operations by automating monitoring and visual inspection tasks like evaluating manufacturing quality, finding bottlenecks in industrial processes, and assessing worker safety within their facilities. The AWS Panorama Appliance is a hardware device that customers can install on their network to connect to existing cameras within your facility, to run computer vision models on multiple concurrent video streams.

This post covers how the AWS Panorama Device SDK helps device manufacturers build a broad portfolio of AWS Panorama-enabled devices. Scaling computer vision at the edge requires purpose-built devices that cater to specific customer needs without compromising security and performance. However, creating such a wide variety of computer vision edge devices is hard for device manufacturers because they need to do the following:

Integrate various standalone cloud services to create an end-to-end computer vision service that works with their edge device and provides a scalable ecosystem of applications for customers.
Invest in choosing and enabling silicon according to customers’ performance and cost requirements.

The AWS Panorama Device SDK addresses these challenges.

AWS Panorama Device SDK

The AWS Panorama Device SDK powers the AWS Panorama Appliance and allows device manufacturers to build AWS Panorama-enabled edge appliances and smart cameras. With the AWS Panorama Device SDK, device manufacturers can build edge computer vision devices for a wide array of use cases across industrial, manufacturing, worker safety, logistics, transportation, retail analytics, smart building, smart city, and other segments. In turn, customers have the flexibility to choose the AWS Panorama-enabled devices that meet their specific performance, design, and cost requirements.

The following diagram shows how the AWS Panorama Device SDK allows device manufacturers to build AWS Panorama-enabled edge appliances and smart cameras.

The AWS Panorama Device SDK includes the following:

Core controller – Manages AWS Panorama service orchestration between the cloud and edge device. The core controller provides integration to media processing and hardware accelerator on-device along with integration to AWS Panorama applications.
Silicon abstraction layer – Provides device manufacturers the ability to enable AWS Panorama across various silicon platforms and devices.

Edge devices integrated with the AWS Panorama Device SDK can offer all AWS Panorama service features, including camera stream onboarding and management, application management, application deployment, fleet management, and integration with event management business logic for real-time predictions, via the AWS Management Console. For example, a device integrated with the AWS Panorama Device SDK can automatically discover camera streams on the network, and organizations can review the discovered video feeds and name or group them on the console. Organizations can use the console to create applications by choosing a model and pairing it with business logic. After the application is deployed on the target device through the console, the AWS Panorama-enabled device runs the machine learning (ML) inference locally to enable high-accuracy and low-latency predictions.

To get device manufacturers started, the AWS Panorama Device SDK provides them with a device software stack for computer vision, sample code, APIs, and tools to enable and test their respective device for the AWS Panorama service. When ready, device manufacturers can work with the AWS Panorama team to finalize the integration of the AWS Panorama Device SDK and run certification tests ahead of their commercial launch.

Partnering with NVIDIA and Ambarella to enable the AWS Panorama Device SDK on their leading edge AI platforms

The AWS Panorama Device SDK will support the NVIDIA® Jetson product family and Ambarella CV 2x product line as the initial partners to build the ecosystem of hardware-accelerated edge AI/ML devices with AWS Panorama.

“Ambarella is in mass production today with CVflow AI vision processors for the enterprise, embedded, and automotive markets. We’re excited to partner with AWS to enable the AWS Panorama service on next-generation smart cameras and embedded systems for our customers. The ability to effortlessly deploy computer vision applications to Ambarella SoC-powered devices in a secure, optimized fashion is a powerful tool that makes it possible for our customers to rapidly bring the next generation of AI-enabled products to market.”

– Fermi Wang, CEO of Ambarella

“The world’s first computer created for AI, robotics, and edge computing, NVIDIA® Jetson AGX Xavier delivers massive computing performance to handle demanding vision and perception workloads at the edge. Our collaboration with AWS on the AWS Panorama Appliance powered by the NVIDIA Jetson platform accelerates time to market for enterprises and developers by providing a fully managed service to deploy computer vision from cloud to edge in an easily extensible and programmable manner.”

– Deepu Talla, Vice President and General Manager of Edge Computing at NVIDIA

Enabling edge appliances and smart cameras with the AWS Panorama Device SDK

Axis Communications, ADLINK Technology, Basler AG, Lenovo, STANLEY Security, and Vivotek will be using the AWS Panorama Device SDK to build AWS Panorama-enabled devices in 2021.

“We’re excited to collaborate to accelerate computer vision innovation with AWS Panorama and explore the advantages of the Axis Camera Application Platform (ACAP), our open application platform that offers users an expanded ecosystem, an accelerated development process, and ultimately more innovative, scalable, and reliable network solutions.”

– Johan Paulsson, CTO of Axis Communications AB

“The integration of AWS Panorama on ADLINK’s industrial vision systems makes for truly plug-and-play computer vision at the edge. In 2021, we will be making AWS Panorama-powered ADLINK NEON cameras powered by NVIDIA Jetson NX Xavier available to customers to drive high-quality computer vision powered outcomes much, much faster. This allows ADLINK to deliver AI/ML digital experiments and time to value for our customers more rapidly across logistics, manufacturing, energy, and utilities use cases.”

– Elizabeth Campbell, CEO of ADLINK USA

“Basler is looking forward to continuing our technology collaborations in machine learning with AWS in 2021. We will be expanding our solution portfolio to include AWS Panorama to allow customers to develop AI-based IoT applications on an optimized vision system from the edge to the cloud. We will be integrating AWS Panorama with our AI Vision Solution Kit, reducing the complexity and need for additional expertise in embedded hardware and software components, providing developers with a new and efficient approach to rapid prototyping, and enabling them to leverage the ecosystem of AWS Panorama computer vision application providers and systems integrators.”

– Arndt Bake, Chief Marketing Officer at Basler AG

“Going beyond traditional security applications, VIVOTEK developed its AI-driven video analytics from smart edge cameras to software applications. We are excited that we will be able to offer enterprises advanced flexibility and functionality through the seamless integration with AWS Panorama. What makes this joint force more powerful is the sufficient machine learning models that our solutions can benefit from AWS Cloud. We look forward to a long-term collaboration with AWS.”

– Joe Wu, Chief Technology Officer at VIVOTEK Inc.

Next steps

Join now to become a device partner and build edge computer vision devices with the AWS Panorama Device SDK.

About the Authors

As a Product Manager on the AWS AI Devices team, Shardul Brahmbhatt currently focuses on AWS Panorama. He is deeply passionate about building products that drive adoption of AI at the Edge.

Kamal Garg leads strategic hardware partnerships for AWS AI Devices. He is deeply passionate about incubating technology ecosystems that optimize the customer and developer experience . Over the last 5+ years, Kamal has developed strategic relationships with leading silicon and connected device manufacturers for next generation services like Alexa, A9 Visual Search, Prime Video, AWS IoT, Sagemaker Neo, Sagemaker Edge, and AWS Panorama.

Configuring autoscaling inference endpoints in Amazon SageMaker

December 2, 2020

by Chaitanya Hazarey Amazon AWS

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to quickly build, train, and deploy machine learning (ML) models at scale. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. You can one-click deploy your ML models for making low latency inferences in real-time on fully managed inference endpoints. Autoscaling is an out-of-the-box feature that monitors your workloads and dynamically adjusts the capacity to maintain steady and predictable performance at the possible lowest cost. When the workload increases, autoscaling brings more instances online. When the workload decreases, autoscaling removes unnecessary instances, helping you reduce your compute cost.

The following diagram is a sample architecture that showcases how a model is invoked for inference using an Amazon SageMaker endpoint.

Amazon SageMaker automatically attempts to distribute your instances across Availability Zones. So, we strongly recommend that you deploy multiple instances for each production endpoint for high availability. If you’re using a VPC, configure at least two subnets in different Availability Zones so Amazon SageMaker can distribute your instances across those Availability Zones.

Amazon SageMaker supports four different ways to implement horizontal scaling of Amazon SageMaker endpoints. You can configure some of these policies using the Amazon SageMaker console, the AWS Command Line Interface (AWS CLI), or the AWS SDK’s Application Auto Scaling API for the advanced options. In this post, we showcase how to configure using the boto3 SDK for Python and outline different scaling policies and patterns.

Prerequisites

This post assumes that you have a functional Amazon SageMaker endpoint deployed. Models are hosted within an Amazon SageMaker endpoint; you can have multiple model versions being served via the same endpoint. Each model is referred to as a production variant.

If you’re new to Amazon SageMaker and have not created an endpoint yet, complete the steps in Identifying bird species on the edge using the Amazon SageMaker built-in Object Detection algorithm and AWS DeepLens until the section Testing the model to develop and host an object detection model.

If you want to get started directly with this post, you can also fetch a model from the MXNet model zoo. For example, if you plan to use ResidualNet152, you need the model definition and the model weights inside a tarball. You can also create custom models that can be hosted as an Amazon SageMaker endpoint. For instructions on building a tarball with Gluon and Apache MXNet, see Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker.

Configuring autoscaling

The following are the high-level steps for creating a model and applying a scaling policy:

Use Amazon SageMaker to create a model or bring a custom model.
Deploy the model.

If you use the MXNet estimator to train the model, you can call deploy to create an Amazon SageMaker endpoint:

# Train my estimator
mxnet_estimator = MXNet('train.py',
                framework_version='1.6.0',
                py_version='py3',
                instance_type='ml.p2.xlarge',
                instance_count=1)

mxnet_estimator.fit('s3://my_bucket/my_training_data/')

# Deploy my estimator to an Amazon SageMaker endpoint and get a Predictor
predictor = mxnet_estimator.deploy(instance_type='ml.m5.xlarge',
                initial_instance_count=1)#Instance_count=1 is not recommended for production use. Use this only for experimentation.

If you use a pretrained model like ResidualNet152, you can create an MXNetModel object and call deploy to create the Amazon SageMaker endpoint:

mxnet_model = MXNetModel(model_data='s3://my_bucket/pretrained_model/model.tar.gz',
                         role=role,
                         entry_point='inference.py',
                         framework_version='1.6.0',
                         py_version='py3')
predictor = mxnet_model.deploy(instance_type='ml.m5.xlarge',#
                               initial_instance_count=1)

Create a scaling policy and apply the scaling policy to the endpoint. The following section discusses your scaling policy options.

Scaling options

You can define minimum, desired, and maximum number of instances per endpoint and, based on the autoscaling configurations, instances are managed dynamically. The following diagram illustrates this architecture.

To scale the deployed Amazon SageMaker endpoint, first fetch its details:

import pprint
import boto3
from sagemaker import get_execution_role
import sagemaker
import json

pp = pprint.PrettyPrinter(indent=4, depth=4)
role = get_execution_role()
sagemaker_client = boto3.Session().client(service_name='sagemaker')
endpoint_name = 'name-of-the-endpoint'
response = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
pp.pprint(response)

#Let us define a client to play with autoscaling options
client = boto3.client('application-autoscaling') # Common class representing Application Auto Scaling for SageMaker amongst other services

Simple scaling or TargetTrackingScaling

Use this option when you want to scale based on a specific Amazon CloudWatch metric. You can do this by choosing a specific metric and setting threshold values. The recommended metrics for this option are average CPUUtilization or SageMakerVariantInvocationsPerInstance.

SageMakerVariantInvocationsPerInstance is the average number of times per minute that each instance for a variant is invoked. CPUUtilization is the sum of work handled by a CPU.

The following code snippets show how to scale using these metrics. You can also push custom metrics to CloudWatch or use other metrics. For more information, see Monitor Amazon SageMaker with Amazon CloudWatch.

resource_id='endpoint/' + endpoint_name + '/variant/' + 'AllTraffic' # This is the format in which application autoscaling references the endpoint

response = client.register_scalable_target(
    ServiceNamespace='sagemaker', #
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    MinCapacity=1,
    MaxCapacity=2
)

#Example 1 - SageMakerVariantInvocationsPerInstance Metric
response = client.put_scaling_policy(
    PolicyName='Invocations-ScalingPolicy',
    ServiceNamespace='sagemaker', # The namespace of the AWS service that provides the resource. 
    ResourceId=resource_id, # Endpoint name 
    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # SageMaker supports only Instance Count
    PolicyType='TargetTrackingScaling', # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 10.0, # The target value for the metric. - here the metric is - SageMakerVariantInvocationsPerInstance
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance', # is the average number of times per minute that each instance for a variant is invoked. 
        },
        'ScaleInCooldown': 600, # The cooldown period helps you prevent your Auto Scaling group from launching or terminating 
                                # additional instances before the effects of previous activities are visible. 
                                # You can configure the length of time based on your instance startup time or other application needs.
                                # ScaleInCooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start. 
        'ScaleOutCooldown': 300 # ScaleOutCooldown - The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
        
        # 'DisableScaleIn': True|False - ndicates whether scale in by the target tracking policy is disabled. 
                            # If the value is true , scale in is disabled and the target tracking policy won't remove capacity from the scalable resource.
    }
)

#Example 2 - CPUUtilization metric
response = client.put_scaling_policy(
    PolicyName='CPUUtil-ScalingPolicy',
    ServiceNamespace='sagemaker',
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 90.0,
        'CustomizedMetricSpecification':
        {
            'MetricName': 'CPUUtilization',
            'Namespace': '/aws/sagemaker/Endpoints',
            'Dimensions': [
                {'Name': 'EndpointName', 'Value': endpoint_name },
                {'Name': 'VariantName','Value': 'AllTraffic'}
            ],
            'Statistic': 'Average', # Possible - 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'
            'Unit': 'Percent'
        },
        'ScaleInCooldown': 600,
        'ScaleOutCooldown': 300
    }
)

With the scale-in cooldown period, the intention is to scale-in conservatively to protect your application’s availability, so scale-in activities are blocked until the cooldown period has expired. With the scale-out cooldown period, the intention is to continuously (but not excessively) scale out. After Application Auto Scaling successfully scales out using a target tracking scaling policy, it starts to calculate the cooldown time.

Step scaling

This is an advanced type of scaling where you define additional policies to dynamically adjust the number of instances to scale based on size of the alarm breach. This helps you configure a more aggressive response when demand reaches a certain level. The following code is an example of a step scaling policy based on the OverheadLatency metric:

#Example 3 - OverheadLatency metric and StepScaling Policy
response = client.put_scaling_policy(
    PolicyName='OverheadLatency-ScalingPolicy',
    ServiceNamespace='sagemaker',
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    PolicyType='StepScaling', 
    StepScalingPolicyConfiguration={
        'AdjustmentType': 'ChangeInCapacity', # 'PercentChangeInCapacity'|'ExactCapacity' Specifies whether the ScalingAdjustment value in a StepAdjustment 
                                              # is an absolute number or a percentage of the current capacity.
        'StepAdjustments': [ # A set of adjustments that enable you to scale based on the size of the alarm breach.
            {
                'MetricIntervalLowerBound': 0.0, # The lower bound for the difference between the alarm threshold and the CloudWatch metric.
                 # 'MetricIntervalUpperBound': 100.0, # The upper bound for the difference between the alarm threshold and the CloudWatch metric.
                'ScalingAdjustment': 1 # The amount by which to scale, based on the specified adjustment type. 
                                       # A positive value adds to the current capacity while a negative number removes from the current capacity.
            },
        ],
        # 'MinAdjustmentMagnitude': 1, # The minimum number of instances to scale. - only for 'PercentChangeInCapacity'
        'Cooldown': 120,
        'MetricAggregationType': 'Average', # 'Minimum'|'Maximum'
    }
)

Scheduled scaling

You can use this option when you know that the demand follows a particular schedule in the day, week, month, or year. This helps you specify a one-time schedule or a recurring schedule or cron expressions along with start and end times, which form the boundaries of when the autoscaling action starts and stops. See the following code:

#Example 4 - Scaling based on a certain schedule.
response = client.put_scheduled_action(
    ServiceNamespace='sagemaker',
    Schedule='at(2020-10-07T06:20:00)', # yyyy-mm-ddThh:mm:ss You can use one-time schedule, cron, or rate
    ScheduledActionName='ScheduledScalingTest',
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount',
    #StartTime=datetime(2020, 10, 7), #Start date and time for when the schedule should begin
    #EndTime=datetime(2020, 10, 8), #End date and time for when the recurring schedule should end
    ScalableTargetAction={
        'MinCapacity': 2,
        'MaxCapacity': 3
    }
)

On-demand scaling

Use this option only when you want to increase or decrease the number of instances manually. This updates the endpoint weights and capacities without defining a trigger. See the following code:

response = client.update_endpoint_weights_and_capacities(EndpointName=endpoint_name,
                            DesiredWeightsAndCapacities=[
                                {
                                    'VariantName': 'string',
                                    'DesiredWeight': ...,
                                    'DesiredInstanceCount': 123
                                }
                            ])

Comparing scaling methods

Each of these methods, when successfully applied, results in the addition of instances to an already deployed Amazon SageMaker endpoint. When you make a request to update your endpoint with autoscaling configurations, the status of the endpoint moves to Updating. While the endpoint is in this state, other update operations on this endpoint fail. You can monitor the state by using the DescribeEndpoint API. There is no traffic interruption while instances are being added to or removed from an endpoint.

When creating an endpoint, we specify initial_instance_count; this value is only used at endpoint creation time. That value is ignored afterward, and autoscaling or on-demand scaling uses the change in desiredInstanceCount to set the instance count behind an endpoint.

Finally, if you do use UpdateEndpoint to deploy a new EndpointConfig to an endpoint, to retain the current number of instances, you should set RetainAllVariantProperties to true.

Considerations for designing an autoscaling policy to scale your ML workload

You should consider the following when designing an efficient autoscaling policy to minimize traffic interruptions and be cost-efficient:

Traffic patterns and metrics – Especially consider traffic patterns that involve invoking the inference logic. Then determine which metrics these traffic patterns affect the most. Or what metric is the inference logic sensitive to (such as GPUUtilization, CPUUtilization, MemoryUtilization, or Invocations) per instance? Is the inference logic GPU bound, memory bound, or CPU bound?
Custom metrics – If it’s a custom metric that needs to be defined based on the problem domain, we have the option of deploying a custom metrics collector. With a custom metrics collector, you have an additional option of fine-tuning the granularity of metrics collection and publishing.
Threshold – After we decide on our metrics, we need to decide on the threshold. In other words, how to detect the increase in load, based on the preceding metric, within a time window that allows for the addition of an instance and for your inference logic to be ready to serve inference. This consideration also governs the measure of the scale-in and scale-out cooldown period.
Autoscaling – Depending on the application logic’s tolerance to autoscaling, there should be a balance between over-provisioning and autoscaling. Depending on the workload, if you select a specialized instance such as Inferentia, the throughput gains might alleviate the need to autoscale to a certain degree.
Horizontal scaling – When we have these estimations, it’s time to consider one or more strategies that we enlist in this post to deploy for horizontal scaling. Some work particularly well in certain situations. For example, we strongly recommend that you use a target tracking scaling policy to scale on a metric such as average CPU utilization or the SageMakerVariantInvocationsPerInstance metric. But a good guideline is to empirically derive an apt scaling policy based on your particular workload and above factors. You can start with a simple target tracking scaling policy, and you still have the option to use step scaling as an additional policy for a more advanced configuration. For example, you can configure a more aggressive response when demand reaches a certain level.

Retrieving your scaling activity log

When you want to see all the scaling policies attached to your Amazon SageMaker endpoint, you can use describe_scaling_policies, which helps you understand and debug the different scaling configurations’ behavior:

response = client.describe_scaling_policies(
    ServiceNamespace='sagemaker'
)

for i in response['ScalingPolicies']:
    print('')
    pp.pprint(i['PolicyName'])
    print('')
    if('TargetTrackingScalingPolicyConfiguration' in i):
        pp.pprint(i['TargetTrackingScalingPolicyConfiguration']) 
    else:
        pp.pprint(i['StepScalingPolicyConfiguration'])
    print('')

Conclusion

For models facing unpredictable traffic, Amazon SageMaker autoscaling helps economically respond to the demand and removes the undifferentiated heavy lifting of managing the inference infrastructure. One of the best practices of model deployment is to perform load testing. Determine the appropriate thresholds for your scaling policies and choose metrics based on load testing. For more information about load testing, see Amazon EC2 Testing Policy and Load test and optimize an Amazon SageMaker endpoint using automatic scaling.

References

For additional references, see the following:

About the Authors

Chaitanya Hazarey is a Machine Learning Solutions Architect with the Amazon SageMaker Product Management team. He focuses on helping customers design and deploy end-to-end ML pipelines in production on AWS. He has set up multiple such workflows around problems in the areas of NLP, Computer Vision, Recommender Systems, and AutoML Pipelines.

Pavan Kumar Sunder is a Senior R&D Engineer with Amazon Web Services. He provides technical guidance and helps customers accelerate their ability to innovate through showing the art of the possible on AWS. He has built multiple prototypes around AI/ML, IoT, and Robotics for our customers.

Rama Thamman is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.

What’s around the turn in 2021? AWS DeepRacer League announces new divisions, rewards, and community leagues

December 2, 2020

by Dan McCorriston Amazon AWS

AWS DeepRacer allows you to get hands on with machine learning (ML) through a fully autonomous 1/18th scale race car driven by reinforcement learning, a 3D racing simulator on the AWS DeepRacer console, a global racing league, and hundreds of customer-initiated community races.

The action is already underway at the Championship Cup at AWS re:Invent 2020, with the Wildcard round of racing coming to a close and Round 1 Knockouts officially underway, streaming live on Twitch (updated schedule coming soon). Check out the Championship Cup blog for up-to-date news and schedule information.

We’ve got some exciting announcements about the upcoming 2021 racing season. The AWS DeepRacer League is introducing new skill-based Open and Pro racing divisions in March 2021, with five times as many opportunities for racers to win prizes, and recognition for participation and performance. Another exciting new feature coming in 2021 is the expansion of community races into community leagues, enabling organizations and racing enthusiasts to set up their own racing leagues and race with their friends over multiple races. But first, we’ve got a great deal for racers in December!

December ‘tis the season for racing

Start your engines and hit the tracks in December for less! We’re excited to announce that starting Dec 1, 2020 through Dec. 31, 2020 we’re reducing the cost of training and evaluation for AWS DeepRacer by over 70% (from $3.50 to $1 per hour) for the duration of AWS re:Invent 2020. It’s a great time to learn, compete, and get excited for what’s coming up next year.]

Introducing new racing divisions and digital rewards in 2021

The AWS DeepRacer League’s 2021 season will introduce new skill-based Open and Pro racing divisions. The new racing divisions include five times as many opportunities for racers of all skill levels to win prizes and recognition for participation and performance. AWS DeepRacer League’s new Open and Pro divisions enable all participants to win prizes by splitting the current Virtual Circuit monthly leaderboard into two skill-based divisions to level the competition, each with their own prizes, while maintaining a high level of competitiveness in the League.

The Open division is where all racers begin their ML journey and rewards participation each month with new vehicle customizations and other rewards. Racers can earn their way into the Pro division each month by finishing in the top 10 percent of time trial results. The Pro division celebrates competition and accomplishment with DeepRacer merchandise, DeepRacer Evo cars, exclusive prizes, and qualification to the Championship Cup at re:Invent 2021.

Similar to previous seasons, winners of the Pro division’s monthly race automatically qualify for the Championship Cup with an all-expenses paid trip to re:Invent 2021 for a chance to lift the 2021 Cup, receive a $10,000 machine learning education scholarship, and an F1 Grand Prix experience for two. Starting with the 2021 Virtual Circuit Pre-Season, racers can win multiple prizes each month—including dozens of customizations to personalize your car in the Virtual Circuit (such as car bodies, paint colors, and wraps), several official AWS DeepRacer branded items (such as racing jackets, hats, and shirts), and hundreds of DeepRacer Evo cars, complete with stereo cameras and LiDAR.

Participating in Open and Pro Division races can earn you new digital rewards, like this new racing skin for your virtual racing fun!

“The DeepRacer League has been a fantastic way for thousands of people to test out their newly learnt machine learning skills.“ says AWS Hero and AWS Machine Learning Community Founder, Lyndon Leggate. “Everyone’s competitive spirit quickly shows through and the DeepRacer community has seen tremendous engagement from members keen to learn from each other, refine their skills, and move up the ranks. The new 2021 league format looks incredible and the Open and Pro divisions bring an interesting new dimension to racing! It’s even more fantastic that everyone will get more chances for their efforts to be rewarded, regardless of how long they’ve been racing. This will make it much more engaging for everyone and I can’t wait to take part!”

Empowering organizations to create their own leagues with multi-race seasons and admin tools

With events flourishing around the globe virtually, we’ll soon offer the ability for racers to not only create their own race, but also create multiple races, similar to the monthly Virtual Circuit, with community leagues. Race organizers will be able to set up the whole season, decide on qualification rounds, and use bracket elimination for head-to-head races finalists.

Organizations and individuals can race with their friends and colleagues with Community races.

The new series of tools enables race organizers to run their own events, including access to racer information, model training details, and logs to engage with their audience and develop learnings from participants. Over the past year, organizations such as Capital One, JPMC, Accenture, and Moody’s have already successfully managed internal AWS DeepRacer leagues. Now, even more organizations, schools, companies, and private groups of friends can use AWS DeepRacer and the community leagues as a fun learning tool to actively develop their ML skills.

“We have observed huge participation and interest in AWS DeepRacer events,” says Chris Thomas, Managing Director of Technology & Innovation at Moody’s Accelerator. “They create opportunities for employees to challenge themselves, collaborate with colleagues, and enhance ML skills. We view this as a success from the tech side with the additional benefit of growing our innovation culture.“

AWS re:Invent is a great time to learn more about ML

As you get ready for what’s in store for 2021, don’t forget that registration is still open for re:Invent 2020. Be sure to check out our three informative virtual sessions to help you along your ML journey with AWS DeepRacer: “Get rolling with Machine Learning on AWS DeepRacer”, “Shift your Machine Learning model into overdrive with AWS DeepRacer analysis tools” and “Replicate AWS DeepRacer architecture to master the track with SageMaker Notebooks.” You can take all the courses during re:Invent or learn at your own speed. It’s up to you. Register for re:Invent today and find out more on when these sessions are available to watch live or on-demand.

Start training today and get ready for the 2021 season

Remember to take advantage of the December cost reductions for training and evaluation for AWS DeepRacer by over 70% (from $3.50 to $1 per hour) for the duration of AWS re:Invent 2020. Take advantage of these low rates today and get ready for the AWS DeepRacer League 2021 season. Now is a great time to get rolling with ML and AWS DeepRacer. Watch this page for schedule and video updates all through AWS re:Invent 2020. Let’s get ready to race!

About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.

Private package installation in Amazon SageMaker running in internet-free mode

December 1, 2020

by Saeed Aghabozorgi Amazon AWS

Amazon SageMaker Studio notebooks and Amazon SageMaker notebook instances are internet-enabled by default. However, many regulated industries, such as financial industries, healthcare, telecommunications, and others, require that network traffic traverses their own Amazon Virtual Private Cloud (Amazon VPC) to restrict and control which traffic can go through public internet. Although you can disable direct internet access to Sagemaker Studio notebooks and notebook instances, you need to ensure that your data scientists can still gain access to popular packages. Therefore, you may choose to build your own isolated dev environments that contain your choice of packages and kernels.

In this post, we learn how to set up such an environment for Amazon SageMaker notebook instances and SageMaker Studio. We also describe how to integrate this environment with AWS CodeArtifact, which is a fully managed artifact repository that makes it easy for organizations of any size to securely store, publish, and share software packages used in your software development process.

Solution overview

In this post, we cover the following steps:

Set up the Amazon SageMaker for internet-free mode.
Set up the Conda repository using Amazon Simple Storage Service (Amazon S3). You create a bucket that hosts your Conda channels.
Set up the Python Package Index (PyPI) repository using CodeArtifact. You create a repository and set up AWS PrivateLink endpoints for CodeArtifact.
Build an isolated dev environment with Amazon SageMaker notebook instances. In this step, you use the lifecycle configuration feature to build a custom Conda environment and configure your PyPI client.
Install packages in SageMaker Studio notebooks. In this last step, you can create a custom Amazon SageMaker image and install the packages through Conda or pip client.

Setting up Amazon SageMaker for internet-free mode

We assume that you have already set up a VPC that lets you provision a private, isolated section of the AWS Cloud where you can launch AWS resources in a virtual network. You use it to host Amazon SageMaker and other components of your data science environment. For more information about building secure environments or well-architected pillars, see the following whitepaper, Financial Services Industry Lens: AWS Well-Architected Framework.

Creating an Amazon SageMaker notebook instance

You can disable internet access for Amazon SageMaker notebooks, and also associate them to your secure VPC environment, which allows you to apply network-level control, such as access to resources through security groups, or to control ingress and egress traffic of data.

On the Amazon SageMaker console, choose Notebook instances in the navigation pane.
Choose Create notebook instance.
For IAM role, choose your role.
For VPC, choose your VPC.
For Subnet, choose your subnet.
For Security groups(s), choose your security group.
For Direct internet access, select Disable — use VPC only.
Choose Create notebook instance.

Connect to your notebook instance from your VPC instead of connecting over the public internet.

Amazon SageMaker notebook instances support VPC interface endpoints. When you use a VPC interface endpoint, communication between your VPC and the notebook instance is conducted entirely and securely within the AWS network instead of the public internet. For instructions, see Creating an interface endpoint.

Setting up SageMaker Studio

Similar to Amazon SageMaker notebook instances, you can launch SageMaker Studio in a VPC of your choice, and also disable direct internet access to add an additional layer of security.

On the Amazon SageMaker console, choose Amazon SageMaker Studio in the navigation pane.
Choose Standard setup.
To disable direct internet access, in the Network section, select the VPC only network access type for when you onboard to Studio or call the CreateDomain API.

Doing so prevents Amazon SageMaker from providing internet access to your SageMaker Studio notebooks.

Create interface endpoints (via AWS PrivateLink) to access the following (and other AWS services you may require):
1. Amazon SageMaker API
2. Amazon SageMaker runtime
3. Amazon S3
4. AWS Security Token Service (AWS STS)
5. Amazon CloudWatch

Setting up a custom Conda repository using Amazon S3

Amazon SageMaker notebooks come with multiple environments already installed. The different Jupyter kernels in Amazon SageMaker notebooks are separate Conda environments. If you want to use an external library in a specific kernel, you can install the library in the environment for that kernel. This is typically done using conda install. When you use a conda command to install a package, Conda environment searches a set of default channels, which are usually online or remote channels (URLs) that host the Conda packages. However, because we assume the notebook instances don’t have internet access, we modify those Conda channel paths to a private repository where our packages are stored.

Build such custom channel is to create a bucket in Amazon S3.
Copy the packages into the bucket.

These packages can be either approved packages among the organization or the custom packages built using conda build. These packages need to be indexed periodically or as soon as there is an update. The methods to index packages are out of scope of this post.

Because we set up the notebook to not allow direct internet access, the notebook can’t connect to the S3 buckets that contain the channels unless you create a VPC endpoint.

Create an Amazon S3 VPC endpoint to send the traffic through the VPC instead of the public internet.

By creating a VPC endpoint, you allow your notebook instance to access the bucket where you stored the channels and its packages.

We recommend that you also create a custom resource-based bucket policy that allows only requests from your private VPC to access to your S3 buckets. For instructions, see Endpoints for Amazon S3.

Replace the default channels of the Conda environment in your Amazon SageMaker notebooks with your custom channel (we do that in the next step when we build the isolated dev environment):

# remove default channel from the .condarc
conda config --remove channels 'defaults'
# add the conda channels to the .condarc file
conda config --add channels 's3://user-conda-repository/main/'
conda config --add channels 's3://user-conda-repository/condaforge/'

Setting up a custom PyPI repository using CodeArtifact

Data scientists typically use package managers such as pip, maven, npm, and others to install packages into their environments. By default, when you use pip to install a package, it downloads the package from the public PyPI repository. To secure your environment, you can use private package management tools either on premises, such as Artifactory or Nexus, or on AWS, such as CodeArtifact. This allows you to allow access only to approved packages and perform safety checks. Alternatively, you may choose use a private PyPI mirror set up on Amazon Elastic Container Service (Amazon ECS) or AWS Fargate to mirror the public PyPI repository in your private environment. For more information on this approach, see Building Secure Environments.

If you want to use pip to install Python packages, you can use CodeArtifact to control access to and validate the safety of the Python packages. CodeArtifact is a managed artifact repository service to help developers and organizations securely store and share the software packages used in your development, build, and deployment processes. The CodeArtifact integration with AWS Identity and Access Management (IAM), support for AWS CloudTrail, and encryption with AWS Key Management Service (AWS KMS) gives you visibility and the ability to control who has access to the packages.

You can configure CodeArtifact to fetch software packages from public repositories such as PyPI. PyPI helps you find and install software developed and shared by the Python community. When you pull a package from PyPI, CodeArtifact automatically downloads and stores application dependencies from the public repositories, so recent versions are always available to you.

Creating a repository for PyPI

You can create a repository using the CodeArtifact console or the AWS Command Line Interface (AWS CLI). Each repository is associated with the AWS account that you use when you create it. The following screenshot shows the view of choosing your AWS account on the CodeArtifact console.

A repository can have one or more CodeArtifact repository associated with it as an upstream repository. It can facilitate two needs.

Firstly, it allows a package manager client to access the packages contained in more than one repository using a single URL endpoint.

Secondly, when you create a repository, it doesn’t contain any packages. If an upstream repository has an external connection to a public repository, the repositories that are downstream from it can pull packages from that public repository. For example, the repository my-shared-python-repository has an upstream repository named pypi-store, which acts as an intermediate repository that connects your repository to an external connection (your PyPI repository). In this case, a package manager that is connected to my-shared-python-repository can pull packages from the PyPI public repository. The following screenshot shows this package flow.

For instructions on creating a CodeArtifact repository, see Software Package Management with AWS CodeArtifact.

Because we disable internet access for the Amazon SageMaker notebooks, in the next section, we set up AWS PrivateLink endpoints to make sure all the traffic for installing the package in the notebooks traverses through the VPC.

Setting up AWS PrivateLink endpoints for CodeArtifact

You can configure CodeArtifact to use an interface VPC endpoint to improve the security of your VPC. When you use an interface VPC endpoint, you don’t need an internet gateway, NAT device, or virtual private gateway. To create VPC endpoints for CodeArtifact, you can use the AWS CLI or Amazon VPC console. For this post, we use the Amazon Elastic Compute Cloud (Amazon EC2) create-vpc-endpoint AWS CLI command. The following two VPC endpoints are required so that all requests to CodeArtifact are in the AWS network.

The following command creates an endpoint to access CodeArtifact repositories:

aws ec2 create-vpc-endpoint --vpc-id vpcid --vpc-endpoint-type Interface 
  --service-name com.amazonaws.region.codeartifact.api --subnet-ids subnetid 
  --security-group-ids groupid

The following command creates an endpoint to access package managers and build tools:

aws ec2 create-vpc-endpoint --vpc-id vpcid --vpc-endpoint-type Interface 
  --service-name com.amazonaws.region.codeartifact.repositories --subnet-ids subnetid 
  --security-group-ids groupid --private-dns-enabled

CodeArtifact uses Amazon S3 to store package assets. To pull packages from CodeArtifact, you must create a gateway endpoint for Amazon S3. See the following code:

aws ec2 create-vpc-endpoint --vpc-id vpcid --service-name com.amazonaws.region.s3 
  --route-table-ids routetableid

Building your dev environment

Amazon SageMaker periodically updates the Python and dependency versions in the environments installed on the Amazon SageMaker notebook instances (when you stop and start) or in the images launched in SageMaker Studio. This might cause some incompatibility if you have your own managed package repositories and dependencies. You can freeze your dependencies in internet-free mode so that:

You’re not affected by periodic updates from Amazon SageMaker to the base environment
You have better control over the dependencies in your environments and can get ample time to update or upgrade your dependencies

Using Amazon SageMaker notebook instancesartifcat

To create your own dev environment with specific versions of Python and dependencies, you can use lifecycle configuration scripts. A lifecycle configuration provides shell scripts that run only when you create the notebook instance or whenever you start one. When you create a notebook instance, you can create a new lifecycle configuration and the scripts it uses or apply one that you already have. Amazon SageMaker has a lifecycle config script sample that you can use and modify to create isolated dependencies as described earlier. With this script, you can do the following:

Build an isolated installation of Conda
Create a Conda environment with it
Make the environment available as a kernel in Jupyter

This makes sure that dependencies in that kernel aren’t affected by the upgrades that Amazon SageMaker periodically roles out to the underlying AMI. This script installs a custom, persistent installation of Conda on the notebook instance’s EBS volume, and ensures that these custom environments are available as kernels in Jupyter. We add Conda and CodeArtifact configuration to this script.

The on-create script downloads and installs a custom Conda installation to the EBS volume via Miniconda. Any relevant packages can be installed here.

Set up CodeArtifact.
Set up your Conda channels.
Install ipykernel to make sure that the custom environment can be used as a Jupyter kernel.
Make sure the notebook instance has internet connectivity to download the Miniconda installer.

The on-create script installs the ipykernel library so you can use create custom environments as Jupyter kernels, and uses pip install and conda install to install libraries. You can adapt the script to create custom environments and install the libraries that you want. Amazon SageMaker doesn’t update these libraries when you stop and restart the notebook instance, so you can make sure that your custom environment has specific versions of libraries that you want. See the following code:

#!/bin/bash

 set -e
 sudo -u ec2-user -i <<'EOF'
 unset SUDO_UID

 # Configure common package managers to use CodeArtifact
 aws codeartifact login --tool pip --domain my-org --domain-owner <000000000000> --repository  my-shared-python-repository  --endpoint-url https://vpce-xxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com 

 # Install a separate conda installation via Miniconda
 WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda
 mkdir -p "$WORKING_DIR"
 wget https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O "$WORKING_DIR/miniconda.sh"
 bash "$WORKING_DIR/miniconda.sh" -b -u -p "$WORKING_DIR/miniconda" 
 rm -rf "$WORKING_DIR/miniconda.sh"

 # Create a custom conda environment
 source "$WORKING_DIR/miniconda/bin/activate"

 # remove default channel from the .condarc 
 conda config --remove channels 'defaults'
 # add the conda channels to the .condarc file
 conda config --add channels 's3://user-conda-repository/main/'
 conda config --add channels 's3://user-conda-repository/condaforge/'

 KERNEL_NAME="custom_python"
 PYTHON="3.6"

 conda create --yes --name "$KERNEL_NAME" python="$PYTHON"
 conda activate "$KERNEL_NAME"

 pip install --quiet ipykernel

 # Customize these lines as necessary to install the required packages
 conda install --yes numpy
 pip install --quiet boto3

 EOF

The on-start script uses the custom Conda environment created in the on-create script, and uses the ipykernel package to add that as a kernel in Jupyter, so that they appear in the drop-down list in the Jupyter New menu. It also logs in to CodeArtifact to enable installing the packages from the custom repository. See the following code:

#!/bin/bash

set -e

sudo -u ec2-user -i <<'EOF'
unset SUDO_UID

# Get pip artifact
/home/ec2-user/SageMaker/aws/aws codeartifact login --tool pip --domain <my-org> --domain-owner <xxxxxxxxx> --repository  <my-shared-python-repository.  --endpoint-url <https://vpce-xxxxxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com> 

WORKING_DIR=/home/ec2-user/SageMaker/custom-miniconda/
source "$WORKING_DIR/miniconda/bin/activate"

for env in $WORKING_DIR/miniconda/envs/*; do
    BASENAME=$(basename "$env")
    source activate "$BASENAME"
    python -m ipykernel install --user --name "$BASENAME" --display-name "Custom ($BASENAME)"
done


EOF

echo "Restarting the Jupyter server.."
restart jupyter-server

CodeArtifact authorization tokens are valid for a default period of 12 hours. You can add a cron job to the on-start script to refresh the token automatically, or log in to CodeArtifact again in the Jupyter notebook terminal.

Using SageMaker Studio notebooks

You can create your own custom Amazon SageMaker images in your private dev environment in SageMaker Studio. You can add the custom kernels, packages, and any other files required to run a Jupyter notebook in your image. It gives you the control and flexibility to do the following:

Install your own custom packages in the image
Configure the images to be integrated with your custom repositories for package installation by users

For example, you can install a selection of R or Python packages when building the image:

# Dockerfile
RUN conda install --quiet --yes 
    'r-base=4.0.0' 
    'r-caret=6.*' 
    'r-crayon=1.3*' 
    'r-devtools=2.3*' 
    'r-forecast=8.12*' 
    'r-hexbin=1.28*'

Or you can set up the Conda in the image to just use your own custom channels in Amazon S3 to install packages by changing the configuration of Conda channels:

# Dockerfile
RUN 
    # add the conda channels to the .condarc file
    conda config --add channels 's3://my-conda-repository/_conda-forge/' && 
    conda config --add channels 's3://my-conda-repository/main/' && 
    # remove defaults from the .condarc 
    conda config --remove channels 'defaults'

You should use the CodeArtifact login command in SageMaker Studio to fetch credentials for use with pip:

# PyPIconfig.py
# Configure common package managers to use CodeArtifact
 aws codeartifact login --tool pip --domain my-org --domain-owner <000000000000> --repository  my-shared-python-repository  --endpoint-url https://vpce-xxxxx.api.codeartifact.us-east-1.vpce.amazonaws.com

CodeArtifact needs authorization tokens. You can add a cron job into the image to run the above command periodically. Alternatively, you can execute it manually when the notebooks get started. To make it simple for your users, you can add the preceding command to a shell script (such as PyPIConfig.sh) and copy the file into to the image to be loaded in SageMaker Studio. In your Dockerfile, add the following command:

# Dockerfile
COPY PyPIconfig.sh /home/PyPIconfig.sh

For ease of use, the PyPIconfig.sh is available in /home on SageMaker Studio. You can easily run it to configure your pip client in SageMaker Studio and fetch an authorization token from CodeArtifact using your AWS credentials.

Now, you can build and push your image into Amazon Elastic Container Repository (Amazon ECR). Finally, attach the image to multiple users (by attaching to a domain) or a single user (by attaching to the user’s profile) in SageMaker Studio. The following screenshot shows the configuration on the SageMaker Studio control panel.

For more information about building a custom image and attaching it to SageMaker Studio, see Bring your own custom SageMaker image tutorial.

Installing the packages

In Amazon SageMaker notebook instances, as soon as you start the Jupyter notebook, you see a new kernel in Jupyter in the drop-down list of kernels (see the following screenshot). This environment is isolated from other default Conda environments.

In your notebook, when you use pip install <package name>, the Python package manager client connects to your custom repository instead of the public repositories. Also, if you use conda install <package name>, the notebook instance uses the packages in your Amazon S3 channels to install it. See the following screenshot of this code.

In SageMaker Studio, the custom images appear in the image selector dialog box of the SageMaker Studio Launcher. As soon as you select your own custom image, the kernel you installed in the image appears in the kernel selector dialog box. See the following screenshot.

As mentioned before, CodeArtifact authorization tokens are valid for a default period of 12 hours. If you’re using CodeArtifact, you can open a terminal or notebook in SageMaker Studio and run the PyPIconfig.sh file to configure your client or refresh your expired token:

# Configure PyPI package managers to use CodeArtifact
 /home/pyPIconfig.sh

The following screenshot shows your view in SageMaker Studio.

Conclusion

This post demonstrated how to build a private environment for Amazon SageMaker notebook instances and SageMaker Studio to have better control over the dependencies in your environments. To build the private environment, we used the lifecycle configuration feature in notebook instances. The sample lifecycle config scripts are available on the GitHub repo. To install custom packages in SageMaker Studio, we built a custom image and attached it to SageMaker Studio. For more information about this feature, see Bringing your own custom container image to Amazon SageMaker Studio notebooks. For this solution, we used CodeArtifact, which makes it easy to build a PyPI repository for approved Python packages across the organization. For more information, see Software Package Management with AWS CodeArtifact.

Give the CodeArtifact a try, and share your feedback and questions in the comments.

About the Author

Saeed Aghabozorgi Ph.D. is senior ML Specialist in AWS, with a track record of developing enterprise level solutions that substantially increase customers’ ability to turn their data into actionable knowledge. He is also a researcher in the artificial intelligence and machine learning field.

Stefan Natu is a Sr. Machine Learning Specialist at AWS. He is focused on helping financial services customers build end-to-end machine learning solutions on AWS. In his spare time, he enjoys reading machine learning blogs, playing the guitar, and exploring the food scene in New York City.

Securing data analytics with an Amazon SageMaker notebook instance and Kerberized Amazon EMR cluster

December 1, 2020

by James Sun Amazon AWS

Ever since Amazon SageMaker was introduced at AWS re:Invent 2017, customers have used the service to quickly and easily build and train machine learning (ML) models and directly deploy them into a production-ready hosted environment. SageMaker notebook instances provide a powerful, integrated Jupyter notebook interface for easy access to data sources for exploration and analysis. You can enhance the SageMaker capabilities by connecting the notebook instance to an Apache Spark cluster running on Amazon EMR. It gives data scientists and engineers a common instance with shared experience where they can collaborate on AI/ML and data analytics tasks.

If you’re using a SageMaker notebook instance, you may need a way to allow different personas (such as data scientists and engineers) to do different tasks on Amazon EMR with a secure authentication mechanism. For example, you might use the Jupyter notebook environment to build pipelines in Amazon EMR to transform datasets in the data lake, and later switch personas and use the Jupyter notebook environment to query the prepared data and perform advanced analytics on it. Each of these personas and actions may require their own distinct set of permissions to the data.

To address this requirement, you can deploy a Kerberized EMR cluster. Amazon EMR release version 5.10.0 and later supports MIT Kerberos, which is a network authentication protocol created by the Massachusetts Institute of Technology (MIT). Kerberos uses secret-key cryptography to provide strong authentication so passwords or other credentials aren’t sent over the network in an unencrypted format.

This post walks you through connecting a SageMaker notebook instance to a Kerberized EMR cluster using SparkMagic and Apache Livy. Users are authenticated with Kerberos Key Distribution Center (KDC), where they obtain temporary tokens to impersonate themselves as different personas before interacting with the EMR cluster with appropriately assigned privileges.

This post also demonstrates how a Jupyter notebook uses PySpark to download the COVID-19 database in CSV format from the Johns Hopkins GitHub repository. The data is transformed and processed by Pandas and saved to an S3 bucket in columnar Parquet format referenced by an Apache Hive external table hosted on Amazon EMR.

Solution walkthrough

The following diagram depicts the overall architecture of the proposed solution. A VPC with two subnets are created: one public, one private. For security reasons, a Kerberized EMR cluster is created inside the private subnet. It needs access to the internet to access data from the public GitHub repo, so a NAT gateway is attached to the public subnet to allow for internet access.

The Kerberized EMR cluster is configured with a bootstrap action in which three Linux users are created and Python libraries are installed (Pandas, requests, and Matplotlib).

You can set up Kerberos authentication a few different ways (for more information, see Kerberos Architecture Options):

Cluster dedicated KDC
Cluster dedicated KDC with Active Directory cross-realm trust
External KDC
External KDC integrated with Active Directory

The KDC can have its own user database or it can use cross-realm trust with an Active Directory that holds the identity store. For this post, we use a cluster dedicated KDC that holds its own user database. First, the EMR cluster has security configuration enabled to support Kerberos and is launched with a bootstrap action to create Linux users on all nodes and install the necessary libraries. A bash step is launched right after the cluster is ready to create HDFS directories for the Linux users with default credentials that are forced to change as soon as the users log in to the EMR cluster for the first time.

A SageMaker notebook instance is spun up, which comes with SparkMagic support. The Kerberos client library is installed and the Livy host endpoint is configured to allow for the connection between the notebook instance and the EMR cluster. This is done through configuring the SageMaker notebook instance’s lifecycle configuration feature. We provide sample scripts later in this post to illustrate this process.

Fine-grained user access control for EMR File System

The EMR File System (EMRFS) is an implementation of HDFS that all EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon Simple Storage Service (Amazon S3). The Amazon EMR security configuration enables you to specify the AWS Identity and Access Management (IAM) role to assume when a user or group uses EMRFS to access Amazon S3. Choosing the IAM role for each user or group enables fine-grained access control for EMRFS on multi-tenant EMR clusters. This allows different personas to be associated with different IAM roles to access Amazon S3.r

Deploying the resources with AWS CloudFormation

You can use the provided AWS CloudFormation template to set up this architecture’s building blocks, including the EMR cluster, SageMaker notebook instance, and other required resources. The template has been successfully tested in the us-east-1 Region.

Complete the following steps to deploy the environment:

Sign in to the AWS Management Console as an IAM power user, preferably an admin user.
Choose Launch Stack to launch the CloudFormation template:

Choose Next.

For Stack name, enter a name for the stack (for example, blog).
Leave the other values as default.
Continue to choose Next and leave other parameters at their default.
On the review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
Choose Create stack.

Wait until the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE. The process usually takes about 10–15 minutes.

After the environment is complete, we can investigate what the template provisioned.

Notebook instance lifecycle configuration

Lifecycle configurations perform the following tasks to ensure a successful Kerberized authentication between the notebook and EMR cluster:

Configure the Kerberos client on the notebook instance
Configure SparkMagic to use Kerberos authentication

You can view your provisioned lifecycle configuration, SageEMRConfig, on the Lifecycle configuration page on the SageMaker console.

The template provisioned two scripts to start and create your notebook: start.sh and create.sh, respectively. The scripts replace {EMRDNSName} with your own EMR cluster primary node’s DNS hostname during the CloudFormation deployment.

EMR cluster security configuration

Security configurations in Amazon EMR are templates for different security setups. You can create a security configuration to conveniently reuse a security setup whenever you create a cluster. For more information, see Use Security Configurations to Set Up Cluster Security.

To view the EMR security configuration created, complete the following steps:

On the Amazon EMR console, choose Security configurations.
Expand the security configuration created. Its name begins with blog-securityConfiguration.

Two IAM roles are created as part of the solution in the CloudFormation template, one for each EMR user (user1 and user2). The two users are created during the Amazon EMR bootstrap action.

Choose the role for user1, blog-allowEMRFSAccessForUser1.

The IAM console opens and shows the summary for the IAM role.

Expand the policy attached to the role blog-emrFS-user1.

This is the S3 bucket the CloudFormation template created to store the COVID-19 datasets.

Choose {} JSON.

You can see the policy definition and permissions to the bucket named blog-s3bucket-xxxxx.

Return to the EMR security configuration.
Choose the IAM policy for user2, blog-allowEMRFSAccessForUser2.
Expand the policy attached to the role, blog-emrFS-user2.
Choose {} JSON.

You can see the policy definition and permissions to the bucket named my-other-bucket.

Authenticating with Kerberos

To use these IAM roles, you authenticate via Kerberos from the notebook instance to the EMR cluster KDC. The authenticated user inherits the permissions associated with the policy of the IAM role defined in the Amazon EMR security configuration.

To authenticate with Kerberos in the SageMaker notebook instance, complete the following steps:

On the SageMaker console, under Notebook, choose Notebook instances.
Locate the instance named SageEMR.
Choose Open JupyterLab.

On the File menu, choose New.
Choose Terminal.

Enter kinit followed by the username user2.
Enter the user’s password.

The initial default password is pwd2. The first time logging in, you’re prompted to change the password.

Enter a new password.

Enter klist to view the Kerbereos ticket for user2.

After your user is authenticated, they can access the resources associated with the IAM role defined in Amazon EMR.

Running the example notebook

To run your notebook, complete the following steps:

Choose the Covid19-Pandas-Spark example notebook.

Choose the Run () icon to progressively run the cells in the example notebook.

When you reach the cell in the notebook to save the Spark DataFrame (sdf) to an internal hive table, you get an Access Denied error.

This step fails because the IAM role associated with Amazon EMR user2 doesn’t have permissions to write to the S3 bucket blog-s3bucket-xxxxx.

Navigate back to the Terminal
Enter kinit followed by the username user1.
Enter the user’s password.

The initial default password is pwd1. The first time logging in, you’re prompted to change the password.

Enter a new password.

Restart the kernel.
Run all cells in the notebook by choosing Kernel and choosing Restart Kernel and Run All Cells.

This re-establishes a new Livy session with the EMR cluster using the new Kerberos token for user1.

The notebook uses Pandas and Matplotlib to process and transform the raw COVID-19 dataset into a consumable format and visualize it.

The notebook also demonstrates the creation of a native Hive table in HDFS and an external table hosted on Amazon S3, which are queried by SparkSQL. The notebook is self-explanatory; you can follow the steps to complete the demo.

Restricting principal access to Amazon S3 resources in the IAM role

The CloudFormation template by default allows any principal to assume the role blog-allowEMRFSAccessForUser1. This is apparently too permissive. We need to further restrict the principals that can assume the role.

On the IAM console, under Access management, choose Roles.
Search for and choose the role blog-allowEMRFSAccessForUser1.

On the Trust relationship tab, choose Edit trust relationship.

Open a second browser window to look up your EMR cluster’s instance profile role name.

You can find the instance profile name on the IAM console by searching for the keyword instanceProfileRole. Typically, the name looks like <stack name>-EMRClusterinstanceProfileRole-xxxxx.

Modify the policy document using the following JSON file, providing your own AWS account ID and the instance profile role name:

{
    "Version": "2012-10-17",
    "Statement": [
        {
           "Effect": "Allow", 
		 "Principal":{
  "AWS":[
           "arn:aws:iam::<account id>:role/<EMR Cluster instanceProfile Role name>"
        ]
		 },
           "Action": "sts:AssumeRole"
        }
    ]
}

Return to the first browser window.
Choose Update Trust Policy.

This makes sure that only the EMR cluster’s users are allowed to access their own S3 buckets.

Cleaning up

You can complete the following steps to clean up resources deployed for this solution. This also deletes the S3 bucket, so you should copy the contents in the bucket to a backup location if you want to retain the data for later use.

On the CloudFormation console, choose Stacks.
Select the slack deployed for this solution.
Choose Delete.

Summary

We walked through the solution using a SageMaker notebook instance authenticated with a Kerberized EMR cluster via Apache Livy, and processed a public COVID-19 dataset with Pandas before saving it in Parquet format in an external Hive table. The table references the data hosted in an Amazon S3 bucket. We provided a CloudFormation template to automate the deployment of necessary AWS services for the demo. We strongly encourage you to use these managed and serverless services such as Amazon Athena and Amazon QuickSight for your specific use cases in production.

About the Authors

James Sun is a Senior Solutions Architect with Amazon Web Services. James has over 15 years of experience in information technology. Prior to AWS, he held several senior technical positions at MapR, HP, NetApp, Yahoo, and EMC. He holds a PhD from Stanford University.

Graham Zulauf is a Senior Solutions Architect. Graham is focused on helping AWS’ strategic customers solve important problems at scale.

Customization, automation and scalability in customer service: Integrating Genesys Cloud and AWS Contact Center Intelligence

November 30, 2020

by Rebecca Owens Amazon AWS

This is a guest post authored by Rebecca Owens and Julian Hernandez, who work at Genesys Cloud.

Legacy technology limits organizations in their ability to offer excellent customer service to users. Organizations must design, establish, and implement their customer relationship strategies while balancing against operational efficiency concerns.

Another factor to consider is the constant evolution of the relationship with the customer. External drivers, such as those recently imposed by COVID-19, can radically change how we interact in a matter of days. Customers have been forced to change the way they usually interact with brands, which has resulted in an increase in the volume of interactions hitting those communication channels that remain open, such as contact centers. Organizations have seen a significant increase in the overall number of interactions they receive, in some cases as much as triple the pre-pandemic volumes. This is further compounded by issues that restrict the number of agents available to serve customers.

The customer experience (CX) is becoming increasingly relevant and is considered by most organizations as a key differentiator.

In recent years, there has been a sharp increase in the usage of artificial intelligence (AI) in many different areas and operations within organizations. AI has evolved from being a mere concept to a tangible technology that can be incorporated in our day-to-day lives. The issue is that organizations are starting down this path only to find limitations due to language availability. Technologies are often only available in English or require a redesign or specialized development to handle multiple languages, which creates a barrier to entry.

Organizations face a range of challenges when formulating a CX strategy that offers a differentiated experience and can rapidly respond to changing business needs. To minimize the risk of adoption, you should aim to deploy solutions that provide greater flexibility, scalability, services, and automation possibilities.

Solution

Genesys Cloud (an omni-channel orchestration and customer relationship platform) provides all of the above as part of a public cloud model that enables quick and simple integration of AWS Contact Center Intelligence (AWS CCI) to transform the modern contact center from a cost center into a profit center. With AWS CCI, AWS and Genesys are committed to offer a variety of ways organizations can quickly and cost-effectively add functionalities such as conversational interfaces based on Amazon Lex, Amazon Polly, and Amazon Kendra.

In less than 10 minutes, you can integrate Genesys Cloud with the AWS CCI self-service solution powered by Amazon Lex and Amazon Polly in either English-US, Spanish-US, Spanish-SP, French-FR, French-CA, and Italian-IT (recently released). This enables you to configure automated self-service channels that your customers can use to communicate naturally with bots powered by AI, which can understand their needs and provide quick and timely responses. Amazon Kendra (Amazon’s intelligent search service) “turbocharges” Amazon Lex with the ability to query FAQs and articles contained in a variety of knowledge bases to address the long tail of questions. You don’t have to explicitly program all these questions and corresponding answers in Amazon Lex. For more information, see AWS announces AWS Contact Center Intelligence solutions.

This is complemented by allowing for graceful escalation of conversations to live agents in situations where the bot can’t fully respond to a customer’s request, or when the company’s CX strategy requires it. The conversation context is passed to the agent so they know the messages that the user has previously exchanged with the bot, optimizing handle time, reducing effort, and increasing overall customer satisfaction.

With Amazon Lex, Amazon Polly, Amazon Kendra, and Genesys Cloud, you can easily create a bot and deploy it to different channels: voice, chat, SMS, and social messaging apps.

Enabling the integration

The integration between Amazon Lex (from which the addition of Amazon Polly and Amazon Kendra easily follows) and Genesys Cloud is available out of the box. It’s designed so that you can employ it quickly and easily.

You should first configure an Amazon Lex bot in one of the supported languages (for this post, we use Spanish-US). In the following use case, the bot is designed to enable a conversational interface that allows users to validate information, availability, and purchase certain products. It also allows them to manage the order, including tracking, modification, and cancellation. All of these are implemented as intents configured in the bot.

The following screenshot shows a view of Genesys Cloud Resource Center, where you can get started.

Integration consists of three simple steps (for full instructions, see About the Amazon Lex integration):

After completing these steps, you can use any bots that you configured in Amazon Lex within Genesys Cloud flows, regardless of whether the flow is for voice (IVR type) or for digital channels like web chat, social networks, and messaging channels. The following screenshot shows a view of available bots for our use case on the Amazon Lex console.

To use our sample retail management bot, go into Architect (a Genesys Cloud flow configuration product) and choose the type of flow to configure (voice, chat, or messaging) so you can use the tools available for that channel.

In the flow toolbox, you can add the Call Lex Bot action anywhere in the flow by adding it via drag-and-drop.

This is how you can call onto any of your existing Amazon Lex bots from a Genesys Cloud Architect flow. In this voice flow example, we first identify the customer through a query to the CRM before passing them to the bot.

The Call Lex Bot action allows you to select one of your existing bots and configure information to pass (input variables). It outputs the intent identified in Amazon Lex and the slot information collected by the bot (output variables). Genesys Cloud can use the outputs to continue processing the interaction and provide context to the human agent if the interaction is transferred.

Going back to our example, we use the bot JH_Retail_Spa and configure two variables to pass to Amazon Lex that we collected from the CRM earlier in the flow: Task.UserName and Task.UserAccount. We then configure the track an order intent and its associated output variables.

The output information is played back to the customer, who can choose to finish the interaction or, if necessary, seek the support of a human agent. The agent is presented with a script that provides them with the context so they can seamlessly pick up the conversation at the point where the bot left off. This means the customer avoids having to repeat themselves, removing friction and improving customer experience.

You can enable the same functionality on digital channels, such as web chat, social networks, or messaging applications like WhatsApp or Line. In this case, all you need to do is use the same Genesys Cloud Architect action (Call Lex Bot) in digital flows.

The following screenshot shows an example of interacting with a bot on an online shopping website.

As with voice calls, if the customer needs additional support in digital interactions, these interactions are transferred to agents according to the defined routing strategy. Again, context is provided and the transcription of the conversation between the client and the bot is displayed to the agent.

In addition to these use cases, you can use the Genesys Cloud REST API to generate additional interaction types, providing differentiated customer service. For example, with the release of Amazon Lex in Spanish, some of our customers and partners are building Alexa Skills, delivering an additional personalized communication channel to their users.

Conclusion

Customer experience operations are constantly coming up against new challenges, especially in the days of COVID-19. Genesys Cloud provides a solution that can manage all the changes we’re facing daily. It natively provides a flexible, agile, and resilient omni-channel solution that enables scalability on demand.

With the release of Amazon Lex in Spanish, you can quickly incorporate bots within your voice or digital channels, improving efficiency and customer service. These interactions can be transferred when needed to human agents with the proper context so they can continue the conversation seamlessly and focus on more complex cases where they can add more value.

If you have Genesys Cloud, check out the integration with Amazon Lex in Spanish and US Spanish to see how simple and beneficial it can be. If you’re not a customer, this is an additional reason to migrate and take full advantage of the benefits Genesys and AWS CCI can offer you. Differentiate your organization by personalizing every customer service interaction, improving agent satisfaction, and enhancing visibility into important business metrics with a more intelligent contact center.

About the Authors

Rebecca Owens is a Senior Product Manager at Genesys and is based out of Raleigh, North Carolina.

Julian Hernandez is Senior Cloud Business Development – LATAM for Genesys and is based out of Bogota D.C. Area, Colombia.