Technique that lets devices convey information in natural language improves on state of the art.Read More
AWS scientist wins ICLR outstanding paper award
Ability to balance parameter size and effectiveness could be “extremely useful” in reducing parameter size of deep-learning models.Read More
Use computer vision to detect crop disease through image analysis with Amazon Rekognition Custom Labels
Currently, many diseases affect farming and lead to significant economic losses due to reduction of yield and loss of quality produce. In many cases, the health condition of a crop or a plant is often assessed by the condition of its leaves. For farmers, it is crucial to identify these symptoms early. Early identification is key to controlling diseases before they spread too far. However, manually identifying if a leaf is infected, the type of the infection, and the required disease control solution is a hard problem to solve. Current methods can be error prone and very costly. This is where an automated machine learning (ML) solution for computer vision (CV) can help. Typically, building complex machine learning models require hundreds of thousands of labeled images, along with expertise in data science. In this post, we showcase how you can build an end-to-end disease detection, identification, and resolution recommendation solution using Amazon Rekognition Custom Labels.
Amazon Rekognition is a fully managed service that provides CV capabilities for analyzing images and video at scale, using deep learning technology without requiring ML expertise. Amazon Rekognition Custom Labels, an automated ML feature of Amazon Rekognition, lets you quickly train custom CV models specific to your business needs, simply by bringing labeled images.
Solution overview
We create a custom model to detect the plant leaf disease. To create our custom model, we follow these steps:
- Create a project in Amazon Rekognition Custom Labels.
- Create a dataset with images containing multiple types of plant leaf diseases.
- Train the model and evaluate the performance.
- Test the new custom model using the automatically generated API endpoint.
Amazon Rekognition Custom Labels lets you manage the ML model training process on the Amazon Rekognition console, which simplifies the end-to-end model development and inference process.
Creating your project
To create your plant leaf disease detection project, complete the following steps:
- On the Amazon Rekognition console, choose Custom Labels.
- Choose Get Started.
- For Project name, enter plant-leaf-disease-detection.
- Choose Create project.
You can also create a project on the Projects page. You can access the Projects page via the navigation pane.
Creating your dataset
To create your leaf disease detection model, you first need to create a dataset to train the model with. For this post, our dataset is composed of three categories of plant leaf disease images: bacterial leaf blight, brown spots, and leaf smut.
The following images show examples of bacterial leaf blight.
The following images show examples of brown spots.
The following images show examples of leaf smut.
We sourced our images from UCI, Citation (Prajapati HB, Shah JP, Dabhi VK. Detection and classification of rice plant diseases. Intelligent Decision Technologies. 2017 Jan 1;11(3):357-73, doi: 10.3233/IDT-170301) (Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.)
To create your dataset, complete the following steps:
- Create an Amazon Simple Storage Service (Amazon S3) bucket.
For this post, I create an S3 bucket called plan-leaf-disease-data.
- Create three folders inside this bucket called Bacterial-Leaf-Blight, Brown-Spot, and Leaf-Smut to store images of each disease category.
- Upload each category of image files in their respective bucket.
- On the Amazon Rekognition console, under Datasets, choose Create dataset.
- Select Import images from Amazon S3 bucket.
- For S3 folder location, enter the S3 bucket path.
- For automatic labeling, select Automatically attach a label to my images based on the folder they’re stored in.
This creates data labeling of the images as folder names.
You can now see the generated S3 bucket permissions policy.
- Copy the JSON policy.
- Navigate to the S3 bucket.
- On the Permission tab, under Bucket policy, choose Edit.
- Enter the JSON policy you copied.
- Chose Save changes.
- Choose Submit.
You can see that image labeling is organized based on the folder name.
Training your model
After you label your images, you’re ready to train your model.
- Choose Train Model.
- For Choose project, choose your project plant-leaf-disease-detection.
- For Choose training dataset, choose your dataset plant-leaf-disease-dataset.
As part of model training, Amazon Rekognition Custom Labels requires a labeled test dataset. Amazon Rekognition Custom Labels uses the test dataset to verify how well your trained model predicts the correct labels and generates evaluation metrics. Images in the test dataset are not used to train your model and should represent the same types of images you use with your model to analyze.
- For Create test set, select how you want to create your test dataset.
Amazon Rekognition Custom Labels provides three options:
- Choose an existing test dataset
- Create a new test dataset
- Split training dataset
For this post, we select Split training dataset and let Amazon Rekognition hold back 20% of the images for testing and use the remaining 80% of the images to train the model.
Our model took approximately 1 hour to train. The training time required for your model depends on many factors, including the number of images provided in the dataset and the complexity of the model.
When training is complete, Amazon Rekognition Custom Labels outputs key quality metrics, including F1 score, precision, recall, and the assumed threshold for each label. For more information about metrics, see Metrics for Evaluating Your Model.
Our evaluation results show that our model has a precision of 1.0 for Bacterial-Leaf-Blight and Brown-Spot, which means that no objects were mistakenly identified (false positives) in our test set. Our model also didn’t miss any objects in our test set (false negatives), which is reflected in our recall score of 1. You can often use the F1 score as an overall quality score because it takes both precision and recall into account. Finally, we see that our assumed threshold to generate the F1 score, precision, and recall metrics each category is 0.62, 0.69, and 0.54 for Bacterial-Leaf-Blight, Brown-Spot, and Leaf-Smut, respectively. By default, our model returns predictions above this assumed threshold.
We can also choose View test results to see how our model performed on each test image. The following screenshot shows an example of a correctly identified image of bacterial leaf blight during the model testing (true positive).
Testing your model
Your plant disease detection model is now ready for use. Amazon Rekognition Custom Labels provides the API calls for starting, using, and stopping your model; you don’t need to manage any infrastructure. For more information, see Starting or Stopping an Amazon Rekognition Custom Labels Model (Console).
In addition to using the API, you can also use the Custom Labels Demonstration. This CloudFormation template enables you to set up a custom, password-protected UI where you can start and stop your models and run demonstration inferences.
Once deployed, the application can be accessed using a web browser using the address specified in url output from the CloudFormation stack created during deployment of the solution.
- Choose Start the model.
- Provide the inference unit required. For this example, let’s give a value of 1.
You’re charged for the amount of time, in minutes, that the model is running. For more information, see Inference hours.
It might take a while to start.
- Choose the model name.
- Choose Upload.
A window opens for you to choose the plant leaf image from your local drive.
The model detects the disease in the uploaded leaf image along with confidence score. It also gives the pest control recommendation based on the type of disease.
Cleaning up
To avoid incurring unnecessary charges, delete the resources used in this walkthrough when not in use. For instructions, see the following:
Conclusion
In this post, we showed you how to create an object detection model with Amazon Rekognition Custom Labels. This feature makes it easy to train a custom model that can detect an object class without needing to specify other objects or losing accuracy in its results.
For more information about using custom labels, see What Is Amazon Rekognition Custom Labels?
About the Authors
Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.
Sameer Goel is a Solutions Architect in Seattle, who drives customer success by building prototypes on cutting-edge initiatives. Prior to joining AWS, Sameer graduated with a master’s degree from NEU Boston, with a concentration in data science. He enjoys building and experimenting with AI/ML projects on Raspberry Pi.
Amazon’s robot arms break ground in safety and technology
While these systems look like other robot arms, they embed advanced technologies that will shape Amazon’s robot fleet for years to come.Read More
Join AWS at NVIDIA GTC 21, April 12–16
Starting Monday, April 12, 2021, the NVIDIA GPU Technology Conference (GTC) is offering online sessions for you to learn AWS best practices to accomplish your machine learning (ML), virtual workstations, high performance computing (HPC), and Internet of Things (IoT) goals faster and more easily.
Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training, cost-effective ML inference, flexible remote virtual workstations, and powerful HPC computations. At the edge, you can use AWS IoT Greengrass and Amazon SageMaker Neo to extend a wide range of AWS Cloud services and ML inference to NVIDIA-based edge devices so the devices can act locally on the data they generate.
AWS is a Global Diamond Sponsor of the conference.
Available sessions
ML infrastructure:
- A Developer’s Guide to Choosing the Right GPUs for Deep Learning (Presented by Amazon Web Services, Inc.) [SS33025]
- A Developer’s Guide to Improving GPU Utilization and Reducing Deep Learning Costs (Presented by Amazon Web Services, Inc.) [SS33093]
- Analyzing Traffic Video Streams at Scale Using NVIDIA AI Software and NVIDIA A100-Powered AWS Instances [S32002]
- Unlocking the Power of AI in Latin America through Developer Communities [S32508]
ML with Amazon SageMaker:
- Model and Data Parallelism at Scale to Train Models with Billions of Parameters on Amazon SageMaker with NVIDIA GPUs [S31655]
- Achieve Best Inference Performance on NVIDIA GPUs by Combining TensorRT with TVM Compilation Using SageMaker Neo [S32214]
- 12x Reduction in Deep Learning Training Cost at Deepset by Using Accelerated Tensor Core-Powered GPU Instances on Amazon SageMaker [S31541]
- RAPIDS on AWS SageMaker: Scaling End-to-End Explainable Machine Learning Workflows [S31486]
ML deep dive:
- Advancing the State of the Art in AutoML, Now 10x Faster with NVIDIA RAPIDS [S31521]
- Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks [S31413]
- Automatically Build Machine Learning Models for Vision and Text with AutoGluon [S31667]
- Dive into Deep Learning: Code Side-by-Side with MXNet, PyTorch, and TensorFlow [S31692]
- Accelerate the Bridging of ML and DL with NVIDIA-Accelerated Apache MXNet 2.0 [S31746]
- Standardizing on an Array API for Python Across Deep Learning Frameworks [S31798]
- DGL-KE: Training Knowledge Graph Embeddings at Scale [S31490]
- Accelerate Drug Discovery with Multitask Graph Neural Networks [S31477]
- Deep Learning in Scala on Spark 3.0 with GPU on AWS [S32285]
High performance computing:
Internet of Things:
- Building Image and Video Inference Edge Applications with AWS Greengrass V2 on Jetson Devices [S31855]
- Video Analytics Pipeline Development from Edge to Cloud [S32143]
Edge computing with AWS Wavelength:
- Accelerating VR Adoption Using 5G Edge Computing [S31606]
- XR Streaming from 5G Mobile Edge Using AWS Wavelength and NVIDIA CloudXR SDK [S32031]
- Securing the Integrity of the CV2X Messages using Mobile Edge Compute (Presented by Amazon Web Services, Inc) [SS33228]
Automotive:
- Accelerating AV Development – Cloud based Innovation, Economics, and Efficiencies [S31722]
- How Renault Challenges Physical Mockups by Distributing Rendering on 4,000 GPUs [E31274]
Computer vision with AWS Panorama:
- Computer vision at the edge, with AWS Panorama (Presented by Amazon Web Services, Inc.) [SS33117]
- Lenovo’s ThinkEdge Portfolio Expansion Powered by NVIDIA Jetson (Presented by Lenovo) [SS33267]
Game tech:
- Next-Gen Game Development and Collaboration in the Cloud [S31650]
- 4K 60fps Cloud Gaming and Digital Content Creation Interactive Streaming with NICE DCV and Amazon EC2 G4dn Instances (Presented by Amazon Web Services, Inc.) [SS33013]
Visit AWS at NVIDIA GTC 21 for more details and register for free for access to this content during the week of April 12, 2021. See you there!
About the Author
Geoff Murase is a Senior Product Marketing Manager for AWS EC2 accelerated computing instances, helping customers meet their compute needs by providing access to hardware-based compute accelerators such as Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs). In his spare time, he enjoys playing basketball and biking with his family.
iNaturalist opens up a wealth of nature data — and computer vision challenges
Amazon Machine Learning Research Award recipient utilizes a combination of people and machine learning models to illuminate the planet’s incredible biodiversity.Read More
Build a CI/CD pipeline for deploying custom machine learning models using AWS services
Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality ML artifacts. AWS Serverless Application Model (AWS SAM) is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, event source mappings, steps in AWS Step Functions, and more.
Generally, ML workflows orchestrate and automate sequences of ML tasks. A workflow includes data collection, training, testing, human evaluation of the ML model, and deployment of the models for inference.
For continuous integration and continuous delivery (CI/CD) pipelines, AWS recently released Amazon SageMaker Pipelines, the first purpose-built, easy-to-use CI/CD service for ML. Pipelines is a native workflow orchestration tool for building ML pipelines that takes advantage of direct SageMaker integration. For more information, see Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines.
In this post, I show you an extensible way to automate and deploy custom ML models using service integrations between Amazon SageMaker, Step Functions, and AWS SAM using a CI/CD pipeline.
To build this pipeline, you also need to be familiar with the following AWS services:
- AWS CodeBuild – A fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy
- AWS CodePipeline – A fully managed continuous delivery service that helps you automate your release pipelines
- Amazon Elastic Container Registry (Amazon ECR) – A container registry
- AWS Lambda – A service that lets you run code without provisioning or managing servers. You pay only for the compute time you consume
- Amazon Simple Storage Service (Amazon S3) – An object storage service that offers industry-leading scalability, data availability, security, and performance
- AWS Step Functions – A serverless function orchestrator that makes it easy to sequence AWS Lambda functions and multiple AWS services
Solution overview
The solution has two main sections:
- Use AWS SAM to create a Step Functions workflow with SageMaker – Step Functions recently announced native service integrations with SageMaker. You can use this feature to train ML models, deploy ML models, test results, and expose an inference endpoint. This feature also provides a way to wait for human approval before the state transitions can progress towards the final ML model inference endpoint’s configuration and deployment.
- Deploy the model with a CI/CD pipeline – One of the requirements of SageMaker is that the source code of custom models needs to be stored as a Docker image in an image registry such as Amazon ECR. SageMaker then references this Docker image for training and inference. For this post, we create a CI/CD pipeline using CodePipeline and CodeBuild to build, tag, and upload the Docker image to Amazon ECR and then start the Step Functions workflow to train and deploy the custom ML model on SageMaker, which references this tagged Docker image.
The following diagram describes the general overview of the MLOps CI/CD pipeline.
The workflow includes the following steps:
- The data scientist works on developing custom ML model code using their local notebook or a SageMaker notebook. They commit and push changes to a source code repository.
- A webhook on the code repository triggers a CodePipeline build in the AWS Cloud.
- CodePipeline downloads the source code and starts the build process.
- CodeBuild downloads the necessary source files and starts running commands to build and tag a local Docker container image.
- CodeBuild pushes the container image to Amazon ECR. The container image is tagged with a unique label derived from the repository commit hash.
- CodePipeline invokes Step Functions and passes the container image URI and the unique container image tag as parameters to Step Functions.
- Step Functions starts a workflow by initially calling the SageMaker training job and passing the necessary parameters.
- SageMaker downloads the necessary container image and starts the training job. When the job is complete, Step Functions directs SageMaker to create a model and store the model in the S3 bucket.
- Step Functions starts a SageMaker batch transform job on the test data provided in the S3 bucket.
- When the batch transform job is complete, Step Functions sends an email to the user using Amazon Simple Notification Service (Amazon SNS). This email includes the details of the batch transform job and links to the test data prediction outcome stored in the S3 bucket. After sending the email, Step Function enters a manual wait phase.
- The email sent by Amazon SNS has links to either accept or reject the test results. The recipient can manually look at the test data prediction outcomes in the S3 bucket. If they’re not satisfied with the results, they can reject the changes to cancel the Step Functions workflow.
- If the recipient accepts the changes, an Amazon API Gateway endpoint invokes a Lambda function with an embedded token that references the waiting Step Functions step.
- The Lambda function calls Step Functions to continue the workflow.
- Step Functions resumes the workflow.
- Step Functions creates a SageMaker endpoint config and a SageMaker inference endpoint.
- When the workflow is successful, Step Functions sends an email with a link to the final SageMaker inference endpoint.
Use AWS SAM to create a Step Functions workflow with SageMaker
In this first section, you visualize the Step Functions ML workflow easily in Visual Studio Code and deploy it to the AWS environment using AWS SAM. You use some of the new features and service integrations such as support in AWS SAM for AWS Step Functions, native support in Step Functions for SageMaker integrations, and support in Step Functions to visualize workflows directly in VS Code.
Prerequisites
Before getting started, make sure you complete the following prerequisites:
- Install and configure the AWS Command Line Interface (AWS CLI)
- Install the AWS SAM CLI
- Install Visual Studio Code
- Install the AWS Toolkit extension for VS Code
Deploy the application template
To get started, follow the instructions on GitHub to complete the application setup. Alternatively, you can switch to the terminal and enter the following command:
git clone https://github.com/aws-samples/sam-sf-sagemaker-workflow.git
The directory structure should be as follows:
. sam-sf-sagemaker-workflow
|– cfn
|—- sam-template.yaml
| — functions
| —- api_sagemaker_endpoint
| —- create_and_email_accept_reject_links
| —- respond_to_links
| —- update_sagemakerEndpoint_API
| — scripts
| — statemachine
| —- mlops.asl.json
The code has been broken down into subfolders with the main AWS SAM template residing in path cfn/sam-template.yaml
.
The Step Functions workflows are stored in the folder statemachine/mlops.asl.json
, and any other Lambda functions used are stored in functions
folder.
To start with the AWS SAM template, run the following bash scripts from the root folder:
#Create S3 buckets if required before executing the commands.
S3_BUCKET=bucket-mlops #bucket to store AWS SAM template
S3_BUCKET_MODEL=ml-models #bucket to store ML models
STACK_NAME=sam-sf-sagemaker-workflow #Name of the AWS SAM stack
sam build -t cfn/sam-template.yaml #AWS SAM build
sam deploy --template-file .aws-sam/build/template.yaml
--stack-name ${STACK_NAME} --force-upload
--s3-bucket ${S3_BUCKET} --s3-prefix sam
--parameter-overrides S3ModelBucket=${S3_BUCKET_MODEL}
--capabilities CAPABILITY_IAM
The sam build
command builds all the functions and creates the final AWS CloudFormation template. The sam deploy
command uploads the necessary files to the S3 bucket and starts creating or updating the CloudFormation template to create the necessary AWS infrastructure.
When the template has finished successfully, go to the CloudFormation console. On the Outputs tab, copy the MLOpsStateMachineArn
value to use later.
The following diagram shows the workflow carried out in Step Functions, using VS Code integrations with Step Functions.
The following JSON based snippet of Amazon States Language describes the workflow visualized in the preceding diagram.
{
"Comment": "This Step Function starts machine learning pipeline, once the custom model has been uploaded to ECR. Two parameters are expected by Step Functions are git commitID and the sagemaker ECR custom container URI",
"StartAt": "SageMaker Create Training Job",
"States": {
"SageMaker Create Training Job": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",
"Parameters": {
"TrainingJobName.$": "$.commitID",
"ResourceConfig": {
"InstanceCount": 1,
"InstanceType": "ml.c4.2xlarge",
"VolumeSizeInGB": 20
},
"HyperParameters": {
"mode": "batch_skipgram",
"epochs": "5",
"min_count": "5",
"sampling_threshold": "0.0001",
"learning_rate": "0.025",
"window_size": "5",
"vector_dim": "300",
"negative_samples": "5",
"batch_size": "11"
},
"AlgorithmSpecification": {
"TrainingImage.$": "$.imageUri",
"TrainingInputMode": "File"
},
"OutputDataConfig": {
"S3OutputPath": "s3://${S3ModelBucket}/output"
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 100000
},
"RoleArn": "${SagemakerRoleArn}",
"InputDataConfig": [
{
"ChannelName": "training",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://${S3ModelBucket}/iris.csv",
"S3DataDistributionType": "FullyReplicated"
}
}
}
]
},
"Retry": [
{
"ErrorEquals": [
"SageMaker.AmazonSageMakerException"
],
"IntervalSeconds": 1,
"MaxAttempts": 1,
"BackoffRate": 1.1
},
{
"ErrorEquals": [
"SageMaker.ResourceLimitExceededException"
],
"IntervalSeconds": 60,
"MaxAttempts": 1,
"BackoffRate": 1
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.cause",
"Next": "FailState"
}
],
"Next": "SageMaker Create Model"
},
"SageMaker Create Model": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createModel",
"Parameters": {
"ExecutionRoleArn": "${SagemakerRoleArn}",
"ModelName.$": "$.TrainingJobName",
"PrimaryContainer": {
"ModelDataUrl.$": "$.ModelArtifacts.S3ModelArtifacts",
"Image.$": "$.AlgorithmSpecification.TrainingImage"
}
},
"ResultPath": "$.taskresult",
"Next": "SageMaker Create Transform Job",
"Catch": [
{
"ErrorEquals": ["States.ALL" ],
"Next": "FailState"
}
]
},
"SageMaker Create Transform Job": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createTransformJob.sync",
"Parameters": {
"ModelName.$": "$.TrainingJobName",
"TransformInput": {
"SplitType": "Line",
"CompressionType": "None",
"ContentType": "text/csv",
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://${S3ModelBucket}/iris.csv"
}
}
},
"TransformOutput": {
"S3OutputPath.$": "States.Format('s3://${S3ModelBucket}/transform_output/{}/iris.csv', $.TrainingJobName)" ,
"AssembleWith": "Line",
"Accept": "text/csv"
},
"DataProcessing": {
"InputFilter": "$[1:]"
},
"TransformResources": {
"InstanceCount": 1,
"InstanceType": "ml.m4.xlarge"
},
"TransformJobName.$": "$.TrainingJobName"
},
"ResultPath": "$.result",
"Next": "Send Approve/Reject Email Request",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "FailState"
}
]
},
"Send Approve/Reject Email Request": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "${CreateAndEmailLinkFnName}",
"Payload": {
"token.$":"$$.Task.Token",
"s3_batch_output.$":"$.result.TransformOutput.S3OutputPath"
}
},
"ResultPath": "$.output",
"Next": "Sagemaker Create Endpoint Config",
"Catch": [
{
"ErrorEquals": [ "rejected" ],
"ResultPath": "$.output",
"Next": "FailState"
}
]
},
"Sagemaker Create Endpoint Config": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createEndpointConfig",
"Parameters": {
"EndpointConfigName.$": "$.TrainingJobName",
"ProductionVariants": [
{
"InitialInstanceCount": 1,
"InitialVariantWeight": 1,
"InstanceType": "ml.t2.medium",
"ModelName.$": "$.TrainingJobName",
"VariantName": "AllTraffic"
}
]
},
"ResultPath": "$.result",
"Next": "Sagemaker Create Endpoint",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "FailState"
}
]
},
"Sagemaker Create Endpoint": {
"Type": "Task",
"Resource": "arn:aws:states:::sagemaker:createEndpoint",
"Parameters": {
"EndpointName.$": "$.TrainingJobName",
"EndpointConfigName.$": "$.TrainingJobName"
},
"Next": "Send Email With API Endpoint",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "FailState"
}
]
},
"Send Email With API Endpoint": {
"Type": "Task",
"Resource": "${UpdateSagemakerEndpointAPI}",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "FailState"
}
],
"Next": "SuccessState"
},
"SuccessState": {
"Type": "Succeed"
},
"FailState": {
"Type": "Fail"
}
}
}
Step Functions process to create the SageMaker workflow
In this section, we discuss the detailed steps involved in creating the SageMaker workflow using Step Functions.
Step Functions uses the commit ID passed by CodePipeline as a unique identifier to create a SageMaker training job. The training job can sometimes take a long time to complete; to wait for the job, you use .sync
while specifying the resource section of the SageMaker training job.
When the training job is complete, Step Functions creates a model and saves the model in an S3 bucket.
Step Functions then uses a batch transform step to evaluate and test the model, based on batch data initially provided by the data scientist in an S3 bucket. When the evaluation step is complete, the output is stored in an S3 bucket.
Step Functions then enters a manual approval stage. To create this state, you use callback URLs. To implement this state in Step Functions, use .waitForTaskToken
while calling a Lambda resource and pass a token to the Lambda function.
The Lambda function uses Amazon SNS or Amazon Simple Email Service (Amazon SES) to send an email to the subscribed party. You need to add your email address to the SNS topic to receive the accept/reject email while testing.
You receive an email, as in the following screenshot, with links to the data stored in the S3 bucket. This data has been batch transformed using the custom ML model created in the earlier step by SageMaker. You can choose Accept or Reject based on your findings.
If you choose Reject, Step Functions stops running the workflow. If you’re satisfied with the results, choose Accept, which triggers the API link. This link passes the embedded token and type to the API Gateway or Lambda endpoint as request parameters to progress to the next Step Functions step.
See the following Python code:
import json
import boto3
sf = boto3.client('stepfunctions')
def lambda_handler(event, context):
type= event.get('queryStringParameters').get('type')
token= event.get('queryStringParameters').get('token')
if type =='success':
sf.send_task_success(
taskToken=token,
output="{}"
)
else:
sf.send_task_failure(
taskToken=token
)
return {
'statusCode': 200,
'body': json.dumps('Responded to Step Function')
}
Step Functions then creates the final unique SageMaker endpoint configuration and inference endpoint. You can achieve this in Lambda code using special resource values, as shown in the following screenshot.
When the SageMaker endpoint is ready, an email is sent to the subscriber with a link to the API of the SageMaker inference endpoint.
Deploy the model with a CI/CD pipeline
In this section, you use the CI/CD pipeline to deploy a custom ML model.
The pipeline starts its run as soon as it detects updates to the source code of the custom model. The pipeline downloads the source code from the repository, builds and tags the Docker image, and uploads the Docker image to Amazon ECR. After uploading the Docker image, the pipeline triggers the Step Functions workflow to train and deploy the custom model to SageMaker. Finally, the pipeline sends an email to the specified users with details about the SageMaker inference endpoint.
We use Scikit Bring Your Own Container to build a custom container image and use the iris dataset to train and test the model.
When your Step Functions workflow is ready, build your full pipeline using the code provided in the GitHub repo.
After you download the code from the repo, the directory structure should look like the following:
. codepipeline-ecr-build-sf-execution
| — cfn
| —- params.json
| —- pipeline-cfn.yaml
| — container
| —- descision_trees
| —- local_test
| —- .dockerignore
| —- Dockerfile
| — scripts
In the params.json
file in folder /cfn
, provide in your GitHub token, repo name, the ARN of the Step Function state machine you created earlier.
You now create the necessary services and resources for the CI/CD pipeline. To create the CloudFormation stack, run the following code:
aws cloudformation create-stack --stack-name codepipeline-ecr-build-sf-execution --template-body file://cfn/pipeline-cfn.yaml --parameters file://cfn/params.json --capabilities CAPABILITY_NAMED_IAM
Alternatively, to update the stack, run the following code:
aws cloudformation update-stack --stack-name codepipeline-ecr-build-sf-execution --template-body file://cfn/pipeline-cfn.yaml --parameters file://cfn/params.json --capabilities CAPABILITY_NAMED_IAM
The CloudFormation template deploys a CodePipeline pipeline into your AWS account. The pipeline starts running as soon as code changes are committed to the repo. After the source code is downloaded by the pipeline stage, CodeBuild creates a Docker image and tags it with the commit ID and current timestamp before pushing the image to Amazon ECR. CodePipeline moves to the next stage to trigger a Step Functions step (which you created earlier).
When Step Functions is complete, a final email is generated with a link to the API Gateway URL that references the newly created SageMaker inference endpoint.
Test the workflow
To test your workflow, complete the following steps:
- Start the CodePipeline build by committing a code change to the
codepipeline-ecr-build-sf-execution/container
folder. - On the CodePipeline console, check that the pipeline is transitioning through the different stages as expected.
When the pipeline reaches its final state, it starts the Step Functions workflow, which sends an email for approval.
- Approve the email to continue the Step Functions workflow.
When the SageMaker endpoint is ready, you should receive another email with a link to the API inference endpoint.
To test the iris dataset, you can try sending a single data point to the inference endpoint.
- Copy the inference endpoint link from the email and assign it to the bash variable INFERENCE_ENDPOINT as shown in the following code, then use the
INFERENCE_ENDPOINT=https://XXXX.execute-api.us-east-1.amazonaws.com/v1/invokeSagemakerAPI?sagemaker_endpoint=d236eba5-09-03-2020-18-29-15
curl --location --request POST ${INFERENCE_ENDPOINT} --header 'Content-Type: application/json' --data-raw '{ "data": "4.5,1.3,0.3,0.3"
}'
{"result": "setosa"}
curl --location --request POST ${INFERENCE_ENDPOINT} --header 'Content-Type: application/json' --data-raw '{
"data": "5.9,3,5.1,1.8"
}'
{"result": "virginica"}
By sending different data, we get different sets of inference results back.
Clean up
To avoid ongoing charges, delete the resources created in the previous steps by deleting the CloudFormation templates. Additionally, on the SageMaker console, delete any unused models, endpoint configurations, and inference endpoints.
Conclusion
This post demonstrated how to create an ML pipeline for custom SageMaker ML models using some of the latest AWS service integrations.
You can extend this ML pipeline further by adding a layer of authentication and encryption while sending approval links. You can also add more steps to CodePipeline or Step Functions as deemed necessary for your project’s workflow.
The sample files are available in the GitHub repo. To explore related features of SageMaker and further reading, see the following:
About the Author
Sachin Doshi is a Senior Application Architect working in the AWS Professional Services team. He is based out of New York metropolitan area. Sachin helps customers optimize their applications using cloud native AWS services.
The science behind SageMaker’s cost-saving Debugger
New tool can spot problems — such as overfitting and vanishing gradients — that prevent machine learning models from learning.Read More
How a paper by three Oxford academics influenced AWS bias and explainability software
Why conditional demographic disparity matters for developers using SageMaker Clarify.Read More
Alexa: The science must go on
Throughout the pandemic, the Alexa team has continued to invent on behalf of our customers.Read More