Amazon Rekognition is a computer vision service that makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning (ML) expertise to use. With Amazon Rekognition, you can identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial search capabilities that you can use to detect, analyze, and compare faces for a wide variety of use cases.
Amazon Rekognition Custom Labels allows you to identify the objects and scenes in images that are specific to your business needs. For example, you can find your logo in social media posts, identify your products on store shelves, classify machine parts in an assembly line, distinguish healthy and infected plants, and more. The blog post Building your own brand detection shows how to use Amazon Rekognition Custom Labels to build an end-to-end solution to detect brand logos in images and videos.
Amazon Rekognition Custom Labels provides a simple end-to-end experience where you start by labeling a dataset, and Amazon Rekognition Custom Labels builds a custom ML model for you by inspecting the data and selecting the right ML algorithm. After your model is trained, you can start using it immediately for image analysis. If you want to process images in batches (such as once a day or week, or at scheduled times during the day), you can provision your custom model at scheduled times.
In this post, we show how you can build a cost-optimal batch solution with Amazon Rekognition Custom Labels that provisions your custom model at scheduled times, processes all your images, and deprovisions your resources to avoid incurring extra cost.
Overview of solution
The following architecture diagram shows how you can design a cost-effective and highly scalable workflow to process images in batches with Amazon Rekognition Custom Labels. It takes advantage of AWS services such as Amazon EventBridge, AWS Step Functions, Amazon Simple Queue Service (Amazon SQS), AWS Lambda, and Amazon Simple Storage Service (Amazon S3).
This solution uses a serverless architecture and managed services, so it can scale on demand and doesn’t require provisioning and managing any servers. The Amazon SQS queue increases the overall fault tolerance of the solution by decoupling image ingestion from the image processing and enabling reliable delivery of messages for each ingested image. Step Functions makes it easy to build visual workflows to orchestrate a series of individual tasks, such as checking if an image is available for processing and managing the state lifecycle of the Amazon Rekognition Custom Labels project. Although the following architecture shows how you can build a batch processing solution for Amazon Rekognition Custom Labels using AWS Lambda, you can build a similar architecture using services such as AWS Fargate.
The following steps describe the overall workflow:
- As an image is stored in Amazon S3 bucket, it triggers a message that gets stored in an Amazon SQS queue.
- Amazon EventBridge is configured to trigger an AWS Step Functions workflow at a certain frequency (1 hour by default).
- As the workflow runs, it performs the following actions:
- It checks the number of items in the Amazon SQS queue. If there are no items to process in the queue, the workflow ends.
- If there are items to process in the queue, the workflow starts the Amazon Rekognition Custom Labels model.
- The workflow enables Amazon SQS integration with an AWS Lambda function to process those images.
- As the integration between the Amazon SQS queue and AWS Lambda is enabled, the following events occur:
- AWS Lambda starts processing messages with the image details from Amazon SQS.
- The AWS Lambda function uses the Amazon Rekognition Custom Labels project to process the images.
- The AWS Lambda function then places the JSON file containing the inferenced labels in the final bucket. The image is also moved from the source bucket to the final bucket.
- When all the images are processed, the AWS Step Functions workflow does the following:
- It stops the Amazon Rekognition Custom Labels model.
- It disables integration between the Amazon SQS queue and the AWS Lambda function by disabling the trigger.
The following diagram illustrates the AWS Step Functions state machine for this solution.
Prerequisites
To deploy this solution, you need the following prerequisites:
- An AWS account with permission to deploy the solution using AWS CloudFormation, which creates AWS Identity and Access Management (IAM) roles and other resources.
- The Amazon Resource Name (ARN) of the Amazon Rekognition Custom Labels project (referred as ProjectArn) and the Amazon Resource Name (ARN) of the model version that was created after training the model (referenced as ProjectVersionArn). These values are required to check the status of the model and also to analyze images using the model.
To learn how to train a model, see Getting Started with Amazon Rekognition Custom Labels.
Deployment
To deploy the solution using AWS CloudFormation in your AWS account, follow the steps in the GitHub repo. It creates the following resources:
- Amazon S3 bucket
- Amazon SQS queue
- AWS Step Functions workflow
- Amazon EventBridge rules to trigger the workflow
- IAM roles
- AWS Lambda Functions
You can see the names of different resources created by the solution in the output section of the CloudFormation stack.
Testing the workflow
To test your workflow, complete the following steps:
- Upload sample images to the input S3 bucket that was created by the solution (for example, xxxx-sources3bucket-xxxx).
- On the Step Functions console, choose the state machine created by the solution (for example, CustomCVStateMachine-xxxx).
You should see the state machine is triggered by the Amazon EventBridge rule every hour.
- You can manually start the workflow by choosing Start execution.
- As images are processed, you can go to the output S3 bucket (for example, xxxx-finals3bucket-xxxx) to see the JSON output for each image.
The following screenshot shows the contents of the final S3 bucket with the images, along with their corresponding JSON output from Amazon Rekognition Custom Labels.
Conclusion
In this post, we showed how you can build a cost-optimal batch solution with Amazon Rekognition Custom Labels that can provision your custom model at scheduled times, process all your images, and deprovision your resources to avoid incurring extra cost. Depending on your use case, you can easily adjust the scheduled time window at which the solution should process the batch. For more information about how to create, train, evaluate, and use a model that detects objects, scenes, and concepts in images see getting started with Amazon Rekognition Custom Labels.
While the solution described in this post showed how you can process batch images with Amazon Rekognition Custom Labels, you can easily tweak the solution to process batch images with Amazon Lookout for Vision for defects and anomalies detection. With Amazon Lookout for Vision, manufacturing companies can increase quality and reduce operational costs by quickly identifying differences in images of objects at scale. For example, Amazon Lookout for Vision can be used to identify missing components in products, damage to vehicles or structures, irregularities in production lines, minuscule defects in silicon wafers, and other similar problems. To learn more about Amazon Lookout for Vision, see the developer guide.
About the Authors
Rahul Srivastava is a Senior Solutions Architect at Amazon Web Services and is based in the United Kingdom. He has extensive architecture experience working with large enterprise customers. He is helping our customers with architecture, cloud adoption, developing products with a purpose and take advantage of AI/ ML to solve real world business problems.
Kashif Imran is a Principal Solutions Architect at Amazon Web Services. He works with some of the largest AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement computer vision applications at scale. His expertise spans application architecture, serverless, containers, NoSQL and machine learning.