This is a guest post authored by Andrew Masek, Software Engineer at The Barcode Registry and Erik Quisling, CEO of The Barcode Registry.
Product counterfeiting is the single largest criminal enterprise in the world. Growing over 10,000% in the last two decades, sales of counterfeit goods now total $1.7 trillion per year worldwide, which is more than drugs and human trafficking. Although traditional methods of counterfeit prevention like unique barcodes and product verification can be very effective, new machine learning (ML) technologies such as object detection seem very promising. With object detection, you can now snap a picture of a product and know almost instantly if that product is likely to be legitimate or fraudulent.
The Barcode Registry (in conjunction with its partner Buyabarcode.com) is a full-service solution that helps customers prevent product fraud and counterfeiting. It does this by selling unique GS1-registered barcodes, verifying product ownership, and registering users’ products and barcodes in a comprehensive database. Their latest offering, which we discuss in this post, uses Amazon SageMaker to create object detection models to help instantly recognize counterfeit products.
Overview of solution
To use these object detection models, you first need to collect data to train them. Companies upload annotated pictures of their products to The Barcode Registry website. After this data is uploaded to Amazon Simple Storage Service (Amazon S3) and processed by AWS Lambda functions, you can use it to train a SageMaker object detection model. This model is hosted on a SageMaker endpoint, where the website connects it to the end-user.
There are three key steps to creating The Barcode Registry uses to create a custom object detection model with SageMaker:
- Create a training script for SageMaker to run.
- Build a Docker container from the training script and upload it to Amazon ECR.
- Use the SageMaker console to train a model with the custom algorithm.
Product data
As a prerequisite in order to train an object detection model you will need an AWS account and training images, consisting of at least 100 high-quality (high-resolution and in multiple lighting-conditions) pictures of your object. As with any ML model, high-quality data is paramount. To train an object detection model, we need images containing the relevant products as well as bounding boxes describing where the products are in the images, as shown in the following example.
To train an effective model, pictures of each of a brand’s products with different backgrounds and lighting conditions are needed—approximately 30–100 unique annotated images for each product.
After the images are uploaded to the web server, they’re uploaded to Amazon S3 using the AWS SDK for PHP. A Lambda event is triggered each time an image is uploaded. The function removes the Exif metadata from the images, which can sometimes cause them to appear rotated when they’re opened by the ML libraries later used to train the model. The associated bounding box data is stored in JSON files and uploaded to Amazon S3 to accompany the images.
SageMaker for object detection models
SageMaker is a managed ML service that includes a variety of tools for building, training and hosting models in the cloud. In particular, TheBarcodeRegistry uses SageMaker for its object detection service because of SageMaker’s reliable and scalable ML model training and hosting services. This means that many brands can have their own object detection models trained and hosted and even if usage spikes unpredictably, there won’t be any downtime.
The Barcode Registry uses custom Docker containers uploaded to Amazon Elastic Container Registry (Amazon ECR) in order to have more fine-grained control of the object detection algorithm employed for training and inference as well as support for Multi Model Server (MMS). MMS is very important for the counterfeit detection use case because it allows multiple brand’s models to be cost-effectively hosted on the same server. Alternatively, you can use the built-in object detection algorithm to quickly deploy standard models developed by AWS.
Train a custom object detection model with SageMaker
First, you need to add your object detection algorithm. In this case, upload a Docker container featuring scripts to train a Yolov5 object detection model to Amazon ECR:
- On the SageMaker console, under Notebook in the navigation pane, choose Notebook instances.
- Choose Create notebook instance.
- Enter a name for the notebook instance and under Permissions and encryption choose an AWS Identity and Access Management (IAM) role with the necessary permissions.
- Open the Git repositories menu.
- Select Clone a public Git repository to this notebook instance only and paste the following Git repository URL: https://github.com/portoaj/SageMakerObjectDetection
- Click Create notebook instance and wait about five minutes for the instance’s status to update from Pending to InService in the Notebook instance menu.
- Once the notebook is InService, select it and click Actions and Open Jupyter to launch the notebook instance in a new tab.
- Select the SageMakerObjectDetection directory and then click on
sagemakerobjectdetection.ipynb
to launch the Jupyter notebook. - Select the
conda_python3
kernel and click Set Kernel. - Select the code cell and set the
aws_account_id
variable to your AWS Account ID. - Click Run to begin the process of building a Docker container and uploading it to Amazon ECR. This process may take about 20 minutes to complete.
- Once the Docker container has been uploaded, return to the Notebook instances menu, select your instance, and click Actions and Stop to shut your notebook instance down.
After the algorithm is built and pushed to Amazon ECR, you can use it to train a model via the SageMaker console.
- On the SageMaker console, under Training in the navigation pane, choose Training jobs.
- Choose Create training job.
- Enter a name for the job and choose the AWS Identity and Access Management (IAM) role with the necessary permissions.
- For Algorithm source, select Your own algorithm container in ECR.
- For Container, enter the registry path.
- Setting a single ml.p2.xlarge instance under the resource configuration should be sufficient for training a Yolov5 model.
- Specify Amazon S3 locations for both your input data and output path and any other settings such as configuring a VPC via Amazon Virtual Private Cloud (Amazon VPC) or enabling Managed Spot Training.
- Choose Create training job.
You can track the model’s training progress on the SageMaker console.
Automated model training
The following diagram illustrates the automated model training workflow:
To make SageMaker start training the object detection model as soon as a user finishes uploading their data, the web server uses Amazon API Gateway to notify a Lambda function that the brand has finished and to begin a training job.
When a brand’s model is successfully trained, Amazon EventBridge calls a Lambda function that moves the trained model into the live endpoint’s S3 bucket, where it’s finally ready for inference. A newer alternative to using Amazon EventBridge to move models through the MLOps lifecycle that you should consider is SageMaker Pipelines.
Host the model for inference
The following diagram illustrates the inference workflow:
To use the trained models, SageMaker requires an inference model to be hosted by an endpoint. The endpoint is the server or array of servers that are used to actually host the inference model. Similar to the training container that we created, a Docker container for inference is hosted in Amazon ECR. The inference model uses that Docker container and takes the input image the user took with their phone, runs it through the trained object detection model, and outputs the result.
Again, The Barcode Registry uses custom Docker containers for the inference model to enable the use of Multi Model Server, but if only one model is needed that can be easily hosted through the built-in object detection algorithm.
Conclusion
The Barcode Registry (in conjunction with its partner Buyabarcode.com) uses AWS for its entire object detection pipeline. The web server reliably stores data in Amazon S3 and uses API Gateway and Lambda functions to connect the web server to the cloud. SageMaker readily trains and hosts ML models, which means a user can take a picture of a product on their phone and see if the product is a counterfeit. This post shows how to create and host an object detection model using SageMaker, as well as how to automate the process.
In testing, the model was able to achieve over 90% accuracy on a training set of 62 images and a testing set of 32 images, which is pretty impressive for a model trained without any human intervention. To get started training object detection models yourself check out the official documentation or learn how to deploy an object detection model to the edge using AWS IoT Greengrass.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
About the Authors
Andrew Masek, Software Engineer at The Barcode Registry.