Workplace safety hazards can exist in many different forms: sharp edges, falling objects, flying sparks, chemicals, noise, and a myriad of other potentially dangerous situations. Safety regulators such as Occupational Safety and Health Administration (OSHA) and European Commission often require that businesses protect their employees and customers from hazards that can cause injury by providing personal protective equipment (PPE) and ensuring their use. Across many industries, such as manufacturing, construction, food processing, chemical, healthcare, and logistics, workplace safety is usually a top priority. In addition, due to the COVID-19 pandemic, wearing PPE in public places has become important to reduce the spread of the virus. In this post, we show you how you can use Amazon Rekognition PPE detection to improve safety processes by automatically detecting if persons in images are wearing PPE. We start with an overview of the PPE detection feature, explain how it works, and then discuss the different ways to deploy a PPE detection solution based on your camera and networking requirements.
Amazon Rekognition PPE detection overview
Even when people do their best to follow PPE guidelines, sometimes they inadvertently forget to wear PPE or don’t realize it’s required in the area they’re in. This puts their safety at potential risk and opens the business to possible regulatory compliance issues. Businesses usually rely on site supervisors or superintendents to individually check and remind all people present in the designated areas to wear PPE, which isn’t reliable, effective, or cost-efficient at scale. With Amazon Rekognition PPE detection, businesses can augment manual checks with automated PPE detection.
With Amazon Rekognition PPE detection, you can analyze images from your on-premises cameras at scale to automatically detect if people are wearing the required protective equipment, such as face covers (surgical masks, N95 masks, cloth masks), head covers (hard hats or helmets), and hand covers (surgical gloves, safety gloves, cloth gloves). Using these results, you can trigger timely alarms or notifications to remind people to wear PPE before or during their presence in a hazardous area to help improve or maintain everyone’s safety.
You can also aggregate the PPE detection results and analyze them by time and place to identify how safety warnings or training practices can be improved or generate reports for use during regulatory audits. For example, a construction company can check if construction workers are wearing head covers and hand covers when they’re on the construction site and remind them if one or more PPE isn’t detected to support their safety in case of accidents. A food processing company can check for PPE such as face covers and hand covers on employees working in non-contamination zones to comply with food safety regulations. Or a manufacturing company can analyze PPE detection results across different sites and plants to determine where they should add more hazard warning signage and conduct additional safety training.
With Amazon Rekognition PPE detection, you receive a detailed analysis of an image, which includes bounding boxes and confidence scores for persons (up to 15 per image) and PPE detected, confidence scores for the body parts detected, and Boolean values and confidence scores for whether the PPE covers the corresponding body part. The following image shows an example of PPE bounding boxes for head cover, hand covers, and face cover annotated using the analysis provided by the Amazon Rekognition PPE detection feature.
Often just detecting the presence of PPE in an image isn’t very useful. It’s important to detect if the PPE is worn by the customer or employee. Amazon Rekognition PPE detection also predicts a confidence score for whether the protective equipment is covering the corresponding body part of the person. For example, if a person’s nose is covered by face cover, head is covered by head cover, and hands are covered by hand covers. This prediction helps filter out cases where the PPE is in the image but not actually on the person.
You can also supply a list of required PPE (such as face cover or face cover and head cover) and a minimum confidence threshold (such as 80%) to receive a consolidated list of persons on the image that are wearing the required PPE, not wearing the required PPE, and when PPE can not be determined (such as when a body part isn’t visible). This reduces the amount of code developers need to write to high level counts or reference a person’s information in the image to further drill down.
Now, let’s take a closer look at how Amazon Rekognition PPE detection works.
How it works
To detect PPE in an image, you call the DetectProtectiveEquipment API and pass an input image. You can provide the input image (in JPG or PNG format) either as raw bytes or as an object stored in an Amazon Simple Storage Service (Amazon S3) bucket. You can optionally use the SummarizationAttributes (ProtectiveEquipmentSummarizationAttributes) input parameter to request summary information about persons that are wearing the required PPE, not wearing the required PPE, or are indeterminate.
The following image shows an example input image and its corresponding output from the DetectProtectiveEquipment as seen on the Amazon Rekognition PPE detection console. In this example, we supply face cover as the required PPE and 80% as the required minimum confidence threshold as part of summarizationattributes. We receive a summarization result that indicates that there are four persons in the image that are wearing face covers at a confidence score of over 80% [person identifiers 0, 1,2, 3]. It also provides the full fidelity API response in the per-person results. Note that this feature doesn’t perform facial recognition or facial comparison and can’t identify the detected persons.
Following is the DetectProtectiveEquipment API request JSON for this sample image in the console:
{
"Image": {
"S3Object": {
"Bucket": "console-sample-images",
"Name": "ppe_group_updated.jpg"
}
},
"SummarizationAttributes": {
"MinConfidence": 80,
"RequiredEquipmentTypes": [
"FACE_COVER"
]
}
}
The response of the DetectProtectiveEquipment API is a JSON structure that includes up to 15 persons detected per image and for each person, the body parts detected (face, head, left hand, and right hand), the types of PPE detected, and if the PPE covers the corresponding body part. The full JSON response from DetectProtectiveEquipment API for this image is as follows:
"ProtectiveEquipmentModelVersion": "1.0",
"Persons": [
{
"BodyParts": [
{
"Name": "FACE",
"Confidence": 99.07738494873047,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.06805413216352463,
"Height": 0.09381836652755737,
"Left": 0.7537466287612915,
"Top": 0.26088595390319824
},
"Confidence": 99.98419189453125,
"Type": "FACE_COVER",
"CoversBodyPart": {
"Confidence": 99.76295471191406,
"Value": true
}
}
]
},
{
"Name": "LEFT_HAND",
"Confidence": 99.25702667236328,
"EquipmentDetections": []
},
{
"Name": "RIGHT_HAND",
"Confidence": 80.11490631103516,
"EquipmentDetections": []
},
{
"Name": "HEAD",
"Confidence": 99.9693374633789,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.09358207136392593,
"Height": 0.10753925144672394,
"Left": 0.7455776929855347,
"Top": 0.16204142570495605
},
"Confidence": 98.4826889038086,
"Type": "HEAD_COVER",
"CoversBodyPart": {
"Confidence": 99.99744415283203,
"Value": true
}
}
]
}
],
"BoundingBox": {
"Width": 0.22291666269302368,
"Height": 0.82421875,
"Left": 0.7026041746139526,
"Top": 0.15703125298023224
},
"Confidence": 99.97362518310547,
"Id": 0
},
{
"BodyParts": [
{
"Name": "FACE",
"Confidence": 99.71298217773438,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.05732834339141846,
"Height": 0.07323434203863144,
"Left": 0.5775181651115417,
"Top": 0.33671364188194275
},
"Confidence": 99.96135711669922,
"Type": "FACE_COVER",
"CoversBodyPart": {
"Confidence": 96.60395050048828,
"Value": true
}
}
]
},
{
"Name": "LEFT_HAND",
"Confidence": 98.09618377685547,
"EquipmentDetections": []
},
{
"Name": "RIGHT_HAND",
"Confidence": 95.69132995605469,
"EquipmentDetections": []
},
{
"Name": "HEAD",
"Confidence": 99.997314453125,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.07994530349969864,
"Height": 0.08479492366313934,
"Left": 0.5641391277313232,
"Top": 0.2394576370716095
},
"Confidence": 97.718017578125,
"Type": "HEAD_COVER",
"CoversBodyPart": {
"Confidence": 99.9454345703125,
"Value": true
}
}
]
}
],
"BoundingBox": {
"Width": 0.21979166567325592,
"Height": 0.742968738079071,
"Left": 0.49427083134651184,
"Top": 0.24296875298023224
},
"Confidence": 99.99588012695312,
"Id": 1
},
{
"BodyParts": [
{
"Name": "FACE",
"Confidence": 98.42090606689453,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.05756797641515732,
"Height": 0.07883334159851074,
"Left": 0.22534936666488647,
"Top": 0.35751715302467346
},
"Confidence": 99.97816467285156,
"Type": "FACE_COVER",
"CoversBodyPart": {
"Confidence": 95.9388656616211,
"Value": true
}
}
]
},
{
"Name": "LEFT_HAND",
"Confidence": 92.42487335205078,
"EquipmentDetections": []
},
{
"Name": "RIGHT_HAND",
"Confidence": 96.88029479980469,
"EquipmentDetections": []
},
{
"Name": "HEAD",
"Confidence": 99.98686218261719,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.0872764065861702,
"Height": 0.09496871381998062,
"Left": 0.20529428124427795,
"Top": 0.2652358412742615
},
"Confidence": 90.25578308105469,
"Type": "HEAD_COVER",
"CoversBodyPart": {
"Confidence": 99.99089813232422,
"Value": true
}
}
]
}
],
"BoundingBox": {
"Width": 0.19479165971279144,
"Height": 0.72265625,
"Left": 0.12187500298023224,
"Top": 0.2679687440395355
},
"Confidence": 99.98648071289062,
"Id": 2
},
{
"BodyParts": [
{
"Name": "FACE",
"Confidence": 99.32310485839844,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.055801939219236374,
"Height": 0.06405147165060043,
"Left": 0.38087061047554016,
"Top": 0.393160879611969
},
"Confidence": 99.98370361328125,
"Type": "FACE_COVER",
"CoversBodyPart": {
"Confidence": 98.56526184082031,
"Value": true
}
}
]
},
{
"Name": "LEFT_HAND",
"Confidence": 96.11709594726562,
"EquipmentDetections": []
},
{
"Name": "RIGHT_HAND",
"Confidence": 80.49284362792969,
"EquipmentDetections": []
},
{
"Name": "HEAD",
"Confidence": 99.91870880126953,
"EquipmentDetections": [
{
"BoundingBox": {
"Width": 0.08105235546827316,
"Height": 0.07952981442213058,
"Left": 0.36679577827453613,
"Top": 0.2875025272369385
},
"Confidence": 98.80988311767578,
"Type": "HEAD_COVER",
"CoversBodyPart": {
"Confidence": 99.6932144165039,
"Value": true
}
}
]
}
],
"BoundingBox": {
"Width": 0.18541666865348816,
"Height": 0.6875,
"Left": 0.3187499940395355,
"Top": 0.29218751192092896
},
"Confidence": 99.98927307128906,
"Id": 3
}
],
"Summary": {
"PersonsWithRequiredEquipment": [
0,
1,
2,
3
],
"PersonsWithoutRequiredEquipment": [],
"PersonsIndeterminate": []
}
}
Deploying Amazon Rekognition PPE detection
Depending on your use case, cameras, and environment setup, you can use different approaches to analyze your on-premises camera feeds for PPE detection. Because DetectProtectiveEquipment API only accepts images as input, you can extract frames from streaming or stored videos at the desired frequency (such as every 1, 2 or 5 seconds or every time motion is detected) and analyze those frames using the DetectProtectiveEquipment API. You can also set different frequencies of frame ingestion for cameras covering different areas. For example, you can set a higher frequency for busy or important locations and a lower frequency for areas that see light activity. This allows you to control the network bandwidth requirements because you only send images to the AWS cloud for processing.
The following architecture shows how you can design a serverless workflow to process frames from camera feeds for PPE detection.
We have included a demo web application that implements this reference architecture in the Amazon Rekognition PPE detection GitHub repo. This web app extracts frames from a webcam video feed and sends them to the solution deployed in the AWS Cloud. As images get analyzed with the DetectProtectiveEquipment API, a summary output is displayed in the web app in near-real time. Following are a few example GIFs showing the detection of face cover, head cover, and hand covers as they are worn by a person in front of the webcam that is sampling a frame every two seconds. Depending on your use case, you can adjust the sampling rate to a higher or lower frequency. A screenshot showing the full demo application output, including the PPE and PPE worn or not predictions is also shown below.
Face cover detection
Hand cover detection
Head cover detection
Full demo web application output
Using this application and solution, you can generate notifications with Amazon Simple Notification Service. Although not implemented in the demo solution (but shown in the reference architecture), you can store the PPE detection results to create anonymized reports of PPE detection events using AWS services such as AWS Glue, Amazon Athena, and Amazon QuickSight. You can also optionally store ingested images in Amazon S3 for a limited time for regulatory auditing purposes. For instructions on deploying the demo web application and solution, see the Amazon Rekognition PPE detection GitHub repo.
Instead of sending images via Amazon API Gateway, you can also send images directly to an S3 bucket. This allows you to store additional metadata, including camera location, time, and other camera information, as Amazon S3 object metadata. As images get processed, you can delete them immediately or set them to expire within a time window using a lifecycle policy for an S3 bucket as required by your organization’s data retention policy. You can use the following reference architecture diagram to design this alternate workflow.
Extracting frames from your video systems
Depending on your camera setup and video management system, you can use the SDK provided by the manufacturer to extract frames. For cameras that support HTTP(s) or RTSP streams, the following code sample shows how you can extract frames at a desired frequency from the camera feed and process them using DetectProtectiveEquipment API.
import cv2
import boto3
import time
from datetime import datetime
import json
def processFrame(videoStreamUrl):
cap = cv2.VideoCapture(videoStreamUrl)
ret, frame = cap.read()
if ret:
hasFrame, imageBytes = cv2.imencode(".jpg", frame)
if hasFrame:
session = boto3.session.Session()
rekognition = session.client('rekognition')
response = rekognition. detect_protective_equipment(
Image={
'Bytes': imageBytes.tobytes(),
}
)
print(response)
cap.release()
# Video stream
videoStreamUrl = "rtsp://@192.168.10.100"
frameCaptureThreshold = 300
while (True):
try:
processFrame(videoStreamUrl)
except Exception as e:
print("Error: {}.".format(e))
time.sleep(frameCaptureThreshold)
To extract frames from stored videos, you can use AWS Elemental MediaConvert or other tools such as FFmpeg or OpenCV. The following code shows you how to extract frames from stored video and process them using the DetectProtectiveEquipment API:
import json
import boto3
import cv2
import math
import io
videoFile = "video file"
rekognition = boto3.client('rekognition')
ppeLabels = []
cap = cv2.VideoCapture(videoFile)
frameRate = cap.get(5) #frame rate
while(cap.isOpened()):
frameId = cap.get(1) #current frame number
print("Processing frame id: {}".format(frameId))
ret, frame = cap.read()
if (ret != True):
break
if (frameId % math.floor(frameRate) == 0):
hasFrame, imageBytes = cv2.imencode(".jpg", frame)
if(hasFrame):
response = rekognition. detect_protective_equipment(
Image={
'Bytes': imageBytes.tobytes(),
}
)
for person in response["Persons"]:
person["Timestamp"] = (frameId/frameRate)*1000
ppeLabels.append(person)
print(ppeLabels)
with open(videoFile + ".json", "w") as f:
f.write(json.dumps(ppeLabels))
cap.release()
Detecting other and custom PPE
Although the DetectProtectiveEquipment API covers the most common PPE, if your use case requires identifying additional equipment specific to your business needs, you can use Amazon Rekognition Custom Labels. For example, you can use Amazon Rekognition Custom Labels to quickly train a custom model to detect safety goggles, high visibility vests, or other custom PPE by simply supplying some labelled images of what to detect. No machine learning expertise is required to use Amazon Rekognition Custom Labels. When you have a custom model trained and ready for inference, you can then make parallel calls to DetectProtectiveEquipment and to the Amazon Rekognition Custom Labels model to detect all the required PPE and combine the results for further processing. For more information about using Amazon Rekognition Custom Labels to detect high-visibility vests including a sample solution with instructions, please visit the Custom PPE detection GitHub repository. You can use the following reference architecture diagram to design a combined DetectProtectiveEquipment and Amazon Rekognition Custom Labels PPE detection solution.
Conclusion
In this post, we showed how to use Amazon Rekognition PPE detection (the DetectProtectiveEquipment API) to automatically analyze images and video frames to check if employees and customers are wearing PPE such as face covers, hand covers, and head covers. We covered different implementation approaches, including frame extraction from cameras, stored video, and streaming videos. Finally, we covered how you can use Amazon Rekognition Custom Labels to identify additional equipment that is specific to your business needs.
To test PPE detection with your own images, sign in to the Amazon Rekognition console and upload your images in the Amazon Rekognition PPE detection console demo. For more information about the API inputs, outputs, limits, and recommendations, see Amazon Rekognition PPE detection documentation. To find out what our customers think about the feature or if you need a partner to help build an end-to-end PPE detection solution for your organization, see the Amazon Rekognition workplace safety web-page.
About the Authors
Tushar Agrawal leads Outbound Product Management for Amazon Rekognition. In this role, he focuses on making customers successful by solving their business challenges with the right solution and go-to-market capabilities. In his spare time, he loves listening to music and re-living his childhood with his kindergartener.
Kashif Imran is a Principal Solutions Architect at Amazon Web Services. He works with some of the largest AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement computer vision applications at scale. His expertise spans application architecture, serverless, containers, NoSQL and machine learning.
Matteo Figus is an AWS Solution Engineer based in the UK. Matteo works with the AWS Solution Architects to create standardized tools, code samples, demonstrations and quickstarts. He is passionate about open-source software and in his spare time he likes to cook and play the piano.
Connor Kirkpatrick is an AWS Solution Engineer based in the UK. Connor works with the AWS Solution Architects to create standardised tools, code samples, demonstrations and quickstarts. He is an enthusiastic squash player, wobbly cyclist, and occasional baker.