Annotate dense point cloud data using SageMaker Ground Truth

Autonomous vehicle companies typically use LiDAR sensors to generate a 3D understanding of the environment around their vehicles. For example, they mount a LiDAR sensor on their vehicles to continuously capture point-in-time snapshots of the surrounding 3D environment. The LiDAR sensor output is a sequence of 3D point cloud frames (the typical capture rate is 10 frames per second). Amazon SageMaker Ground Truth makes it easy to label objects in a single 3D frame or across a sequence of 3D point cloud frames for building machine learning (ML) training datasets. Ground Truth also supports sensor fusion of camera and LiDAR data with up to eight video camera inputs.

As LiDAR sensors become more accessible and cost-effective, customers are increasingly using point cloud data in new spaces like robotics, signal mapping, and augmented reality. Some new mobile devices even include LiDAR sensors, one of which supplied the data for this post! The growing availability of LiDAR sensors has increased interest in point cloud data for ML tasks, like 3D object detection and tracking, 3D segmentation, 3D object synthesis and reconstruction, and even using 3D data to validate 2D depth estimation.

Although dense point cloud data is rich in information (over 1 million point clouds), it’s challenging to label because labeling workstations often have limited memory, and graphics capabilities and annotators tend to be geographically distributed, which can increase latency. Although large numbers of points may be renderable in a labeler’s workstation, labeler throughput can be reduced due to rendering time when dealing with multi-million sized point clouds, greatly increasing labeling costs and reducing efficiency.

A way to reduce these costs and time is to convert point cloud labeling jobs into smaller, more easily rendered tasks that preserve most of the point cloud’s original information for annotation. We refer to these approaches broadly as downsampling, similar to downsampling in the signal processing domain. Like in the signal processing domain, point cloud downsampling approaches attempt to remove points while preserving the fidelity of the original point cloud. When annotating downsampled point clouds, you can use the output 3D cuboids for object tracking and object detection tasks directly for training or validation on the full-size point cloud with little to no impact on model performance while saving labeling time. For other modalities, like semantic segmentation, in which each point has its own label, you can use your downsampled labels to predict the labels on each point in the original point cloud, allowing you to perform a tradeoff between labeler cost (and therefore amount of labeled data) and a small amount of misclassifications of points in the full-size point cloud.

In this post, we walk through how to perform downsampling techniques to prepare your point cloud data for labeling, then demonstrate how to upsample your output labels to apply to your original full-size dataset using some in-sample inference with a simple ML model. To accomplish this, we use Ground Truth and Amazon SageMaker notebook instances to perform labeling and all preprocessing and postprocessing steps.

The data

The data we use in this post is a scan of an apartment building rooftop generated using the 3D Scanner App on an iPhone12 Pro. The app allows you to use the built-in LiDAR scanners on mobile devices to scan a given area and export a point cloud file. In this case, the point cloud data is in xyzrgb format, an accepted format for a Ground Truth point cloud. For more information about the data types allowed in a Ground Truth point cloud, see Accepted Raw 3D Data Formats.

The following image shows our 3D scan.

Methods

We first walk through a few approaches to reduce dataset size for labeling point clouds: tiling, fixed step sample, and voxel mean. We demonstrate why downsampling techniques can increase your labeling throughput without significantly sacrificing annotation quality, and then we demonstrate how to use labels created on the downsampled point cloud and apply them to your original point cloud with an upsampling approach.

Downsampling approaches

Downsampling is taking your full-size dataset and either choosing a subset of points from it to label, or creating a representative set of new points that aren’t necessarily in the original dataset, but are close enough to allow labeling.

Tiling

One naive approach is to break your point cloud space into 3D cubes, otherwise known as voxels, of (for example) 500,000 points each that are labeled independently in parallel. This approach, called tiling, effectively reduces the scene size for labeling.

However, it can greatly increase labeling time and costs, because a typical 8-million-point scene may need to be broken up into over 16 sub-scenes. The large number of independent tasks that result from this method means more annotator time is spent on context switching between tasks, and workers may lose context when the scene is too small, resulting in mislabeled data.

Fixed step sample

An alternative approach is to select or create a reduced number of points by a linear subsample, called a fixed step sample. Let’s say you want to hit a target of 500,000 points (we have observed this is generally renderable on a consumer laptop—see Accepted Raw 3D Data Format), but you have a point cloud with 10 million points. You can calculate your step size as step = 10,000,000 / 500,000 = 20. After you have a step size, you can select every 20th point in your dataset, creating a new point cloud. If your point cloud data is of high enough density, labelers should still be able to make out any relevant features for labeling even though you may only have 1 point for every 20 in the original scene.

The downside of this approach is that not all points contribute to the final downsampled result, meaning that if a point is one of few important ones, but not part of the sample, your annotators may miss the feature entirely.

Voxel mean

An alternate form of downsampling that uses all points to generate a downsampled point cloud is to perform grid filtering. Grid filtering means you break the input space into regular 3D boxes (or voxels) across the point cloud and replace all points within a voxel with a single representative point (the average point, for example). The following diagram shows an example voxel red box.

If no points exist from the input dataset within a given voxel, no point is added to the downsampled point cloud for that voxel. Grid filtering differs from a fixed step sample because you can use it to reduce noise and further tune it by adjusting the kernel size and averaging function to result in slightly different final point clouds. The following point clouds show the results of simple (fixed step sample) and advanced (voxel mean) downsampling. The point cloud downsampled using the advanced method is smoother, this is particularly noticeable when comparing the red brick wall in the back of both scenes.

Upsampling approach

After downsampling and labeling your data, you may want to see the labels produced on the smaller, downsampled point cloud projected onto the full-size point cloud, which we call upsampling. Object detection or tracking jobs don’t require post-processing to do this. Labels in the downsampled point cloud (like cuboids) are directly applicable to the larger point cloud because they’re defined in a world coordinate space shared by the full-size point cloud (x, y, z, height, width, length). These labels are minimally susceptible to very small errors along the boundaries of objects when a boundary point wasn’t in the downsampled dataset, but such occasional and minor errors are outweighed by the number of extra, correctly labeled points within the cuboid that can also be trained on.

For 3D point cloud semantic segmentation jobs, however, the labels aren’t directly applicable to the full-size dataset. We only have a subset of the labels, but we want to predict the rest of the full dataset labels based on this subset. To do this, we can use a simple K-Nearest Neighbors (K-NN) classifier with the already labeled points serving as the training set. K-NN is a simple supervised ML algorithm that predicts the label of a point using the “K” closest labeled points and a weighted vote. With K-NN, we can predict the point class of the rest of the unlabeled points in the full-size dataset based on the majority class of the three closest (by Euclidean distance) points. We can further refine this approach by varying the hyperparameters of a K-NN classifier, like the number of closest points to consider as well as the distance metric and weighting scheme of points.

After you map the sample labels to the full dataset, you can visualize tiles within the full-size dataset to see how well the upsampling strategy worked.

Now that we’ve reviewed the methods used in this post, we demonstrate these techniques in a SageMaker notebook on an example semantic segmentation point cloud scene.

Prerequisites

To walk through this solution, you need the following:

  • An AWS account.
  • A notebook AWS Identity and Access Management (IAM) role with the permissions required to complete this walkthrough. Your IAM role must have the following AWS managed policies attached:
    • AmazonS3FullAccess
    • AmazonSageMakerFullAccess
  • An Amazon Simple Storage Service (Amazon S3) bucket where the notebook artifacts (input data and labels) are stored.
  • A SageMaker work team. For this post, we use a private work team. You can create a work team on the SageMaker console.

Notebook setup

We use the notebook ground_truth_annotation_dense_point_cloud_tutorial.ipynb in the SageMaker Examples section of a notebook instance to demonstrate these downsampling and upsampling approaches. This notebook contains all code required to perform preprocessing, labeling, and postprocessing.

To access the notebook, complete the following steps:

  1. Create a notebook instance. You can use the instance type, ml.t2.xlarge, to launch the notebook instance. Please choose an instance with at least 16 GB of RAM.
    1. You need to use the notebook IAM role you created early. This role allows your notebook to upload your dataset to Amazon S3 and call the solution APIs.
  2. Open Jupyter Lab or Jupyter to access your notebook instance.
  3. In Jupyter, choose the SageMaker Examples In Jupyter Lab, choose the SageMaker icon.
  4. Choose Ground Truth Labeling Jobs and then choose the ipynb notebook.
  5. If you’re using Jupyter, choose Use to copy the notebook to your instance and run it. If you’re in Jupyter lab, choose Create a Copy.

Provide notebook inputs

First, we modify the notebook to add our private work team ARN and the bucket location we use to store our dataset as well as our labels.

Section 1: Retrieve the dataset and visualize the point cloud

We download our data by running Section 1 of our notebook, which downloads our dataset from Amazon S3 and loads the point cloud into our notebook instance. We download custom prepared data from an AWS owned bucket. An object called rooftop_12_49_41.xyz should be in the root of the S3 bucket. This data is a scan of an apartment building rooftop custom generated on a mobile device. In this case, the point cloud data is in xyzrgb format.

We can visualize our point cloud using the Matplotlib scatter3d function. The point cloud file contains all the correct points but isn’t rotated correctly. We can rotate the object around its axis by multiplying the point cloud by a rotation matrix. We can obtain a rotation matrix using scipy and specify the degree changes we want to make to each axis using the from_euler method:

!aws s3 cp s3://smgt-downsampling-us-east-1-322552456788/rooftop_12_49_41.xyz pointcloud.xyz

# Let's read our dataset into a numpy file
pc = np.loadtxt("pointcloud.xyz", delimiter=",")

print(f"Loaded points of shape {pc.shape}")

# playing with view of 3D scene

from scipy.spatial.transform import Rotation

def plot_pointcloud(pc, rot = [[30,90,60]], color=True, title="Simple Downsampling 1", figsize=(50,25), verbose=False):
    if rot:
        rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True)
        R1 = rot1.as_matrix()
        if verbose:
            print('Rotation matrix:','n',R1)
            
        # matrix multiplication between our rotation matrix and pointcloud 
        pc_show = np.matmul(R1, pc.copy()[:,:3].transpose() ).transpose()
        if color:
            try:
                rot_color1 = np.matmul(R1, pc.copy()[:,3:].transpose() ).transpose().squeeze()
            except:
                rot_color1 = np.matmul(R1, np.tile(pc.copy()[:,3],(3,1))).transpose().squeeze()
    else:
        pc_show = pc
            
    fig = plt.figure( figsize=figsize)
    ax = fig.add_subplot(111, projection="3d")
    ax.set_title(title, fontdict={'fontsize':20})
    if color:
        ax.scatter(pc_show[:,0], pc_show[:,1], pc_show[:,2], c=rot_color1[:,0], s=0.05)
    else:
        ax.scatter(pc_show[:,0], pc_show[:,1], pc_show[:,2], c='blue', s=0.05)
        
# rotate in z direction 30 degrees, y direction 90 degrees, and x direction 60 degrees
rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True) 
print('Rotation matrix:','n', rot1.as_matrix())

plot_pointcloud(pc, rot = [[30,90,60]], color=True, title="Full pointcloud", figsize=(50,30))

Section 2: Downsample the dataset

Next, we downsample the dataset to less than 500,000 points, which is an ideal number of points for visualizing and labeling. For more information, see the Point Cloud Resolution Limits in Accepted Raw 3D Data Formats. Then we plot the results of our downsampling by running Section 2.

As we discussed earlier, the simplest form of downsampling is to choose values using a fixed step size based on how large we want our resulting point cloud to be.

A more advanced approach is to break the input space into cubes, otherwise known as voxels, and choose a single point per box using an averaging function. A simple implementation of this is shown in the following code.

You can tune the target number of points and box size used to see the reduction in point cloud clarity as more aggressive downsampling is performed.

#Basic Approach
target_num_pts = 500_000
subsample = int(np.ceil(len(pc) / target_num_pts))
pc_downsample_simple = pc[::subsample]
print(f"We've subsampled to {len(pc_downsample_simple)} points")

#Advanced Approach
boxsize = 0.013 # 1.3 cm box size.
mins = pc[:,:3].min(axis=0)
maxes = pc[:,:3].max(axis=0)
volume = maxes - mins
num_boxes_per_axis = np.ceil(volume / boxsize).astype('int32').tolist()
num_boxes_per_axis.extend([1])

print(num_boxes_per_axis)

# For each voxel or "box", use the mean of the box to chose which points are in the box.
means, _, _ = scipy.stats.binned_statistic_dd(
    pc[:,:4],
    [pc[:,0], pc[:,1], pc[:,2], pc[:,3]], 
    statistic="mean",
    bins=num_boxes_per_axis,
)
x_means = means[0,~np.isnan(means[0])].flatten()
y_means = means[1,~np.isnan(means[1])].flatten()
z_means = means[2,~np.isnan(means[2])].flatten()
c_means = means[3,~np.isnan(means[3])].flatten()
pc_downsample_adv = np.column_stack([x_means, y_means, z_means, c_means])
print(pc_downsample_adv.shape)

Section 3: Visualize the 3D rendering

We can visualize point clouds using a 3D scatter plot of the points. Although our point clouds have color, our transforms have different effects on color, so comparing them in a single color provides a better comparison. We can see that the advanced voxel mean method creates a smoother point cloud because averaging has a noise reduction effect. In the following code, we can look at our point clouds from two separate perspectives by multiplying our point clouds by different rotation matrices.

When you run Section 3 in the notebook, you also see a comparison of a linear step approach versus a box grid approach, specifically in how the box grid filter has a slight smoothing effect on the overall point cloud. This smoothing could be important depending on the noise level of your dataset. Modifying the grid filtering function from mean to median or some other averaging function can also improve the final point cloud clarity. Look carefully at the back wall of the simple (fixed step size) and advanced (voxel mean) downsampled examples, notice the smoothing effect the voxel mean method has compared to the fixed step size method.

rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True)
R1 = rot1.as_matrix()

simple_rot1 = pc_downsample_simple.copy()
simple_rot1 = np.matmul(R1, simple_rot1[:,:3].transpose() ).transpose()
advanced_rot1 = pc_downsample_adv.copy()
advanced_rot1 = np.matmul(R1, advanced_rot1[:,:3].transpose() ).transpose()

fig = plt.figure( figsize=(50, 30))
ax = fig.add_subplot(121, projection="3d")
ax.set_title("Simple Downsampling 1", fontdict={'fontsize':20})
ax.scatter(simple_rot1[:,0], simple_rot1[:,1], simple_rot1[:,2], c='blue', s=0.05)
ax = fig.add_subplot(122, projection="3d")
ax.set_title("Voxel Mean Downsampling 1", fontdict={'fontsize':20})
ax.scatter(advanced_rot1[:,0], advanced_rot1[:,1], advanced_rot1[:,2], c='blue', s=0.05)

# to look at any of the individual pointclouds or rotate the pointcloud, use the following function

plot_pointcloud(pc_downsample_adv, rot = [[30,90,60]], color=True, title="Advanced Downsampling", figsize=(50,30))

Section 4: Launch a Semantic Segmentation Job

Run Section 4 in the notebook to take this point cloud and launch a Ground Truth point cloud semantic segmentation labeling job using it. These cells generate the required input manifest file and format the point cloud in a Ground Truth compatible representation.

To learn more about the input format of Ground Truth as it relates to point cloud data, see Input Data and Accepted Raw 3D Data Formats.

In this section, we also perform the labeling in the worker portal. We label a subset of the point cloud to have some annotations to perform upsampling with. When the job is complete, we load the annotations from Amazon S3 into a NumPy array for our postprocessing. The following is a screenshot from the Ground Truth point cloud semantic segmentation tool.

Section 5: Perform label upsampling

Now that we have the downsampled labels, we train a K-NN classifier from SKLearn to predict the full dataset labels by treating our annotated points as training data and performing inference on the remainder of the unlabeled points in our full-size point cloud.

You can tune the number of points used as well as the distance metric and weighting scheme to influence how label inference is performed. If you label a few tiles in the full-size dataset, you can use those labeled tiles as ground truth to evaluate the accuracy of the K-NN predictions. You can then use this accuracy metric for hyperparameter tuning of K-NN or to try different inference algorithms to reduce your number of misclassified points between object boundaries, resulting in the lowest possible in-sample error rate. See the following code:

# There's a lot of possibility to tune KNN further
# 1) Prevent classification of points far away from all other points (random unfiltered ground point)
# 2) Perform a non-uniform weighted vote
# 3) Tweak number of neighbors
knn = KNeighborsClassifier(n_neighbors=3)
print(f"Training on {len(pc_downsample_adv)} labeled points")
knn.fit(pc_downsample_adv[:,:3], annotations)

print(f"Upsampled to {len(pc)} labeled points")
annotations_full = knn.predict(pc[:,:3])

Section 6: Visualize the upsampled labels

Now that we have performed upsampling of our labeled data, we can visualize a tile of the original full-size point cloud. We aren’t rendering all of the full-size point cloud because that may prevent our visualization tool from rendering. See the following code:

pc_downsample_annotated = np.column_stack((pc_downsample_adv[:,:3], annotations))
pc_annotated = np.column_stack((pc[:,:3], annotations_full))

labeled_area = pc_downsample_annotated[pc_downsample_annotated[:,3] != 255]
min_bounds = np.min(labeled_area, axis=0)
max_bounds = np.max(labeled_area, axis=0)

min_bounds = [-2, -2, -4.5, -1]
max_bounds = [2, 2, -1, 256]

def extract_tile(point_cloud, min_bounds, max_bounds):
    return point_cloud[
        (point_cloud[:,0] > min_bounds[0])
        & (point_cloud[:,1] > min_bounds[1])
        & (point_cloud[:,2] > min_bounds[2])
        & (point_cloud[:,0] < max_bounds[0])
        & (point_cloud[:,1] < max_bounds[1])
        & (point_cloud[:,2] < max_bounds[2])
    ]


tile_downsample_annotated = extract_tile(pc_downsample_annotated, min_bounds, max_bounds)
tile_annotated = extract_tile(pc_annotated, min_bounds, max_bounds)

rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True)
R1 = rot1.as_matrix()

down_rot = tile_downsample_annotated.copy()
down_rot = np.matmul(R1, down_rot[:,:3].transpose() ).transpose()
down_rot_color = np.matmul(R1, np.tile(tile_downsample_annotated.copy()[:,3],(3,1))).transpose().squeeze()

full_rot = tile_annotated.copy()
full_rot = np.matmul(R1, full_rot[:,:3].transpose() ).transpose()
full_rot_color = np.matmul(R1, np.tile(tile_annotated.copy()[:,3],(3,1))).transpose().squeeze()


fig = plt.figure(figsize=(50, 20))
ax = fig.add_subplot(121, projection="3d")
ax.set_title("Downsampled Annotations", fontdict={'fontsize':20})
ax.scatter(down_rot[:,0], down_rot[:,1], down_rot[:,2], c=down_rot_color[:,0], s=0.05)
ax = fig.add_subplot(122, projection="3d")
ax.set_title("Upsampled Annotations", fontdict={'fontsize':20})
ax.scatter(full_rot[:,0], full_rot[:,1], full_rot[:,2], c=full_rot_color[:,0], s=0.05)

Because our dataset is dense, we can visualize the upsampled labels within a tile to see the downsampled labels upsampled to the full-size point cloud. Although a small number of misclassifications may exist along boundary regions between objects, you also have many more correctly labeled points in the full-size point cloud than the initial point cloud, meaning your overall ML accuracy may improve.

Cleanup

Notebook instance: you have two options if you do not want to keep the created notebook instance running. If you would like to save it for later, you can stop rather than deleting it.

  • To stop a notebook instance: click the Notebook instances link in the left pane of the SageMaker console home page. Next, click the Stop link under the ‘Actions’ column to the left of your notebook instance’s name. After the notebook instance is stopped, you can start it again by clicking the Start link. Keep in mind that if you stop rather than delete it, you will be charged for the storage associated with it.
  • To delete a notebook instance: first stop it per the instruction above. Next, click the radio button next to your notebook instance, then select Delete from the Actions drop down menu.

Conclusion

Downsampling point clouds can be a viable method when preprocessing data for object detection and object tracking labeling. It can reduce labeling costs while still generating high-quality output labels, especially for 3D object detection and tracking tasks. In this post, we demonstrated how the downsampling method can affect the clarity of the point cloud for workers, and showed a few approaches that have tradeoffs based on the noise level of the dataset.

Finally, we showed that you can perform 3D point cloud semantic segmentation jobs on downsampled datasets and map the labels to the full-size point cloud through in-sample prediction. We accomplished this by training a classifier to do inference on the remaining full dataset size points, using the already labeled points as training data. This approach enables cost-effective labeling of highly dense point cloud scenes while still maintaining good overall label quality.

Test out this notebook with your own dense point cloud scenes in Ground Truth, try out new downsampling techniques, and even try new models beyond K-NN for final in-sample prediction to see if downsampling and upsampling techniques can reduce your labeling costs.


About the Authors

 Vidya Sagar Ravipati is a Deep Learning Architect at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption. Previously, he was a Machine Learning Engineer in Connectivity Services at Amazon who helped to build personalization and predictive maintenance platforms.

 

Isaac Privitera is a Machine Learning Specialist Solutions Architect and helps customers design and build enterprise-grade computer vision solutions on AWS. Isaac has a background in using machine learning and accelerated computing for computer vision and signals analysis. Isaac also enjoys cooking, hiking, and keeping up with the latest advancements in machine learning in his spare time.

 

 

Jeremy Feltracco is a Software Development Engineer with the Amazon ML Solutions Lab at Amazon Web Services. He uses his background in computer vision, robotics, and machine learning to help AWS customers accelerate their AI adoption.

Read More

Intelligent governance of document processing pipelines for regulated industries

Processing large documents like PDFs and static images is a cornerstone of today’s highly regulated industries. From healthcare information like doctor-patient visits and bills of health, to financial documents like loan applications, tax filings, research reports, and regulatory filings, these documents are integral to how these industries conduct business. The mechanisms by which these documents are processed and analyzed, however, are often manual, time-consuming, error-prone, expensive, and not easily scalable.

Fortunately, recent innovations in this space are helping companies improve these methods. Machine learning (ML) techniques such as optical character recognition (OCR) and natural language processing (NLP) enable firms to digitize and extract text from millions of documents and understand the content, including contextual nuances of the language within them. Furthermore, you can then transform the extracted text by merging it with supplemental data to produce additional business insights.

This step-by-step method is called a document processing pipeline. The pipeline includes various components to extract, transform, enrich, and conform the data. New data domains are often introduced and used for numerous downstream business purposes. For example, in financial services, you could be identifying connected financial events, calculating environmental risk scores, and developing risk models. Because these documents help inform or even dictate such important data-driven decisions, it’s imperative for regulated industry companies to establish and maintain a robust data governance framework as part of these document processing pipelines. Without governance, pipelines become a dumping ground where documents are inconsistently stored, duplicated, and processed, and the business is unable to explain to potential auditors where the data that fed their decisions came from, or what that data was used for.

A data governance framework is made up of people, processes, and technology. It enables business users to work collaboratively with technologists to drive clean, certified, and trusted data. It consists of several components including data quality, data catalog, data ownership, data lineage, operation, and compliance. In this post, we discuss data catalog, data ownership, and data lineage, and how they tie together with building document processing pipelines for regulated industries.

For more information about design patterns on data quality, see How to Architect Data Quality on the AWS Cloud.

Data lineage

Data lineage is the part of data governance that refers to the practice of providing GPS services for data. At any point in time, it can explain where the data originated, what happened to it, what its latest status is, and where it’s headed from this point on.

It provides visibility while simplifying the ability to trace financial numbers back to their origin, and provides transparency on potential errors and their root cause analyses.

Furthermore, you can use data lineage captured over time as analytical inputs to drive accuracy scores.

It’s imperative for a document processing pipeline to have a well-defined data lineage framework. The framework should include an end-to-end lifecycle, responsibility model, and the technology to enable data transformation transparency. Without lineage, the data can’t be trusted.

To illustrate this end-to-end data lineage concept, we walk you through creating an NLP-powered document search engine with built-in lineage at each step. Every object and piece of data processed by this ML pipeline can be traced back to the original document.

Each processing component can be replaced by your choice of tooling or bespoke ML model. Furthermore, you can customize the solution to include other use cases, such as central document data lakes or supplemental tabular data feed to an online transaction processing (OLTP) application.

The solution follows an event-driven architecture in which the completion of each stage within the pipeline triggers the next step, while providing self-service lineage for traceability and monitoring. In addition, hooks have been included to provide capabilities to extend the pipeline to additional workloads.

This design uses the following AWS services (you can also follow along in the GitHub repo):

  • Amazon Comprehend – An NLP service that uses ML to find insights and relationships in text, and can do so in multiple languages.
  • Amazon DynamoDB – A key-value and document database that delivers single-digit millisecond performance at any scale.
  • Amazon DynamoDB Streams – A change data capture (CDC) service. It captures an ordered flow of information about changes to items in a DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.
  • Amazon Elasticsearch Service (Amazon ES) – A fully managed service that makes it possible for you to deploy, secure, and run Elasticsearch cost-effectively and at scale. You can build, monitor, and troubleshoot your applications using the tools you love, at the scale you need.
  • AWS Lambda – A serverless compute service that runs code in response to triggers such as changes in data, shifts in system state, or user actions. Because Amazon S3 can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.
  • Amazon Simple Notification Service (Amazon SNS) – An AWS managed service for application-to-application communications, with a pub/sub model enabling high-throughput, low-latency message relaying.
  • Amazon Simple Queue Service (Amazon SQS) – A fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications.
  • Amazon Simple Storage Service (Amazon S3) – An object storage service to stores your documents and allows for central management with fine-tuned access controls.
  • Amazon Textract – A fully managed ML service that automatically extracts printed text, handwriting, and other data from scanned documents that goes beyond OCR to identify, understand, and extract data from forms and tables.

Architecture

The overall design is grouped into five segments:

  • Metadata services module
  • Ingestion module
  • OCR module
  • NLP module
  • Analytics module

All components interact via asynchronous events to allow for scalability. The following diagram illustrates the conceptual design.

The following diagram illustrates the physical design.

Metadata services

This is an encapsulated module to register, track, and trace incoming documents. It’s designed to be used across many different document processing pipelines. In your organization, one team might decide to use the OCR and NLP modules designed in this post. Another team might decide to use a different pipeline. However, governance practices of each pipeline should be consistent, and documents should be registered one time with full transparency on movement and downstream usage. Each document can be processed several times. You can extend the catalog and lineage services designed in this post to keep track of many pipelines, from multiple sources of data.

At the core, the metadata services module contains four reference tables, an SNS topic, three SQS queues, and three self-contained Lambda functions. Tables are created in DynamoDB, and schemas can be easily extended to include additional data attributes deemed important for your pipeline.

In addition, you can extend this design to include additional data governance components such as data quality.

The tables are defined as follows.

Table Name Purpose DynamoDB Stream Enabled? Data Governance component Sample Use
Document Registry Keeps track of all incoming documents. Each document is assigned a unique document ID and registered one time in this table. Yes Catalog Provides the ability to quickly look up and understand the document source and context metadata.
Document Ownership Covers responsibility model of the data governance in which each document acquired to the pipeline has a defined owner. No Ownership Provides notification services and can be extended to manage data quality controls.
Document Lineage Keeps track of all data movements. It provides detailed lineage info that includes the source S3 bucket name, destination S3 bucket name, source file name, target file name, ARN ID of the AWS service that processed the document, and timestamp. No

A simple PartiQL query against this table based on the document ID provides a list of all steps the original document has taken. Query output can include the following columns:

·         Document ID

·         Original document name

·         Timestamp

·         Source S3 bucket

·         Source file name

·         Destination S3 bucket

·         Destination file name

Pipeline Operations Keeps a record of all pipeline actions taken on a document ID, including the current pipeline stage and its status, and keeps a timeline of the stages in chronological order. Yes An operational query on a document ID to determine where in the pipeline the current document processing is.

DynamoDB Streams allows downstream application code to react to updates to objects in DynamoDB. It provides a mechanism to keep an event-based microservices architecture in place by triggering subsequent steps of a workflow whenever new documents are written to our Document Registry table, and subsequently when new document references are created in the Pipeline Operations table.

In addition, DynamoDB Streams provides developer teams with an efficient way of connecting your application logic to various updates in the tables (for example, to keep track of a particular document ID based on owner tags, or alert when certain unexpected problems arise while processing some documents).

The Lambda functions provide microservices API call capabilities for the document pipeline to self-register its movements and actions undertaken by the pipeline code:

  • Document Arrival Register API – Registers the incoming document’s metadata and location within Document Registry table
  • Document Lineage API – Registers the lineage information within Document Lineage table
  • Pipeline Operations API – Provides up-to-date information on the state of the pipeline

The SNS topic is used as a sink for incoming messages from all pipeline movements and document registrations. It disseminates the messages to each downstream subscribed SQS queue according to what type of message was received. In this model, the number of consumers of the messages coming through the SNS topic could be greatly expanded as needed, and all messages are guaranteed to stay in order, because both the SNS topics and SQS queues are created in a First-In-First-Out (FIFO) configuration to prevent duplicates and maintain single-threaded processing in the pipeline.

Using Amazon SNS in the design provides scalability by creating a pub/sub architecture. A pub/sub architecture design is a pattern that provides a framework to decouple the services that produce an event from services that process the event. Many subscribers can subscribe to the same event and trigger different pipelines. For example, this design can easily be extended to process incoming XML file formats by subscribing an additional XML process pipeline for the same event.

The following table provides schema information. The document ID is identical and unique for each document and is part of the composite primary key used to identify movement of each document within the pipeline.

The following diagram shows the architecture of our metadata services.

Ingestion module

The ingestion workload is triggered when a new document is uploaded to the NLP/Raw S3 bucket (or the bucket where raw documents are placed from users or front-end applications).

The ingestion module follows a four-step process (as shown in the following diagram):

  1. A document is uploaded to the NLP/Raw S3 bucket.
  2. The Document Registrar Lambda function is invoked, which calls the metadata services API to register the document and receive a unique ID. This ID is added to the document as a tag, and the metadata is registered within the DynamoDB table Document Registry.
  3. After the document metadata is registered with Metadata Services, the DynamoDB Document Registration stream is invoked to start the Document Classification Lambda function. This function examines the metadata registered on the document and determines if the downstream OCR segment should be invoked on this document. The result of this examination is written back to the metadata services.
  4. The metadata registration of the previous step invokes a DynamoDB Pipeline Operations Stream, which invokes the Document Extension Detector Lambda function. This function examines the incoming file formats and separates the images files from PDF documents.

All steps are registered in metadata services. The red dotted lines in the following diagram represent the metadata asynchronous API calls.

OCR module

This module detects the incoming file format and uses Amazon Textract in this implementation to convert the incoming documents into text. Amazon Textract can process image files synchronously, and PDF and other documents asynchronously, to allow time for the service to complete its analysis.

The OCR module consists of the following process, as illustrated in the architecture diagram:

  1. Image files are uploaded to the NLP/image S3 bucket and the Sync Processor Lambda function is invoked. The function synchronously points Amazon Textract to the S3 location of the image file, and waits for a response.
  2. Amazon Textract transforms the format to text and deposits the text output in the NLP/Textract. This step concludes OCR processing of the image file types.
  3. PDF files are placed within the NLP/PDF S3 bucket. This bucket invokes the Async Processor Lambda function. This function feeds the document to Amazon Textract and completes its state, registering as such with the metadata services.
  4. When the Amazon Textract document analysis is complete, an SNS message is sent to a specified SNS topic, notifying downstream consumers of the job completion. In this implementation, an SQS queue captures that message.
  5. The SQS queue message is the event that triggers the Result Processor Lambda function.
  6. The function extracts the results of document analysis from Amazon Textract and formats it according to the type of text it analyzed (forms, tables, and raw text).
  7. The results are pushed to the NLP/Textract S3 bucket, page by page for every type of text, and as a complete JSON response.

All the progress is registered in metadata services. The red dotted lines in the diagram represent the metadata asynchronous API calls.

NLP module

This module detects key phrases and entities within the document by using the text output from the OCR module. A key phrase is a string containing a noun phrase that describes a particular thing. It generally consists of a noun and the modifiers that distinguish it. For example, “day” is a noun; “a beautiful day” is a noun phrase that includes an article (“a”) and an adjective (“beautiful”).

Once key phrases are understood, it’s quite likely that indexing them in an analytical tool would let you find this article quickly and accurately. For example, if you want to analyze corporate social responsibility (CSR) reports, you can find attributes such as “reducing carbon footprints,” “improving labor policies,” “participating in fair-trade,” and “charitable giving” by indexing results of this module.

We use Amazon Comprehend to perform this function in this pipeline. However, as we explained earlier, you can easily swap the tooling used for this design with your preferred tool. For example, you can replace Amazon Comprehend with an Amazon SageMaker custom model as an alternative to extract key phrases and entities in a more domain-focused way. SageMaker is an ML service that you can use to build, train, and deploy ML models for virtually any use case.

Amazon Comprehend is called on a synchronous basis to extract key phrases in the following steps (as illustrated in the following diagram):

  1. The incoming text file uploaded to the NLP/Textract S3 bucket invokes the Sync Comprehend Processor Lambda function.
  2. The function feeds the incoming file to Amazon Comprehend for processing.
  3. The results from Amazon Comprehend, in JSON format, are deposited in the NLP/JSON S3 bucket.
  4. The results from Amazon Comprehend are sent to Amazon ES, the service we incorporate as our document search engine.

All steps are being registered in metadata services. The red dotted lines in the diagram represent the metadata asynchronous API calls.

Analytics module

This module is responsible for the consumption and analytics segment of the pipeline. The steps are illustrated in the following diagram:

  1. The output from Amazon Comprehend, in JSON format, is fed to Amazon Neptune. Neptune allows end users to discover relationships across documents. This is an example of a downstream analytics application that is not implemented in this post.
  2. The end users have access to the original document in four formats (CSV, JSON, original, text), and can search key phrases using Amazon ES. They can identify relationships using Neptune. A JSON version of the document is available in the NLP/JSON S3 bucket. The original document is available in the NLP/Raw S3 bucket.
  3. Full lineage can be obtained from the Document Lineage table in DynamoDB.

The analytics module has many potential implementations. For example, you can use a relational datastore like Amazon Relational Database (Amazon RDS) or Amazon Aurora to analyze extracted tabular data using SQL.

Conclusion

In this post, we architected an end-to-end document processing pipeline using AWS managed ML services. In addition, we introduced metadata services to help organizations create a centralized document repository to store documents one time but process multiple times. A data governance framework as illustrated in this design provides you with necessary guardrails to ensure documents are governed in a standard fashion across the organization, while providing lines of business with autonomy to decide your NLP and OCR models and choice of tooling.

The architecture discussed in this post has been coded and is available for deployment in the GitHub repo. You can download the code and create your pipeline within a few days.


About the Authors

  David Kheyman is a Solutions Architect at Amazon Web Services based out of New York City, where he designs and implements repeatable AWS architecture patterns and solutions for large organizations.

 

 

Mojgan Ahmadi is a Principal Solutions Architect with Amazon Web Services based in New York, where she guides global financial services customers to build highly secure, scalable, reliable, and cost-efficient applications on the cloud. She brings over 20 years of technology experience on Software Development and Architecture, Data Governance and Engineering, and IT Management.

 

Anirudh Menon is a Solutions Architect with Amazon Web Services based in New York, where he helps financial services customers drive innovation with AWS solutions and industry-specific patterns.

Read More

Announcing the AWS DeepComposer Chartbusters challenges 2021 season launch

We’re back with two new challenges for the AWS DeepComposer Chartbusters 2021 season! Chartbusters is a global challenge in which developers use AWS DeepComposer to create original compositions and compete in monthly challenges to showcase their machine learning (ML) and generative artificial intelligence (AI) skills. Regardless of your background in music or ML, one of the two new challenges will be right for you.

You can choose between two different challenges this season. In the basic challenge, Melody-Go-Round, you can use any of the generative AI models available in the AWS DeepComposer Music studio to create new compositions. In the advanced challenge, Melody Harvest, you train a custom generative AI model with your own dataset using Amazon SageMaker. In this challenge, you can dive deeper into the mechanics of data preparation, model training, and evaluation to teach a model to play your favorite style of music.

The 2021 season runs through October 31, 2021. Winners of each challenge are selected on the last day of each month, and we’ll feature the winners in an AWS Machine Learning Blog post. Monthly winners of the Melody Harvest challenge will also win a ticket to AWS re:Invent 2021. To participate, go to the AWS DeepComposer console and choose the Chartbusters challenge that’s right for you in the navigation pane.

Compete in the Melody-Go-Round challenge

You can compete in the AWS DeepComposer Chartbusters Melody-Go-Round challenge in just a few simple steps:

  1. In the AWS DeepComposer Music studio, record a track, import a track, or pick any of the available input tracks.

  1. Get creative and explore different combinations of available models. You can also explore advanced parameters under each model.

  1. Use the Edit melody feature to add or remove notes, or change the note duration and pitch. When finished, choose Apply changes. You can iterate by adjusting the advanced parameters and choosing Enhance again. Repeat these steps until you’re satisfied with the generated music.

You can also download the melody and import it into a digital audio workstation like GarageBand and further indulge your creativity.

  1. When your melody is complete, go to the submission form and choose an existing composition or import a post-processed audio track. Choose Melody-Go-Round for the competition type, register or sign in to SoundCloud, and choose Submit.

For more information on judging criteria, visit AWS DeepComposer Melody-Go-Round page.

Compete in the Melody Harvest challenge

  1. Explore our GitHub pages for Generative Adversarial Networks (GANs), Autoregressive Convolutional Neural Networks (AR-CNNs), and Transformers. Then train your own model and start composing your music.
  2. You can upload the generated MIDI file to a digital audio workstation like GarageBand and further improve it.
  3. When your melody is complete, go to the submission form, choose Melody Harvest for the competition type, import a postprocessed audio track, and add the link to your GitHub repository. Make sure your GitHub repository has your notebook and your model’s checkpoint files.

For more information on datasets and judging criteria visit AWS DeepComposer Melody Harvest page.

Conclusion

Congratulations! You have successfully submitted your composition to the AWS DeepComposer Chartbusters challenge. Now you can invite your friends and family to listen to your creation on SoundCloud, vote for their favorite, and join the fun by participating in the competition.

Although you don’t need a physical keyboard to compete, we’re offering the AWS DeepComposer keyboard at a special price of $69.00 (30% off) for a limited time on Amazon.com to improve your music generation experience. The pricing includes the keyboard and 3 months of the AWS DeepComposer free trial. To learn more about the different generative AI techniques supported by AWS DeepComposer, check out the learning capsules available on the AWS DeepComposer console.


About the Authors

Maryam Rezapoor is a Senior Product Manager with AWS AI Devices team. As a former biomedical researcher and entrepreneur, she finds her passion in working backward from customers’ needs to create new impactful solutions. Outside of work, she enjoys hiking, photography, and gardening.

 

 

 Chris Whittam is a Senior Product Manager on the AWS AI Devices team helping developers get hands on (literally) with machine learning.

Read More

AWS DeepRacer device software now open source

AWS DeepRacer is the fastest way to get started with machine learning (ML). You can train reinforcement learning (RL) models by using a 1/18th scale autonomous vehicle in a cloud-based virtual simulator and compete for prizes and glory in the global AWS DeepRacer League. Today, we’re expanding AWS DeepRacer’s ability to provide fun, hands-on learning by open-sourcing the AWS DeepRacer device software.

Why open source

The AWS DeepRacer virtual and in-person leagues have been a hit, but now developers want to go beyond league racing with their car. Because the AWS DeepRacer is an Ubuntu-based computer on wheels powered by the Robot Operating System (ROS) we are able to open source the code, making it straightforward for a developer with basic Linux coding skills to prototype new and interesting uses for their car. Now that the AWS DeepRacer device software is openly available, anyone with the car and an idea can make new uses for their device a reality.

We’ve compiled 6 sample projects from the AWS DeepRacer team and members of the global AWS DeepRacer community to help you get started exploring the possibilities that open source provides. As developers share new projects using #deepracerproject, we will highlight our favorites on the AWS DeepRacer robotics projects page. Whether you’re mounting a Nerf cannon on the car with the DeepBlaster project, creating visualizations of your home or office with the Mapping project, or coming up with new ways of racing your friends and colleagues with the DeepDriver project, you can do all that and more with the open source code and sample projects. Documentation is available in GitHub and open for collaboration with thousands of community members in the AWS DeepRacer Slack channel. The only limit to what you can do with AWS DeepRacer is your imagination (and, well, the laws of physics).

Let the experiments begin

With the open-sourcing of the AWS DeepRacer device code, you can quickly and easily change the default behavior of your currently track-obsessed race car. Want to block other cars from overtaking it by deploying countermeasures? Want to deploy your own custom algorithm to make the car go faster from point A to B? You just need to dream it and code it. We can’t wait to see the ideas that you come up with, from new racing formats to new uses for AWS DeepRacer.

Starting today, you can choose from six projects (Follow the leader, Mapping, and Off Road created by AWS, and RoboCat, DeepBlaster, and DeepDriver created by the open source community) or create your own. You can get started with the Follow the Leader sample project, which trains the car to detect and follow an object. It’s the quickest project to build and run, and in the next section we’ll demonstrate how easy it is to modify the default the behavior of your AWS DeepRacer car. To complete this setup, upgrade to the latest software version and access the car via SSH.

Download the Follow the Leader project

Connect to the car using SSH, switch to the root user, and create a working directory. Then clone the Follow the Leader GitHub repository:

sudo su
mkdir -p ~/deepracer_ws
cd ~/deepracer_ws
git clone https://github.com/aws-deepracer/aws-deepracer-follow-the-leader-sample-project.git

The process to fully clone the project repository to your car can take a few minutes (depending on the speed of your internet connection). The Follow the Leader project contains several installation scripts that help shortcut the process to get you up and running faster. You can also complete the next few steps manually if you’re more comfortable with running shell-based commands or want to learn more about the process using the links to the relevant documentation for each stage.

Download and convert the object detection model

First, we need to download and convert the object detection model. To do this, we run the script that came in the Follow the Leader repository:

sudo su
cd ~/deepracer_ws/aws-deepracer-follow-the-leader-sample-project/installers
/usr/bin/bash install_object_detection_model.sh

The installer script downloads and optimizes the model before copying the optimized artifacts to the model location. This process takes approximately 3–4 minutes to complete.

You can complete this stage manually using the detailed instructions to download and convert the object detection model.

Initialize rosdep if it’s not initialized previously

Rosdep helps to install the dependency packages. Initialize the rosdep if it’s not done before on the device:

sudo rosdep init
sudo rosdep update

Build the Follow the Leader packages

Next, we fetch the package dependencies needed for the project and build them:

sudo su
cd ~/deepracer_ws/aws-deepracer-follow-the-leader-sample-project/installers
/usr/bin/bash build_and_install_ftl_application.sh

When successful, you should see a screen similar to the following:

The script downloads and installs the required package dependencies and builds the packages. This process can take approximately 8–10 minutes to complete.

You can also complete this stage manually by following the steps 1–10 in “Download and Building” in the Follow the Leader README.md. The install script does the same steps (just saves you some typing).

Launch the Follow the Leader application

Now we run the Follow the Leader application:

sudo su
cd ~/deepracer_ws/aws-deepracer-follow-the-leader-sample-project/installers
/usr/bin/bash run_ftl_application.sh

Enable Follow the Leader mode

Finally, we need to open another SSH session to the car to enable Follow the Leader mode using the command line interface (CLI):

sudo su
cd ~/deepracer_ws/aws-deepracer-follow-the-leader-sample-project/installers
/usr/bin/bash enable_ftl_mode.sh

Now you, or a willing volunteer (or object), can move around and watch the car begin to follow! How cool is that?

Share your results

Congratulations! You completed your first sample project. Share your experience with friends and family on social media with the tag #deepracerproject so we can see what you’re up to. As the community invents new projects for AWS DeepRacer, we’ll be adding them to the AWS DeepRacer GitHub Organization as well as featuring them in future blog posts so that everyone can get inspired. Purchase an AWS DeepRacer car today to start experimenting with your first AWS DeepRacer robotics project today! We are offering a 25% discount on the AWS DeepRacer ($100 off) and AWS DeepRacer Evo ($150 off) till May 27th, 2021.


About the Author

David Smith is a Sr. Solutions Architect for AWS DeepRacer. He is passionate about AWS DeepRacer, technology as an enabler and learning. Outside of work he’s into Formula 1, flying (and crashing) drones, 3d printing, running (Parkrun), tinkering with code and spending time with the family.

Read More

Monitor and Manage Anomaly Detection Models on a fleet of Wind Turbines with Amazon SageMaker Edge Manager

In industrial IoT, running machine learning (ML) models on edge devices is necessary for many use cases, such as predictive maintenance, quality improvement, real-time monitoring, process optimization, and security. The energy industry, for instance, invests heavily in ML to automate power delivery, monitor consumption, optimize efficiency, and extend the lifetime of their equipment.

Wind energy is one of the most popular renewable energy sources. According to the Global Wind Energy Council, 22,893 wind turbines were installed globally in 2019, produced from 33 suppliers and accounting for over 63 GW of wind power capacity. With such scale, energy companies need an efficient platform to manage and maintain their wind turbine fleets, and the ML models running on the devices. A commercial wind turbine costs around $3–4 million. If a turbine is out of service, it costs $800–1,600 per day and results in a total loss of 7.5 megawatts, which is enough energy to power approximately 2,500 homes.

A wind turbine is a complex piece of engineering and consists of many sensors that can be used by a monitoring mechanism to capture data such as vibration, temperature, wind speed, and air humidity. You could train an ML model with this data, deploy it to an edge device connected to the turbine’s sensors, and predict anomalies in real time at the edge. It would reduce the operational cost of your fleet of turbines. But imagine the effort to maintain this solution on a fleet of thousands or millions of devices. How do you operate, secure, deploy, run, and monitor ML models on a fleet of devices at the edge?

Amazon SageMaker Edge Manager can help you to answer this question. The service allows you to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, industrial equipment, mobile devices, and more. With Edge Manager, you can manage the lifecycle of each ML model on each device in your device fleets for up to thousands or millions of devices. The service provides a software agent that runs on edge devices and a management interface on the AWS Management Console.

In this post, we show how to use Edge Manager to create a robust end-to-end solution that manages the lifecycle of ML models deployed to a wind turbine fleet. But instead of using real wind turbines, you learn how to build your own fleet of mini 3D printed wind turbines. This is a DIY open-source, open-hardware project created to demonstrate how to build an ML at the edge solution with Amazon SageMaker. You can use to it as a platform to learn, experiment, and get inspired.

The next sections cover the following topics:

  • The specifications of the wind turbine farm
  • How to configure each Jetson Nano
  • How to build an anomaly detection model using SageMaker
  • How to run your own mini wind turbine farm

The wind turbine farm

The wind turbine farm created for this project has five mini 3D printed wind turbines connected to five distinct Jetson Nanos via USB. The Jetson Nanos are connected to the internet through Ethernet cables plugged to a cable modem. A fan, positioned in front of the farm, produces the wind to simulate an outdoor condition. The following image shows how the wind farm is organized.

The mini wind turbine

The mini wind turbine of this project is a mechanical device integrated with a microcontroller (Arduino) and some sensors. It was modeled using FreeCAD, an open-source tool for designing industrial parts. These parts were then 3D printed using PETG (plastic filament type) and assembled with the electronics components. Its base is static, which means that the turbine doesn’t align with the wind direction by itself. This restriction was important to simplify the project.

Each turbine has one voltage generator (small motor) and seven different sensors:

  • Vibration (MPU6050: 6 axis accelerometer/gyroscope)
  • Infrared rotation encoder (rotations per second)
  • Gearbox temperature (MPU6050)
  • Ambient temperature (BME680)
  • Atmospheric pressure (BME680)
  • Air humidity (BME680)
  • Air quality (BME680)

An Arduini Mini Pro is responsible for interfacing with these sensors and collecting data from them. This data is streamed through the serial pins (TX, RX). An FTDI device that converts this serial signal to USB is the bridge between the Arduino and the Jetson Nano. A Python application that runs on Jetson Nano receives the raw data from the sensors through this bridge.

A micro servo was modified and transformed into a voltage generator. Its internal gearbox increases the generator (motor) speed by five times to produce a (low) voltage between 0–3.3v. This generator is also connected to the Arduino through an analog input pin. This information is also sent with the sensor’s readings.

The frequency at which the data is collected depends on the sensor. All the signals from BME650 are collected each 150 milliseconds, the rotation encoder each 1 second, and the voltage generator and the vibration sensor each 50 milliseconds.

If you want to know more about these technical details and learn how to build your own mini wind turbine, see the GitHub repository.

The edge device

Each Jetson Nano has a built-in GPU with 128-core NVIDIA Maxwell™ and a Quad-core ARM® A57 CPU running at 1.43 GHz. This hardware is enough to run a Python application that collects and formats the data from the sensors of the turbine and then calls the Edge Manager agent API to get the predictions. This application compares the prediction with a threshold to check for anomalies in the data. The model is invoked in real time.

When SageMaker Neo compiles the ML model for Jetson Nano, a runtime (DLR) optimized for this target device is included in the deployment package. This runtime detects automatically that it’s running on a Jetson Nano and loads the model directly into the device’s GPU for maximum performance.

The Edge Manager agent is also distributed as a Linux (arm64) application that can be run as a background process (daemon) on your Jetson Nano. It uses the runtime SageMaker Neo includes in the compilation package to interface with the optimized model and expose it as a well-defined API. This API is integrated with the local application through a low latency protocol (grpc + unix socket).

The cloud services

Now that you know some details about the physical hardware used to develop the wind turbine farm, it’s time to see which AWS services support the solution on the cloud side. A minimal, standalone setup to get a model deployed and running on the Edge Manager agent requires only SageMaker and nothing more. However, other services were used in this project with two important features: a mechanism for over-the-air (OTA) deployment and a dashboard for monitoring the anomalies in near-real time.

In summary, the components required for this project are:

  • A device fleet (Edge Manager), which organizes and controls one or more registered devices through the agent (running on each device)
  • One IoT thing per device and IoT thing group, which is used by the OTA mechanism to communicate with the devices via MQTT
  • AWS IoT rules, and an AWS Lambda function to get and filter application logs and ingest them into Amazon Elasticsearch Service (Amazon ES)
  • A Lambda function to parse the model metrics captured by agent in ingest them into Amazon ES
  • An Elasticsearch server with Kibana, which has dashboards for monitoring the anomalies (optional)
  • SageMaker to build, compile, and package the ML model

The following diagram illustrates this architecture.

Putting everything together

Now that we have all the components of our wind turbine farm, it’s time to understand the steps we need to take to integrate all these moving parts, deploy a model to our edge devices, and keep an application running and predicting anomalies in real time.

The following diagram shows all the steps involved in the process.

The solution consists of the following steps:

  1. The data scientist explores the dataset and designs an anomaly detection model (autoencoder) with PyTorch, using SageMaker Studio.
  2. The model is trained with a SageMaker training job.
  3. With Neo, the model is optimized (compiled) to Jetson Nano.
  4. Edge Manager creates a deployment package with the compiled model.
  5. The data scientist creates an IoT job that sends a notification of the new model available to the edge devices.
  6. The application running on Jetson Nano performs the following:
    1. Receives this notification and downloads the model package from the Amazon Simple Storage Service (Amazon S3) bucket.
    2. Unpacks the model and loads it using the Edge Manager agent API (LoadModel).
    3. Reads the sensors from the wind turbine, prepares the data, invokes the ML model, and captures some model metrics using the Edge Manager agent API.
    4. Compares the prediction with a baseline to detect potential anomalies.
    5. Sends the raw sensor data to an AWS IoT topic.
  7. Through a rule, AWS IoT reads the app logs topic and exports the data to Amazon ES.
  8. A Lambda function captures the model metrics (mean average error) exported by the agent and ingests the data into Amazon ES.
  9. The operator uses a Kibana dashboard to check for any anomalies.

Configure your edge device

The Edge Manager agent uses certificates provided by AWS IoT Core to authenticate and call other AWS services. That way you need to create an IoT thing first and then an edge device fleet. But first, you need to prepare some basic resources to support your solution.

Create prerequisite resources

Before getting started, you need to configure AWS Command Line Interface in your workstation first (if necessary) and then to create the following resources:

  • An S3 bucket to store the captured data
  • An AWS Identity and Access Management (IAM) role for your devices
  • An IoT thing to map to your Edge Manager device
  • An IoT policy to control the permissions of the temporary credentials of the edge device
  1. Create a new bucket for the solution.

Each time you call CaptureData in the agent API, it uploads the tensors (input and predictions) into this bucket.

Next, you create your IAM role.

  1. On the IAM console, create a role named WindTurbineFarm so the devices can access resources in your account.
  2. Add permissions to this role to upload files to the S3 bucket you created.
  3. Add the following trusted entities to the role:
    1. amazonaws.com
    2. iot.amazonaws.com
    3. amazonaws.com

Use the following code (provide the name for the S3 bucket, your AWS account, and Region):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<<S3_BUCKET_NAME>>",
                "arn:aws:s3:::<<S3_BUCKET_NAME>>/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "iot:CreateRoleAlias",
                "iot:DescribeRoleAlias",
                "iot:UpdateRoleAlias",
                "iot:TagResource",
                "iot:ListTagsForResource"
            ],
            "Resource": [
                "arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:rolealias/SageMakerEdge*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/*SageMaker*",
                "arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/*Sagemaker*",
                "arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/*sagemaker*",
                "arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/WindTurbineFarm"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "sagemaker:GetDeviceRegistration",
                "sagemaker:SendHeartbeat",
		  "iot:DescribeEndpoint",
		  "s3:ListAllMyBuckets”
            ],
            "Resource": "*",
            "Effect": "Allow"
        }, 
        {
            "Action": [
                "sagemaker:DescribeDevice"
            ],
            "Resource": [
                "arn:aws:sagemaker:<<REGION>>:<<AWS_ACCOUNT_ID>>:device-fleet/windturbinefarm*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "iot:Publish",
                "iot:Receive"
            ],
            "Resource": [
                "arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topic/wind-turbine/*"
            ],
            "Effect": "Allow"
        }
    ]
}

You’re now ready to create your IoT thing, which you later map to your Edge Manager device.

  1. On the AWS IoT Core console, under Manage, choose Things
  2. Choose Create.
  3. Name your device (for this post, edge-device-0).
  4. Create a new group or choose an existing group (for this post, WindTurbineFarm).
  5. Create a certificate.
  6. Download the certificates, including the root CA.
  7. Activate the certificate.

You now create your policy, which controls the permissions of the temporary credentials of the edge device.

  1. On the AWS IoT Core console, under Secure, choose Policies.
  2. Choose Create.
  3. Name the policy (for this post, WindTurbine).
  4. Choose Advanced Mode.
  5. Enter the following policy, providing your AWS account and Region:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iot:Connect"
      ],
      "Resource": "arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:client/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:Publish",
        "iot:Receive"
      ],
      "Resource": [
	"arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topic/wind-turbine/*",
	"arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topic/$aws/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:Subscribe"
      ],
      "Resource": [
	"arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topicfilter/wind-turbine/*",
	"arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topicfilter/$aws/*",
          "arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topic/$aws/*"
    ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "iot:UpdateThingShadow"
      ],
      "Resource": [	
	"arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:topicfilter/wind-turbine/*",
          "arn:aws:iot:<<REGION>>:<<AWS_ACCOUNT_ID>>:thing/edge-device-*"

      ]
    },
    {
      "Effect": "Allow",
      "Action": "iot:AssumeRoleWithCertificate",
      "Resource": "arn:aws:iot: <<REGION>>:<<AWS_ACCOUNT_ID>>:rolealias/SageMakerEdge-WindTurbineFarm"
    }
  ]
}
  1. Choose Create.

Lastly, you attach the policy to the certificate.

  1. On the AWS IoT Core console, under Secure, choose Certificates.
  2. Select the certificate you created.
  3. On the Actions menu, choose Attach policy.
  4. Select the policy WindTurbine.
  5. Choose Attach.

Now your IoT thing is ready to be linked to an edge device. Repeat these steps (except for creating the policy) for each additional device in your device fleet. For a production environment with hundreds or thousands of devices, you just apply a different approach, using automated scripts and parameter files to provision all the IoT things.

Create the edge fleet

To create your edge fleet, complete the following steps:

  1. On the SageMaker console, under Edge Inference, choose Edge device fleets.
  2. Choose Create device fleet.

  1. Enter a name for the device (for this post, WindTurbineFarm).
  2. Enter the ARN of the IAM role you used in the previous steps (arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/WindTurbineFarm).
  3. Enter the output S3 bucket URI (s3://<<NAME_OF_YOUR_BUCKET>>/wind_turbine_data/).
  4. Choose Submit.

Now you need to add a new device to the fleet.

  1. On the SageMaker console, under Edge Inference, choose Edge devices.
  2. Choose Register devices.

  1. For Device Properties, enter the name of the device fleet you created (WindTurbineFarm).
  2. Choose Next.
  3. For Device name, enter any unique name for your device (for this post, we use the same name as our IoT thing, edge-device-wind-turbine-00000000000).
  4. For IoT name, enter the name of the thing you created earlier (edge-device-0).
  5. Choose Submit.

Repeat the registering process for all your other devices. Now you can SSH to your Jetson Nano and complete the configuration of your device.

Prepare the edge device

Before you start configuring your Jetson Nano, you need to install JetPack 4.4.1 in your Nano. This is the version you use to build, run, and test this demo.

The model preparation process for your target device is very sensitive in relation to the versions of the libraries installed in your device. For instance, because the target device is a Jetson Nano, Neo optimizes the model and runtime to a given version of the TensorRT and CUDA. The runtime (libdlr.so) is physically linked to the versions you specify in the compilation job. This means that if you compile your model using Neo for JetPack 4.4.1, it doesn’t work with JetPack 3.x. and vice versa.

  1. With JetPack 4.4.1 running on your Jetson Nano, you can start configuring your device with the following commands:
echo "export TVM_TENSORRT_MAX_WORKSPACE_SIZE=2147483647" >> ~/.bashrc
echo "export SM_EDGE_AGENT_HOME=/home/${USER}/agent" >> ~/.bashrc

# Also export the variables for the current session
export TVM_TENSORRT_MAX_WORKSPACE_SIZE=2147483647
export SM_EDGE_AGENT_HOME=/home/${USER}/agent


sudo apt install -y protobuf-compiler python3-serial 
sudo apt install -y python3-pip python3-joblib python3-boto3 libssl-dev
sudo apt install -y curl
sudo pip3 install grpcio-tools grpcio PyWavelets paho-mqtt
  1. Download the Linux ARMv8 version of the Edge Manager agent.
  2. Copy the package to your Jetson Nano (scp). Create a folder for the agent and unpack the package in your home directory:
mkdir -p ~/agent/certificates/iot
mkdir -p ~/agent/certificates/root
tar -xzvf <<agent_package>>.tgz -C ~/agent
  1. Copy the AWS IoT Core certificates you provisioned for your thing in the previous section to the directory ~/agent/certificates/iot in your Jetson Nano.

You should see the following files in this directory:

  • pem – CA root
  • <<CERT_PREFIX>>-public.pem.key – Public key
  • <<CERT_PREFIX>>-private.pem.key – Private key
  • <<CERT_PREFIX>>-certificate.pem.crt – Certificate
  1. Get the root certificate used to sign the deployment package created by Edge Manager. The agent uses this to validate the model.
aws s3 cp s3://sagemaker-edge-release-store-us-west-2-linux-armv8/Certificates/<<AWS_REGION>>/<<AWS_REGION>>.pem .
  1. Copy this certificate to the directory ~/agent/certificates/root in your Jetson Nano.

Next, you create the Edge Manager agent configuration file.

  1. Open an empty file named ~/agent/sagemaker_edge_config.json and enter the following code:
{
    "sagemaker_edge_core_device_uuid": "<<SAGEMAKER_EDGE_DEVICE_NAME>>",
    "sagemaker_edge_core_device_fleet_name": "WindTurbineFarm",
    "sagemaker_edge_core_capture_data_buffer_size": 30,
    "sagemaker_edge_core_capture_data_batch_size": 10,
    "sagemaker_edge_core_capture_data_push_period_seconds": 4,
    "sagemaker_edge_core_folder_prefix": "wind_turbine_data",
    "sagemaker_edge_core_region": "<<AWS_REGION>>",
    "sagemaker_edge_core_root_certs_path": "/home/<<LINUX_USER>>/agent/certificates/root",
    "sagemaker_edge_provider_aws_ca_cert_file": "/home/<<LINUX_USER>>/agent/certificates/iot/AmazonRootCA1.pem",
    "sagemaker_edge_provider_aws_cert_file": "/home/<<LINUX_USER>>/agent/certificates/iot/<<CERT_PREFIX>>-certificate.pem.crt",
    "sagemaker_edge_provider_aws_cert_pk_file": "/home/<<LINUX_USER>>/agent/certificates/iot/<<CERT_PREFIX>>-private.pem.key",
    "sagemaker_edge_provider_aws_iot_cred_endpoint": "https://<<CREDENTIALS_ENDPOINT_HOST>>/role-aliases/SageMakerEdge-WindTurbineFarm/credentials",
    "sagemaker_edge_provider_provider": "Aws",
    "sagemaker_edge_provider_s3_bucket_name": "<<S3_BUCKET>>",
    "sagemaker_edge_core_capture_data_destination": "Cloud"
}

Provide the information for the following resources:

  • SAGEMAKER_EDGE_DEVICE_NAME – The unique name of your device you defined previously.
  • AWS_REGION – The Region where you created your edge device.
  • LINUX_USER – The Linux user name you’re using in Jetson Nano.
  • CERT_PREFIX – The prefix of the certificate files you created when you provisioned your IoT thing in the previous section.
  • CREDENTIALS_ENDPOINT_HOST – Your endpoint host. You can get this endpoint through the AWS Command Line Interface (AWS CLI). (Install the AWS CLI if you don’t have it already). Use credentials of the same account and the same Region you used in the previous sections (this isn’t the IoT thing shadow URL). Then run the following command to retrieve the endpoint host:
aws iot describe-endpoint --endpoint-type iot:CredentialProvider
  • S3_BUCKET – The name of the S3 bucket you used to configure your edge device fleet in the previous section.
  1. Save the file with all these modifications.

Now you’re ready to run the Edge Manager agent in your Jetson Nano.

  1. To test the agent, run the following commands:
cd ~/agent
rm -f /tmp/edge_agent
./bin/sagemaker_edge_agent_binary -c sagemaker_edge_config.json -a /tmp/edge_agent &

The following screenshot shows your output.

The agent is now running. After a few minutes, you can see the heartbeat of the device, reported on the console. To see it on the SageMaker console, under Edge Inference, choose Edge Devices and choose your device.

Configure the application

Now it’s time to set up the application that runs on the edge device. This application is responsible for the following:

  • Get the temporary credentials using the certificate
  • Listen to the OTA update topics to see whether a new model package is ready to deploy
  • Deploy the available model package to the edge device
  • Load the model to the agent if necessary
  • Perform an infinite loop:
    • Read the sensor data
    • Format the input data
    • Invoke the ML model and capture some metrics of the prediction
    • Compare the predictions MAE (mean average error) to the baseline
    • Publish raw data to an IoT topic (MQTT)

To install the application, first get the custom AWS IoT endpoint. On the AWS IoT Core console, choose Settings. Copy the endpoint and use it in the following code:

cd ~/
git clone https://github.com/aws-samples/amazon-sagemaker-edge-manager-demo wind_turbine
cd wind_turbine/04_EdgeApplication
## by the AWS IoT Endpoint host you just copied and save the file
chmod +x run.py
./run.py &

The application outputs something like the following screenshot.

Optional: run this application with the parameter –test-mode if you just want to run a test with no wind turbine connected to the edge device.

If everything went fine, the application keeps waiting for a new model. It’s time to train a new model and deploy it to the Jetson Nano.

Train and deploy the ML model

This post demonstrates how to detect anomalies in the components of a wind turbine. There are many ways of doing this with the data collected by its sensors. To keep this example as simple as possible, you prepare a model that analyzes vibration, wind speed, rotation (per second), and the produced voltage to determine whether an anomaly exists or not. For that purpose, we train an autoencoder using PyTorch on SageMaker and prepare it for deployment on your Jetson Nano.

This model architecture has two advantages: it’s unsupervised, so we don’t need to label our data, and you can collect data from wind turbines that are working perfectly. Therefore, your model is trained to detect what you consider normal behavior of your wind turbines. When a defect appears in any part of the turbine, a drift occurs on the sensors data, which the model interprets as abnormal behavior (an anomaly).

The following screenshot is a sample of the raw data captured by the turbine sensors.

The data has the following features:

  • nanoId – ID of the edge device that collected the data
  • turbineId – ID of the turbine that produced this data
  • arduino_timestamp – Timestamp of the Arduino that was operating this turbine
  • nanoFreemem: Amount of free memory in bytes
  • eventTime – Timestamp of the row
  • rps – Rotation of the rotor in rotations per second
  • voltage – Voltage produced by the generator in milivolts
  • qw, qx, qy, qz – Quaternion angular acceleration
  • gx, gy, gz – Gravity acceleration
  • ax, ay, az – Linear acceleration
  • gearboxtemp – Internal temperature
  • ambtemp – External temperature
  • humidity – Air humidity
  • pressure – Air pressure
  • gas – Air quality
  • wind_speed_rps – Wind speed in rotations per second

The selected features based on our goals are: qx,qx,qy,qz (angular acceleration), wind_speed_rps, rps, and voltage. The following image is a sample of the feature qx. The data produced by the accelerometer is too noisy so we need to clean it first.

The angular velocity (quaternion) is first converted to Euler Angles (roll, pitch, yaw). Then we denoise all the features with Wavelets (PyWavelets), and normalize them. The following screenshot shows the signals after these transformations.

Finally, we apply a sliding window to this resulting dataset (six features) to capture the temporal relationship between neighbor readings and create the input tensor of our ML model. The average interval between two sequential samples is approximately 50 milliseconds. Each time window (of our sliding window) is then converted into a tensor, using the following structure:

  • Tensor – 6 features x 10 steps (100 samples) = 6×100
    • Step – Group of time steps
    • Time step – Group of intervals (time_step=20 = ~5 seconds)
    • Interval – Group of samples (interval=5 = ~250 milliseconds)
  • Reshaped tensor – 6x10x10

Interval, time step and step are hyperparameters that you can adjust during training. The final result is a stream of data, encoded as a multidimensional tensor (representing a few seconds in the past). The trained autoencoder tries to recreate the input tensor as the output (prediction). By measuring the MAE between the input and output and comparing it with a pre-defined threshold, you can identify potential anomalies.

One important aspect of this approach is that it extracts the linear and non-linear correlations between the features, to better understand the impacts of one feature into another, such as wind speed on the rotation or produced voltage.

Now it’s time to run this experiment.

  1. First, you need to set up your Studio environment if you don’t have one yet.
  2. Clone the GitHub repo https://github.com/aws-samples/amazon-sagemaker-edge-manager-demo inside a Studio terminal.

The repository contains a folder named 03_Notebooks with two Jupyter notebooks.

  1. Follow the instructions in the first notebook to prepare the dataset – Because the accelerator data is a signal, it contains noise, so you run a denoise mechanism to clean the data.

The final dataset has only six features: roll, pitch, yaw (converted from a Quaternion to Euler angles), wind_speed_rps, rps (rotations per second), voltage (produced by the generator).

  1. Follow the instructions in the second notebook to train, package, and deploy the model:
    1. Use SageMaker to train your PyTorch autoencoder (CNN based).
    2. Run a batch prediction to compute MAE and threshold used by the app to determine whether the prediction is an anomaly or not.
    3. Compile the model to Jetson Nano using Neo.
    4. Create a deployment package with Edge Manager.
    5. Create an IoT job that publishes a JSON document to a topic listened to by the application that is running on your Jetson Nano.

The application gets the package, unpacks it, loads the model in the Edge Manager agent, and unblocks the application run.

Both notebooks are very detailed, so follow the steps carefully, after which you’ll have an anomaly detection model to deploy in your Jetson Nano.

Compilation job and model optimization

One of the most important steps of the whole process is the model optimization step in the second notebook. When you compile a model with SageMaker Neo, it not only optimizes the model to improve the prediction performance in the target device, it also converts the original model into an intermediate representation. After this conversion, you don’t need to use the original framework anymore (PyTorch, TensorFlow, MXNet). This representation is then interpreted by a light runtime (DLR), which is packaged with the model by Neo. Both the runtime and optimized model are libraries, compiled as native programs for a specific operational system and architecture. In the case of Jetson Nano, the OS is a Linux distro and the architecture: ARM8 64bits. The runtime in this case uses TensorRT for maximum performance on the Jetson’s GPU.

When you launch a compilation job on Neo, you need to specify some parameters related to the setup of your target device, for instance:

  • trt-ver – 7.1.3
  • cuda-ver – 10.2
  • gpu-code – sm_53

The Jetson Nano’s GPU is a NVIDIA Maxwell, architecture version 53, so the parameter gpu-code is the same for all compilation jobs. However, trt-ver and cuda-ver depend of the version of the TensorRT and CUDA installed on your Nano. When you were preparing your edge device, you set up your Jetson Nano with JetPack 4.4.1. This makes sure that the model you optimize using Neo is compatible with your Jetson Nano.

Visualize the results

The dashboard setup is out of scope for this post. For more information, see Analyze device-generated data with AWS IoT and Amazon Elasticsearch Service.

Now that you have your model deployed and running on your Jetson Nano, it’s time to look at the behavior of your wind turbines through a dashboard. The application you deployed to the Jetson Nano collects some logs and sends them to two different places:

  • The IoT MQTT topic wind-turbine/logs/<<iot_thing_name>> contains the app logs and raw data collected from the wind turbine sensors
  • The S3 bucket s3://<<S3_BUCKET>>/wind_turbine_data contains the metrics of the ML model

You can get this data and ingest it into Amazon ES or another database. Then you can use your preferred reporting to prepare dashboards.

The following visualization shows three different but correlated things for each one of the five turbines: the rotation speed (in RPS), the produced voltage, and the detected anomalies for voltage, rotation, and vibration.

Some noise was injected in the raw data from the turbines to simulate failures.

The following visualization shows an aggregation of the turbines’ speed and produced voltage anomalies over time.

Conclusion

Securely and reliably maintaining the lifecycle of an ML model deployed across a fleet of devices isn’t an easy task. However, with Edge Manager, you can reduce the implementation effort and operational cost of such a solution. Also, with a demo like the mini wind turbine farm, you can experiment, optimize, and automate your ML pipeline with the services and expertise provided by AWS.

To build a solution for your own needs, get the code and artifacts used in this project from the GitHub repo. If you want more practice using Edge Manager, check out the end-to-end workshop for Edge Manager on Studio.


About the Author

Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions which solve their business challenges using AWS. He has been working on several AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. He likes playing with hardware and automation projects in his free time, and he has a particular interest for robotics.

Read More

Build a medical sentence matching application using BERT and Amazon SageMaker

Determining the relevance of a sentence when compared to a specific document is essential for many different types of applications across various industries. In this post, we focus on a use case within the healthcare field to help determine the accuracy of information regarding patient health.

Frequently, during each patient visit, a new document is created with the information from the visit. This information often consists of a medical transcription that has been dictated by either the nurse or the physician. Such a document may contain a brief description statement (also known as a restatement) that explains the main details from that specific patient visit. In future visits, doctors may rely on previous visits’ restatements to quickly get an overview of the patient’s overall status. Such restatements may also be used during patient handoffs. However, this introduces the potential for errors to be made during patient handoffs to new medical teams if the restatements are difficult to understand or if they contain inadequate information (Staggers et. al. 2011). Therefore, having an accurate description of the patient’s status is important, because the cost of errors in such restatements can be high and may negatively affect the patient’s overall care (Garcia et. al. 2017).

This post walks you through how to deploy a machine learning (ML) model that aims to determine the top sentences from the document that best match the corresponding document restatement; this can be a first step to ensure the accuracy of the patient’s health records overall by determining the relevance of the restatement. We emphasize that this model determines the top ranking sentences that match the restatement; it does not generate the restatement itself.

When creating this solution, we were faced with a dual-sided challenge. Beyond the technical challenge of actually creating an AI/ML model, several surrounding components complicate actually using such models in the real world. Indeed, the actual ML code may be a very small part of the system as a whole (Sculley et al. 2015). This is especially so in complex architectures frequently deployed in the context of the healthcare and life science space.

We focused on one particular challenge: creating the ability to serve the model so that others (applications, services, or people) can use it. By serving a model, we mean to grant others the ability to pass new data to the model so they can get the predictions they need. This post provides a broad overview of the problem, the solution, and a few points to keep in mind if you plan to use a similar approach in your own use cases. A full technical write up, including a readme and a step-by-step deployment of the architecture, is available in the GitHub code repository. For more information about approaches to serving models, see Build, Train, and Deploy a Machine Learning Model With Amazon SageMaker and AWS Deep Learning Containers on Amazon ECS.

Background and use case

In the medical field (as well as other industries), documents are frequently associated with a shorter restatement text of the original document. We use the term restatement, but in fact this shorter text can be a summary, highlight, description, or other metadata about the document. For example, an after-visit clinical summary given to a patient summarizes the content of the patient visit to a physician.

For illustration purposes, the following is an example that’s unrelated to the medical industry.

Document:

On Monday morning, Joshua ate a large breakfast of bacon and eggs. He then went for a brisk walk. Finally, he returned home and sat at his desk.

Restatement:

Joshua went for a walk.

In this example, the restatement is just a rewording of the highlighted sentence in the full document. This example shows that, although the use case that we focus on in this post is specific to the medical field, you can use and modify this approach for many other text analysis applications.

Let’s now take a closer look at the use case for this post. We used data taken from MTSamples (which we downloaded from Kaggle). This data contains many different samples of transcribed medical texts. It includes documents with raw transcriptions of sample notes, as well as shorter descriptions of those notes (which we treat as restatements).

The following is an example from the MTSamples dataset.

Document:

HISTORY OF PRESENT ILLNESS: , I have seen ABC today. He is a very pleasant gentleman who is 42 years old, 344 pounds. He is 5’9″. He has a BMI of 51. He has been overweight for ten years since the age of 33, at his highest he was 358 pounds, at his lowest 260. He is pursuing surgical attempts of weight loss to feel good, get healthy, and begin to exercise again. He wants to be able to exercise and play volleyball. Physically, he is sluggish. He gets tired quickly. He does not go out often. When he loses weight he always regains it and he gains back more than he lost. His biggest weight loss is 25 pounds and it was three months before he gained it back. He did six months of not drinking alcohol and not taking in many calories. He has been on multiple commercial weight loss programs including Slim Fast for one month one year ago and Atkin’s Diet for one month two years ago.,PAST MEDICAL HISTORY: , He has difficulty climbing stairs, difficulty with airline seats, tying shoes, used to public seating, difficulty walking, high cholesterol, and high blood pressure. He has asthma and difficulty walking two blocks or going eight to ten steps. He has sleep apnea and snoring. He is a diabetic, on medication. He has joint pain, knee pain, back pain, foot and ankle pain, leg and foot swelling. He has hemorrhoids.,PAST SURGICAL HISTORY: , Includes orthopedic or knee surgery.,SOCIAL HISTORY: , He is currently single. He drinks alcohol ten to twelve drinks a week, but does not drink five days a week and then will binge drink. He smokes one and a half pack a day for 15 years, but he has recently stopped smoking for the past two weeks.,FAMILY HISTORY: , Obesity, heart disease, and diabetes. Family history is negative for hypertension and stroke.,CURRENT MEDICATIONS:, Include Diovan, Crestor, and Tricor.,MISCELLANEOUS/EATING HISTORY: ,He says a couple of friends of his have had heart attacks and have had died. He used to drink everyday, but stopped two years ago. He now only drinks on weekends. He is on his second week of Chantix, which is a medication to come off smoking completely. Eating, he eats bad food. He is single. He eats things like bacon, eggs, and cheese, cheeseburgers, fast food, eats four times a day, seven in the morning, at noon, 9 p.m., and 2 a.m. He currently weighs 344 pounds and 5’9″. His ideal body weight is 160 pounds. He is 184 pounds overweight. If he lost 70% of his excess body weight that would be 129 pounds and that would get him down to 215.,REVIEW OF SYSTEMS: , Negative for head, neck, heart, lungs, GI, GU, orthopedic, or skin. He also is positive for gout. He denies chest pain, heart attack, coronary artery disease, congestive heart failure, arrhythmia, atrial fibrillation, pacemaker, pulmonary embolism, or CVA. He denies venous insufficiency or thrombophlebitis. Denies shortness of breath, COPD, or emphysema. Denies thyroid problems, hip pain, osteoarthritis, rheumatoid arthritis, GERD, hiatal hernia, peptic ulcer disease, gallstones, infected gallbladder, pancreatitis, fatty liver, hepatitis, rectal bleeding, polyps, incontinence of stool, urinary stress incontinence, or cancer. He denies cellulitis, pseudotumor cerebri, meningitis, or encephalitis.,PHYSICAL EXAMINATION: ,He is alert and oriented x 3. Cranial nerves II-XII are intact. Neck is soft and supple. Lungs: He has positive wheezing bilaterally. Heart is regular rhythm and rate. His abdomen is soft. Extremities: He has 1+ pitting edema.,IMPRESSION/PLAN:, I have explained to him the risks and potential complications of laparoscopic gastric bypass in detail and these include bleeding, infection, deep venous thrombosis, pulmonary embolism, leakage from the gastrojejuno-anastomosis, jejunojejuno-anastomosis, and possible bowel obstruction among other potential complications. He understands. He wants to proceed with workup and evaluation for laparoscopic Roux-en-Y gastric bypass. He will need to get a letter of approval from Dr. XYZ. He will need to see a nutritionist and mental health worker. He will need an upper endoscopy by either Dr. XYZ. He will need to go to Dr. XYZ as he previously had a sleep study. We will need another sleep study. He will need H. pylori testing, thyroid function tests, LFTs, glycosylated hemoglobin, and fasting blood sugar. After this is performed, we will submit him for insurance approval.

Restatement:

Consult for laparoscopic gastric bypass.

Although the raw transcript document is quite long, only a few of the sentences actually appear to be related to the restatement “Consult for laparoscopic gastric bypass.” We highlighted two sentences within the document that you might intuitively think best match the restatement. The approach we deployed quantifies the similarities and reports the sentences in the document that best match the restatement. We did this by using a pretrained BERT language model trained specifically on clinical texts (published by Alsentzer et. al. 2019). The model itself is hosted by HuggingFace, a platform for sharing open-source natural language processing (NLP) projects. We used this model to calculate sentence-by-sentence similarities using the sentence-transform Python library.

It is important to note that in this example and in this solution, we are performing the sentence ranking without explicitly extracting and detecting the medical entities. However, many applications rely on explicitly extracting and analyzing diagnoses, medications, and other health information. For detecting medical entities such as medical conditions, medications, and other medical information in medical text, consider using Amazon Comprehend Medical, a HIPAA-eligible service built to extract medical information from unstructured medical text.

More information about this approach is available in our technical write-up.

Architecture diagram

In this section, we go over the architecture diagram for this solution at a very high level. For more details and to see the step-by-step framework, see our technical write-up.

In the model development and testing phase, we use Amazon SageMaker Studio. Studio is a powerful integrated development environment (IDE) for building, training, testing, and deploying ML models. Because we use a prebuilt model for this solution, we don’t need to use Studio’s full ability to train algorithms at scale. Instead, we use it for development and deployment purposes.

We created a Jupyter notebook that you can import into Studio. This notebook walks you through the entire development and deployment process. We start by writing the code for our model to a file. The model is then built using an NGINX/Flask framework, so that new data can be passed to it at inference time. Prior to deploying the model, we package it as a Docker container, build it using AWS CodeBuild, and push it to Amazon Elastic Container Registry (Amazon ECR). Then we deploy the model using Amazon Elastic Container Service (Amazon ECS).

The final result is a model that you can query using a simple API call. This is an important point: the ability to query models via an API capability is an essential component of designing scalable, easy-to-use interfaces. For more information, see Implementing Microservices on AWS.

After we deploy our model, we create a graphical user interface (using Streamlit) so that our model can be easily accessed through a webpage. Streamlit is an open-source library used to create front ends for ML applications. After we create our webpage, we deploy it in a similar way to how we deployed our model: we package it as a separate Docker container, build it using CodeBuild, push it to Amazon ECR, and deploy it using Amazon ECS.

By creating and deploying this webpage, we provide users with no programming experience the ability to use our model to test their own documents and restatements. The following screenshot shows what the webpage looks like.

After the user inputs their restatement and corresponding document, the top five results (the five sentences that best match the statement) are returned. If you deploy the entire solution using our original MTSamples example, the final result looks like the following screenshot.

The solution reports the following results:

  • The top five sentences within the document that best match the restatement.
  • The similarity distance between each sentence and the restatement. A lower distance means closer similarities between that sentence and the restatement sentence.

In this example, the best matching sentence is “He wants to proceed with workup and evaluation for laparoscopic Roux-en-Y gastric bypass” with a distance of .0672. Therefore, this approach has correctly identified a sentence within the document that matches the restatement.

Limitations

Like any algorithm, this approach has some limitations. For instance, this approach is not designed to handle cases where the restatement of the document is actually high-level metadata about the document not directly related to the text of the document itself. You can solve such use cases by using Amazon Comprehend custom models. For more information, see Comprehend Custom and Building a custom classifier using Amazon Comprehend.

Another limitation in our approach is that it doesn’t explicitly handle negation (words such as “not,” “no,” and “denies”), which may change the meaning of the text. AWS services such as Amazon Comprehend and Amazon Comprehend Medical use deep learning models to handle negation.

Conclusion

In this post, we walked through the high-level steps to deploy a pre-built NLP model to analyze medical texts. If you’re interested in deploying this yourself, see our step-by-step technical write-up.

References

For more information, see the following references:


About the Authors

Joshua Broyde is an AI/ML Specialist Solutions Architect on the Global Healthcare and Life Sciences team at Amazon Web Services. He works with customers in the healthcare and life sciences industry at all levels of the Machine Learning Lifecycle on a number of AI/ML fronts, including analyzing medical images and video, analyzing machine sensor data and performing natural language processing of medical and healthcare texts.

 

Claire Palmer is a Solutions Architect at Amazon Web Services. She is on the Global Account Development team, supporting healthcare and life sciences customers. Claire has a passion for driving innovation initiatives and developing solutions that are both secure and scalable. She is based out of Seattle, Washington and enjoys exploring the PNW in her free time.

Read More