Bring your own container to project model accuracy drift with Amazon SageMaker Model Monitor

The world we live in is constantly changing, and so is the data that is collected to build models. One of the problems that is often seen in production environments is that the deployed model doesn’t behave the same way as it did during the training phase. This concept is generally called data drift or dataset shift, and can be caused by many factors, such as bias in sampling data that affects features or label data, the non-stationary nature of time series data, or changes in the data pipeline. Because machine learning (ML) models aren’t deterministic, it’s important to minimize the variance in the production environment by periodically monitoring the deployment environment for model drift and sending alerts and, if necessary, triggering retraining of the models with new data.

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy ML models at any scale. After you train an ML model, you can deploy it on SageMaker endpoints that are fully managed and can serve inferences in real time with low latency. After you deploy your model, you can use Amazon SageMaker Model Monitor to continuously monitor the quality of your ML model in real time. You can also configure alerts to notify and trigger actions if any drift in model performance is observed. Early and proactive detection of these deviations enables you to take corrective actions, such as collecting new ground truth training data, retraining models, and auditing upstream systems, without having to manually monitor models or build additional tooling.

In this post, we present some techniques to detect covariate drift (one of the types of data drift) and demonstrate how to incorporate your own drift detection algorithms and visualizations with Model Monitor.

Types of data drift

Data drift can be classified into three categories depending on whether the distribution shift is happening on the input or on the output side, or whether the relationship between the input and the output has changed.

Covariate shift

In a covariate shift, the distribution of inputs changes over time, but the conditional distribution P(y|x) doesn’t change. This type of drift is called covariate shift because the problem arises due to a shift in the distribution of the covariates (features). For example, the training dataset for a face recognition algorithm may contain predominantly younger faces, while the real world may have a much larger proportion of older people.

Label shift

While covariate shift focuses on changes in the feature distribution, label shift focuses on changes in the distribution of the class variable. This type of shifting is essentially the reverse of covariate shift. An intuitive way to think about it might be to consider an unbalanced dataset; for example, if the spam to non-spam ratio of emails in our training set is 50%, but in reality, 10% of our emails are non-spam, then the target label distribution has shifted.

Concept shift

Concept shift is different from covariate and label shift in that it’s not related to the data distribution or the class distribution, but instead is related to the relationship between the two variables. For example, in stock analysis, the relationship between prior stock market data and the stock price is non-stationary. The relationship between the input data and the target variable changes over time, and the model needs to be refreshed often.

Now that we know the types of distribution shifts, let’s see how Model Monitor helps us in detecting data drifts. In the next section, we walk through setting up Model Monitor and incorporating our custom drift detection algorithms.

Model Monitor

Model Monitor offers four different types of monitoring capabilities to detect and mitigate model drift in real time:

  • Data quality – Helps detect change in data schemas and statistical properties of independent variables and alerts when a drift is detected.
  • Model quality – For monitoring model performance characteristics such as accuracy or precision in real time, Model Monitor allows you to ingest the ground truth labels collected from your applications. Model Monitor automatically merges the ground truth information with prediction data to compute the model performance metrics.
  • Model bias –Model Monitor is integrated with Amazon SageMaker Clarify to improve visibility into potential bias. Although your initial data or model may not be biased, changes in the world may cause bias to develop over time in a model that has already been trained.
  • Model explainability – Drift detection alerts you when a change occurs in the relative importance of feature attributions.

Deequ, which measures data quality, powers some of the Model Monitor pre-built monitors. You don’t require coding to utilize these pre-built monitoring capabilities. You also have the flexibility to monitor models by coding to provide custom analysis. All metrics emitted by Model Monitor can be collected and viewed in Amazon SageMaker Studio, so you can visually analyze your model performance without writing additional code.

In certain scenarios, the pre-built monitors may not be sufficient to generate sophisticated metrics to detect drifts, and may necessitate bringing your own metrics. In the next sections, we describe the setup to bring in your metrics by building a custom container.

Environment setup

For this post, we use a SageMaker notebook to set up Model Monitor and also visualize the drifts. We begin with setting up required roles and Amazon Simple Storage Service (Amazon S3) buckets to store our data:

region = boto3.Session().region_name

sm_client = boto3.client('sagemaker')

role = get_execution_role()
print(f"RoleArn: {role}")

# You can use a different bucket, but make sure the role you chose for this notebook
# has the s3:PutObject permissions. This is the bucket into which the data is captured
bucket = session.Session(boto3.Session()).default_bucket()
print(f"Demo Bucket: {bucket}")
prefix = 'sagemaker/DEMO-ModelMonitor'

s3_capture_upload_path = f's3://{bucket}/{prefix}/datacapture'
s3_report_path = f's3://{bucket}/{prefix}/reports'

Upload train dataset, test dataset, and model file to Amazon S3

Next, we upload our training and test datasets, and also the training model we use for inference. For this post, we use the Census Income Dataset from UCI Machine Learning Repository. The dataset consists of people’s income and several attributes that describe population demographics. The task is to predict if a person makes above or below $50,000. This dataset contains both categorical and integral attributes, and has several missing values. This makes it a good example to demonstrate model drift and detection.

We use XGBoost algorithm to train the model offline using SageMaker. We provide the model file for deployment. The training dataset is used for comparing with inference data to generate drift scores, while test dataset is used for computing how much the accuracy of the model has degraded due to drift. We provide more intuition on these algorithms in later steps.

The following code uploads our datasets and model to Amazon S3:

model_file = open("model/model.tar.gz", 'rb')
train_file = open("data/train.csv", 'rb')
test_file = open("data/test.csv", 'rb')

s3_model_key = os.path.join(prefix, 'model.tar.gz')
s3_train_key = os.path.join(prefix, 'train.csv')
s3_test_key = os.path.join(prefix, 'test.csv')

boto3.Session().resource('s3').Bucket(bucket).Object(s3_model_key).upload_fileobj(model_file)
boto3.Session().resource('s3').Bucket(bucket).Object(s3_train_key).upload_fileobj(train_file)
boto3.Session().resource('s3').Bucket(bucket).Object(s3_test_key).upload_fileobj(test_file)

Set up a Docker container

Model Monitor supports bringing your own custom model monitor containers. When you create a MonitoringSchedule, Model Monitor starts processing jobs for evaluating incoming inference data. While invoking the containers, Model Monitor sets up additional environment variables for you so that your container has enough context to process the data for that particular run of the scheduled monitoring. For the container code variables, see Container Contract Inputs. Of the available input environmental variables, we’re interested in dataset_source, output_path, and end_time:

"Environment": {
    "dataset_source": "/opt/ml/processing/endpointdata",
    "output_path": "/opt/ml/processing/resultdata",
    "end_time": "2019-12-01T16: 20: 00Z"
}

The dataset_source variable specifies the data capture location on the container, and end_time refers to the time of the last event capture. The custom drift algorithm is included in the src directory. For details on the algorithms, see the GitHub repo.

Now we build the Docker container and push it to Amazon ECR. See the following Dockerfile:

FROM python:3.8-slim-buster

RUN pip3 install pandas==1.1.4 numpy==1.19.4 scikit-learn==0.23.2 pyarrow==2.0.0 scipy==1.5.4 boto3==1.17.12

WORKDIR /home
COPY src/* /home/

ENTRYPOINT ["python3", "drift_detector.py"]

We build and push it to Amazon ECR with the following code:

from docker_utils import build_and_push_docker_image

repository_short_name = 'custom-model-monitor'

image_name = build_and_push_docker_image(repository_short_name)

Set up an endpoint and enable data capture

As we mentioned before, our model was trained using XGBoost, so we use XGBoostModel from the SageMaker SDK to deploy the model. Because we have a custom input parser, which includes imputation and one-hot encoding, we provide the inference entry point along with source directory, which includes the Scikit ColumnTransfomer model. The following code is the customer inference function:

script_path = pathlib.Path(__file__).parent.absolute()
with open(f'{script_path}/preprocess.pkl', 'rb') as f:
    preprocess = pickle.load(f) 


def input_fn(request_body, content_type):
    """
    The SageMaker XGBoost model server receives the request data body and the content type,
    and invokes the `input_fn`.

    Return a DMatrix (an object that can be passed to predict_fn).
    """

    if content_type == 'text/csv':        
        df = pd.read_csv(StringIO(request_body), header=None)
        X = preprocess.transform(df)
        
        X_csv = StringIO()
        pd.DataFrame(X).to_csv(X_csv, header=False, index=False)
        req_transformed = X_csv.getvalue().replace('n', '')
        
        return xgb_encoders.csv_to_dmatrix(req_transformed)
    else:
        raise ValueError(
            f'Content type {request_content_type} is not supported.'
        )

We also include the configuration for data capture, specifying the S3 destination, and sampling percentage with the endpoint deployment (see the following code). Ideally, we want to capture 100% of the incoming data for drift detection, but for a high traffic endpoint, we suggest reducing the sampling percentage so that the endpoint availability isn’t affected. Besides, Model Monitor might automatically reduce the sampling percentage if it senses the endpoint availability is affected.

from sagemaker.xgboost.model import XGBoostModel
from sagemaker.serializers import CSVSerializer
from sagemaker.model_monitor import DataCaptureConfig

model_url = f's3://{bucket}/{s3_model_key}'

xgb_inference_model = XGBoostModel(
    model_data=model_url,
    role=role,
    entry_point='inference.py',
    source_dir='script',
    framework_version='1.2-1',
)

data_capture_config = DataCaptureConfig(
                        enable_capture=True,
                        sampling_percentage=100,
                        destination_s3_uri=s3_capture_upload_path)

predictor = xgb_inference_model.deploy(
    initial_instance_count=1,
    instance_type='ml.c5.xlarge',
    serializer=CSVSerializer(),
    data_capture_config=data_capture_config)

After we deploy the model, we can set up a monitor schedule.

Create a monitor schedule

Typically, we create a processing job to generate baseline metrics of our training set. Then we create a monitor schedule to start periodic jobs to analyze incoming inference requests and generate metrics similar to baseline metrics. The inference metrics are compared with baseline metrics and a detailed report on constraints violation and drift in data quality is generated. Using SageMaker SDK simplifies the creation of baseline metrics and scheduling model monitor.

For the algorithms we use in this post (such as Wasserstein distance and Kolmogorov–Smirnov test), the container that we build needs access to both the training dataset and the inference data for computing metrics. This non-typical setup requires some low-level setup of the monitor schedule. The following code is the Boto3 request to set up the monitor schedule. The key fields are related to the container image URL and arguments to the container. We cover building containers in the next section. For now, note that we pass the S3 URL for training the dataset:

s3_train_path = f's3://{bucket}/{s3_train_key}'
s3_test_path = f's3://{bucket}/{s3_test_key}'
s3_result_path = f's3://{bucket}/{prefix}/result/{predictor.endpoint_name}'

sm_client.create_monitoring_schedule(
    MonitoringScheduleName=predictor.endpoint_name,
    MonitoringScheduleConfig={
        'ScheduleConfig': {
            'ScheduleExpression': 'cron(0 * ? * * *)'
        },
        'MonitoringJobDefinition': {
            'MonitoringInputs': [
                {
                    'EndpointInput': {
                        'EndpointName': predictor.endpoint_name,
                        'LocalPath': '/opt/ml/processing/endpointdata'
                    }
                },
            ],
            'MonitoringOutputConfig': {
                'MonitoringOutputs': [
                    {
                        'S3Output': {
                            'S3Uri': s3_result_path,
                            'LocalPath': '/opt/ml/processing/resultdata',
                            'S3UploadMode': 'EndOfJob'
                        }
                    },
                ]
            },
            'MonitoringResources': {
                'ClusterConfig': {
                    'InstanceCount': 1,
                    'InstanceType': 'ml.c5.xlarge',
                    'VolumeSizeInGB': 10
                }
            },
            'MonitoringAppSpecification': {
                'ImageUri': image_name,
                'ContainerArguments': [
                    '--train_s3_uri',
                    s3_train_path,
                    '--test_s3_uri',
                    s3_test_path,
                    '--target_label',
                    'income'
                ]
            },
            'StoppingCondition': {
                'MaxRuntimeInSeconds': 600
            },
            'Environment': {
                'string': 'string'
            },
            'RoleArn': role
        }
    }
)

Generate requests for the endpoint

After we deploy the model, we send traffic at a constant rate to the endpoints. The following code launches a thread to generate the requests for about 10 hours. Make sure to stop the kernel if you want to stop the traffic. We let the traffic generator run for a few hours to collect enough samples to visualize drift. The plots automatically refresh whenever there is new data.

def invoke_endpoint(ep_name, file_name, runtime_client):
    pre_time = time()
    with open(file_name) as f:
        count = len(f.read().split('n')) - 2 # Remove EOF and header
    
    # Calculate time needed to sleep between inference calls if we need to have a constant rate of calls for 10 hours
    ten_hours_in_sec = 10*60*60
    sleep_time = ten_hours_in_sec/count
    
    with open(file_name, 'r') as f:
        next(f) # Skip header
        
        for ind, row in enumerate(f):   
            start_time = time()
            payload = row.rstrip('n')
            response = runtime_client(data=payload)
            
            # Print every 15 minutes (900 seconds)
            if (ind+1) % int(count/ten_hours_in_sec*900) == 0:
                print(f'Finished sending {ind+1} records.')
            
            # Sleep to ensure constant rate. Time spent for inference is subtracted
            sleep(max(sleep_time - (time() - start_time), 0))
                
    print("Done!")
    
print(f"Sending test traffic to the endpoint {predictor.endpoint_name}.nPlease wait...")

thread = Thread(target = invoke_endpoint, args=(predictor.endpoint, 'data/infer.csv', predictor.predict))
thread.start()

Visualize the data drift

We use the entire dataset for training and synthetically generate a new dataset for inference by modifying statistical properties of some of the attributes from the original dataset as described in our GitHub repo.

Normalized drift score per feature

A normalized drift score (in %) for each feature is computed by calculating the ratio of overlap of the distribution of incoming inference data with the original data to the original data. For categorical data, this is computed by summing the absolute of the difference in probability scores (training and inference) across each label. For numerical data, the training data is split into 10 bins (deciles), and the absolute of the difference in probability scores over bin is summed to calculate the drift score.

The following plot shows the drift scores over the time intervals. A few features have low to no drifts, while some features have drifts that are increasing over time, and finally hours-per-week has a large drift right from the start.

Projected drift in model accuracy

This metric provides a proxy of the accuracy of a model due to drift in inference data from the test data. The idea behind this approach is to determine how much percentage of the inference data is similar to the portion of the test data, where the model predicts well. For this metric, we use Isolation Forests to train one-class classifier, and generate scores for the portion of the test data that model predicted well, and for the inference data. The relative difference in the mean of these scores is the projected drift in accuracy.

The following plot shows the corresponding degradation in model accuracy due to drift in the incoming data. This demonstrates that covariate shift can reduce accuracy of a model. A few spikes in the metric can be attributed to the statistical nature of how the data is generated. This is only a proxy for model accuracy, and not the actual accuracy, which can only be known from the ground truth of the labels.

P-values of null test hypothesis

A null test hypothesis tests whether two samples (for this post, the training and inference datasets) are derived from the same general population. The p-value gives the probability of obtaining test results at least as extreme as the observations under the assumption that the null hypothesis is correct. A p-value threshold of 5% is used to decide whether the observed sample has drifted from the training data or not.

The following plot shows p-values of different attributes that have crossed the threshold at least once. The y-axis shows the inverse log of p-values to distinguish very small p-values. P-values for numerical attributes are obtained from Kolmogorov–Smirnov test, and for categorical attributes from Chi-square test.

Conclusion

Amazon SageMaker Model Monitor is a powerful tool to detect data drift. In this post, we showed how to easily integrate your custom data drift detection algorithms into Model Monitor, while benefiting from the heavy lifting that Model Monitor provides in terms of data capture and scheduling the monitor runs. The notebook provides a detailed step-by-step instruction on how to build a custom container and attach it to Model Monitor schedules. Give Model Monitor a try and leave your feedback in the comments.


About the Author

Vinay Hanumaiah is a Senior Deep Learning Architect at Amazon ML Solutions Lab, where he helps customers build AI and ML solutions to accelerate their business challenges. Prior to this, he contributed to the launch of AWS DeepLens and Amazon Personalize. In his spare time, he enjoys time with his family and is an avid rock climber.

Read More

Detect defects and augment predictions using Amazon Lookout for Vision and Amazon A2I

With machine learning (ML), more powerful technologies have become available that can automate the task of detecting visual anomalies in a product. However, implementing such ML solutions is time-consuming and expensive because it involves managing and setting up complex infrastructure and having the right ML skills. Furthermore, ML applications need human oversight to ensure accuracy with anomaly detection, help provide continuous improvements, and retrain models with updated predictions. However, you’re often forced to choose between an ML-only or human-only system. Companies are looking for the best of both worlds, integrating ML systems into your workflow while keeping a human eye on the results to achieve higher precision.

In this post, we show how you can easily set up Amazon Lookout For Vision to train a visual anomaly detection model using a printed circuit board dataset, use a human-in-the-loop workflow to review the predictions using Amazon Augmented AI (Amazon A2I), augment the dataset to incorporate human input, and retrain the model.

Solution overview

Lookout for Vision is an ML service that helps spot product defects using computer vision to automate the quality inspection process in your manufacturing lines, with no ML expertise required. You can get started with as few as 30 product images (20 normal, 10 anomalous) to train your unique ML model. Lookout for Vision uses your unique ML model to analyze your product images in near-real time and detect product defects, allowing your plant personnel to diagnose and take corrective actions.

Amazon A2I is an ML service that makes it easy to build the workflows required for human review. Amazon A2I brings human review to all developers, removing the undifferentiated heavy lifting associated with building human review systems or managing large numbers of human reviewers, whether running on AWS or not.

To get started with Lookout for Vision, we create a project, create a dataset, train a model, and run inference on test images. After going through these steps, we show you how you can quickly set up a human review process using Amazon A2I and retrain your model with augmented or human reviewed datasets. We also provide an accompanying Jupyter notebook.

Architecture overview

The following diagram illustrates the solution architecture.

 

The solution has the following workflow:

  1. Upload data from the source to Amazon Simple Storage Service (Amazon S3).
  2. Run Lookout for Vision to process data from the Amazon S3 path.
  3. Store inference results in Amazon S3 for downstream review.
  4. Use Lookout for Vision to determine if an input image is damaged and validate that the confidence level is above 70%. If below 70%, we start a human loop for a worker to manually determine whether an image is damaged.
  5. A private workforce investigates and validates the detected damages and provides feedback.
  6. Update the training data with corresponding feedback for subsequent model retraining.
  7. Repeat the retraining cycle for continuous model retraining.

Prerequisites

Before you get started, complete the following steps to set up the Jupyter notebook:

  1. Create a notebook instance in Amazon SageMaker.
  2. When the notebook is active, choose Open Jupyter.
  3. On the Jupyter dashboard, choose New, and choose Terminal.
  4. In the terminal, enter the following code:
cd SageMaker<br />git clone https://github.com/aws-samples/amazon-lookout-for-vision.git
  1. Open the notebook for this post: Amazon-Lookout-for-Vision-and-Amazon-A2I-Integration.ipynb.

You’re now ready to run the notebook cells.

  1. Run the setup environment step to set up the necessary Python SDKs and variables:
!pip install lookoutvision
!pip install simplejson

In the first step, you need to define the following:

  • region – The Region where your project is located
  • project_name – The name of your Lookout for Vision project
  • bucket – The name of the Amazon S3 bucket where we output the model results
  • model_version – Your model version (the default setting is 1)
# Set the AWS region
region = '<AWS REGION>'

# Set your project name here
project_name = '<CHANGE TO AMAZON LOOKOUT FOR VISION PROJECT NAME>'

# Provide the name of the S3 bucket where we will output results and store images
bucket = '<S3 BUCKET NAME>'

# This will default to a value of 1; Since we're training a new model, leave this set to a value of 1
model_version = '1'
  1. Create the S3 buckets to store images:
!aws s3 mb s3://{bucket}
  1. Create a manifest file from the dataset by running the cell in the section Create a manifest file from the dataset in the notebook.

Lookout for Vision uses this manifest file to determine the location of the files, as well as the labels associated with the files.

Upload circuit board images to Amazon S3

To train a Lookout for Vision model, we need to copy the sample dataset from our local Jupyter notebook over to Amazon S3:

# Upload images to S3 bucket:
!aws s3 cp circuitboard/train/normal s3://{bucket}/{project_name}/training/normal --recursive
!aws s3 cp circuitboard/train/anomaly s3://{bucket}/{project_name}/training/anomaly --recursive

!aws s3 cp circuitboard/test/normal s3://{bucket}/{project_name}/validation/normal --recursive
!aws s3 cp circuitboard/test/anomaly s3://{bucket}/{project_name}/validation/anomaly –recursive

Create a Lookout for Vision project

You have a couple of options on how to create your Lookout for Vision project: the Lookout for Vision console, the AWS Command Line Interface (AWS CLI), or the Boto3 SDK. We chose the Boto3 SDK in this example, but highly recommend you check out the console method as well.

The steps we take with the SDK are:

  1. Create a project (the name was defined at the beginning) and tell your project where to find your training dataset. This is done via the manifest file for training.
  2. Tell your project where to find your test dataset. This is done via the manifest file for test.

This second step is optional. In general, all test-related code is optional; Lookout for Vision also works with just a training dataset. We use both because training and testing is a common (best) practice when training AI and ML models.

Create a manifest file from the dataset

Lookout for Vision uses this manifest file to determine the location of the files, as well as the labels associated with the files. See the following code:

#Create the manifest file

from lookoutvision.manifest import Manifest
mft = Manifest(
    bucket=bucket,
    s3_path="{}/".format(project_name),
    datasets=["training", "validation"])
mft_resp = mft.push_manifests()
print(mft_resp)

Create a Lookout for Vision project

The following command creates a Lookout for Vision project:

# Create an Amazon Lookout for Vision Project

from lookoutvision.lookoutvision import LookoutForVision
l4v = LookoutForVision(project_name=project_name)
# If project does not exist: create it
p = l4v.create_project()
print(p)
print('Done!')

Create and train a model

In this section, we walk through the steps of creating the training and test datasets, training the model, and hosting the model.

Create the training and test datasets from images in Amazon S3

After we create the Lookout for Vision project, we create the project dataset by using the sample images we uploaded to Amazon S3 along with the manifest files. See the following code:

dsets = l4v.create_datasets(mft_resp, wait=True)
print(dsets)
print('Done!')

Train the model

After we create the Lookout for Vision project and the datasets, we can train our first model:

l4v.fit(
    output_bucket=bucket,
    model_prefix="mymodel_",
    wait=True)

When training is complete, we can view available model metrics:

met = Metrics(project_name=project_name)

met.describe_model(model_version=model_version)

You should see an output similar to the following.

Metrics Model Arn Status Message Performance Model Performance
F1 Score 1 TRAINED Training completed successfully. 0.93023
Precision 1 TRAINED Training completed successfully. 0.86957
Recall 1 TRAINED Training completed successfully. 1

Host the model

Before we can use our newly trained Lookout for Vision model, we need to host it:

l4v.deploy(
    model_version=model_version,
    wait=True)

Set up Amazon A2I to review predictions from Lookout for Vision

In this section, you set up a human review loop in Amazon A2I to review inferences that are below the confidence threshold. You must first create a private workforce and create a human task UI.

Create a workforce

You need to create a workforce via the SageMaker console. Note the ARN of the workforce and enter its value in the notebook cell:

WORKTEAM_ARN = 'your workforce team ARN'

The following screenshot shows the details of a private team named lfv-a2i and its corresponding ARN.

Create a human task UI

You now create a human task UI resource: a UI template in liquid HTML. This HTML page is rendered to the human workers whenever a human loop is required. For over 70 pre-built UIs, see the amazon-a2i-sample-task-uis GitHub repo.

Follow the steps provided in the notebook section Create a human task UI to create the web form, initialize Amazon A2I APIs, and inspect output:

...
def create_task_ui():
    '''
    Creates a Human Task UI resource.

    Returns:
    struct: HumanTaskUiArn
    '''
    response = sagemaker_client.create_human_task_ui(
        HumanTaskUiName=taskUIName,
        UiTemplate={'Content': template})
    return response
...

Create a human task workflow

Workflow definitions allow you to specify the following:

  • The worker template or human task UI you created in the previous step.
  • The workforce that your tasks are sent to. For this post, it’s the private workforce you created in the prerequisite steps.
  • The instructions that your workforce receives.

This post uses the Create Flow Definition API to create a workflow definition. The results of human review are stored in an Amazon S3 bucket, which can be accessed by the client application. Run the cell Create a Human task Workflow in the notebook and inspect the output:

create_workflow_definition_response = sagemaker_client.create_flow_definition(
        FlowDefinitionName = flowDefinitionName,
        RoleArn = role,
        HumanLoopConfig = {
            "WorkteamArn": workteam_arn,
            "HumanTaskUiArn": humanTaskUiArn,
            "TaskCount": 1,
            "TaskDescription": "Select if the component is damaged or not.",
            "TaskTitle": "Verify if the component is damaged or not"
        },
        OutputConfig={
            "S3OutputPath" : a2i_results
        }
    )
flowDefinitionArn = create_workflow_definition_response['FlowDefinitionArn'] 
# let's save this ARN for future use

Make predictions and start a human loop based on the confidence level threshold

In this section, we loop through an array of new images and use the Lookout for Vision SDK to determine if our input images are damaged or not, and if they’re above or below a defined threshold. For this post, we set the threshold confidence level at .70. If our result is below .70, we start a human loop for a worker to manually determine if our image is normal or an anomaly. See the following code:

...

SCORE_THRESHOLD = .70

for fname in Incoming_Images_Array:
    #Lookout for Vision inference using detect_anomalies
    fname_full_path = (Incoming_Images_Dir + "/" + fname)
    with open(fname_full_path, "rb") as image:
        modelresponse = L4Vclient.detect_anomalies(
            ProjectName=project_name,
            ContentType="image/jpeg",  # or image/png for png format input image.
            Body=image.read(),
            ModelVersion=model_version,
            )
        modelresponseconfidence = (modelresponse["DetectAnomalyResult"]["Confidence"])

    if (modelresponseconfidence < SCORE_THRESHOLD):
...
        # start an a2i human review loop with an input
        start_loop_response = a2i.start_human_loop(
            HumanLoopName=humanLoopName,
            FlowDefinitionArn=flowDefinitionArn,
            HumanLoopInput={
                "InputContent": json.dumps(inputContent)
            }
        )
... 

You should get the output shown in the following screenshot.

Complete your review and check the human loop status

If inference results are below the defined threshold, a human loop is created. We can review the status of those jobs and wait for results:

...
completed_human_loops = []
for human_loop_name in human_loops_started:
    resp = a2i.describe_human_loop(HumanLoopName=human_loop_name)
    print(f'HumanLoop Name: {human_loop_name}')
    print(f'HumanLoop Status: {resp["HumanLoopStatus"]}')
    print(f'HumanLoop Output Destination: {resp["HumanLoopOutput"]}')
    print('n')
    
    if resp["HumanLoopStatus"] == "Completed":
        completed_human_loops.append(resp)
workteamName = workteam_arn[workteam_arn.rfind('/') + 1:]
print("Navigate to the private worker portal and do the tasks. Make sure you've invited yourself to your workteam!")
print('https://' + sagemaker_client.describe_workteam(WorkteamName=workteamName)['Workteam']['SubDomain'])

The work team sees the following screenshot to choose the correct label for the image.

View results of the Amazon A2I workflow and move objects to the correct folder for retraining

After the work team members have completed the human loop tasks, let’s use the results of the tasks to sort our images into the correct folders for training a new model. See the following code:

...

    # move the image to the appropriate training folder
    if (labelanswer == "Normal"):
        # move object to the Normal training folder s3://a2i-lfv-output/image_folder/normal/
        !aws s3 cp {taskObjectResponse} s3://{bucket}/{project_name}/train/normal/
    else:
        # move object to the Anomaly training folder
        !aws s3 cp {taskObjectResponse} s3://{bucket}/{project_name}/train/anomaly/
...

Retrain your model based on augmented datasets from Amazon A2I

Training a new model version can be triggered as a batch job on a schedule, manually as needed, based on how many new images have been added to the training folders, and so on. For this example, we use the Lookout for Vision SDK to retrain our model using the images that we’ve now included in our modified dataset. Follow the accompanying Jupyter notebook downloadable from [GitHub-LINK] for the complete notebook.

# Train the model!
l4v.fit(
    output_bucket=bucket,
    model_prefix="mymodelprefix_",
    wait=True)

You should see an output similar to the following.

Now that we’ve trained a new model using newly added images, let’s check the model metrics! We show the results from the first model and the second model at the same time:

# All models of the same project
met.describe_models()

You should see an output similar to the following. The table shows two models: a hosted model (ModelVersion:1) and the retrained model (ModelVersion:2). The performance of the retrained model is better with the human reviewed and labeled images.

Metrics ModelVersion Status StatusMessage Model Performance
F1 Score 2 TRAINED Training completed successfully. 0.98
Precision 2 TRAINED Training completed successfully. 0.96
Recall 2 TRAINED Training completed successfully. 1
F1 Score 1 HOSTED The model is running. 0.93023
Precision 1 HOSTED The model is running. 0.86957
Recall 1 HOSTED The model is running. 1

Clean up

Run the Stop the model and cleanup resources cell to clean up the resources that were created. Delete any Lookout for Vision projects you’re no longer using, and remove objects from Amazon S3 to save costs. See the following code:

#If you are not using the model, stop to save costs! This can take up to 5 minutes.

#change the model version to whichever model you're using within your current project
model_version='1'
l4v.stop_model(model_version=model_version)

Conclusion

This post demonstrated how you can use Lookout for Vision and Amazon A2I to train models to detect defects in objects unique to your business and define conditions to send the predictions to a human workflow with labelers to review and update the results. You can use the human labeled output to augment the training dataset for retraining to improve the model accuracy.

Start your journey towards industrial anomaly detection and identification by visiting the Lookout for Vision Developer Guide and the Amazon A2I Developer Guide.


About the Author

Dennis Thurmon is a Solutions Architect at AWS, with a passion for Artificial Intelligence and Machine Learning. Based in Seattle, Washington, Dennis worked as a Systems Development Engineer on the Amazon Go and Amazon Books team before focusing on helping AWS customers bring their workloads to life in the AWS Cloud.

 

 

Amit Gupta is an AI Services Solutions Architect at AWS. He is passionate about enabling customers with well-architected machine learning solutions at scale.

 

 

 

Neel Sendas is a Senior Technical Account Manager at Amazon Web Services. Neel works with enterprise customers to design, deploy, and scale cloud applications to achieve their business goals. He has worked on various ML use cases, ranging from anomaly detection to predictive product quality for manufacturing and logistics optimization. When he is not helping customers, he dabbles in golf and salsa dancing.

 

 

Read More

Automate annotation of image training data with Amazon Rekognition

Every machine learning (ML) model demands data to train it. If your model isn’t predicting Titanic survival or iris species, then acquiring a dataset might be one of the most time-consuming parts of your model-building process—second only to data cleaning.

What data cleaning looks like varies from dataset to dataset. For example, the following is a set of images tagged robin that you might want to use to train an image recognition model on bird species.

That nest might count as dirty data, and some model applications may make it inappropriate to include American and European robins in the same category, but this seems pretty good so far. Let’s keep looking at additional images.

Well, that’s clearly not right.

One thing that can be frustrating about bad data is its obvious wrongness—that roaring campfire and woman with a bow and arrow (perhaps doing a Robin Hood-themed photoshoot?) aren’t even birds, much less robins. If your image collections or datasets weren’t carefully assembled by human intelligence for the specific model training application, they’re likely dirty. Cleaning that kind of dirty data is where Amazon Rekognition comes in.

Solution overview

Amazon Rekognition Image is an image recognition service capable of detecting thousands of different objects using deep neural network models. By taking advantage of the training that’s already gone into the service, you can easily sort through a mass of data and pick out only images that contain a known object, whether that’s as general as animal or as specific as robin. This can lead to the development of a customized dataset narrowly suited for your needs that’s cleaned quickly and cheaply compared to manual solutions. You can apply this principle to any image repository that is expected to include a mix of correct and incorrect images, as long as the correct images fall under an existing Amazon Rekognition label that excludes some incorrect images.

Consider one alternative, Amazon Mechanical Turk, a crowdsourcing marketplace where people can post jobs for virtual gig workers. The minimum price of a task (in this case, one worker labeling an image as “a robin” or “not a robin”) is $0.012. To ensure quality, typical jobs on Mechanical Turk have three to five people view and label each image, bringing the cost floor up to $0.036–$0.06 per image. On Amazon Rekognition, the first million images (beyond the Free Tier, which covers 5,000 images a month for 12 months) each cost $0.001, or at most one twelfth the cost of using Mechanical Turk. For distinctions that don’t require human discernment, that can add up to considerable cost savings.

On top of that, you may desire to control costs by limiting the number of images scanned by Amazon Rekognition. We have a couple options for large repositories that might incur substantial costs if searched exhaustively:

  • Place a cap on the number of images to scan. If you want to end up with 50 filtered images for a particular label, like bird, you might set your algorithm to scan only up to several hundred at most. You might end up with fewer than 50 birds—but if the hit rate was so low that you reached the cap, your repository might not be a great source of bird pictures, and you’ve saved money searching to the end for the fiftieth bird. Ideally, you’ll find 50 birds before reaching the cap and stop then, but it’s the nature of dirty data that we often don’t know exactly how dirty it is.
  • Implement an early stopping algorithm. If some number, perhaps 20, images in a row fail to turn up any birds, then stop looking. Early stopping might mean the dataset is unsuited to its intended purpose, or that there was some error in the invocation of the function, like a typo in the label (for example a search for birb instead of bird).

Try the demo filter function

The following diagram shows what this solution could look like in practice. A filter function running locally on the client’s computer can use an SDK to make API calls to Amazon Rekognition and Amazon Simple Storage Service (Amazon S3) to check each image in turn. When Amazon Rekognition detects the desired label in an image from the source bucket repository, the function copies that image into the destination bucket.

All the function needs are appropriate permissions within your AWS account and the following parameters:

  • An Amazon Rekognition label to filter on. To find out which one might be best for your needs, check the current list of available labels (available in the documentation) or try testing a good image from your repository on the Amazon Rekognition console and seeing what labels come up.
  • The name of a source bucket in Amazon S3 that contains an unsorted image repository.
  • The name of a destination bucket that images are copied into if Amazon Rekognition detects the specified label.

Optionally, you can also specify a confidence threshold to even more stringently filter images, and a name to call the folder that images in the destination bucket are organized into.

A basic filter function might look something like this:

def check_for_tag(client, file_name, bucket, tag, threshold):
    """Checks an individual S3 object for a single tag"""

    response = client.detect_labels(
        Image={
            'S3Object': {
                'Bucket': bucket,
                'Name': file_name
            }
        })

    return tag.lower() in {label['Name'].lower() for label in response['Labels'] if label['Confidence'] > threshold}
def filter(source, destination, tag, threshold, name):
    """Copies an object from source to destination if there's a tag match"""

    # set up resources
    s3_resource = boto3.resource('s3')
    client = boto3.client('rekognition')

    # iterate through source bucket, copying hits
    source_bucket = s3_resource.Bucket(source)
    objects = source_bucket.objects.all()
    for object in objects:
        if check_for_tag(client, object.key, source, tag, threshold):
            copy_source = {
                'Bucket': source,
                'Key': object.key
            }
            new_name = f"{name}/{object.key}"
            s3_resource.meta.client.copy(copy_source, destination, new_name)

With the images already stored in Amazon S3, you don’t even need to upload them to Amazon Rekognition to get a prompt response.

The following are some ideas for customizing and exploring this procedure:

  • Add a human-in-the-loop element to the filter function, so that for Amazon Rekognition confidence scores between certain values, the image is sent elsewhere for manual checking.
  • Include the bounding box data from Amazon Rekognition as metadata to train an object detection model.
  • Train an Amazon Rekognition Custom Labels model with the collected data—the filter function above stores images in the format expected by Amazon Rekognition Custom Labels, with each folder’s name corresponding to a label the model predicts.

Conclusion

In this post, we explored the possibility of using Amazon Rekognition to filter image sets intended for ML applications. This solution can remove egregiously off-the-mark images from a dataset, which results in cleaner training data and better-performing models at a fraction of the cost of hiring human data labelers.

Interested in learning about ML through blogs, tutorials, and more? Check out the AWS Machine Learning community.


About the Authors

Samantha Finley is an Associate Solutions Architect at AWS.

 

 

 

 

Quentin Morris is an Associate Solutions Architect at AWS.

 

 

 

 

Jerry Mullis is an Associate Solutions Architect at AWS.

 

 

 

 

Woodrow Bogucki is an Associate Technical Trainer at AWS. He has a Master’s Degree in Computer Engineering from Texas A&M. His favorite class was Deep Learning and his personal interests include Mexican food, BBQ, and fried chicken.

Read More

Simplify patient care with a custom voice assistant using Amazon Lex V2

For the past few decades, physician burnout has been a challenge in the healthcare industry. Although patient interaction and diagnosis are critical aspects of a physician’s job, administrative tasks are equally taxing and time-consuming. Physicians and clinicians must keep a detailed medical record for each patient. That record is stored in the hospital electronic health record (EHR) system, a database that contains the records of every patient in the hospital. To maintain these records, physicians often spend multiple hours each day to manually enter data into the EHR system, resulting in lower productivity and increased burnout.

Physician burnout is one of the leading factors that lead to depression, fatigue, and stress for doctors during their careers. In addition, it can lead to higher turnover, reduced productivity, and costly medical errors, affecting people’s lives and health.

In this post, you learn the importance of voice assistants and how they can automate administrative tasks for doctors. We also walk through creating a custom voice assistant using PocketSphinx and Amazon Lex.

Voice assistants as a solution to physician burnout

Voice assistants are now starting to automate the vital yet manual parts of patient care. They can be a powerful tool to help doctors save time, reduce stress, and spend more time focusing on the patient versus the administrative requirements of clinical documentation.

Today, voice assistants are becoming more available as natural language processing models advance, errors decrease, and development becomes more accessible for the average developer. However, most devices are limited, so developers must often build their own customized versions.

As Solutions Architects working in the healthcare industry, we see a growing trend towards the adoption of voice assistants in hospitals and patient rooms.

In this post, you learn how to create a custom voice assistant using PocketSphinx and Amazon Lex. With our easy-to-set-up and managed services, developers and innovators can hit the ground running and start developing the devices of the future.

Custom voice assistant solution architecture

The following architecture diagram presents the high-level overview of our solution.

In our solution, we first interface with a voice assistant script that runs on your computer. After the wake word is recognized, the voice assistant starts recording what you say and sends the audio to Amazon Lex, where it uses an AWS Lambda function to retrieve dummy patient data stored in Amazon DynamoDB. The sensor data is generated by another Python script, generate_data.py, which you also run on your computer.

Sensor types include blood pressure, blood glucose, body temperature, respiratory rate, and heart rate. Amazon Lex sends back a voice message, and we use Amazon Polly, a service that turns text into lifelike speech, to create a consistent experience.

Now you’re ready to create the components needed for this solution.

Deploy your solution resources

You can find all the files of our custom voice assistant solution on our GitHub repo. Download all the files, including the PocketSphinx model files downloaded from their repo.

You must deploy the DynamoDB table and Lambda function directly by choosing Launch Stack.

The AWS CloudFormation stack takes a few minutes to complete. When it’s complete, you can go to the Resources tab to check out the Lambda function and DynamoDB table created. Note the name of the Lambda function because we reference it later when creating the Amazon Lex bot.

Create the Amazon Lex bot

When the CloudFormation stack is complete, we’re ready to create the Amazon Lex bot. For this post, we use the newer V2 console.

  1. On the Amazon Lex console, choose Switch to the new Lex V2 console.
  2. In the navigation pane, choose Bots.
  3. Choose Create bot.
  4. For Bot name, enter Healthbot.
  5. For Description, enter an optional description.
  6. For Runtime role, select Create a role with basic Amazon Lex permissions.
  7. In the Children’s Online Privacy Protection Act (COPPA) section, select No.
  8. Keep the settings for Idle session timeout at their default (5 minutes).
  9. Choose Next.

  1. For Voice interaction, choose the voice you want to use.
  2. Choose Done.

Create custom slot types, intents, and utterances

Now we create a custom slot type for the sensors, our intents, and sample utterances.

  1. On the Slot types page, choose Add slot type.
  2. Choose Add blank slot type.
  3. For Slot type name¸ enter SensorType.
  4. Choose Add.
  5. In the editor, under Slot value resolution, select Restrict to slot values.

  1. Add the following values:
    1. Blood pressure
    2. Blood glucose
    3. Body temperature
    4. Heart rate
    5. Respiratory rate

  1. Choose Save slot type.

On the Intents page, we have two intents automatically created for us. We keep the FallbackIntent as the default.

  1. Choose NewIntent.
  2. For Intent name, change to PatientData.

  1. In the Sample utterances section, add some phrases to invoke this intent.

We provide a few examples in the following screenshot, but you can also add your own.

  1. In the Add slot section, for Name, enter PatientId.
  2. For Slot type¸ choose AMAZON.AlphaNumeric.
  3. For Prompts, enter What is the patient ID?

This prompt isn’t actually important because we’re using Lambda for fulfillment.

  1. Add another required slot named SensorType.
  2. For Slot type, choose SensorType (we created this earlier).
  3. For Prompts, enter What would you like to know?
  4. Under Code hooks, select Use a Lambda function for initialization and validation and Use a Lambda function for fulfillment.

  1. Choose Save intent.
  2. Choose Build.

The build may take a few minutes to complete.

Create a new version

We now create a new version with our new intents. We can’t use the draft version in production.

  1. When the build is complete, on the Bot versions page, choose Create version.
  2. Keep all the settings at their default.
  3. Choose Create.

You should now see Version 1 listed on the Bot Versions page.

Create an alias

Now we create an Alias to deploy.

  1. Under Deployment in the navigation pane, choose Aliases.
  2. Chose Create alias.
  3. For Alias name¸ enter prod.
  4. Associate this alias with the most recent version (Version 1).

  1. Choose Create.
  2. On the Aliases page, choose the alias you just created.
  3. Under Languages, choose English (US).

  1. For Source, choose the Lambda function you saved earlier.
  2. For Lambda function version or alias, choose $LATEST.

  1. Choose Save.

You now have a working Amazon Lex Bot you can start testing with. Before we move on, make sure to save the bot ID and alias ID.

The bot ID is located on the bot details page.

The alias ID is located on the Aliases page.

You need to replace these values in the voice assistant script voice_assistant.py later.

In the following sections, we explain how to use PocketSphinx to detect a custom wake word as well as how to start using the solution.

Use PocketSphinx for wake word recognition

The first step of our solution involves invoking a custom wake word before we start listening to your commands to send to Amazon Lex. Voice assistants need an always on, highly accurate, and small footprint program to constantly listen for a wake word. This is usually because they’re hosted on a small, low battery device such as an Amazon Echo.

For wake word recognition, we use PocketSphinx, an open-source continuous speech recognition engine made by Carnegie Mellon University, to process each audio chunk. We decided to use PocketSphinx because it provides a free, flexible, and accurate wake system with good performance.

Create your custom wake word

Building the language model using PocketSphinx is simple. The first step is to create a corpus. You can use the included model that is pre-trained with “Amazon” so if you don’t want to train your own wake word, you can skip to the next step. However, we highly encourage you to test out creating your own custom wake word to use with the voice assistant script.

The corpus is a list of sentences that you use to train the language model. You can find our pre-built corpus file in the file corpus.txt that you downloaded earlier.

  1. Modify the corpus file based on the key phrase or wake word you want to use and then go to the LMTool page.
  2. Choose Browse AND select the corpus.txt file you created
  3. Choose COMPILE KNOWLEDGE BASE.
  4. Download the files the tool created and replace the example corpus files that you downloaded previously.
  5. Replace the KEY_PHRASE and DICT variables in the Python script to reflect the new files and wake word.

  1. Update the bot ID and bot alias ID with the values you saved earlier in the voice assistant script.

Set up the voice assistant script on your computer

In the GitHub repository, you can download the two Python scripts you use for this post: generate_data.py and voice_assistant.py.

You must complete a few steps before you can run the script, namely installing the correct Python version and libraries.

  1. Download and install Python 3.6.

PocketSphinx supports up to Python 3.6. If you have another version of Python installed, you can use pyenv to switch between Python versions.

  1. Install Pocketsphinx.
  2. Install Pyaudio.
  3. Install Boto3.

Make sure you use the latest version by using pip install boto3==<version>.

  1. Install the AWS Command Line Interface (AWS CLI) and configure your profile.

If you don’t have an AWS Identity and Access Management (IAM) user yet, you can create one. Make sure you set the Region to the same Region where you created your resources earlier.

Start your voice assistant

Now that we have everything set up, open up a terminal on your computer and run generate_data.py.

Make sure to run it for at least a minute so that the table is decently populated. Our voice assistant only queries the latest data inserted into the table, so you can stop it after it runs one time. The patient IDs generated are between 0–99, and are asked for later.

Check the table to make sure that data is generating.

Now you can run voice_assistant.py.

Your computer is listening for the wake word you set earlier (or the default “Amazon”) and doesn’t start recording until it detects the wake word. The wake word detection is processed using PocketSphinx’s decoder. The decoder continuously checks for the KEYPHRASE or WakeWord in the audio channel.

To initiate the conversation, say the utterance you set in your intent earlier. The following is a sample conversation:

You: Hey Amazon

You: I want to get patient data.

Lex: What is the ID of the patient you wish to get information on?

You: 45

Lex: What would you like to know about John Smith?

You: blood pressure

Lex: The blood pressure for John Smith is 120/80.

Conclusion

Congratulations! You have set up a healthcare voice assistant that can serve as a patient information retrieval bot. Now you have completed the first step towards creating a personalized voice assistant.

Physician burnout is an important issue that needs to be addressed. Voice assistants, with their increasing sophistication, can help make a difference in the medical community by serving as virtual scribes, assistants, and much more. Instead of burdening physicians with menial tasks such as ordering medication or retrieving patient information, they can use innovative technologies to relieve themselves of the undifferentiated administrative tasks.

We used PocketSphinx and Amazon Lex to create a voice assistant with the simple task of retrieving some patient information. Instead of running the program on your computer, you can try hosting this on any small device that supports Python, such as the Raspberry Pi.

Furthermore, Amazon Lex is HIPAA-eligible, which means that you can integrate it with existing healthcare systems by following the HL7/FHIR standards.

Personalized healthcare assistants can be vital in helping physicians and nurses care for their patients, and retrieving sensor data is just one of the many use cases that can be viable. Other use cases such as ordering medication and scribing conversations can benefit doctors and nurses across hospitals.

We want to challenge you to try out Amazon Lex and see what you can make!


About the Author

David Qiu is a Solutions Architect working in the HCLS sector, helping healthcare companies build secure and scalable solutions in AWS. He is passionate about educating others on cloud technologies and big data processing. Outside of work, he also enjoys playing the guitar, video games, cigars, and whiskey. David holds a Bachelors in Economics and Computer Science from Washington University in St. Louis.

 

 

Manish Agarwal is a technology enthusiast having 20+ years of engineering experience ranging from leading cutting-edge Healthcare startup to delivering massive scale innovations at companies like Apple and Amazon. Having deep expertise in AI/ML and healthcare, he truly believes that AI/ML will completely revolutionize the healthcare industry in next 4-5 years. His interests include precision medicine, Virtual assistants, Autonomous cars/ drones, AR/VR and blockchain. Manish holds Bachelors of Technology from Indian Institute of Technology (IIT).

 

Navneet Srivastava, a Principal Solutions Architect, is responsible for helping provider organizations and healthcare companies to deploy data lake, data mesh, electronic medical records, devices, and AI/ML-based applications while educating customers about how to build secure, scalable, and cost-effective AWS solutions. He develops strategic plans to engage customers and partners, and works with a community of technically focused HCLS specialists within AWS. Navneet has a M.B.A from NYIT and a bachelors in Software Engineering and holds several associate and professional certifications for architecting on AWS.

Read More

TC Energy builds an intelligent document processing workflow to process over 20 million images with Amazon AI

This is a guest post authored by Paul Ngo, US Gas Technical and Operational Services Data Team Lead at TC Energy.

TC Energy operates a network of pipelines, including 57,900 miles of natural gas and 3,000 miles of oil and liquid pipelines, throughout North America. TC Energy enables a stable network of natural gas and crude oil pipelines with safety, integrity, collaboration, and responsibility top of mind. TC Energy’s natural gas pipeline supplies more than 25% of the clean-burning natural gas consumed daily across North America to heat homes, fuel industries, and generate power.

To ensure the maintenance and safety requirements for the US natural gas system, a significant focus is spent on data collection, analysis, and management. With an aging pipeline system coupled with a growing repository of electronic records, any opportunity to leverage technology can reduce cost in performing re-work associated with not being able to locate these high-value records.

In this post, we share how TC Energy built an intelligent document processing workflow using Amazon AI services.

Pressure test records

One example high-value record, identified through the customized intelligent document processing workflow, is a pressure test record. Pressure test records are important for pipeline safety, maintenance, and regulatory compliance. These documents, now totaling over 2.2 million physical pages (and counting) of text and diagrams, present a challenge to both label and discover when needed. Although the key pressure test data remains the same, over the years the documentation formats, ownership, and pressure charts have changed many times, including both typed and handwritten documentation and imagery from as early as the 1900s.

Within the US Integrity & Data team, Paul Ngo learned that manually searching and reviewing these electronic records for pressure test or design pressure records is both time-consuming and introduces missed opportunities in locating high-value records. Incorporating technology through innovation with machine learning (ML) has proven a more enhanced way to search for these types of records quickly as such we wanted to use ML to meet our directive to “leave no stone unturned.”

The solution

To address these challenges, Dallas Kinzel, Delivery Lead within the IS Canadian Innovation team, turned to fully managed ML. The solution is built around Amazon Rekognition Custom Labels, a feature of Amazon Rekognition with AutoML capabilities that classifies images with custom labels , and Amazon Textract, an AI service that easily extracts text (including handwriting or written) and data from virtually any document.

Collectively, our US Business Unit and IS Innovation teams worked together to develop an intelligent document processing workflow in stages. First, the team built a document classifier with Amazon Rekognition Custom Labels (training on 111 distinct document types). Second, they processed classified documents with Amazon Textract, and used DetectText to make sense of scribbled handwriting and numbers.

The following diagram illustrates the solution architecture (click on the image for an expanded view).

Using Amazon Rekognition Custom Labels to create a document classifier was simple and easy. First, the team gathered less than 100 samples to train a custom model, yielding an initial 96% F-Score accuracy rate. Think of F-Score as a measure of how accurately the system is classifying documents. With further improvements to the model, the F-Score improved to 98%. The team was able to achieve this high level of accuracy in a fraction of the time as opposed to if they had they built their own computer vision model from scratch.

Conclusion

What’s next? This system is still in its early stages, and we have more exciting things in store! As a continuation of the work led by Duane Patton, the IS Product team continues to enhance the existing solution by adding new custom labels to the document classifier and increasing the accuracy of classification by utilizing the combined results of Amazon Rekognition and Amazon Textract. We have plans to add other features to the solution, including serverless compute for automatic document processing with AWS Lambda. Amazon DynamoDB, for processing status recording, is also on the roadmap to make the new TC Energy computer vision solution even more efficient and accurate.

Contact sales or visit the product pages to learn more about how Amazon Rekognition and Amazon Textract can help your business.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.


Paul Ngo has a BSc in Computer Science and is the US Gas Technical and Operational Services Data Team Lead at TC Energy. He has over 15 years experience at TC Energy and has experience in data analytics and repeatable sustainable reporting. He has a passion for innovation and leveraging technology to improve productivity.

Read More

Simplify data annotation and model training tasks with Amazon Rekognition Custom Labels

For a supervised machine learning (ML) problem, labels are values expected to be learned and predicted by a model. To obtain accurate labels, ML practitioners can either record them in real time or conduct offline data annotation, which are activities that assign labels to the dataset based on human intelligence. However, manual dataset annotation can be tedious and tiring for a human, especially on a large dataset. Even with labels that are obvious to a human to annotate, the process can still be error-prone due to fatigue. As a result, building training datasets takes up to 80% of a data scientist’s time.

To tackle this issue, we demonstrate in this post how to use an assisting ML model, which is trained using a small annotated dataset, to speed up the annotation on a larger dataset while having a human in the loop. As an example, we focus on a computer vision object detection use case. We detect AWS and Amazon smile logos from images collected on the AWS and Amazon website. Depending on the use case, you can start with training a model with only a few images that captures the obvious pattern in the dataset, and have a human focus on the lightweight tasks of reviewing these automatically proposed annotations and adjust mistaken labels only when necessary. This solution avoids repeating manual work, reduces human fatigue, and improves data annotation quality and efficiency.

In this post, we use AWS CloudFormation to set up a serverless stack with AWS Lambda functions as well as their corresponding permissions, Amazon Simple Storage Service (Amazon S3) for image data lake and model prediction storage, Amazon SageMaker Ground Truth for data labeling, and Amazon Rekognition Custom Labels for dataset management and model training and hosting. Code used in this post is available on the GitHub repository.

Solution overview

Our solution includes the following steps:

  1. Prepare an S3 bucket with images.
  2. Create a Ground Truth labeling workforce.
  3. Deploy the CloudFormation stack.
  4. Train the first version of your model.
  5. Start the feedback client.
  6. Perform label verification with Amazon Rekognition Custom Labels.
  7. Generate a manifest file.
  8. Train the second version of your model.

Prerequisites

Make sure to install Python3, Pillow, and the AWS Command Line Interface (AWS CLI) on your environment and set up your AWS profile configuration and credentials.

Prepare an S3 bucket with images

First, create a new S3 bucket in the designed Region (N. Virginia or Ireland) with two partitions: one with a smaller number of images, and another with a larger number. For example, in this post, s3://rekognition-custom-labels-feedback-amazon-logo/v1/train/ includes eight AWS or Amazon smile logo images, and s3://rekognition-custom-labels-feedback-amazon-logo/v2/train/ has 20 logo images.

Add the following cross -origin resource sharing (CORS) policy in the bucket permission settings:

[
    {
        "AllowedHeaders": [],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": []
    }
]

Create a Ground Truth labeling workforce

In this post, we use a Ground Truth private labeling workforce. We also add new workers, create a new private team, and add a worker into the team. Eventually, we need to record the labeling workforce Amazon Resource Name (ARN).

  1. On the Amazon SageMaker console, under Ground Truth, choose Labeling workforces.
  2. On the Private tab, choose Invite new workers.
  3. Enter the email address of each worker you want to invite.
  4. Choose Invite new workers.

After you add the worker, they receive an email, similar to the one shown in the following screenshot.

Meanwhile, you can create a new labeling team.

  1. Choose Create private team.
  2. For Team name, enter a name.
  3. Choose Create private team to confirm.

  1. On the Labeling workforces page, choose the name of the team.
  2. On the Workers tab, choose Add workers to team.

  1. Select the intended worker’s email address and choose Add workers to team.

Finally, we can get the labeling workforce ARN. For more details, see Create and Manage Workforces.

Deploy the CloudFormation stack

Deploy the CloudFormation stack in one of the AWS Regions where you are going to use Amazon Rekognition Custom Labels. This solution is currently available in us-east-1 (N. Virginia) and eu-west-1 (Ireland).

Region Launch
US East (N. Virginia)
EU West (Ireland)

After deployment, choose the Outputs tab and make note of the three outputs: jobRoleArn, preLambdaArn, and postLambdaArn.

Train the first version of your model

For instructions on creating a project and training a model with Custom Labels, see Announcing Amazon Rekognition Custom Labels. In this post, we create a project called custom-labels-feedback. The first version model was trained and validated using the v1 dataset that includes eight AWS or Amazon smile logo images. The following screenshot shows some labeled sample data used for training.

When the first version model’s training process is finished, take note of your model ARN. In our example, the model performance achieved an F1 score as 0.667. We use this model to help human workers to annotate a larger dataset (v2) for the next iteration.

Start the feedback client

To start the feedback client, complete the following steps:

  1. In your terminal, clone the repository:
git clone https://github.com/aws-samples/amazon-rekognition-custom-labels-feedback-solution.git
  1. Change the working directory:
cd amazon-rekognition-custom-labels-feedback-solution/src
  1. Update the following items in feedback-config.json in the src/ folder:
    1. images – The S3 bucket folder that has the larger dataset. In this post, it is the v2 dataset that contains 30 images.
    2. outputBucket – The output S3 bucket. For best practices, we recommend using the same image bucket here.
    3. jobRoleArn – The output from the CloudFormation stack.
    4. workforceTeamArn – The private team ARN as set earlier in the Ground Truth labeling workforces.
    5. preLambdaArn – The output from the CloudFormation stack.
    6. postLambdaArn – The output from the CloudFormation stack.
    7. projectVersionArn – The first model ARN.

You need to start the first version model before you call the feedback client.

  1. Expand the API Code section on your Amazon Rekognition Custom Labels model page, and enter the AWS CLI command Start model in your terminal.

The model status changes to STARTING.

  1. When the model status changes to RUNNING, run the following code in your terminal:
python3 start-feedback.py

It analyzes the larger dataset of images using the first version model and starts Ground Truth label verification jobs. It also outputs a command for later usage, which generates a manifest file for the larger dataset (v2).

Perform label verification

Now human workers can log in the labeling project to verify labels proposed by the first version model. Usually, label verification jobs are sent to the workers in several batches.

For most of the images that are labeled correctly by the first version model, human workers only need to confirm these labels without any adjustment, which accelerates the whole data annotation process.

Generate a manifest file for the second dataset

After the label verification jobs are complete, (the status of the labeling jobs in Ground Truth changes from In progress to Complete), run the following command that you got from the feedback client’s output in your terminal:

python3 get-feedback.py --jobs-manifest s3://......

This command generates a manifest file for the larger dataset that you can use to train the next version of your model in Rekognition Custom Labels. The output S3 Path indicates the manifest file location for the larger dataset.

Train the second version of your model

To train the next version of your model, we first create a new dataset.

  1. On the Amazon Rekognition Custom Labels console, choose Create dataset.
  2. For Dataset name, enter a name.
  3. For Image location, select Import images labeled by Amazon SageMaker Ground Truth.
  4. For .manifest file location, enter the S3 path you noted earlier.

Double check whether all images are labeled correctly. The following screenshot shows some sample data that we imported from Ground Truth.

With this newly added dataset in Amazon Rekognition Custom Labels, you can train the next version of your model under the same project as the first version. For example, in this post, we train the next version model using the dataset amazon-logo-v2 under the project custom-labels-feedback, and use the dataset amazon-logo-v1 as a test set.

In our example, comparing to the first version, the next version model achieves much better performance with a 0.900 F1 score.

It’s worth noting that you can apply this solution multiple times in a Amazon Rekognition Custom Labels project. You can use the next version model to easily annotate even larger datasets and train models until you’re satisfied with final model performance.

Clean up

After you finish using the custom labels feedback solution, remember to delete the CloudFormation stack via the AWS CloudFormation console, and stop running models by calling the AWS CLI command in your terminal. This helps you avoid any unnecessary charges.

Conclusion

This post presented an end-to-end demonstration of using Amazon Rekognition Custom Labels to efficiently annotate a larger dataset with assistance from a model trained on a smaller dataset. This solution enables you to gain feedback on a model’s performance and make improvements by using human verification and adjustment when necessary. As a result, data annotation, model training, and error analysis are conducted simultaneously and interactively, which improves dataset annotation efficiency.

For more information about building dataset labels with Ground Truth, see Amazon SageMaker Ground Truth and Amazon Rekognition Custom Labels.


About the Authors

Sherry Ding is a Senior AI/ML Specialist Solutions Architect. She has extensive experience in machine learning with a PhD degree in Computer Science. She mainly works with Public Sector customers on various AI/ML related business challenges, helping them accelerate their machine learning journey on the AWS Cloud. When not helping customers, she enjoys outdoor activities.

 

 

Kashif Imran is a Principal Solutions Architect at Amazon Web Services. He works with some of the largest AWS customers who are taking advantage of AI/ML to solve complex business problems. He provides technical guidance and design advice to implement computer vision applications at scale. His expertise spans application architecture, serverless, containers, NoSQL, and machine learning.

 

 

Dr. Baichuan Sun is a Senior Data Scientist at AWS AI/ML. He is passionate about solving strategic business problems with customers using data-driven methodology on the cloud, and he has been leading projects in challenging areas including robotics computer vision, time series forecasting, price optimization, predictive maintenance, pharmaceutical development, product recommendation system, etc. In his spare time he enjoys traveling and hanging out with family

Read More