Use Amazon SageMaker ACK Operators to train and deploy machine learning models

AWS recently released the new Amazon SageMaker Operators for Kubernetes using the AWS Controllers for Kubernetes (ACK). ACK is a framework for building Kubernetes custom controllers, where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like databases or message queues simply by using the Kubernetes API. The new SageMaker ACK Operators make it easier for machine learning (ML) developers and data scientists who use Kubernetes as their control plane to train, tune, and deploy ML models in Amazon SageMaker without signing in to the SageMaker console.

Kubernetes and SageMaker

Building scalable ML workflows involves many iterative steps, including sourcing and preparing data, building ML models, training and evaluating these models, deploying them to production, and monitoring workloads after deployment.

SageMaker is a fully managed service designed and optimized specifically for managing these ML workflows. It removes the undifferentiated heavy lifting of infrastructure management and eliminates the need to invest in IT and DevOps to manage clusters for ML model building, training, and inference. Compute resources are only provisioned when requested, scaled as needed, and shut down automatically when jobs complete, thereby providing near 100% utilization. SageMaker provides many performance and cost optimizations for distributed training, spot training, automatic model tuning, inference latency, and multi-model endpoints.

Many AWS customers who have portability requirements implement a hybrid cloud approach, or implement on-premises and use Kubernetes, an open-source, general-purpose container orchestration system, to set up repeatable ML pipelines running training and inference workloads. However, to support ML workloads, these developers still need to write custom code to optimize the underlying ML infrastructure, provide high availability and reliability, provide data science productivity tools, and comply with appropriate security and regulatory requirements. Kubernetes customers therefore want to use fully managed ML services such as SageMaker for cost-optimized and managed infrastructure, but want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines to retain standardization and portability.

To address this need, AWS allows you to train, tune, and deploy models in SageMaker by using the new SageMaker ACK Operators, which includes a set of custom resource definitions for SageMaker resources that extends the Kubernetes API. With the SageMaker ACK Operators, you can take advantage of fully managed SageMaker infrastructure, tools, and optimizations natively from Kubernetes.

How did we get here?

In late 2019, AWS introduced the SageMaker Operators for Kubernetes to enable developers and data scientists to manage the end-to-end SageMaker training and production lifecycle using Kubernetes as the control plane. SageMaker operators were installed from the GitHub repo by downloading a YAML configuration file that configured your Kubernetes cluster with the custom resource definitions and operator controller service.

In 2020, AWS introduced ACK to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a common controller runtime, a code generator, and a set of AWS service-specific controllers, one of which is the SageMaker controller.

Going forward, new functionality will be added to the SageMaker Operators for Kubernetes through the ACK project.

How does ACK work?

The following diagram illustrates how ACK works.

In this example, Alice is a Kubernetes user. She wants to run model training on SageMaker from within the Kubernetes cluster using the Kubernetes API. Alice issues a call to kubectl apply, passing in a file that describes a Kubernetes custom resource describing her SageMaker training job. kubectl apply passes this file, called a manifest, to the Kubernetes API server running in the Kubernetes controller node (Step 1 in the workflow diagram).

The Kubernetes API server receives the manifest with the SageMaker training job specification and determines whether Alice has permissions to create a custom resource of kind sageMaker.services.k8s.aws/TrainingJob, and whether the custom resource is properly formatted (Step 2).

If Alice is authorized and the custom resource is valid, the Kubernetes API server writes (Step 3) the custom resource to its etcd data store and then responds back (Step 4) to Alice that the custom resource has been created.

The SageMaker controller, which is running on a Kubernetes worker node within the context of a normal Kubernetes Pod, is notified (Step 5) that a new custom resource of kind SageMaker.services.k8s.aws/TrainingJob has been created.

The SageMaker controller then communicates (Step 6) with the SageMaker API, calling the SageMaker CreateTrainingJob API to create the training job in AWS. After communicating with the SageMaker API, the SageMaker controller calls the Kubernetes API server to update (Step 7) the custom resource’s status with information it received from SageMaker. The SageMaker controller therefore provides the same information to the developers that they would have received using the AWS SDK. This results in a better and consistent developer experience.

Machine learning use case

For this post, we follow the SageMaker example provided in the following notebook. However, you can reuse the components in this example with your preference of SageMaker built-in or custom algorithms and your own datasets.

We use the Abalone dataset originally from the UCI data repository [1]. In the libsvm converted version, the nominal feature (male/female/infant) has been converted into a real valued feature. The age of abalone is to be predicted from eight physical measurements. This dataset is already processed and stored in Amazon Simple Storage Service (Amazon S3). We train an XGBoost model on the UCI Abalone dataset to replicate the flow in the example Jupyter notebook.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account.

An existing Amazon Elastic Kubernetes Service (Amazon EKS) cluster. It should be Kubernetes version 1.16+. For automated cluster creation using eksctl, see Getting started with Amazon EKS – eksctl and create your cluster with Amazon EC2 Linux managed nodes.

Install the following tools on the client machine used to access your Kubernetes cluster (you can use AWS Cloud9, a cloud-based integrated development environment (IDE) for the Kubernetes cluster setup):

  • kubectl – A command line tool for working with Kubernetes clusters.
  • Helm version 3.7+ – A tool for installing and managing Kubernetes applications.
  • AWS Command Line Interface (AWS CLI) – A command line tool for interacting with AWS services.
  • eksctl – A command line tool for working with Amazon EKS clusters that automates many individual tasks.
  • yq – A command line YAML processor. (For Linux environments, use the wget plain binary installation).

Set up IAM role-based authentication for the controller Pod

IAM roles for service accounts (IRSA) allows fine-grained roles at the Kubernetes Pod level by combining an OpenID Connect (OIDC) identity provider with Kubernetes service account annotations. In this section, we associate the Amazon EKS cluster with an OIDC provider and create an AWS Identity and Access Management (IAM) role that is assumed by the ACK controller Pod via its service account to access AWS services.

Create a cluster and OIDC ID provider

Make sure you’re connected to the right cluster. Substitute the values for CLUSTER_NAME and CLUSTER_REGION below:

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

# Set the cluster name, region where the cluster exists
export CLUSTER_NAME=<CLUSTER_NAME>
export CLUSTER_REGION=<CLUSTER_REGION>
export RANDOM_VAR=$RANDOM

aws eks update-kubeconfig --name $CLUSTER_NAME --region $CLUSTER_REGION
kubectl config get-contexts 

# Ensure cluster has compute
kubectl get nodes

Set up the OIDC ID provider (IdP) in AWS and associate it with your Amazon EKS cluster:

eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} 
--region ${CLUSTER_REGION} --approve

Get the identity issuer URL by running the following code:

export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
OIDC_PROVIDER_URL=$(aws eks describe-cluster --name $CLUSTER_NAME --region $CLUSTER_REGION --query "cluster.identity.oidc.issuer" --output text | cut -c9-)

Set up an IAM role

Next, let’s set up the IAM role that defines the access to the SageMaker and Application Auto Scaling services. For this, we also need to have an IAM trust policy in place, allowing the specified Kubernetes service account (for example, ack-sagemaker-controller) to assume the IAM role.

Create a file named trust.json and insert the following trust relationship code block required for IAM role:

printf '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::'$AWS_ACCOUNT_ID':oidc-provider/'$OIDC_PROVIDER_URL'"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "'$OIDC_PROVIDER_URL':aud": "sts.amazonaws.com",
          "'$OIDC_PROVIDER_URL':sub": [
            "system:serviceaccount:ack-system:ack-sagemaker-controller",
            "system:serviceaccount:ack-system:ack-applicationautoscaling-controller"
          ]
        }
      }
    }
  ]
}
' > ./trust.json

Updating an Application Auto Scaling Scalable Target requires additional permissions. First, create a service-linked role for Application Auto Scaling.

aws iam create-service-linked-role --aws-service-name sagemaker.application-autoscaling.amazonaws.com

Create a file named pass_role_policy.json to create the policy required for the IAM role.

printf '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::'$AWS_ACCOUNT_ID':role/aws-service-role/sagemaker.application-autoscaling.amazonaws.com/AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint"
    }
  ]
}
' > ./pass_role_policy.json

Run the following command to create a role with the trust relationship defined in trust.json. This trust relationship is required so that Amazon EKS (via a webhook) can inject the necessary environment variables and mount volumes into the Pod that are required by the AWS SDK to assume this role.

OIDC_ROLE_NAME=ack-controller-role-$CLUSTER_NAME

aws iam create-role --role-name $OIDC_ROLE_NAME --assume-role-policy-document file://trust.json

# Attach the AmazonSageMakerFullAccess Policy to the Role. This policy provides full access to 
# Amazon SageMaker. Also provides select access to related services (e.g., Application Autoscaling,
# S3, ECR, CloudWatch Logs).
aws iam attach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

# Attach the iam:PassRole policy required for updating ApplicationAutoscaling ScalableTarget
aws iam put-role-policy --role-name $OIDC_ROLE_NAME --policy-name "iam-pass-role-policy" --policy-document file://pass_role_policy.json

export IAM_ROLE_ARN_FOR_IRSA=$(aws iam get-role --role-name $OIDC_ROLE_NAME --output text --query 'Role.Arn')
echo $IAM_ROLE_ARN_FOR_IRSA

Install SageMaker and Application Auto Scaling controllers

Choose an AWS Region for the SageMaker and automatic scaling resources we create in this post. For convenience, we recommend using us-east-1:

export SERVICE_REGION="us-east-1"
# Namespace for controller
export ACK_K8S_NAMESPACE="ack-system"

Now, let’s install the SageMaker and Application Auto Scaling controller using the following helper script. This script pulls the helm charts from ACK’s public Amazon Elastic Container Registry (Amazon ECR) repository and configures the values of the AWS account, default Region for resources to be created, and IAM role (created in previous step) in the service account to be used by the controller Pod to assume the role. Create a file named install-controllers.sh and insert the following code block:

#!/usr/bin/env bash

# Deploy ACK Helm Charts
export HELM_EXPERIMENTAL_OCI=1
export ACK_K8S_NAMESPACE=${ACK_K8S_NAMESPACE:-"ack-system"}

function install_ack_controller() {
    local service="$1"
    local release_version="$2"
    local chart_export_path=/tmp/chart
    local chart_ref=$service-chart
    local chart_repo=public.ecr.aws/aws-controllers-k8s/$chart_ref
    local chart_package=$chart_ref-$release_version.tgz
    
    # Download helm chart
    mkdir -p $chart_export_path
    helm pull oci://"$chart_repo" --version "$release_version" -d $chart_export_path
    tar xvf "$chart_export_path"/"$chart_package" -C "$chart_export_path"

    # Update the values in helm chart
    pushd $chart_export_path/$service-chart
        yq e '.aws.region = env(SERVICE_REGION)' -i values.yaml 
        yq e '.serviceAccount.annotations."eks.amazonaws.com/role-arn" = env(IAM_ROLE_ARN_FOR_IRSA)' -i values.yaml
    popd

    # Create a namespace and install the helm chart
    helm install -n $ACK_K8S_NAMESPACE --create-namespace ack-$service-controller $chart_export_path/$service-chart
}

install_ack_controller "sagemaker" "v0.3.0"
install_ack_controller "applicationautoscaling" "v0.2.0"

Run the script:

chmod +x install-controllers.sh
./install-controllers.sh

The output contains the following:

Pulled: public.ecr.aws/aws-controllers-k8s/sagemaker-chart:v0.3.0
...

NAME: ack-sagemaker-controller
LAST DEPLOYED: Tue Nov 16 01:53:34 2021
NAMESPACE: ack-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
Pulled: public.ecr.aws/aws-controllers-k8s/applicationautoscaling-chart:v0.2.0
...

NAME: ack-applicationautoscaling-controller
LAST DEPLOYED: Tue Nov 16 01:53:35 2021
NAMESPACE: ack-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

Next, we run the following commands to verify custom resource definitions were applied and controller Pods are running:

kubectl get crds | grep "services.k8s.aws"

The output of the command should contain a number of custom resource definitions related to SageMaker (such as trainingjobs or endpoint) and Application Auto Scaling (such as scalingpolicies and scalabletargets):

# Get pods in controller namespace
kubectl get pods -n $ACK_K8S_NAMESPACE

We see one controller Pod per service running in the ack-system namespace:

NAME                                                     READY   STATUS    RESTARTS   AGE
ack-applicationautoscaling-controller-7479dc78dd-ts9ng   1/1     Running   0          4m52s
ack-sagemaker-controller-788858fc98-6fgr6                1/1     Running   0          4m56s

Prepare SageMaker resources

Next, we create an S3 bucket and IAM role for SageMaker.

To train a model with SageMaker, we need an S3 bucket to store the dataset and artifacts from the training process. We simply use the preprocessed dataset at s3://SageMaker-sample-files/datasets/tabular/uci_abalone[1].

Let’s create a variable for the S3 bucket:

export SAGEMAKER_BUCKET=ack-sagemaker-bucket-$RANDOM_VAR

Create a file named create-bucket.sh and insert the following code block:

printf '
#!/usr/bin/env bash
# create bucket
if [[ $SERVICE_REGION != "us-east-1" ]]; then
  aws s3api create-bucket --bucket "$SAGEMAKER_BUCKET" --region "$SERVICE_REGION" --create-bucket-configuration LocationConstraint="$SERVICE_REGION"
else
  aws s3api create-bucket --bucket "$SAGEMAKER_BUCKET" --region "$SERVICE_REGION"
fi
# sync dataset
aws s3 sync s3://sagemaker-sample-files/datasets/tabular/uci_abalone/train s3://"$SAGEMAKER_BUCKET"/datasets/tabular/uci_abalone/train
aws s3 sync s3://sagemaker-sample-files/datasets/tabular/uci_abalone/validation s3://"$SAGEMAKER_BUCKET"/datasets/tabular/uci_abalone/validation
' > ./create-bucket.sh

Run the script to create the S3 bucket and copy the dataset:

chmod +x create-bucket.sh
./create-bucket.sh

The SageMaker training job that we run later in the post needs an IAM role to access Amazon S3 and SageMaker. Run the following commands to create a SageMaker execution IAM role that is used by SageMaker to access AWS resources:

export SAGEMAKER_EXECUTION_ROLE_NAME=ack-sagemaker-execution-role-$RANDOM_VAR

TRUST="{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }"
aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

SAGEMAKER_EXECUTION_ROLE_ARN=$(aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn')

echo $SAGEMAKER_EXECUTION_ROLE_ARN

Note down the execution role ARN to use in later steps.

Train an XGBoost model

Now, we create a training.yaml file to specify the parameters for a SageMaker training job. SageMaker training jobs enable remote training of ML models. You can customize each training job to run your own ML scripts with custom architectures, data loaders, hyperparameters, and more. To submit a SageMaker training job, we require a job name. Let’s create that variable first:

export JOB_NAME=ack-xgboost-training-job-$RANDOM_VAR

In the following code, we create a training.yaml file that contains the hyperparameters for the training job as well as the location of the training and validation data. It’s also where we specify the Amazon ECR image used for training.

Note: If your $SERVICE_REGION isn’t us-east-1, change the following image URI. For the XGBoost algorithm version 1.2-1 Region-specific image URI, see Docker Registry Paths and Example Code.

export XGBOOST_IMAGE=683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.2-1

printf '
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: TrainingJob
metadata:
  name: '$JOB_NAME'
spec:
  # Name that will appear in SageMaker console
  trainingJobName: '$JOB_NAME'
  hyperParameters: 
    max_depth: "5"
    gamma: "4"
    eta: "0.2"
    min_child_weight: "6"
    subsample: "0.7"
    objective: "reg:linear"
    num_round: "50"
    verbosity: "2"
  algorithmSpecification:
    trainingImage: '$XGBOOST_IMAGE'
    trainingInputMode: File
  roleARN: '$SAGEMAKER_EXECUTION_ROLE_ARN'
  outputDataConfig:
    # The output path of our model
    s3OutputPath: s3://'$SAGEMAKER_BUCKET'
  resourceConfig:
    instanceCount: 1
    instanceType: ml.m4.xlarge
    volumeSizeInGB: 5
  stoppingCondition:
    maxRuntimeInSeconds: 3600
  inputDataConfig:
    - channelName: train
      dataSource:
        s3DataSource:
          s3DataType: S3Prefix
          # The input path of our train data 
          s3URI: s3://'$SAGEMAKER_BUCKET'/datasets/tabular/uci_abalone/train/abalone.train
          s3DataDistributionType: FullyReplicated
      contentType: text/libsvm
      compressionType: None
    - channelName: validation
      dataSource:
        s3DataSource:
          s3DataType: S3Prefix
          # The input path of our validation data 
          s3URI: s3://'$SAGEMAKER_BUCKET'/datasets/tabular/uci_abalone/validation/abalone.validation
          s3DataDistributionType: FullyReplicated
      contentType: text/libsvm
      compressionType: None 
' > ./training.yaml

Now, we can create the training job:

kubectl apply -f training.yaml

You should see the following output:

trainingjob.sagemaker.services.k8s.aws/ack-xgboost-training-job-7420 created

You can watch the status of the training job. It takes a few minutes for STATUS to show as Completed.

kubectl get trainingjob.sagemaker --watch
NAME                            SECONDARYSTATUS   STATUS
ack-xgboost-training-job-7420   Starting          InProgress
ack-xgboost-training-job-7420   Downloading       InProgress
ack-xgboost-training-job-7420   Training          InProgress
ack-xgboost-training-job-7420   Completed         Completed

Deploy the results of the SageMaker training job

To deploy the model, we need to specify a model name, an endpoint config name, and an endpoint name:

export MODEL_NAME=ack-xgboost-model-$RANDOM_VAR
export ENDPOINT_CONFIG_NAME=ack-xgboost-endpoint-config-$RANDOM_VAR
export ENDPOINT_NAME=ack-xgboost-endpoint-$RANDOM_VAR

We deploy this model on a c5.large instance type. In the following .yaml file, we define the model, the endpoint config, and the endpoint:

printf '
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Model
metadata:
  name: '$MODEL_NAME'
spec:
  modelName: '$MODEL_NAME'
  primaryContainer:
    containerHostname: xgboost
    # The source of the model data
    modelDataURL: s3://'$SAGEMAKER_BUCKET'/'$JOB_NAME'/output/model.tar.gz
    image: '$XGBOOST_IMAGE'
  executionRoleARN: '$SAGEMAKER_EXECUTION_ROLE_ARN'
---
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: EndpointConfig
metadata:
  name: '$ENDPOINT_CONFIG_NAME'
spec:
  endpointConfigName: '$ENDPOINT_CONFIG_NAME'
  productionVariants:
  - modelName: '$MODEL_NAME'
    variantName: AllTraffic
    instanceType: ml.c5.large
    initialInstanceCount: 1
---
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Endpoint
metadata:
  name: '$ENDPOINT_NAME'
spec:
  endpointName: '$ENDPOINT_NAME'
  endpointConfigName: '$ENDPOINT_CONFIG_NAME'
' > ./deploy.yaml

Now, the endpoint is ready to be deployed:

kubectl apply -f deploy.yaml

You should see the following output:

model.sagemaker.services.k8s.aws/ack-xgboost-model-7420 created
endpointconfig.sagemaker.services.k8s.aws/ack-xgboost-endpoint-config-7420 created
endpoint.sagemaker.services.k8s.aws/ack-xgboost-endpoint-7420 created

We can observe that the model and endpoint config were created. Deploying the endpoint may take some time:

kubectl describe models.sagemaker
kubectl describe endpointconfigs.sagemaker
kubectl describe endpoints.sagemaker

We can watch this process using the following command:

kubectl get endpoints.sagemaker --watch

After some time, the STATUS changes to InService:

NAME                        STATUS
ack-xgboost-endpoint-7420   Creating         
ack-xgboost-endpoint-7420   InService        

This indicates the deployed endpoint is ready for use.

Verify the inference capabilities of the trained model

We invoke the model endpoint using Python to emulate a typical use case. We reuse the code in SageMaker example notebook.

We first download the test set from Amazon S3. Then we load a single sample from the test set and use it to invoke the endpoint we deployed in the previous section. Download the test file with the following code:

pip install boto3 numpy
aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/test/abalone.test abalone.test
head -1 abalone.test > abalone.single.test

Use the Python interpreter to test inference. The Python interpreter is usually installed as /usr/local/bin/python<version> on those machines where it’s available; putting /usr/local/bin in your Unix/Linux shell’s search path makes it possible to start it by entering the Python command.

Create a file named predict.py and insert the following code block:

printf '
import sys
import math
import json
import boto3
import numpy as np
import os

region = os.environ.get("SERVICE_REGION")
endpoint_name = os.environ.get("ENDPOINT_NAME")

runtime_client = boto3.client("runtime.sagemaker", region_name=region)

file_name = "abalone.single.test"
with open(file_name, "r") as f:
    payload = f.read().strip()

response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="text/x-libsvm", Body=payload
)

result = response["Body"].read().decode("utf-8").split(",")
result = [math.ceil(float(i)) for i in result]
label = payload.strip(" ").split()[0]
print("Label: " + label)
print("Prediction:" + str(result[0]))
' > ./predict.py
python predict.py

Running this sample should give us the following result:

Label: 12
Prediction: 13

The age of the abalone that is provided in the test example is estimated to be 13 by the ML model. The actual age was 12. This suggests that our ML model has been trained and provides reasonable predictions. However, the experienced ML user may realize that we haven’t performed hyperparameter tuning and other methods of increasing accuracy yet, which is outside the scope of this post.

Dynamically scale the endpoint according to the load

SageMaker ACK Operators support custom resource definitions for automatic scaling (using ScalableTarget and ScalingPolicy) for your hosted models. The following resources adjust the number of instances (minimum 1 to maximum 20) provisioned for a model in response to changes in metric SageMakerVariantInvocationsPerInstancetracking, which is the average number of times per minute that each instance for a variant is invoked:

printf '
apiVersion: applicationautoscaling.services.k8s.aws/v1alpha1
kind: ScalableTarget
metadata:
  name: ack-scalable-target-predfined
spec:
  maxCapacity: 20
  minCapacity: 1
  resourceID: endpoint/'$ENDPOINT_NAME'/variant/AllTraffic
  scalableDimension: "sagemaker:variant:DesiredInstanceCount"
  serviceNamespace: sagemaker
---
apiVersion: applicationautoscaling.services.k8s.aws/v1alpha1
kind: ScalingPolicy
metadata:
  name: ack-scaling-policy-predefined
spec:
  policyName: ack-scaling-policy-predefined
  policyType: TargetTrackingScaling
  resourceID: endpoint/'$ENDPOINT_NAME'/variant/AllTraffic
  scalableDimension: "sagemaker:variant:DesiredInstanceCount"
  serviceNamespace: sagemaker
  targetTrackingScalingPolicyConfiguration:
    targetValue: 60
    scaleInCooldown: 700
    scaleOutCooldown: 300
    predefinedMetricSpecification:
        predefinedMetricType: SageMakerVariantInvocationsPerInstance
 ' > ./scale-endpoint.yaml

Apply with the following code:

kubectl apply -f scale-endpoint.yaml

You should see the following output:

scalabletarget.applicationautoscaling.services.k8s.aws/ack-scalable-target-predfined created
scalingpolicy.applicationautoscaling.services.k8s.aws/ack-scaling-policy-predefined created

We can observe that scalingpolicy was created:

kubectl describe scalingpolicy.applicationautoscaling

The output of scalingpolicy looks like the following:

Status:
  Ack Resource Metadata:
    Arn:               arn:aws:autoscaling:us-east-1:123456789012:scalingPolicy:b33d12b8-aa81-4cb8-855e-c7b6dcb9d6e7:resource/SageMaker/endpoint/ack-xgboost-endpoint/variant/AllTraffic:policyName/ack-scaling-policy-predefined
    Owner Account ID:  123456789012
  Alarms:
    Alarm ARN:   arn:aws:cloudwatch:us-east-1:123456789012:alarm:TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmHigh-966b8232-a9b9-467d-99f3-95436f5c0383
    Alarm Name:  TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmHigh-966b8232-a9b9-467d-99f3-95436f5c0383
    Alarm ARN:   arn:aws:cloudwatch:us-east-1:123456789012:alarm:TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmLow-71e39f85-1afb-401d-9703-b788cdc10a93
    Alarm Name:  TargetTracking-endpoint/ack-xgboost-endpoint/variant/AllTraffic-AlarmLow-71e39f85-1afb-401d-9703-b788cdc10a93

Clean up

Run the following commands to delete the resources created in this post:

kubectl delete -f scale-endpoint.yaml
kubectl delete -f deploy.yaml
kubectl delete -f training.yaml

Create a file named uninstall-controller.sh and insert the following code block required for deleting the controller and custom resource definitions:

printf '
#!/usr/bin/env bash

# Uninstall Controller

export HELM_EXPERIMENTAL_OCI=1
export ACK_K8S_NAMESPACE=${ACK_K8S_NAMESPACE:-"ack-system"}

function uninstall_ack_controller() {
   local service="$1"
   local chart_export_path=/tmp/chart
   
   helm uninstall -n $ACK_K8S_NAMESPACE ack-$service-controller
   kubectl delete -f $chart_export_path/ack-$service-controllerchart/crds
}

uninstall_ack_controller "sagemaker"
uninstall_ack_controller "applicationautoscaling"
' > ./uninstall-controller.sh

Run the following commands to uninstall the controller and custom resource definitions, and delete the namespace, IAM roles, and S3 bucket you created:

# uninstall controller and remove CRDs
chmod +x uninstall-controller.sh
./uninstall-controller.sh

# Delete controller namespace
kubectl delete namespace $ACK_K8S_NAMESPACE

# Delete S3 bucket
aws s3 rb s3://$SAGEMAKERageMaker_BUCKET --force

# Delete SageMaker execution role
aws iam detach-role-policy --role-name $SAGEMAKER_EXECUTION_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam detach-role-policy --role-name $SAGEMAKER_EXECUTION_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam delete-role --role-name $SAGEMAKER_EXECUTION_ROLE_NAME

# Delete application autoscaling service linked role
aws iam delete-service-linked-role --role-name AWSServiceRoleForApplicationAutoScaling_SageMakerEndpoint

# Delete IAM role created for IRSA
aws iam detach-role-policy --role-name $OIDC_ROLE_NAME --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam delete-role-policy --role-name $OIDC_ROLE_NAME --policy-name "iam-pass-role-policy"
aws iam delete-role --role-name $OIDC_ROLE_NAME

Conclusion

SageMaker ACK Operators provide engineering teams with a native Kubernetes experience for creating and interacting with the ML jobs on SageMaker, either with the Kubernetes API or with Kubernetes command line utilities such as kubectl. You can build automation, tooling, and custom interfaces for data scientists in Kubernetes by using these controllers—all without building, maintaining, or optimizing ML infrastructure. Data scientists and developers familiar with Kubernetes can compose and interact with fully managed SageMaker training, tuning, and inference jobs, as you would with Kubernetes jobs running locally. Logs from SageMaker jobs stream back to Kubernetes, allowing you to natively view logs for your model training, tuning, and prediction jobs in the command line.

ACK is a community-driven project and will soon include service controllers for other AWS service APIs.

Links

[1] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


About the Authors

Kanwaljit Khurmi is a Senior Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Suraj Kota is a Software Engineer specialized in Machine Learning infrastructure. He builds tools to easily get started and scale machine learning workload on AWS. He worked on the AWS Deep Learning Containers, Deep Learning AMI, SageMaker Operators for Kubernetes, and other open source integrations like Kubeflow.

Archis Joglekar is an AI/ML Partner Solutions Architect in the Emerging Technologies team. He is interested in performant, scalable deep learning and scientific computing using the building blocks at AWS. His past experiences range from computational physics research to machine learning platform development in academia, national labs, and startups. His time away from the computer is spent playing soccer and with friends and family.

Read More

Postprocessing with Amazon Textract: Multi-page table handling

Amazon Textract is a machine learning (ML) service that automatically extracts printed text, handwriting, and other data from scanned documents that goes beyond simple optical character recognition (OCR) to identify and extract data from forms and tables.

Currently, thousands of customers are using Amazon Textract to process different types of documents. Many include tables across one or multiple pages, such as bank statements and financial reports.

Many developers expressed interest in merging Amazon Textract responses where tables exist across multiple pages. This post demonstrates how you can use the amazon-textract-response-parser utility to accomplish this and highlights a few tricks to optimize the process.

Solution overview

When tables span multiple pages, a series of steps and validations are required to determine the linkage across pages correctly.

These include analyzing the table structure similarities across pages (columns, headers, margins) and determining if any additional contents like headers or footers exist that may logically break the tables. These logical steps are separated into two major groups (page context and table structure), and you can adjust and optimize each logical step according to your use case.

This solution runs these tasks in series and only merges the results when all checks are completed and passed. The following diagram shows the solution workflow.

Implement the solution

To get started, you must install the amazon-textract-response-parser, and amazon-textract-helper libraries. The Amazon Textract response parser library enables us to easily parse the Amazon Textract JSON response and provides constructs to work with different parts of the document effectively. This post focuses on the merge/link tables feature. Amazon-textract-helper is another useful library that provides a collection of ready-to-use functions and sample implementations to speed up the evaluation and development of any project using Amazon Textract.

  1. Install the libraries with the following code:
!pip install amazon-textract-response-parser
!pip install amazon-textract-helper
  1. The postprocessing step to identify related tables and merge them is part of the trp.trp2 library, which you must import into your notebook:
import trp.trp2 as t2
from trp.t_pipeline import pipeline_merge_tables
from textractcaller.t_call import call_textract, Textract_Features
from trp.trp2 import TDocument, TDocumentSchema
from trp.t_tables import MergeOptions, HeaderFooterType
  1. Next, call Amazon Textract to process the document:
textract_json = call_textract(input_document=s3_uri_of_documents, features=[Textract_Features.TABLES], boto3_textract_client = textract_client)
  1. Finally, load the response JSON into a document and run the pipeline. The footer and header heights are configurable by the user. There are three default values can be used for HeaderFooterType: None, Narrow, and Normal.
t_document: t2.TDocument = t2.TDocumentSchema().load(textract_json)
t_document = pipeline_merge_tables(t_document, MergeOptions.MERGE, None, HeaderFooterType.NONE)

Pipeline_merge_tables takes a merge option parameter that can be either .MERGE or .LINK.

MergeOptions.MERGE combines the tables and makes them appear as one for postprocessing, with the drawback that the geometry information is no longer in the correct location because you now have cells and tables from subsequent pages moved to the page with the first part of the table.

MergeOptions.LINK maintains the geometric structure and enriches the table information with links between the table elements. A custom['previous_table'] and custom['next_table'] attribute is added to the TABLE blocks in the Amazon Textract JSON schema.

The following image represents a sample PDF file with a table that spans over two pages.

The following shows the Amazon Textract response without table merge postprocessing (left) and the response with table merge postprocessing (right).

Define a custom table merge validation function

The provided postprocessing API works for the majority of use cases; however, based on your specific use case, you can define a custom merge function to improve its accuracy.

This custom function is passed to the CustomTableDetectionFunction parameter of the pipeline_merge_tables function to overwrite the existing logic of identifying the tables to merge. The following steps represent the existing logic.

  1. Validate context between tables. Check if there are any line items between the first and second table except in the footer and header area. If there are any line items, tables are considered separate tables.
  2. Compare the column numbers. If the two tables don’t have the same number of columns, this is an indicator of separate logical tables.
  3. Compare the headers. If the two tables have the exact same columns (same cell number and cell labels), this is a very strong indication of the same logical table.
  4. Compare table dimensions. Verify that the two tables have the same left and right margin. An accuracy percentage parameter can be passed to allow for some degree of error (for example, if the pages are scanned from papers, consequent tables on different pages may have different weights).

If you have a different requirement, you can pass your own custom table detection function to the pipeline_merge_tables API as follows:

def CustomTableDetectionFunction(t_document) -> List[List[str]])
    table_ids_merge_list = []
    ordered_doc = order_blocks_by_geo(t_document)
    trp_doc = Document(TDocumentSchema().dump(ordered_doc))
    for current_page in trp_doc.pages:
        for table in current_page.tables:
        # Provide your custom logic here to determine which tableids should merge to one table
        # if(custom logic)
        #   table_ids_merge_list.append(>tableid1, tableid2, tableid3, ...etc.) 
    return table_ids_merge_list

t_document = pipeline_merge_tables(t_document, MergeOptions.MERGE, CustomTableDetectionFunction, HeaderFooterType.NORMAL)

Our current implementation for the table detection function and pipeline_merge_tables function in our Amazon Textract response parser library is available on GitHub. The customTableDetection function returns a list of lists (of strings), which is required by the merge_table or link_table functions (based on the MergeOptions parameter) called internally by the pipeline_merge_tables API.

Run sample code

The Amazon Textract multi-page tables processing repository provides sample code on how to use the merge tables feature and covers common scenarios that you may encounter in your documents. To try the sample code, you first launch an Amazon SageMaker notebook instance with the code repository, then you can access the notebook to review the code samples.

Launch a SageMaker notebook instance with the code repository

To launch a SageMaker notebook instance, complete the following steps:

  1. Choose the following link to launch an AWS CloudFormation template that deploys a SageMaker notebook instance along with the sample code repository:

  1. Sign in to the AWS Management Console with your AWS Identity and Access Management (IAM) user name and password.

You arrive at the Create Stack page on the Specify Template step.

  1. Choose Next.
  2. For Specify Stack Name, enter a stack name.
  3. Choose Next.
  4. Choose Next
  5. On the review page, acknowledge the IAM resource creation and choose Create stack.

Access the SageMaker notebook and review the code samples

When the stack creation is complete, you can access the notebook and review the code samples.

  1. On the Outputs tab of the stack, choose the link corresponding to the value of the NotebookInstanceName key.
  2. Choose Open Jupyter.
  3. Go to the home page of your Jupyter notebook and browse to the amazon-textract-multipage-tables-processing directory.
  4. Open the Jupyter notebook inside this directory and the sample code provided.

Conclusion

This post demonstrated how to use the Amazon Textract response parser component to identify and merge tables that span multiple pages. You walked through generic checks that you can use to identify a multi-page table, learned how to build your own custom function, and reviewed the two options to merge tables in the Amazon Textract response JSON.

If this post helps you or inspires you to solve a problem, we would love to hear about it! The code for this solution is available on the GitHub repo for you to use and extend. Contributions are always welcome!


About the Authors

 Mehran Najafi, PhD, is a Senior Solutions Architect for AWS focused on AI/ML solutions and architectures at scale.

Keith Mascarenhas is a Solutions Architect and works with our small and medium sized customers in central Canada to help them grow and achieve outcomes faster with AWS. He is also passionate about machine learning and is a member of the Amazon Computer Vision Hero program.

Yuan Jiang is a Sr Solutions Architect with a focus in machine learning. He’s a member of the Amazon Computer Vision Hero program and the Amazon Machine Learning Technical Field Community.

Martin Schade is a Senior ML Product SA with the Amazon Textract team. He has over 20 years of experience with internet-related technologies, engineering, and architecting solutions, and joined AWS in 2014. He has guided some of the largest AWS customers on the most efficient and scalable use of AWS services, and later focused on AI/ML with a focus on computer vision. He is currently obsessed with extracting information from documents.

Read More

Machine learning inference at scale using AWS serverless

With the growing adoption of Machine Learning (ML) across industries, there is an increasing demand for faster and easier ways to run ML inference at scale. ML use cases, such as manufacturing defect detection, demand forecasting, fraud surveillance, and many others, involve tens or thousands of datasets, including images, videos, files, documents, and other artifacts. These inference use cases typically require the workloads to scale to tens of thousands of parallel processing units. The simplicity and automated scaling offered by AWS serverless solutions makes it a great choice for running ML inference at scale. Using serverless, inferences can be run without provisioning or managing servers and while only paying for the time it takes to run. ML practitioners can easily bring their own ML models and inference code to AWS by using containers.

This post shows you how to run and scale ML inference using AWS serverless solutions: AWS Lambda and AWS Fargate.

Solution overview

The following diagram illustrates the solutions architecture for both batch and real-time inference options. The solution is demonstrated using a sample image classification use case. Source code for this sample is available on GitHub.

The diagram illustrates the solutions architecture for batch and real-time inferences. Batch inference uses AWS Fargate and AWS Batch, along with Amazon S3 and Amazon ECR. Real-time inference uses AWS Lambda and Amazon API Gateway.

AWS Fargate: Lets you run batch inference at scale using serverless containers. Fargate task loads the container image with the inference code for image classification.

AWS Batch: Provides job orchestration for batch inference by dynamically provisioning Fargate containers as per job requirements.

AWS Lambda: Lets you run real-time ML inference at scale. The Lambda function loads the inference code for image classification. Lambda function is also used to submit batch inference jobs.

Amazon API Gateway: Provides a REST API endpoint for the inference Lambda function.

Amazon Simple Storage Service (S3): Stores input images and inference results for batch inference.

Amazon Elastic Container Registry (ECR): Stores the container image with inference code for Fargate containers.

Deploying the solution

We have created an AWS Cloud Development Kit (CDK) template to define and configure the resources for the sample solution. CDK lets you provision the infrastructure and build deployment packages for both the Lambda Function and Fargate container. The packages include commonly used ML libraries, such as Apache MXNet and Python, along with their dependencies. The solution is running the inference code using a ResNet-50 model trained on the ImageNet dataset to recognize objects in an image. The model can classify images into 1000 object categories, such as keyboard, pointer, pencil, and many animals. The inference code downloads the input image and performs the prediction with the five classes that the image most relates with the respective probability.

To follow along and run the solution, you need access to:

To deploy the solution, open your terminal window and complete the following steps.

  1. Clone the GitHub repo
    $ git clone https://github.com/aws-samples/aws-serverless-for-machine-learning-inference

  2. Navigate to the project directory and deploy the CDK application.
$ ./install.sh
or
$ ./cloud9_install.sh #If you are using AWS Cloud9

Enter Y to proceed with the deployment.

This performs the following steps to deploy and configure the required resources in your AWS account. It may take around 30 minutes for the initial deployment, as it builds the Docker image and other artifacts. Subsequent deployments typically complete within a few minutes.

  • Creates a CloudFormation stack (“MLServerlessStack”).
  • Creates a container image from the Dockerfile and the inference code for batch inference.
  • Creates an ECR repository and publishes the container image to this repo.
  • Creates a Lambda function with the inference code for real-time inference.
  • Creates a batch job configuration with Fargate compute environment in AWS Batch.
  • Creates an S3 bucket to store inference images and results.
  • Creates a Lambda function to submit batch jobs in response to image uploads to S3 bucket.

Running inference

The sample solution lets you get predictions for either a set of images using batch inference or for a single image at a time using real-time API endpoint. Complete the following steps to run inferences for each scenario.

Batch inference

Get batch predictions by uploading image files to Amazon S3.

  1. Using Amazon S3 console or using AWS CLI, upload one or more image files to the S3 bucket path ml-serverless-bucket-<acct-id><aws-region>/input.
    $ aws s3 cp <path to jpeg files> s3://ml-serverless-bucket-<acct-id>-<aws-region>/input/ --recursive

  2. This will trigger the batch job, which will spin-off Fargate tasks to run the inference. You can monitor the job status in AWS Batch console.
  3. Once the job is complete (this may take a few minutes), inference results can be accessed from the ml-serverless-bucket-<acct-id><aws-region>/output path.

Real-time inference

Get real-time predictions by invoking the REST API endpoint with an image payload.

  1. Navigate to the CloudFormation console and find the API endpoint URL (httpAPIUrl) from the stack output.
  2. Use an API client, like Postman or curl command, to send a POST request to the /predict API endpoint with image file payload.
    $ curl --request POST -H "Content-Type: application/jpeg" --data-binary @<your jpg file name> <your-api-endpoint-url>/predict

  3. Inference results are returned in the API response.

Additional recommendations and tips

Here are some additional recommendations and options to consider for fine-tuning the sample to meet your specific requirements:

  • Scaling – Update AWS Service Quotas in your account and Region as per your scaling and concurrency needs to run the solution at scale. For example, if your use case requires scaling beyond the default Lambda concurrent executions limit, then you must increase this limit to reach the desired concurrency. You also need to size your VPC and subnets with a wide enough IP address range to allow the required concurrency for Fargate tasks.
  • Performance – Perform load tests and fine tune performance across each layer to meet your needs.
  • Use container images with Lambda – This lets you use containers with both AWS Lambda and AWS Fargate, and you can simplify source code management and packaging.
  • Use AWS Lambda for batch inferences – You can use Lambda functions for batch inferences as well if the inference storage and processing times are within Lambda limits.
  • Use Fargate Spot – This lets you run interruption tolerant tasks at a discounted rate compared to the Fargate price, and reduce the cost for compute resources.
  • Use Amazon ECS container instances with Amazon EC2 – For use cases that need a specific type of compute, you can make use of EC2 instances instead of Fargate.

Cleaning up

Navigate to the project directory from the terminal window and run the following command to destroy all resources and avoid incurring future charges.

$ cdk destroy

Conclusion

This post demonstrated how to bring your own ML models and inference code and run them at scale using serverless solutions in AWS. The solution made it possible to deploy your inference code in AWS Fargate and AWS Lambda. Moreover, it also deployed an API endpoint using Amazon API Gateway for real-time inferences and batch job orchestration using AWS Batch for batch inferences. Effectively, this solution lets you focus on building ML models by providing an efficient and cost-effective way to serve predictions at scale.

Try it out today, and we look forward to seeing the exciting machine learning applications that you bring to AWS Serverless!

Additional Reading:


About the Authors

Poornima Chand is a Senior Solutions Architect in the Strategic Accounts Solutions Architecture team at AWS. She works with customers to help solve their unique challenges using AWS technology solutions. She focuses on Serverless technologies and enjoys architecting and building scalable solutions.

Greg Medard is a Solutions Architect with AWS Business Development and Strategic Industries. He helps customers with the architecture, design, and development of cloud-optimized infrastructure solutions. His passion is to influence cultural perceptions by adopting DevOps concepts that withstand organizational challenges along the way. Outside of work, you may find him spending time with his family, playing with a new gadget, or traveling to explore new places and flavors.

Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers using machine learning to solve their business challenges using the AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge, therefore, she has created her own lab with self-driving kit and prototype manufacturing production line, where she spends lot of her free time.

Vasu Sankhavaram is a Senior Manager of Solutions Architecture in Amazon Web Services (AWS). He leads Solutions Architects dedicated to Hitech accounts. Vasu holds an MBA from U.C. Berkeley, and a Bachelor’s degree in Engineering from University of Mysore, India. Vasu and his wife have their hands full with a son who’s a sophomore at Purdue, twin daughters in third grade, and a golden doodle with boundless energy.

Chitresh Saxena is a Senior Technical Account Manager at Amazon Web Services. He has a strong background in ML, Data Analytics and Web technologies. His passion is solving customer problems, building efficient and effective solutions on the cloud with AI, Data Science and Machine Learning.

Read More

Announcing conversational AI partner solutions

We are excited to announce the availability of AWS conversational AI partner solutions, which enables enterprises to implement high-quality, highly effective chatbot, virtual assistant, and Interactive Voice Response (IVR) solutions through the domain expertise of AWS Partners and AWS AI and machine learning (ML) services.

The partners highlighted in this announcement include Cation Consulting, Deloitte, NLX, Quantiphi, ServisBOT, TensorIoT, and XAPP AI. These partners use Amazon Kendra, Amazon Lex, and Amazon Polly, as well as additional AWS AI/ML services, in their solutions.

Overview of conversational AI

The demand for conversational AI (CAI) interfaces continues to grow as users prefer to interact with businesses on digital channels. Organizations of all sizes are developing chatbots, voice assistants, and IVR solutions to increase user satisfaction, reduce operational costs, and streamline business processes. COVID-19 has further accelerated the adoption, due to social distancing rules and shelter-in-place orders.

Conversational AI interfaces, like chatbots, voice assistants, and IVR, are used broadly across a wide variety of industry segments and use cases. CAI solutions go beyond customer service to additional use cases across industries — for example, triaging medical symptoms, booking an appointment, transferring money, or signing up for a new account.

Building a high-quality, highly effective CAI interface can be challenging for some, given the free-form nature of communications—users can say or write whatever they like in many different ways. Implementing a CAI solution involves selecting use cases, defining Intents and sample utterances, designing conversational flows, integrating backend services, and testing, monitoring, and measuring in an iterative approach.

We are moving past the days of simple question and answer services. Chatbots have advanced beyond simple decision trees to incorporate sophisticated natural language understanding (NLU) that not only understands a user’s Intent, but enables the chatbot to respond appropriately in a way that satisfies the user. We are seeing further advancements in CAI to cover true multimodal experiences, wherein users can interact via voice and text simultaneously.

Fortunately, there is help on this journey. The seven AWS Partners for CAI listed in this post have expertise in launching highly effective chatbot and voice experiences, built on top of AWS AI/ML services.

AWS conversational AI partner solutions

With AWS conversational AI partner solutions, enterprises can use the expertise of AWS Partners, using AWS AI/ML services, to build high-quality, highly effective chatbot and voice experiences. This enables you to increase user satisfaction, reduce operational costs, and achieve business goals—all while speeding up the time to market.

The following highlights our initial AWS Partners for conversational AI.

Cation Consulting, the maker of the Parly.ai platform, use natural language conversational AI to build multilingual, multi-channel solutions. Cation enables high-value customer interactions, at a lower cost, through enterprise chatbots and live chat with AI-powered agent-assist capabilities. Cation Consulting helped Ryanair build a chatbot that improves its customer support experience, helping customers find answers to their questions quickly and easily. They recently helped 123.ie implement a highly advanced chatbot for onboarding new accounts that incorporates image recognition and document processing to speed up the process by analyzing a customer’s driver’s license and previous policy documentation.

Deloitte, an AWS Premier Consulting Partner, is one of the leading service providers in the design, delivery, implementation, and scalability of high ROI conversational experiences across industries. Deloitte’s global CAI experts deliver solutions that are highly personalized, context-aware, and designed to serve users any time, any place, and on any device. Deloitte has implemented CAI solutions for Fortune 500 enterprises across the financial services, hospitality, telecommunications, pharmaceutical, healthcare, and utility spaces.

Conversations by NLX™ enables companies to transform customer contact into personalized customer self-service. The platform enables non-technical users to build and manage chat, voice, and multimodal conversational experiences, helping brands track and elevate customer self-service into a strategic asset. NLX customers include a global drink manufacturer, a leading international airline, and more.

Quantiphi uses in-house accelerators built on top of AWS AI/ML services to implement conversational AI solutions. Quantiphi utilizes its expertise in CAI to design and implement complex conversation flows, derive insights, and develop dashboards showcasing impactful key performance indicators. Quantiphi has implemented CAI solutions for a major healthcare provider, a national utilities firm, and more.

ServisBOT’s conversational AI platform enables businesses to build and manage self-service chatbots and voice assistants, faster and easier. ServisBOT provides tools for building and optimizing advanced solutions, including covering multi-bot environments, security, backend integrations, and analytics. The platform also offers low-code tooling, blueprints, and reusable components for business users. ServisBOT’s customers include a global financial services corporation, a global insurance firm, a government agency, and more.

TensorIoT delivers industry-leading conversational AI solutions, including Alexa apps, Amazon Lex chatbots, and voice applications on the edge and in the cloud. Whether users interact via phone, web, or SMS, TensorIoT has expertise developing end-to-end solutions. TensorIoT helped Citibot develop its CAI platform for government use cases. Additional customers include a global credit reporting company, a national music retailer, and more.

XAPP AI’s Optimal Conversation™ Studio empowers enterprises to create intelligent virtual assistants for voice and chat channels. OC Studio provides model, dialog and content management, regression testing, and human-in-the-loop workflows to enable continuous CAI transformation, and thereby a more efficient and effective conversational experience. XAPP AI’s Optimal Conversation™ Studio platform powers over 1,200 conversational AI solutions for over 100 customers across consumer packaged goods, automotive, insurance, media, retail, education, nonprofit, and government spaces.

Getting Started

Our AWS Partners for conversational AI help make AWS one of the best places for all your CAI workloads. Learn more about conversational AI use cases at AWS or explore our AWS conversational AI partners for more information. Contact your AWS account manager for additional information or questions, including how to become an AWS Partner.


About the Authors

Arte Merritt leads partnerships for Contact Center Intelligence and Conversational AI. He is a frequent author and speaker in the conversational AI space. He was the co-founder and CEO of the leading analytics platform for conversational interfaces, leading the company to 20,000 customers, 90B messages, and multiple acquisition offers. Previously he founded Motally, a mobile analytics platform he sold to Nokia. Arte has more than 20 years experience in big data analytics. Arte is an MIT alum.

Read More

Design a compelling record filtering method with Amazon SageMaker Model Monitor

As artificial intelligence (AI) and machine learning (ML) technologies continue to proliferate, using ML models plays a crucial role in converting the insights from data into actual business impacts. Operational ML means streamlining every step of the ML lifecycle and deploying the best models within the existing production system. And within that production system, the models may interact with various processes, such as testing, performance tuning of IT resources, and monitoring strategy and operations.

One common pitfall is a lack of model performance monitoring and proper model retraining and updating, which could adversely affect business. Nearly continuous model monitoring can provide information on how the model is performing in production. The monitoring outputs are used to identify the problems proactively and take corrective actions, such as model retraining and updating, to help stabilize the model in production. However, in a real-world production setting, multiple personas may interact with the model, including real users, engineers who are troubleshooting production issues, or bots conducting performance tests. When inference requests are made for testing purposes at the production endpoint, it may cause false positive detection of violations for the model monitor. To avoid this, we must filter out the test records from the calculation of model monitoring metrics.

Amazon SageMaker is a fully managed service that enables developers and data scientists to build, train, and deploy ML models quickly and easily at any scale. After you train an ML model, you can deploy it on SageMaker endpoints that are fully managed and serve inferences in real time with low latency. After you deploy your model, you can use Amazon SageMaker Model Monitor to monitor your ML model’s quality continuously in real time. You can also configure alerts to notify and initiate actions if any drift in model performance is observed. Early detection of these deviations enables you to take corrective actions, such as collecting new training data, retraining models, and auditing upstream systems without manually monitoring models or building additional tooling.

In this post, we present how to build a record filtering method based on sets of business criteria as part of the preprocessing step in Model Monitor. The goal is to ensure that only the actual production records are sent to Model Monitor for analysis, reflecting the actual usage of the production endpoint.

Solution overview

The following diagram illustrates the high-level workflow of record filtering using a preprocessor script with Model Monitor.

The workflow includes the following steps:

  1. The Model Artifact Amazon Simple Storage Service (Amazon S3) bucket contains model.tar.gz, the XGBoost churn prediction model pretrained on the publicly available dataset mentioned in Discovering Knowledge in Data by Daniel T. Laros. For more information about how this model artifact was trained offline, see the Customer Churn Prediction with XGBoost notebook example on GitHub.
  2. The model is deployed to an inference endpoint with data capture enabled.
  3. Different personas send model prediction request traffic to the endpoint.
  4. The Data Capture bucket stores capture data from requests and responses.
  5. The Validation Dataset bucket contains the validation dataset required to create a baseline from a validation dataset in Model Monitor.
  6. The Baselining bucket stores the output files for dataset statistics and constraints from Model Monitor’s baselining job.
  7. The Code bucket contains a custom preprocessor script for Model Monitor.
  8. Model Monitor data quality initializes a monitoring job.
  9. The Results bucket contains outputs of the monitoring job, including statistics, constraints, and a violations report.

Prerequisites

To implement this solution, you must have the following prerequisites:

Set up the environment

To set up your environment, complete the following steps:

  1. Launch Studio from the AWS Management Console.

If you haven’t created Studio in your account yet, you can manually create one by following Onboard to Amazon SageMaker Studio Using Quick Start. Alternatively, you can use an AWS CloudFormation template (see Creating Amazon SageMaker Studio domains and user profiles using AWS CloudFormation), which automates the creation of Studio in your account.

  1. On the File menu, choose Terminal to launch a new terminal within Studio.
  2. Clone the GitHub repo in the terminal:
cd ~ && git clone https://github.com/aws-samples/amazon-sagemaker-data-quality-monitor-custom-preprocessing.git
  1. Navigate to the directory amazon-sagemaker-data-quality-monitor-custom-preprocessing in Studio.
  2. Open Data_Quality_Custom_Preprocess_Churn.ipynb.
  3. Select Data Science Kernel and ml.t3.medium as an instance type to host the notebook to get started.

The rest of this post dives into a notebook with the various steps involved in designing and testing filtering records using a preprocessor with Model Monitor. We use a pretrained and deployed XGBoost churn prediction model. For detailed notebooks on other Model Monitor capabilities, see the model quality explainability notebook examples on GitHub. Beyond the steps discussed in this post, there are other steps necessary to import libraries and set up AWS Identity and Access Management (IAM) permissions. You can start with the README, which has a more detailed explanation of each step. Alternatively, you can go directly to the code and walk through with the notebook that you cloned in Studio.

Deploy the pretrained XGBoost model with script mode

First, we upload the pretrained model artifacts to Amazon S3 for deployment:

model_path = 'model'
model_filename = 'model.tar.gz'
model_upload_uri = f's3://{bucket}/{prefix}/{model_path}'
local_model_path = f"./model/{model_filename}"
print(f"model s3 location: {model_upload_uri} n")
 
if is_upload_model:
    S3Uploader.upload(
        local_path=local_model_path,
        desired_s3_uri=model_upload_uri
    )
else: print("skip")

Because the model was trained offline using XGBoost, we use XGBoostModel from the SageMaker SDK to deploy the model. We provide the inference entry point in the source directory because we have a custom input parser for JSON requests. We also need to ensure that Flask Response is returned to match both input and output content types exactly. It is a necessary step for Model Monitor to work for the image running Gunicorn/Flask. The content type of output data captured by Model Monitor, which only works with CSV or JSON, is Base64 by default unless Response() explicitly converts it to a specific type. The following are the custom input_fn and output_fn. Currently, the implementation is for a single JSON record, but you can easily extend it to multiple records for batch processing.

def input_fn(request_body, request_content_type):
    if request_content_type == "text/libsvm":
        return xgb_encoders.libsvm_to_dmatrix(request_body)
    elif request_content_type == "text/csv":
        return xgb_encoders.csv_to_dmatrix(request_body.rstrip("n"))
    elif request_content_type == "application/json":
        request = json.loads(request_body)
        feature = ",".join(request.values())
        return xgb_encoders.csv_to_dmatrix(feature.rstrip("n"))
    else:
        raise ValueError("Content type {} is not supported.".format(request_content_type))
 
def output_fn(predictions, content_type):
    if content_type == "text/csv":
        result = ",".join(str(x) for x in predictions)
        return Response(result, mimetype=content_type)
    elif content_type == "application/json":
        result = json.dumps(predictions.tolist())
        return Response(result, mimetype=content_type)
    else:
        raise ValueError("Content type {} is not supported.".format(content_type))

To enable data capture for monitoring the model data quality, you can specify the options such as enable_capture, sampling_percentage, and destination_s3_uri in the DataCaptureConfig object when deploying to an endpoint. For example, unless you expect your endpoint to have high traffic or require a down-sample, you can capture all incoming records by providing 100% in sampling percentage. More information on DataCaptureConfig can be found in the Model Monitor documentation. In the following code, we specify the SageMaker XGBoost model framework version and provide a path for an entry inference script that we reviewed previously:

if is_create_new_ep:
    ## Configure the Data Capture
    data_capture_config = DataCaptureConfig(
        enable_capture=True, 
        sampling_percentage=100, 
        destination_s3_uri=s3_capture_upload_path
    )
    current_endpoint_name = f'{ep_prefix}-{datetime.now():%Y-%m-%d-%H-%M}'
    print(f"Create a Endpoint: {current_endpoint_name}")
 
    xgb_inference_model = XGBoostModel(
        model_data=f'{model_upload_uri}/{model_filename}',
        role=role,
        entry_point="./src/inference.py",
        framework_version="1.2-1")
    
    predictor = xgb_inference_model.deploy(
        initial_instance_count=1,
        instance_type="ml.m5.2xlarge",
        endpoint_name=current_endpoint_name,
        data_capture_config=data_capture_config,
        tags = tags,
        wait=True)
elif not(current_endpoint_name):
    current_endpoint_name = all_demo_eps[0]
    print(f"Use existing endpoint: {current_endpoint_name}")  
else: print(f"Use selected endpoint: {current_endpoint_name}")

After we confirm that the model has been deployed, we can move on to the next step to review the implementation of the filtering mechanism in the preprocessing script for Model Monitor.

Implement a filtering mechanism in the preprocessor script

As previously discussed, we want to exclude test inference records from downstream monitoring reports. You can implement a rule-based filtering mechanism by parsing metadata provided in CustomAttributes in a request header. The following code illustrates how to send custom attributes as key-value pairs using the Boto3 SageMaker Runtime client:

response = runtime_client.invoke_endpoint(
    EndpointName=endpoint_name, 
    ContentType='application/json', 
    Body=json.dumps(payload),
    CustomAttributes=json.dumps({
    "testIndicator": testIndicator,
    "applicationName":"DEMO",
    "transactionId": transactionId}))

We recommend using CustomAttributes to send the required metadata for simplicity. You can optionally choose to include metadata as part of inference records as long as your entry point inference reflects the change and extraction of input features in input records doesn’t break. Next, we review a provided preprocessor script that contains a filtering mechanism.

As illustrated in the following code, we extend the built-in mechanisms of Model Monitor by providing a custom preprocessor function. First, we extract testIndicator information from custom attributes and use this information to set the is_test variable to either True, when it’s a test record, or False otherwise. If we want to skip test records without breaking a monitor job, we can return [] to indicate that the object is an empty set of rows. Note that returning {} results in an error because it’s considered to be an object having an empty row, which SageMaker doesn’t expect.

Moreover, we convert the probability of model output into an integer type for non-test records. This step ensures that the data type is consistent with that of the ground truth label in the validation dataset. We demonstrate in following sections how this step can help you avoid false positive violations in monitoring. Model quality monitoring has its native way of handling the conversion, but this workaround is necessary for data quality monitoring.

Next, we insert the output as the first item into input features, ensuring that the columns’ number and order match exactly with the validation dataset. Although monitoring model output may seem unnecessary for data quality monitoring, we recommend not skipping this step because other types of monitoring may depend on that information to be provided. Finally, the function returns a key-value pair with zero-padded index numbers and corresponding output and input features. This is done to avoid any misalignment of input features caused by sorting of column names by Spark processing. Note that 20 is a magic number because 10**20 is large enough to cover numbers of feature columns in most cases.

Finally, SageMaker applies preprocessing for each row and aggregates the results on your behalf. If you have multiple inference records in a single inference request like mini-batch, you need to consider it in your code beyond the sample code we provide. At the time of writing this post, the preprocessing step in Model Monitor doesn’t publish any logs to Amazon CloudWatch, although this may change in the future. If you need to debug your custom preprocessing script, you may want to write and save your logs inside the container under the directory /opt/ml/processing/output/ so that you can access it later in your S3 bucket.

def preprocess_handler(inference_record):
    input_enc_type = inference_record.endpoint_input.encoding
    input_data = inference_record.endpoint_input.data.rstrip("n")
    output_data = get_class_val(inference_record.endpoint_output.data.rstrip("n"))
    eventmedatadata = inference_record.event_metadata
    custom_attribute = json.loads(eventmedatadata.custom_attribute[0]) if eventmedatadata.custom_attribute is not None else None
    is_test = eval_test_indicator(custom_attribute) if custom_attribute is not None else True
    
    if is_test:
        return []
    elif input_enc_type == "CSV":
        outputs = output_data+','+input_data
        return {str(i).zfill(20) : d for i, d in enumerate(outputs.split(","))}
    elif input_enc_type == "JSON":  
        outputs = {**{LABEL: output_data}, **json.loads(input_data)}    
        write_to_file(str(outputs), "log")
        return {str(i).zfill(20) : outputs[d] for i, d in enumerate(outputs)}
    else:
        raise ValueError(f"encoding type {input_enc_type} is not supported") 

Now that we have reviewed how the preprocessing mechanism is implemented, we upload the script to the Amazon S3 location using the following code:

preprocessor_filename = 'preprocessor.py'
local_path_preprocessor = f"src/{preprocessor_filename}"
s3_record_preprocessor_uri = f's3://{bucket}/{prefix}/code'
 
if is_upload_preprocess_script:
    S3Uploader.upload(
        local_path=local_path_preprocessor,
        desired_s3_uri=s3_record_preprocessor_uri)
else: print("skip")

We can now move on to the next step: creating a monitor schedule.

Create a Model Monitor schedule (data quality only)

Continuous model monitoring involves scheduled analysis of incoming inference records and the creation of metrics relative to baseline metrics. The SageMaker SDK simplifies generating a set of constraints and summary statistics that describes the constraints as a reference. We upload the validation dataset with a column header and ground truth label to Amazon S3, which was used for offline training as a suitable baseline dataset. Decisions around whether to include a ground truth label in the baseline dataset depend on your use case and preference, because a data quality monitor certainly works without ground truth label data. Note that if you exclude ground truth here, you need to exclude inferences from monitoring similarly.

validation_filename = 'validation-dataset-with-header.csv'
local_validation_data_path = f"data/{validation_filename}"
s3_validation_data_uri = f's3://{bucket}/{prefix}/baselining'
 
if is_upload_validation_data:
    S3Uploader.upload(
        local_path=local_validation_data_path,
        desired_s3_uri=s3_validation_data_uri
    )
else: print("skip")

After confirming that the baseline dataset is uploaded to Amazon S3, we create baseline constraints, statistics, and a Model Monitor schedule for the deployed endpoint in one step using a custom wrapper class, DemoDataQualityModelMonitor. Under the hood, the DefaultModelMonitor.suggest_baseline method initiates a processing job with a managed Model Monitor container with Apache Spark and the AWS Deequ library to generate the constraints and statistics as a baseline. After the baselining job is complete, the DefaultModelMonitor.create_monitoring_schedule method creates a monitor schedule.

demo_mon = DemoDataQualityModelMonitor(
    endpoint_name=current_endpoint_name, 
    bucket=bucket,
    projectfolder_prefix=prefix,
    training_dataset_path=f'{s3_validation_data_uri}/{validation_filename}',
    record_preprocessor_script=f'{s3_record_preprocessor_uri}/{preprocessor_filename}',
    post_analytics_processor_script=None,
    kms_key=None,
    subnets=None,
    security_group_ids=None,
    role=role,
    tags=tags)
 
my_monitor = demo_mon.create_data_quality_monitor()

After monitor schedule creation is complete, we can move on to the final step, which is functional testing of the implemented filter with artificial payloads.

Test scenarios

We can test the following two scenarios to confirm that the filtering is working as expected. The first scheduled monitor run isn’t initialized until at least an hour after creating the schedule, so you can either wait or manually start a monitoring job using preprocessing. We use the latter approach for convenience. Fortunately, a utility tool already exists for this purpose and is available in this GitHub repo. We also provided a wrapper method, ArtificialTraffic.generate_artificial_traffic. You can pass column names and predefined static methods to populate bogus inputs and monotonically increase transactionId each time the endpoint is invoked.

First scenario

Our first test scenario includes the following steps:

  1. Send a record that we know won’t create any violations. To do this, you can use a method, generate_artificial_traffic, and set the config variable to empty list. Also, set the testIndicator in custom attributes to ’false' to indicate that it’s not a test record. This is illustrated in the following code:
artificial_traffic = ArtificialTraffic( 
endpointName = current_endpoint_name 
)
# normal payload -it should not cause any violations
artificial_traffic.generate_artificial_traffic(
    applicationName = "DEMO", 
    testIndicator = "false",
    payload=payload, 
    size=1,
    config=[])
  1. Send another record that creates a violation. This time, we pass a set of dictionaries in the config variable to create bogus input features. We also set testIndicator to ’true' to skip this record for the analysis. The following code is provided:
sample_config= {'config': [
{'source': 'Day Calls', 
'function_name': 'random_gaussian', 
'params': [100, 100]}, 
{'source': 'Day Mins', 
'function_name': 'random_gaussian', 
'params': [100, 100]}, 
{'source': 'Account Length', 
'function_name': 'random_int', 
'params': [0, 1000]}, 
{'source': 'VMail Message', 
'function_name': 'random_int', 
'params': [0, 10000]}, 
{'source': 'State_AK', 
'function_name': 'random_bit', 
'params': []}]}
 
## this would cause violations but testIndicaor is set to true so analysis will be skipped and hence no violations
artificial_traffic.generate_artificial_traffic(
    applicationName="DEMO", 
    testIndicator="true",
    payload=payload, 
    size=1,
    config=sample_config['config'])
  1. Manually start a monitor job using the run_model_monitor_job_processor method from the imported utility class and provide parameters such as Amazon S3 locations for baseline files, data capture, and a preprocessor script:
run_model_monitor_job_processor(
    region,
    'ml.m5.xlarge',
    role,
    data_capture_path_scenario_1,
    s3_statistics_uri,
    s3_constraints_uri,
    s3_reports_path+'/scenario_1',
    preprocessor_path=s3_record_preprocessor_uri)
  1. In the Model Monitor outputs, confirm that constraint_violations.json shows violations: [] 0 items and “dataset: item_count:” in statistics.json shows 1, instead of 2.

This confirms that Model Monitor has analyzed only the non-test record.

Second scenario

For our second test, complete the following steps:

  1. Send N records that we know that creates violations, such as data_type_check and baseline_drift_check. Set the testIndicator in custom attributes to “false”. The following code illustrates this:
    artificial_traffic.generate_artificial_traffic(
        applicationName="DEMO", 
        testIndicator="false",
        payload=payload, 
        size=1000,
        config=sample_config['config'])

  2. In the Model Monitor outputs, confirm that constraint_violations.json shows more than one violation item and “dataset: item_count:” in statistics.json shows greater than 1000. An extra item is a carry-over from the first scenario testing.

This confirms that sending test records as inference records creates false positive violations if testIndicator isn’t set correctly.

Clean up

We can delete the Model Monitor schedule and endpoint we created earlier. You could wait until the first monitor schedule starts; the result should be similar to what we confirmed from testing. You could also experiment with other testing scenarios. When you’re done, run the following code to delete the monitoring schedule and endpoint:

my_monitor.delete_monitoring_schedule()
sm.delete_endpoint(EndpointName=current_endpoint_name)

Don’t forget to shut down resources by stopping running instances and apps to avoid incurring charges from SageMaker.

Conclusion

Model Monitor is a powerful tool that lets organizations quickly adopt continuous model monitoring and monitoring strategy for ML. This post discusses how you can use a preprocessing mechanism to design a filter for inference records based on sets of business criteria to ensure that your testing infrastructure doesn’t pollute production data. The notebook included in this post provides an example of a custom preprocessor script that you can extend for different use cases quickly.

To get started with Amazon Sagemaker Model Monitor, check out the following resources:


About the Authors

Kenny Sato is a Data and Machine Learning Engineer at AWS Professional Services, guiding customers on architecting and implementing machine learning solutions. He received his master’s in Computer Engineering from Virginia Tech. In his spare time, you can find him in his backyard, or out somewhere playing with his lovely daughters.

Hemanth Boinpally is a Machine Learning Engineer at AWS Professional Services, guiding customers on building and architecting AI/ML solutions. He received his bachelor’s and master’s in Computer Science. In his spare time, you can find him listening to podcasts or playing sports.

David Nigenda is a Senior Software Development Engineer on the Amazon SageMaker team, currently working on improving production machine learning workflows, as well as launching new inference features. In his spare time, he tries to keep up with his kids.

Read More

Automatically detect sports highlights in video with Amazon SageMaker

Extracting highlights from a video is a time-consuming and complex process. In this post, we provide a new take on instant replay for sporting events using a machine learning (ML) solution for automatically creating video highlights from original video content. Video highlights are then available for download so that users can continue to view them via a web app.

We use Amazon SageMaker to analyze a full-length sports video (in our case, a soccer match) and tag segments of the original video that are highlights (penalty kicks). We also show how to apply our end-to-end architecture to not only other sports, but other types of videos, given the availability of appropriate training data.

Architecture overview

The following diagram depicts our solution architecture.

Orchestration overview

We use AWS Lambda functions as part of the following AWS Step Functions workflow to orchestrate a series of AWS Lambda functions for each step of the process.

The first step of the workflow is to start a MediaConvert job that breaks down the video into individual frames. Once the MediaConvert job completes, a Lambda Function converts each frame to a feature vector. The Lambda function generates feature vectors by passing individual images through a pretrained model (Inception V3). These feature vectors are then sent as topics via Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose, and are finally stored in Amazon S3. Next step of the workflow is to invoke a machine learning model to infer if a video segment is interesting enough to pick up based on the sequence of feature vectors. The model determines what actions defined in the UCF101 labels are seen in the video. Here, AWS Fargate acts as a driver that loops through all sequences of feature vectors, prepares them for inference, performs inference using a SageMaker endpoint, and then collates results in an Amazon DynamoDB table. After the Fargate task completes, a message is placed in an Amazon SQS queue. A Lambda function periodically polls this Amazon SQS queue. When a completion message is detected, the Lambda function triggers a MediaConvert job to prepare highlight segments based on the results of machine learning inference. Finally, an email containing links to highlight clips is sent to the email address specified by the user.

Methodology

We use deep learning techniques to identify an activity in a given video. We use a deep Convolutional Neural Network (CNN) based on a pretrained Inception V3 model—to generate features from images extracted from video, and use a LSTM (Long Short-Term memory) network for predicting actions from sequences of features. Both CNN and LSTM are types of neural networks used in ML-based computer vision solutions. Let’s briefly discuss neural networks and related terms before we jump into 2D-CNN and LSTM.

Neural networks

Neural networks are computer systems vaguely inspired by biological neural networks that constitute animal brains. Just like how the basic unit of the brain is the neuron, the building block of an artificial neural network is a perceptron. Perceptrons do very simple processing. Perceptrons are connected to a large meshed network, which forms a neural network. The neural networks are organized into layers and connections between them are weighted. A neural network isn’t an algorithm, it’s a framework that multiple different ML algorithms can use. We describe different layers in a CNN later in this post when we build a model for extracting features from images extracted from videos.

A neural network is a supervised learning technique in ML. This means that model get better and better as it sees more similar objects, so more training samples results in a better accuracy.

Let’s break down the terms in deep CNNs and understand why this technique is effective in an image recognition. Together with LSTM, we use this technique later for activity identification in a given video.

Deep Convolutional Neural Networks

Deep Convolutional Neural Networks such as Inception V3 and YOLOV5 have proven to be a very effective technique for image recognition and other downstream fine-tuning tasks. More recent vision-transformer based models are also currently being used for state-of-the-art image classification, object detection and segmentation masks. Image recognition has many applications. Something that started as a technique to improve the accuracy of human written digits has evolved to solve more complex problems such as identifying and labeling specific objects and backgrounds in an image.

Although deep CNNs have made the problem of image classification and identification simple and improved the accuracy of results significantly, the implementation of an end-to-end solution from scratch is not a simple task. We recommend using services such as Amazon Rekognition, which provides a state-of-the-art API-based solution for image classification and image recognition solutions. If a custom model is required for solving computer vision problem for either image or video, SageMaker provides a framework for training and inference. SageMaker provides support for multiple ML frameworks using BYOC (bring your own container). We use BYOC in SageMaker for Keras to develop a model for activity recognition and deploy the model for inference.

The convolution technique makes the output of neural networks more robust and accurate because instead of processing every image as a single tile of pixels, it breaks an image into multiple tiles using a sliding window of fixed size. Each tile activates the next layer separately, and all tiles of an image are aggregated in successive layers to generate an output. For example, this allows the digit 8 in the left corner of an image to be identified as the same digit 8 in the right corner of an image. This is a called translation invariance.

LSTM

LSTM networks are types of Recurrent Neural Networks (RNNs), which contain special cells or neurons that allow information to persist due the existence of loops or special memory units. LSTMs in particular are useful in learning tasks involving time sequences of data (like our use case of video classification, which is a time sequence of static frames), especially when there is a need to remember information for long periods of time.

Challenges with video processing

It’s important to keep in mind that videos are like a flip book. The static image on each page when flipped generates the perception of motion. The faster you flip, the better the quality of motion perception you get.

Images are stored as a stream of pixels in a 2D spatial arrangement. This is how a computer program reads images. As an extension of images, videos have an extra dimension of time. Videos are a time series of static images. This makes videos a 3D spatial and temporal arrangement of pixels.

The extra dimension requires more compute and memory to develop an ML model. A lot of preprocessing is required before we can feed video input into CNNs and LSTM.

Apart from increased complexity in preprocessing and processing, there is also the lack of open datasets available for research on video data.

In this post, we use samples provided in the UCF101 dataset for building a model and deploying an endpoint for inference.

Reading a video and extracting frames

Assume that the video source that we’re analyzing in order to extract highlights is in Amazon Simple Storage Service (Amazon S3). We use AWS Elemental MediaConvert to split the video into individual frames, and this MediaConvert job is triggered from the following Lambda function:

1.	import json  
2.	import boto3  
3.	  
4.	s3_location = 's3://<ARTIFACT-BUCKET>/BYUfootballmatch.mp4'  
5.	  
6.	  
7.	def lambda_handler(event, context):  
8.	    with open('mediaconvert.json') as f:  
9.	        data = json.load(f)  
10.	      
11.	    client = boto3.client('mediaconvert')  
12.	    endpoint = client.describe_endpoints()['Endpoints'][0]['Url']  
13.	      
14.	    myclient = boto3.client('mediaconvert', endpoint_url=endpoint)  
15.	  
16.	    
17.	  
18.	    data['Settings']['Inputs'][0]['FileInput'] = s3_location  
19.	      
20.	    response = myclient.create_job(  
21.	    Queue=data['Queue'],  
22.	    Role=data['Role'],  
23.	    Settings=data['Settings'])  
24.	      

Line 22 uses the AWS SDK for Python (Boto3) to initiate the MediaConvert client using the following JSON template. You can specify codec settings, width, height, and other parameters specific to your video format.

1.	{  
2.	  "Queue": "arn:aws:mediaconvert:<REGION>:<AWS ACCOUNT NUMBER>:queues/Default",  
3.	  "UserMetadata": {},  
4.	  "Role": "arn:aws:iam::<AWS ACCOUNT ID>:role/MediaConvertRole",  
5.	  "Settings": {  
6.	    "OutputGroups": [  
7.	      {  
8.	        "CustomName": "MP4",  
9.	        "Name": "File Group",  
10.	        "Outputs": [  
11.	          {  
12.	            "ContainerSettings": {  
13.	              "Container": "MP4",  
14.	              "Mp4Settings": {  
15.	                "CslgAtom": "INCLUDE",  
16.	                "FreeSpaceBox": "EXCLUDE",  
17.	                "MoovPlacement": "PROGRESSIVE_DOWNLOAD"  
18.	              }  
19.	            },  
20.	            "VideoDescription": {  
21.	              "Width": 1280,  
22.	              "ScalingBehavior": "DEFAULT",  
23.	              "Height": 720,  
24.	              "TimecodeInsertion": "DISABLED",  
25.	              "AntiAlias": "ENABLED",  
26.	              "Sharpness": 50,  
27.	              "CodecSettings": {  
28.	                "Codec": "H_264",  
29.	                "H264Settings": {  
30.	                  "InterlaceMode": "PROGRESSIVE",  
31.	                  "NumberReferenceFrames": 3,  
32.	                  "Syntax": "DEFAULT",  
33.	                  "Softness": 0,  
34.	                  "GopClosedCadence": 1,  
35.	                  "GopSize": 90,  
36.	                  "Slices": 1,  
37.	                  "GopBReference": "DISABLED",  
38.	                  "SlowPal": "DISABLED",  
39.	                  "SpatialAdaptiveQuantization": "ENABLED",  
40.	                  "TemporalAdaptiveQuantization": "ENABLED",  
41.	                  "FlickerAdaptiveQuantization": "DISABLED",  
42.	                  "EntropyEncoding": "CABAC",  
43.	                  "Bitrate": 3000000,  
44.	                  "FramerateControl": "INITIALIZE_FROM_SOURCE",  
45.	                  "RateControlMode": "CBR",  
46.	                  "CodecProfile": "MAIN",  
47.	                  "Telecine": "NONE",  
48.	                  "MinIInterval": 0,  
49.	                  "AdaptiveQuantization": "HIGH",  
50.	                  "CodecLevel": "AUTO",  
51.	                  "FieldEncoding": "PAFF",  
52.	                  "SceneChangeDetect": "ENABLED",  
53.	                  "QualityTuningLevel": "SINGLE_PASS",  
54.	                  "FramerateConversionAlgorithm": "DUPLICATE_DROP",  
55.	                  "UnregisteredSeiTimecode": "DISABLED",  
56.	                  "GopSizeUnits": "FRAMES",  
57.	                  "ParControl": "INITIALIZE_FROM_SOURCE",  
58.	                  "NumberBFramesBetweenReferenceFrames": 2,  
59.	                  "RepeatPps": "DISABLED"  
60.	                }  
61.	              },  
62.	              "AfdSignaling": "NONE",  
63.	              "DropFrameTimecode": "ENABLED",  
64.	              "RespondToAfd": "NONE",  
65.	              "ColorMetadata": "INSERT"  
66.	            },  
67.	            "AudioDescriptions": [  
68.	              {  
69.	                "AudioTypeControl": "FOLLOW_INPUT",  
70.	                "CodecSettings": {  
71.	                  "Codec": "AAC",  
72.	                  "AacSettings": {  
73.	                    "AudioDescriptionBroadcasterMix": "NORMAL",  
74.	                    "Bitrate": 96000,  
75.	                    "RateControlMode": "CBR",  
76.	                    "CodecProfile": "LC",  
77.	                    "CodingMode": "CODING_MODE_2_0",  
78.	                    "RawFormat": "NONE",  
79.	                    "SampleRate": 48000,  
80.	                    "Specification": "MPEG4"  
81.	                  }  
82.	                },  
83.	                "LanguageCodeControl": "FOLLOW_INPUT"  
84.	              }  
85.	            ]  
86.	          }  
87.	        ],  
88.	        "OutputGroupSettings": {  
89.	          "Type": "FILE_GROUP_SETTINGS",  
90.	          "FileGroupSettings": {  
91.	            "Destination": "s3://<ARTIFACT-BUCKET>/MP4/"  
92.	          }  
93.	        }  
94.	      },  
95.	      {  
96.	        "CustomName": "Thumbnails",  
97.	        "Name": "File Group",  
98.	        "Outputs": [  
99.	          {  
100.	            "ContainerSettings": {  
101.	              "Container": "RAW"  
102.	            },  
103.	            "VideoDescription": {  
104.	              "Width": 768,  
105.	              "ScalingBehavior": "DEFAULT",  
106.	              "Height": 576,  
107.	              "TimecodeInsertion": "DISABLED",  
108.	              "AntiAlias": "ENABLED",  
109.	              "Sharpness": 50,  
110.	              "CodecSettings": {  
111.	                "Codec": "FRAME_CAPTURE",  
112.	                "FrameCaptureSettings": {  
113.	                  "FramerateNumerator": 20,  
114.	                  "FramerateDenominator": 1,  
115.	                  "MaxCaptures": 10000000,  
116.	                  "Quality": 100  
117.	                }  
118.	              },  
119.	              "AfdSignaling": "NONE",  
120.	              "DropFrameTimecode": "ENABLED",  
121.	              "RespondToAfd": "NONE",  
122.	              "ColorMetadata": "INSERT"  
123.	            }  
124.	          }  
125.	        ],  
126.	        "OutputGroupSettings": {  
127.	          "Type": "FILE_GROUP_SETTINGS",  
128.	          "FileGroupSettings": {  
129.	            "Destination": "s3://<ARTIFACT-BUCKET>/Thumbnails/"  
130.	          }  
131.	        }  
132.	      }  
133.	    ],  
134.	    "AdAvailOffset": 0,  
135.	    "Inputs": [  
136.	      {  
137.	        "AudioSelectors": {  
138.	          "Audio Selector 1": {  
139.	            "Offset": 0,  
140.	            "DefaultSelection": "DEFAULT",  
141.	            "ProgramSelection": 1  
142.	          }  
143.	        },  
144.	        "VideoSelector": {  
145.	          "ColorSpace": "FOLLOW"  
146.	        },  
147.	        "FilterEnable": "AUTO",  
148.	        "PsiControl": "USE_PSI",  
149.	        "FilterStrength": 0,  
150.	        "DeblockFilter": "DISABLED",  
151.	        "DenoiseFilter": "DISABLED",  
152.	        "TimecodeSource": "EMBEDDED",  
153.	        "FileInput": "s3:// <ARTIFACT-BUCKET>/BYUfootballmatch.mp4"  
154.	      }  
155.	    ]  
156.	  }  
157.	}  

While the MediaConvert job is running, another Lambda function checks for job completion. This function is written as follows:

1.	import json  
2.	import boto3  
3.	import pprint   
4.	def lambda_handler(event, context):  
5.	      
6.	    client = boto3.client('mediaconvert')  
7.	    endpoint = client.describe_endpoints()['Endpoints'][0]['Url']  
8.	      
9.	    myclient = boto3.client('mediaconvert', endpoint_url=endpoint)  
10.	      
11.	    response = myclient.list_jobs(  
12.	    MaxResults=1,  
13.	    Order='DESCENDING')  
14.	      
15.	    #Status='SUBMITTED'|'PROGRESSING'|'COMPLETE'|'CANCELED'|'ERROR')  
16.	    print(len(response['Jobs']))  
17.	    Status = response['Jobs'][0]['Status']  
18.	    Id = response['Jobs'][0]['Id']  
19.	    print(Id, Status)  
20.	    

Collect feature vectors for training

Each frame is passed through a pre-trained InceptionV3 model to extract features. The model is small enough to be packaged within the Lambda function along with an ML framework that was used to train the model (MXNet). We don’t describe the image classification model training here, but the overall procedure to do this is as follows:

  1. Train the InceptionV3 network on MXNet using ILSVRC 2012 data. For details about the network architecture and the dataset, see Rethinking the Inception Architecture for Computer Vision.
  2. Load the trained model into a Lambda function (model.json and model.params files) and pop the final layer. We’re left with an output layer that doesn’t perform classification, but provides us with a feature vector of size 1024×1.
  3. Each time a frame is passed through the Lambda function, it outputs this feature vector into a data stream, via Amazon Kinesis Data Streams.
  4. Topics from this stream are collected using Amazon Kinesis Data Firehose and output into another S3 bucket.
  5. An AWS Fargate job orders the files based on the original order in which the frames appear. Because we trigger 1,000 instances of this Lambda function in parallel with one frame per function, the outputs can be slightly out of order. You can also use SageMaker processing instead of Fargate. This gives us our final training data, which we can use in our SageMaker custom video classification model that can identify groups of frames as highlights. In our example, the highlights in our soccer video are penalty kicks.

The code for this Lambda function is as follows:

1.	import logging  
2.	import boto3  
3.	import json  
4.	import numpy as np  
5.	import tempfile  
6.	  
7.	logger = logging.getLogger()  
8.	logger.setLevel(logging.INFO)  
9.	region =   
10.	relevant_timestamps = []  
11.	  
12.	import mxnet as mx  
13.	  
14.	  
15.	def load_model(s_fname, p_fname):  
16.	    """ 
17.	    Load model checkpoint from file. 
18.	    :return: (arg_params, aux_params) 
19.	    arg_params : dict of str to NDArray 
20.	        Model parameter, dict of name to NDArray of net's weights. 
21.	    aux_params : dict of str to NDArray 
22.	        Model parameter, dict of name to NDArray of net's auxiliary states. 
23.	    """  
24.	    symbol = mx.symbol.load(s_fname)  
25.	    save_dict = mx.nd.load(p_fname)  
26.	    arg_params = {}  
27.	    aux_params = {}  
28.	    for k, v in save_dict.items():  
29.	        tp, name = k.split(':', 1)  
30.	        if tp == 'arg':  
31.	            arg_params[name] = v  
32.	        if tp == 'aux':  
33.	            aux_params[name] = v  
34.	    return symbol, arg_params, aux_params  
35.	  
36.	sym, arg_params, aux_params = load_model('model2.json', 'model2.params')  
37.	  
38.	#load json and params into model  
39.	#mod = None  
40.	  
41.	# We bind the module with the input shape and specify that it is only for predicting. The number 1 added before the image shape (3x224x224) means that we will only predict one image at a tim  
42.	  
43.	# FULL MODEL  
44.	#mod = mx.mod.Module(symbol=sym, label_names=None)  
45.	#mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes)  
46.	#mod.set_params(arg_params, aux_params, allow_missing=True)  
47.	  
48.	  
49.	from collections import namedtuple  
50.	Batch = namedtuple('Batch', ['data'])  
51.	  
52.	def lambda_handler(event, context):  
53.	    # PARTIAL MODEL  
54.	    mod2 = None  
55.	    all_layers = sym.get_internals()  
56.	    print(all_layers.list_outputs()[-10:])  
57.	    sym2 = all_layers['global_pool_output']  
58.	    mod2 = mx.mod.Module(symbol=sym2,label_names=None)  
59.	    #mod2.bind(for_training=False, data_shapes = [('data', (1,3,224,224))], label_shapes = mod2._label_shapes)  
60.	    mod2.bind(for_training=False, data_shapes=[('data', (1,3,299,299))])  
61.	    mod2.set_params(arg_params, aux_params)  
62.	      
63.	    #Get image(s) from s3  
64.	    s3 = boto3.resource('s3')  
65.	    bucket = s3.Bucket(event['bucketname'])  
66.	    object = bucket.Object(event['filename'])  
67.	  
68.	    #img = mx.image.imread('image.jpg')  
69.	      
70.	    tmp = tempfile.NamedTemporaryFile()  
71.	    with open(tmp.name, 'wb') as f:  
72.	        object.download_fileobj(f)  
73.	        img=mx.image.imread(tmp.name)  
74.	        # convert into format (batch, RGB, width, height)   
75.	        img = mx.image.imresize(img, 299, 299) # resize  
76.	        img = img.transpose((2, 0, 1)) # Channel first  
77.	        img = img.expand_dims(axis=0) # batchify  
78.	      
79.	        mod2.forward(Batch([img]))  
80.	    out = np.squeeze(mod2.get_outputs()[0].asnumpy())  
81.	      
82.	    kinesis_client = boto3.client('kinesis')  
83.	    put_response = kinesis_client.put_record(StreamName = 'bottleneck_stream',Data = json.dumps({'filename':event['filename'],'features':out.tolist()}), PartitionKey = "partitionkey")  
84.	    return 'Wrote features to kinesis stream'  

Label the images for training the model

As mentioned earlier, we use the UCF101 action recognition dataset, which you can obtain from within a Jupyter notebook instance using the following command:

!wget http://crcv.ucf.edu/data/UCF101/UCF101.rar

We extract the same feature vectors from InceptionV3 for all action recognition datasets contained within the .rar file downloaded (it contains several examples of 101 different actions, including ones relevant for our soccer example, such as the soccer penalty and soccer juggling labels.

We construct a custom LSTM model in TensorFlow and use features extracted in the previous step to train the model. The LSTM model is structured as follows:

  • Layer 1 – 2048 LSTM cells
  • Layer 2 – 512 Dense cells
  • Layer 3 – Drop out layer (p=0.5)
  • Layer 4 – Softmax layer for 101 classes

Model.summary() provides the following summary:

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 2048)              33562624  
_________________________________________________________________
dense_1 (Dense)              (None, 512)               1049088   
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 101)               51813     
=================================================================
Total params: 34,663,525
Trainable params: 34,663,525
Non-trainable params: 0
___________________________

With a more relevant dataset, you should only include the classes you require in the classification task. Due to the nature of the problem selected, we only had access to this open dataset. However, for extracting highlights from custom videos, you can use your own labeled video datasets.

We save the model using the following code in the notebook:

trainedmodel.save('lstm_model.h5')

We create a container containing the following code and Dockerfile, to host the model using SageMaker. The following is the Python entry point code for inference:

1.	#!/usr/bin/env python  
2.	from __future__ import print_function import os import sys import traceback import numpy as np import pandas as pd   
3.	import tensorflow as tf from keras.layers import Dropout, Dense from keras.wrappers.scikit_learn import   
4.	KerasClassifier from keras.models import Sequential from keras.models import load_model def train():  
5.	    print('Starting the training.')  
6.	    try:  
7.	        model = load_model('lstm_model.h5')  
8.	        print('Model is loaded ... Training is complete.')  
9.	    except Exception as e:  
10.	        # Write out an error file. This will be returned as the failure Reason in the DescribeTrainingJob result.  
11.	        trc = traceback.format_exc()  
12.	        with open(os.path.join(output_path, 'failure'), 'w') as s:  
13.	            s.write('Exception during training: ' + str(e) + 'n' + trc)  
14.	        # Printing this causes the exception to be in the training job logs  
15.	        print(  
16.	            'Exception during training: ' + str(e) + 'n' + trc,  
17.	            file=sys.stderr)  
18.	        # A non-zero exit code causes the training job to be marked as Failed.  
19.	        sys.exit(255) if __name__ == '__main__':  
20.	    train()  
21.	    # A zero exit code causes the job to be marked a Succeeded.  
22.	    sys.exit(0)  

We containerize and push the Docker image to Amazon Elastic Container Registry (Amazon ECR):

1.	%%sh  
2.	  
3.	# The name of our algorithm  
4.	algorithm_name=kerassample14  
5.	  
6.	cd container  
7.	  
8.	chmod +x keras-model/train  
9.	chmod +x keras-model/serve  
10.	  
11.	account=$(aws sts get-caller-identity --query Account --output text)  
12.	  
13.	# Get the region defined in the current configuration (default to us-west-2 if none defined)  
14.	region=$(aws configure get region)  
15.	region=${region:-us-west-2}  
16.	  
17.	fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"  
18.	echo $fullname  
19.	# If the repository doesn't exist in ECR, create it.  
20.	  
21.	aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1  
22.	  
23.	if [ $? -ne 0 ]  
24.	then  
25.	    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null  
26.	fi  
27.	  
28.	# Get the login command from ECR and execute it directly  
29.	$(aws ecr get-login --region ${region} --no-include-email)  
30.	  
31.	# Build the docker image locally with the image name and then push it to ECR  
32.	# with the full name.  
33.	  
34.	docker build  -t ${algorithm_name} .  
35.	docker tag ${algorithm_name} ${fullname}  
36.	  
37.	docker push ${fullname} 

Lastly, we host the model on SageMaker:

1.	account = sess.boto_session.client('sts').get_caller_identity()['Account']  
2.	region = sess.boto_session.region_name  
3.	image = f'<account number>.dkr.<region>.amazonaws.com/<containername>:latest'  
4.	  
5.	classifier = sage.estimator.Estimator(  
6.	    image,   
7.	    role,   
8.	    1,   
9.	    'ml.c5.2xlarge',   
10.	    output_path="s3://{}/output".format(sess.default_bucket()),  
11.	    sagemaker_session=sess) 
12.	
13.	
14.	from sagemaker.predictor import csv_serializer
15.	predictor = classifier.deploy(1, 'ml.m5.xlarge', serializer=csv_serializer)

SageMaker provides us with an endpoint to call for predictions where the input is a set of feature vectors (example, 10 frames or images corresponds to a 10×1024 feature matrix), and the output is a probability distribution across the 101 UCF101 classes. We’re only interested in the soccer penalty class. For the purposes of this blog, we use the UCF101 dataset but for your own use cases, do take the time to research relevant action recognition datasets or pretrained models.

Extract highlights

In our architecture, the Fargate job calls the SageMaker estimator sequentially with a set of feature vectors and stores the decision to pick up a set of frames or not in an Amazon DynamoDB table. When the Fargate job is complete, another Lambda function (see the following code) uses the DynamoDB table to edit a clipping job definition and submit the same to a MediaConvert job. This MediaConvert job splits the original video into smaller sections where the desired class of action was identified. In our case, this was the soccer penalty kicks. These extracted videos are then made public for access from outside the account using a Boto3 command from within the same Lambda function.

1.	import json  
2.	import boto3  
3.	import time  
4.	from boto3.dynamodb.conditions import Key, Attr  
5.	import math   
6.	  
7.	s3_location = 's3://<ARTIFACT-BUCKET>/BYUfootballmatch.mp4'  
8.	  
9.	def start_mediaconvert_job(data, sec_in, sec_out):  
10.	    
11.	      
12.	    client = boto3.client('mediaconvert')  
13.	    endpoint = client.describe_endpoints()['Endpoints'][0]['Url']  
14.	      
15.	    myclient = boto3.client('mediaconvert', endpoint_url=endpoint)  
16.	  
17.	    data['Settings']['Inputs'][0]['FileInput'] = s3_location  
18.	      
19.	    starttime = time.strftime('%H:%M:%S:00', time.gmtime(sec_in))  
20.	    endtime = time.strftime('%H:%M:%S:00', time.gmtime(sec_out))  
21.	      
22.	    data['Settings']['Inputs'][0]['InputClippings'][0] = {'EndTimecode': endtime, 'StartTimecode': starttime}  
23.	      
24.	    data['Settings']['OutputGroups'][0]['Outputs'][0]['NameModifier'] = '-from-'+str(sec_in)+'-to-'+str(sec_out)  
25.	      
26.	    response = myclient.create_job(  
27.	    Queue=data['Queue'],  
28.	    Role=data['Role'],  
29.	    Settings=data['Settings'])  
30.	  
31.	def lambda_handler(event, context):  
32.	      
33.	    
34.	    dynamodb = boto3.resource('dynamodb')  
35.	    table = dynamodb.Table('sports-lstm-final-output')        
36.	    response = table.scan()             
37.	    timeins = []  
38.	    timeouts=[]        
39.	    for i in response['Items']:  
40.	        if(i['pickup']=='yes'):              
41.	            timeins.append(i['timein'])  
42.	            timeouts.append(i['timeout'])  
43.	              
44.	    timeins = sorted([int(x) for x in timeins])  
45.	    timeouts =sorted([int(x) for x in timeouts])      
46.	    mintime =min(timeins)  
47.	    maxtime =max(timeouts)  
48.	      
49.	    print('mintime='+str(mintime))  
50.	    print('maxtime='+str(maxtime))  
51.	      
52.	    print(timeins)  
53.	    print(timeouts)  
54.	    mystarttime = mintime  
55.	   
56.	    #find continuous range  
57.	    ranges = {}  
58.	    maxisofar=0  
59.	    rangecount = 0  
60.	    lastnum = timeouts[0]  
61.	    for i in range(len(timeins)-1):  
62.	        c=0  
63.	        if(timeouts[i] >= lastnum):  
64.	            for j in range(i,len(timeouts)-1):  
65.	                if(timeins[j+1] - timeins[j] == 20 and timeouts[j] - timeins[i] == 40 + c*20 ):  
66.	                    c=c+1  
67.	                    continue  
68.	                if(timeins[i+1] - timeins[i] > 20 and timeouts[j] - timeins[i] == 40 ):  
69.	                    print('single frame',i,j)  
70.	                    ranges[rangecount] = {'start':timeins[i], 'end':timeouts[i], 'count':1}   
71.	                    rangecount=rangecount+1  
72.	                    lastnum = timeouts[i+1]  
73.	                    continue  
74.	                      

75.	            if(c>0):  
76.	                
77.	                ranges[rangecount] = {'start':timeins[i], 'end':timeouts[i+c], 'count':c}  
78.	                rangecount=rangecount+1  
79.	                lastnum = timeouts[i+c+1]  
80.	  
81.	    print(lastnum)  
82.	    if(lastnum == timeouts[-1]):  
83.	        # Last frame is a single frame  
84.	        ranges[rangecount] = {'start':timeins[-1], 'end':timeouts[-1], 'count':1}   
85.	          
86.	    print(ranges)  
87.	    #Find max continuous range  
88.	    maxc = 0  
89.	    maxi = 0  
90.	    for i in range(len(ranges)):  
91.	        if maxc < ranges[i]['count']:  
92.	            maxi = i  
93.	            maxc = ranges[i]['count']  
94.	    buffer = 1 #seconds  
95.	      
96.	   
97.	    with open('mediaconvert.json') as f:  
98.	        data = json.load(f)  
99.	      
100.	    # DO THIS for ALL RANGES  
101.	    for i in range(len(ranges)):  
102.	        if ranges[i]['count']:  
103.	            sec_in = math.floor(ranges[i]['start']/20.0) - buffer  
104.	            sec_out = math.ceil(ranges[i]['end']/20.0) + buffer #20:1 was the framer rate in original video  
105.	            sec_in = 0 if sec_in<0 else sec_in  
106.	            start_mediaconvert_job(data, sec_in, sec_out)  
107.	            time.sleep(1)  
108.	      
109.	    print(ranges)  
110.	    return json.dumps({'bucket':'elemental-media-input','prefix':'High','postfix':'mp4'}) 

Deployment Prerequisites

To deploy this solution, you will need to create an Amazon S3 bucket and designate it as an ARTIFACT-BUCKET. This bucket will be used for storing the video file as well as the model artifacts. You could run the follow AWS CLI command to create an Amazon S3 bucket:

aws s3api create-bucket   --bucket <ARTIFACT-BUCKET>

Next, un the following command to copy required artifacts to the artifact bucket.

aws s3 cp s3://aws-ml-blog/artifacts/sportshighlights/ 
s3://<ARTIFACT-BUCKET>  --recursive --copy-props none

Deploy the solution using AWS CloudFormation

We provide an AWS CloudFormation template for creating resources and setting up the workflow for this post. AWS CloudFormation enables you to model, provision, and manage AWS resources by treating infrastructure as code.

The CloudFormation template requires you to provide an email address that is used for sending links to highlight clips at the end of the workflow. The stack sets up the following resources:

Follow these steps to deploy this in your own account:

  1. Choose Launch Stack:

  1. Enter a name for the stack.
  2. Enter an email address where you choose to receive notifications from the Step Functions workflow.
  3. For S4Bucket, enter the name of the ARTIFACT-BUCKET that you created earlier.
  4. Choose Next.
  5. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources.
  6. Choose Create stack.

After the stack is successfully created, you will receive an email with a request to subscribe to the topic. Chose ‘Confirm subscription’.

  1. From the AWS CloudFormation console, navigate to the resources tab for the stack you created. Click on the hyperlink (Physical ID) for the MLStateMachine resource.

This should navigate you to the Step Functions console. Select ‘Start Execution’.

  1. Enter a name, select defaults for input and select ‘Start Execution’.

You can monitor progress of the Step Functions execution by navigating to the ‘Graph Inspector’.

Wait for the Step Functions workflow to complete.

After the Step Functions execution completes, you should receive an email with Amazon S3 files representing the highlight clips from the original video. Following best practices, we do not expose these highlight clips publicly. You could navigate to the Amazon S3 bucket you created and will find the clips in a folder named HighLightclips.

At the end of the process, you should see that the following input video:

generates the following output highlight clip of a penalty kick:

Clean up

To avoid incurring ongoing charges, clean up your infrastructure by deleting the stack from the AWS CloudFormation console.

Empty the artifact bucket that you created for the blog. You could run the following AWS CLI command:

aws s3 rm s3://<ARTIFACT-BUCKET> --recursive 
aws s3api delete-bucket --bucket <ARTIFACT-BUCKET>

After this, navigate to AWS CloudFormation in AWS Management Console, select the stack you created and select ‘Delete’.

Conclusion

In this post, we showed you how to use a custom SageMaker model to generate sports highlights from full-length sports videos. You can extend this solution to generate highlights containing slam dunks, touch downs, home runs or sixers from your favorite sports videos, or from other shows, movies, meetings, and any other content in a video format – as long as you have a pretrained model, or you train a model specific to your use case.

For more information about how to make preprocessing easier, check out Amazon SageMaker Processing.

For more information about how to fine-tune state-of-the-art action recognition models like PAN Resnet 101, TSM, and R2+1D BERT, or host them on SageMaker as endpoints, see Deploy a Model in Amazon SageMaker.


About the Authors

Shreyas Subramanian is an AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges on the AWS Cloud.

Mohit Mehta is a leader in the AWS Professional Services Organization with expertise in AI/ML and Big Data technologies. Mohit holds a M.S in Computer Science, all 12 AWS certifications, MBA from College of William and Mary and GMP from Michigan Ross School of Business.

Vikrant Kahlir is Principal Architect in the Solutions Architecture team. He works with AWS strategic customers product and engineering teams to help them with technology solutions using AWS services for Managed Databases, AI/ML, HPC, Autonomous Computing, and IoT.

Read More

Monitor operational metrics for your Amazon Lex chatbot

Chatbots are increasingly becoming an important channel for companies to interact with their customers, employees, and partners. Amazon Lex allows you to build conversational interfaces into any application using voice and text. Amazon Lex V2 console and APIs make it easier to build, deploy, and manage bots so that you can expedite building virtual agents, conversational IVR systems, self-service chatbots, or informational bots. Designing a bot and deploying it in production is only the beginning of the journey. You want to analyze the bot’s performance over time to gather insights that can help you adapt the bot to your customers’ needs. A deeper understanding of key metrics such as trending topics, top utterances, missed utterances, conversation flow patterns, and customer sentiment help you enhance your bot to better engage with customers and improve their overall satisfaction. It then becomes crucial to have a conversational analytics dashboard to gain these insights from a single place.

In this post, we look at deploying an analytics dashboard solution for your Amazon Lex bot. The solution uses your Amazon Lex bot conversation logs to automatically generate metrics and visualizations. It creates an Amazon CloudWatch dashboard where you can track your chatbot performance, trends, and engagement insights.

Solution overview

The Amazon Lex V2 Analytics Dashboard Solution helps you monitor and visualize the performance and operational metrics of your Amazon Lex chatbot. You can use it to continuously analyze and improve the experience of end users interacting with your chatbot.

The solution includes the following features:

  • A common view of valuable chatbot insights, such as:
    • User and session activity (sentiment analysis, top-N sessions, text/speech modality)
    • Conversation statistics and aggregations (average session duration, messages per session, session heatmaps)
    • Conversation flow, trends, and history (intent path chart, intent per hour heatmaps)
    • Utterance history and performance (missed utterances, top-N utterances)
    • Slot and session attributes most frequently used values
  • Rich visualizations and widgets such as metrics charts, top-N lists, heatmaps, and utterance management
  • Serverless architecture using pay-per-use managed services that scale transparently
  • CloudWatch metrics that you can use to configure CloudWatch alarms

Architecture

The solution uses the following AWS services and features:

The following diagram illustrates the solution architecture.

The source code of this solution is available in the GitHub repository.

Additional resources

There are several blog posts for Amazon Lex that also explore monitoring and analytics dashboards:

This post was inspired by the concepts in those previous posts, but the current solution has been updated to work with Amazon Lex bots created from the V2 APIs. It also adds new capabilities such as CloudWatch custom widgets.

Enable conversation logs

Before you deploy the solution for your existing Amazon Lex bot (created using the V2 APIs), you should enable conversation logs. If your bot already has conversation logs enabled, you can skip this step.

We also provide the option to deploy the solution with an accompanying bot that has conversation logs enabled and a scheduled Lambda function to generate conversation logs. This is an alternative if you just want to test drive this solution without using an existing bot or configuring conversation logs yourself.

We first create a log group.

  1. On the CloudWatch console, in the navigation pane, choose Log groups.
  2. Choose Actions, then choose Create log group.
  3. Enter a name for the log group, then choose Create log group.

Now we can enable the conversation logs.

  1. On the Amazon Lex V2 console, from the list, choose your bot.
  2. On the left menu, choose Aliases.
  3. In the list of aliases, choose the alias for which you want to configure conversation logs.
  4. In the Conversation logs section, choose Manage conversation logs.
  5. For text logs, choose Enable.
  6. Enter the CloudWatch log group name that you created.
  7. Choose Save to start logging conversations.

If necessary, Amazon Lex updates your service role with permissions to access the log group.

The following screenshot shows the resulting conversation log configuration on the Amazon Lex console.

Deploy the solution

You can easily install this solution in your AWS accounts by launching it from the AWS Serverless Application Repository. As a minimum, you provide your bot ID, bot locale ID, and the conversation log group name when you deploy the dashboard. To deploy the solution, complete the following steps:

  1. Choose Launch Stack:

You’re redirected to the create application page on the Lambda console (this is a Serverless solution!).

  1. Scroll down to the Application Settings section and enter the parameters to point the dashboard to your existing bot:
    1. BotId – The ID of an existing Amazon Lex V2 bot that is going to be used with this dashboard. To get the ID of your bot, find your bot on the Amazon Lex console and look for the ID in the Bot details section.
    2. BotLocaleId – The bot locale ID associated to the bot ID with this dashboard, which defaults to en_US. To get the locales configured for your bot, choose View languages on the same page where you found the bot ID.Each dashboard creates metrics for a specific locale ID of a Lex bot. For more details on supported languages, see Supported languages and locales.
    3. LexConversationLogGroupName – The name of an existing CloudWatch log group containing the Amazon Lex conversation logs. The bot ID and locale must be configured to use this log group for its conversation logs.

Alternatively, if you just want to test drive the dashboard, this solution can deploy a fully functional sample bot. The sample bot comes with a Lambda function that is invoked every 2 minutes to generate conversation traffic. If you want to deploy the dashboard with the sample bot instead of using an existing bot, set the ShouldDeploySampleBots parameter to true. This is a quick and easy way to test the solution.

  1. After you set the desired values in the Application settings section, scroll down to the bottom of the page and select I acknowledge that this app creates custom IAM roles, resource policies and deploys nested applications.
  2. Choose Deploy to create the dashboard.

You’re redirected to the application overview page (it may take a moment).

  1. Choose the Deployments tab to watch the deployment status.
  2. Choose View stack events to go to the AWS CloudFormation console to see the deployment details.

The stack may take around 5 minutes to create. Wait until the stack status is CREATE_COMPLETE.

  1. When the stack creation is complete, you can look for a direct link to your dashboard on the Outputs tab of the stack (the DashboardConsoleLink output variable).

You may need to wait a few minutes for data to be reflected in the dashboard.

Use the solution

The dashboard provides a single pane of glass that allows you to monitor the performance of your Amazon Lex bot. The solution is built using CloudWatch features that are intended for monitoring and operational management purposes.

The dashboard displays widgets showing bot activity over a selectable time range. You can use the widgets to visualize trends, confirm that your bot is performing as expected, and optimize your bot configuration. For general information about using CloudWatch dashboards, see Using Amazon CloudWatch dashboards.

The dashboard contains widgets with metrics about end user interactions with your bot covering statistics of sessions, messages, sentiment, and intents. These statistics are useful to monitor activity and identify engagement trends. Your bot must have sentiment analysis enabled if you want to see sentiment metrics.

Additionally, the dashboard contains metrics for missed utterances (phrases that didn’t match the configured intents or slot values). You can expand entries in the Missed Utterance History widget to look for details of the state of the bot at the point of the missed utterance so that you can fine-tune your bot configuration. For example, you can look at the session attributes, context, and session ID of a missed utterance to better understand the related application state.

You can use the dashboard to monitor session duration, messages per session, and top session contributors.

You can track conversations with a widget listing the top messages (utterances) sent to your bot and a table containing a history of messages. You can expand each message in the Message History section to look at conversation state details when the message was sent.

You can visualize utilization with heatmap widgets that aggregate sessions and intents by day or time. You can hover your pointer over blocks to see the aggregation values.

You can look at a chart containing conversation paths aggregated by sessions. The thickness of the connecting path lines is proportional to the usage. Grey path lines show forward flows and red path lines show flows returning to a previously hit intent in the same session. You can hover your pointer over the end blocks to see the aggregated counts. The conversation path chart is useful to visualize the most common paths taken by your end users and to uncover unexpected flows.

The dashboard shows tables that aggregate the top slots and session attributes values. The session attributes and slots are dynamically extracted from the conversation logs. These widgets can be configured to exclude specific session attributes and slots by modifying the parameters of the widget. These tables are useful to identify the top values provided to the bot in slots (data inputs) and to track the top custom application information kept in session attributes.

You can add missed utterances to intents of the Draft version of your bot with the Add Missed Utterances widget. For more information about bot versions, see Creating versions. This widget is optionally added to the dashboard if you set the ShouldAddWriteWidgets parameter to true when you deploy the solution.

CloudWatch features

This section describes the CloudWatch features used to create the dashboard widgets.

Custom metrics

The dashboard includes custom CloudWatch metrics that are created using metric filters that extract data from the bot conversation logs. These custom metrics track your bot activity, including number of messages and missed utterances.

The metrics are collected under a custom namespace based on the bot ID and locale ID. The namespace is named Lex/Activity/<BotID>/<LocaleID> where <BotId> and <LocaleId> are the bot and locale IDs that you passed when creating the stack. To see these metrics on the CloudWatch console, navigate to the Metrics section and look for the namespace under Custom Namespaces.

Additionally, the metrics are categorized using dimensions based on bot characteristics, such as bot alias, bot version, and intent names. These dimensions are dynamically extracted from conversation logs so they automatically create metrics subcategories as your bot configuration changes over time.

You can use various CloudWatch capabilities with these custom metrics, including alarms and anomaly detection.

Contributor Insights

Similar to the custom metrics, the solution creates CloudWatch Contributor Insights rules to track the unique contributors of highly variable data such as utterances and session IDs. The widgets in the dashboard using Contributor Insights rules include Top 10 Messages and Top 10 Sessions.

The Contributor Insights rules are used to create top-N metrics and dynamically create aggregation metrics from this highly variable data. You can use these metrics to identify outliers in the number of messages sent in a session and see which utterances are the most commonly used. You can download the top-N items from these widgets as a CSV file by choosing the widget menu and choosing Export contributors.

Logs Insights

The dashboard uses the CloudWatch Logs Insights feature to query conversation logs. Various widgets in the dashboard including the Missed Utterance History and the Message History use CloudWatch Logs Insights queries to generate the tables.

The CloudWatch Logs Insights widgets allow you to inspect the details of the items returned by the queries by choosing the arrow next to the item. Additionally, the Logs Insights widgets have a link that can take you to the CloudWatch Logs Insights console to edit and run the query used to generate the results. You can access this link by choosing the widget menu and choosing View in CloudWatch Logs Insights. The CloudWatch Logs Insights console also allows you to export the result of the query as a CSV file by choosing Export results.

Custom widgets

The dashboard includes widgets that are rendered using the custom widgets feature. These widgets are powered by Lambda functions using Python or JavaScript code. The functions use the D3.js (JavaScript) or pandas (Python) libraries to render rich visualizations and perform complex data aggregations.

The Lambda functions query your bot conversation logs using the CloudWatch Logs Insights API. The functions then use code to aggregate the data and output the HTML that is displayed in the dashboard. It obtains bot configuration details (such as intents and utterances) using the Amazon Lex V2 APIs or dynamically extracts it from the query results (such as slots and session attributes). The dashboard uses custom widgets for the following widgets: heatmaps, conversation path, add utterances management form, and top-N slot/session attribute tables.

Cost

The Amazon Lex V2 Analytics Dashboard Solution is based on CloudWatch features. See Amazon CloudWatch pricing for cost details.

Clean up

To clean up your resources, you can delete the CloudFormation stack. This permanently removes the dashboard and metrics from your account. For more information, see Deleting a stack on the AWS CloudFormation console.

Summary

In this post, we showed you a solution that provides insights on how your users interact with your Amazon Lex chatbot. The solution uses CloudWatch features to create metrics and visualizations that you can use to improve the user experience of your bot users.

The Amazon Lex V2 Analytics Dashboard Solution is provided as open source—use it as a starting point for your own solution, and help us make it better by contributing back fixes and features. For expert assistance, AWS Professional Services and other AWS Partners are here to help.

We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the GitHub repository.


About the Author

Oliver Atoa is a Principal Solutions Architect in the AWS Language AI Services team.

Read More