Amazon AWS – Page 283

Achieve 12x higher throughput and lowest latency for PyTorch Natural Language Processing applications out-of-the-box on AWS Inferentia

May 4, 2021

by Fabio Nonato de Paula Amazon AWS

AWS customers like Snap, Alexa, and Autodesk have been using AWS Inferentia to achieve the highest performance and lowest cost on a wide variety of machine learning (ML) deployments. Natural language processing (NLP) models are growing in popularity for real-time and offline batched use cases. Our customers deploy these models in many applications like support chatbots, search, ranking, document summarization, and natural language understanding. With AWS Inferentia you can also achieve out-of-the-box highest performance and lowest cost on opensource NLP models, without the need for customizations.

In this post, you learn how to maximize throughput for both real-time applications with tight latency budgets and batch processing where maximum throughput and lowest cost are key performance goals on AWS Inferentia. For this post, you deploy an NLP-based solution using HuggingFace Transformers pretrained BERT base models, with no modifications to the model and one-line code change at the PyTorch framework level. The solution achieves 12 times higher throughput at 70% lower cost on AWS Inferentia, as compared to deploying the same model on GPUs.

To maximize inference performance of Hugging Face models on AWS Inferentia, you use AWS Neuron PyTorch framework integration. Neuron is a software development kit (SDK) that integrates with popular ML frameworks, such as TensorFlow and PyTorch, expanding the frameworks APIs so you can run high-performance inference easily and cost-effectively on Amazon EC2 Inf1 instances. With a minimal code change, you can compile and optimize your pretrained models to run on AWS Inferentia. The Neuron team is consistently releasing updates with new features and increased model performance. With the v1.13 release, the performance of transformers based models improved by an additional 10%–15%, pushing the boundaries of minimal latency and maximum throughput, even for larger NLP workloads.

To test out the Neuron SDK features yourself, check out the latest Utilizing Neuron Capabilities for PyTorch.

The NeuronCore Pipeline mode explained

Each AWS Inferentia chip, available through the Inf1 instance family, contains four NeuronCores. The different instance sizes provide 1 to 16 chips, totaling 64 NeuronCores on the largest instance size, the inf1.24xlarge. The NeuronCore is a compute unit that runs the operations of the Neural Network (NN) graph.

When you compile a model without Pipeline mode, the Neuron compiler optimizes the supported NN operations to run on a single NeuronCore. You can combine the NeuronCores into groups, even across AWS Inferentia chips, to run the compile model. This configuration allows you to use multiple NeuronCores in data parallel mode across AWS Inferentia chips. This means that, even on the smallest instance size, four models can be active at any given time. Data parallel implementation of four (or more) models provides the highest throughput and lowest cost in most cases. This performance boost comes with minimum impact on latency, because AWS Inferentia is optimized to maximize throughput at small batch sizes.

With Pipeline mode, the Neuron compiler optimizes the partitioning and placement of a single NN graph across a requested number of NeuronCores, in a completely automatic process. It allows for an efficient use of the hardware because the NeuronCores in the pipeline run streaming inference requests, using a faster on-chip cache to hold the model weights. When one of the cores in the pipeline finishes processing a first request it can start processing following requests, without waiting for the last core to complete processing the first request. This streaming pipeline inference increases per core hardware utilization, even when running inference of small batch sizes on real-time applications, such as batch size 1.

Finding the optimum number of NeuronCores to fit a single large model is an empirical process. A good starting point is to use the following approximate formula, but we recommend experimenting with multiple configurations to achieve an optimum deployment:

neuronCore_pipeline_cores = 4*round(number-of-weights-in-model/(2E7))

The compiler directly takes the value of neuroncore-pipeline-cores compilation flag, and that is all there is to it! To enable this feature, add the argument to the usual compilation flow of your desired framework.

In TensorFlow Neuron, use the following code:

import numpy as np
import tensorflow.neuron as tfn

example_input = np.zeros([1,224,224,3], dtype='float16')
tfn.saved_model.compile("<Path to your saved model>",
                        "<Path to write compiled model>/1",
                        model_feed_dict={'input_1:0' : example_input },
                        compiler_args = ['--neuroncore-pipeline-cores', '8'])

In PyTorch Neuron, use the following code:

import torch
import torch_neuron

model = torch.jit.load(<Path to your traced model>)
inputs = torch.zeros([1, 3, 224, 224], dtype=torch.float32)

model_compiled = torch.neuron.trace(model, 
                           example_inputs=inputs, 
                           compiler_args = ['--neuroncore-pipeline-cores', '8'])

For more information about the NeuronCore Pipeline and other Neuron features, see Neuron Features.

Run HuggingFace question answering models in AWS Inferentia

To run a Hugging Face BertForQuestionAnswering model on AWS Inferentia, you only need to add a single, extra line of code to the usual Transformers implementation, besides importing the torch_neuron framework. You can adapt the following example of the forward pass method according to the following snippet:

from transformers import BertTokenizer, BertForQuestionAnswering
import torch
import torch_neuron

tokenizer = BertTokenizer.from_pretrained('twmkn9/bert-base-uncased-squad2')
model = BertForQuestionAnswering.from_pretrained('twmkn9/bert-base-uncased-squad2',return_dict=False)

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors='pt')

neuron_model = torch.neuron.trace(model, 
                                  example_inputs = (inputs['input_ids'],inputs['attention_mask']),
                                  verbose=1)

outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask']))

The one extra line in the preceding code is the call to the torch.neuron.trace() method. This call compiles the model and returns a new neuron_model() method that you can use to run inference over the original inputs, as shown in the last line of the script. If you want to test this example, see PyTorch Hugging Face pretrained BERT Tutorial.

The ability to compile and run inference using the pretrained models—or even fine-tuned, as in the preceding code—directly from the Hugging Face model repository is the initial step towards optimizing deployments in production. This first step can already produce two times greater performance with 70% lower cost when compared to a GPU alternative (which we discuss later in this post). When you combine NeuronCore Groups and Pipelines features, you can explore many other ways of packaging the models within a single Inf1 instance.

Optimize model deployment with NeuronCore Groups and Pipelines

The HuggingFace question answering deployment requires some of the model’s parameters to be set a priori. Neuron is an ahead-of-time (AOT) compiler, which requires knowledge of the tensor shapes at compile time. For that, we define both batch size and sequence length for our model deployment. In the previous example, the Neuron framework inferred those from the example input passed on the trace call: (inputs[‘input_ids’], inputs[‘attention_mask’]).

Besides those two model parameters, you can set the compiler argument ‘--neuroncore-pipeline-cores’ and the environment variable ‘NEURONCORE_GROUP_SIZES‘ to fine-tune how your model server consumes the NeuronCores on the AWS Inferentia chip.

For example, to maximize the number of concurrent server workers processing the inference request on a single AWS Inferentia chip—four cores—you set NEURONCORE_GROUP_SIZES=”1,1,1,1” and ‘--neuroncore-pipeline-cores’ to 1, or leave it out as a compiler argument. The following image depicts this split. It’s a full data parallel deployment.

For minimum latency, you can set ‘--neuroncore-pipeline-cores’ to 4 and NEURONCORE_GROUP_SIZES=”4” so that the process consumes all four NeuronCores at once, for a single model. The AWS Inferentia chip can process four inference requests concurrently, as a stream. The model pipeline parallel deployment looks like the following figure.

Data parallel deployments favor throughput with multiple workers processing requests concurrently. The pipeline parallel, however, favors latency, but can also improve throughput due to the stream processing behavior. With these two extra parameters, you can fine-tune the serving application architecture according to the most important serving metrics for your use case.

Optimize for minimum latency: Multi-core pipeline parallel

Consider an application that requires minimum latency, such as sequence classification as part of an online chatbot workflow. As the user submits text, a model running on the backend classifies the intent of a single user input and is bounded by how fast it can infer. The model most likely has to provide responses to single input (batch size 1) requests.

The following table compare the performance and cost of Inf1 instances vs. the g4dn.xlarge—the most optimized GPU instance family for inference in the cloud—while running the HuggingFace BERT base model in a data parallel vs. pipeline parallel configuration and batch size 1. Looking at the 95th percentile (p95) of latency, we get lower values in Pipeline mode for both the 4 core inf1.xlarge and the 16 cores inf1.6xlarge instances. The best configuration between Inf1 instances is the 16 cores case, with a 58% reduction in latency, reaching 6 milliseconds.

Instance	Batch Size	Inference Mode	NeuronCores per model	Throughput [sentences/sec]	Latency p95 [seconds]	Cost per 1M inferences	Throughput ratio [inf1/g4dn]	Cost ratio [inf1/g4dn]
inf1.xlarge	1	Data Parallel	1	245	0.0165	$0.42	1.6	43%
inf1.xlarge	1	Pipeline Parallel	4	291	0.0138	$0.35	2.0	36%
inf1.6xlarge	1	Data Parallel	1	974	0.0166	$0.54	6.5	55%
inf1.6xlarge	1	Pipeline Parallel	16	1793	0.0069	$0.30	12.0	30%
g4dn.xlarge	1	–	–	149	0.0082	$0.98

The model tested was the PyTorch version of HuggingFace bert-base-uncase, with sequence length 128. On AWS Inferentia, we compile the model to use all available cores and run full pipeline parallel. For the data parallel cases, we compile the models for a single core and configured the NeuronCore Groups to run a worker model per core. The GPU deployment used the same setup as AWS Inferentia, where the model was traced with TorchScript JIT and cast to mixed precision using PyTorch AMP Autocast.

Throughput also increased 1.84 times with Pipeline mode on AWS Inferentia, reaching 1,793 sentences per second, which is 12 times the throughput of g4dn.xlarge. The cost of inference on this configuration also favors the inf1.6xlarge over the most cost-effective GPU option, even at a higher cost per hour. The cost per million sentences is 70% lower based on Amazon Elastic Compute Cloud (Amazon EC2) On-Demand instance pricing. For latency sensitive applications that can’t utilize the full throughput of the inf1.6xlarge, or for smaller models such as BERT Small, we recommend using Pipeline mode on inf1.xlarge for a cost-effective deployment.

Optimize for maximum throughput: Single-core data parallel

An NLP use case that requires increase throughput over minimum latency is extractive question answering tasks, as part of a search and document retrieval pipeline. In this case, increasing the number of document sections processed in parallel can speed up the search result or improve the quality and breadth of searched answers. In such a setup, inferences are more likely to run in batches (batch size larger than 1).

To achieve maximum throughput, we found through experimentation the optimum batch size to be 6 on AWS Inferentia, for the same model tested before. On g4dn.xlarge, we ran batch 64 without running out of GPU memory. The following results help show how batch size 6 can provide 9.2 times more throughput on inf1.6xlarge at 61% lower cost, when compared to GPU.

Instance	Batch Size	Inference Mode	NeuronCores per model	Throughput [sentences/sec]	Latency p95 [seconds]	Cost per 1M inferences	Throughput ratio [inf1/g4dn]	Cost ratio [inf1/g4dn]
inf1.xlarge	6	Data Parallel	1	985	0.0249	$0.10	2.3	30%
inf1.xlarge	6	Pipeline Parallel	4	945	0.0259	$0.11	2.2	31%
inf1.6xlarge	6	Data Parallel	1	3880	0.0258	$0.14	9.2	39%
inf1.6xlarge	6	Pipeline Parallel	16	2302	0.0310	$0.23	5.5	66%
g4dn.xlarge	64	–	–	422	0.1533	$0.35

In this application, cost considerations can also impact the final serving infrastructure design. The most cost-efficient way of running the batched inferences is using the inf1.xlarge instance. It achieves 2.3 times higher throughput than the GPU alternative, at 70% lower cost. Choosing between inf1.xlarge and inf1.6xlarge depends only on the main objective: minimum cost or maximum throughput.

To test out the NeuronCore Pipeline and Groups feature yourself, check out the latest Utilizing Neuron Capabilities tutorials for PyTorch.

Conclusion

In this post, we explored ways to optimize your NLP deployments using the NeuronCore Groups and Pipeline features. The native integration of AWS Neuron SDK and PyTorch allowed you to compile and optimize the HuggingFace Transformers model to run on AWS Inferentia with minimal code change. By tunning the deployment architecture to be pipeline parallel, the BERT models achieve minimum latency for real-time applications, with 12 times higher throughput than a g4dn.xlarge alternative, while costing 70% less to run. For batch inferencing, we achieve 9.2 times higher throughput at 60% less cost.

The Neuron SDK features described in this post also apply to other ML model types and frameworks. For more information, see the AWS Neuron Documentation.

Learn more about the AWS Inferentia chip and the Amazon EC2 Inf1 instances to get started running your own custom ML pipelines on AWS Inferentia using the Neuron SDK.

About the Authors

Fabio Nonato de Paula is a Sr. Manager, Solutions Architect for Annapurna Labs at AWS. He helps customers use AWS Inferentia and the AWS Neuron SDK to accelerate and scale ML workloads in AWS. Fabio is passionate about democratizing access to accelerated ML and putting deep learning models in production. Outside of work, you can find Fabio riding his motorcycle on the hills of Livermore valley or reading ComiXology.

Mahadevan Balasubramaniam is a Principal Solutions Architect for Autonomous Computing with nearly 20 years of experience in the area of physics infused deep learning, building and deploying digital twins for industrial systems at scale. Mahadevan obtained his PhD in Mechanical Engineering from Massachusetts Institute of Technology and has over 25 patents and publications to his credit.

How Endel’s AI-powered Focus soundscapes earned the backing of neuroscience

May 4, 2021

by admin Amazon AWS

A new study has found that when compared to curated playlists and silence, personalized AI soundscapes generated by Alexa Fund company Endel are more effective in helping people focus.Read More

Creating an end-to-end application for orchestrating custom deep learning HPO, training, and inference using AWS Step Functions

May 3, 2021

by Mehdi Far Amazon AWS

Amazon SageMaker hyperparameter tuning provides a built-in solution for scalable training and hyperparameter optimization (HPO). However, for some applications (such as those with a preference of different HPO libraries or customized HPO features), we need custom machine learning (ML) solutions that allow retraining and HPO. This post offers a step-by-step guide to build a custom deep learning web application on AWS from scratch, following the Bring Your Own Container (BYOC) paradigm. We show you how to create a web application to enable non-technical end users to orchestrate different deep learning operations and perform advanced tasks such as HPO and retraining from a UI. You can modify the example solution to create a deep learning web application for any regression and classification problem.

Solution overview

Creating a custom deep learning web application consists of two main steps:

ML component (focusing on how to dockerize a deep learning solution)
Full-stack application to use ML component

In the first step, we need to create a custom Docker image and register it in Amazon Elastic Container Registry. Amazon SageMaker will use this image to run Bayesian HPO, training/re-training, and inference. Details of dockerizing a deep learning code are described in Appendix A.

In the second step, we deploy a full-stack application with AWS Serverless Application Model (SAM). We use AWS Step Functions and AWS Lambda to orchestrate different stages of ML pipeline. Then we create the frontend application hosted in Amazon Simple Storage Service (Amazon S3) and Amazon CloudFront. We also use AWS Amplify with Amazon Cognito for authentication. The following diagram shows the solution architecture.

After you deploy the application, you can authenticate with Amazon Cognito to trigger training or HPO jobs from the UI (Step 2 in the diagram). User requests go through Amazon API Gateway to Step Functions, which is responsible for orchestrating the training or HPO (Step 3). When it’s complete, you can submit a set of input parameters through the UI to API Gateway and Lambda to get the inference results (Step 4).

Deploy the application

For instructions on deploying the application, see the GitHub repo README file. This application consists of four main components:

machine-learning – Contains SageMaker notebooks and scripts for building an ML Docker image (for HPO and training), discussed in Appendix A
shared-infra – Contains AWS resources used by both the backend and frontend in an AWS CloudFormation
backend – Contains the backend code: APIs and a step function for retraining the model, running HPO, and an Amazon DynamoDB database
frontend – Contains the UI code and infrastructure to host it.

Deployment details can be found here.

Create a step for HPO and training in Step Functions

Training a model for inference using Step Functions requires multiple steps:

Create a training job.
Create a model.
Create an endpoint configuration.
Optionally, delete the old endpoint.
Create a new endpoint.
Wait until the new endpoint is deployed.

Running HPO is simpler because we only create an HPO job and output the result to Amazon CloudWatch Logs. We orchestrate both model training and HPO using Step Functions. We can define these steps as a state machine, using Amazon State Language (ASL) definition. The following figure is the graphical representation of this state machine.

As the first step, we use the Choice state to decide whether to have an HPO or training mode using the following code:

"Mode Choice": {
    "Type": "Choice",
    "Choices": [
        {
            "Variable": "$.Mode",
            "StringEquals": "HPO",
            "Next": "HPOFlow"
        }
    ],
    "Default":  "TrainingModelFlow"
},

Many states have the names Create a … Record and Update Status to…. These steps either create or update records in DynamoDB tables. The API queries these tables to return the status of the job and the ARN of created resources (the endpoint ARN for making an inference).

Each record has the Step Function execution ID as a key and a field called status. As the state changes, its status changes from TRAINING_MODEL, all the way to READY. The state machine records important outputs like S3 model output, model ARN, endpoint config ARN, and endpoint ARN.

For example, the following state runs right before endpoint deployment. The endpointConfigArn field is updated in the record.

"Update Status to DEPLOYING_ENDPOINT": {
    "Type": "Task",
    "Resource": "arn:aws:states:::dynamodb:updateItem",
    "Parameters": {
        "TableName": "${ModelTable}",
        "Key": {
            "trainingId": {
                "S.$": "$$.Execution.Id"
            },
            "created": {
                "S.$": "$$.Execution.StartTime"
            }
        },
        "UpdateExpression": "SET #st = :ns, #eca = :cf",
        "ExpressionAttributeNames": {
            "#st" : "status",
            "#eca" : "endpointConfigArn"
        },
        "ExpressionAttributeValues": {
            ":ns" : {
                "S": "DEPLOYING_ENDPOINT"
            },
            ":cf" : {
                "S.$": "$.EndpointConfigArn"
            }
        }
    },
    "ResultPath": "$.taskresult",
    "Next": "Deploy"
}

The following screenshot shows the content in the DynamoDB table.

In the preceding screenshot, the last job is still running. It finished training and creating an endpoint configuration, but hasn’t deployed the endpoint yet. Therefore, there is no endpointArn in this record.

Another important state is Delete Old Endpoint. When you deploy an endpoint, an Amazon Elastic Compute Cloud (Amazon EC2) instance is running 24/7. As you train more models and create more endpoints, your inference cost grows linearly with the number of models. Therefore, we create this state to delete the old endpoint to reduce our cost.

The Delete Old Endpoint state calls a Lambda function that deletes the oldest endpoint if it exceeds the maximum number specified. The default value is 5, but you could change it in the parameter of the CloudFormation template for the backend. Although you can change this value to any arbitrary number, SageMaker has a soft limit on how many endpoints you can have at a given time. There is also a limit per each instance type.

Finally, we have states for updating status to ERROR (one for HPO and another one for model training). These steps are used in the Catch field when any part of the step throws an error. These steps update the DynamoDB record with the fields error and errorCause from Step Functions (see the following screenshot).

Although we can retrieve this data from the Step Functions APIs, we keep them in DynamoDB records so that the front end can retrieve all the related information in one place.

Automate state machine creation with AWS CloudFormation

We can use the state machine definition to recreate this state machine on any accounts. The template contains several variables, such as DynamoDB table names for tracking job status or Lambda functions that are triggered by states. The ARN of these resources changes in each deployment. Therefore, we use AWS SAM to inject these variables. You can find the state machine resource here. The following code is an excerpt of how we refer to the ASL file and how resources ARNs are passed:

TrainingModelStateMachine:
  Type: AWS::Serverless::StateMachine 
  Properties:
    DefinitionUri: statemachine/model-training.asl.json
    DefinitionSubstitutions:
      DeleteOldestEndpointFunctionArn: !GetAtt DeleteOldestEndpointFunction.Arn
      CheckDeploymentStatusFunctionArn: !GetAtt CheckDeploymentStatusFunction.Arn
      ModelTable: !Ref ModelTable
      HPOTable: !Ref HPOTable
    Policies: 
      - LambdaInvokePolicy:
          FunctionName: !Ref DeleteOldestEndpointFunction
    # .. the rest of policies is omitted for brevity 

ModelTable:
  Type: AWS::DynamoDB::Table
  Properties:
    AttributeDefinitions:
      - AttributeName: "trainingId"
        AttributeType: "S"
      - AttributeName: "created"
        AttributeType: "S"
    # .. the rest of policies is omitted for brevity

AWS::Serverless::StateMachine is an AWS SAM resource type. The DefinitionUri refers to the state machine definition we discussed in the last step. The definition has some variables, such as ${ModelTable}. See the following code:

"Update Status to READY": {
    "Type": "Task",
    "Resource": "arn:aws:states:::dynamodb:updateItem",
    "Parameters": {
        "TableName": "${ModelTable}",
        "Key": {
	…

When we run the AWS SAM CLI, the variables in this template are replaced by the key-value declared in DefinitionSubstitutions. In this case, the ${ModelTable} is replaced by the table name of the ModelTable resource created by AWS CloudFormation.

This way, the template is reusable and can be redeployed multiple times without any change to the state machine definition.

Build an API for the application

This application has five APIs:

POST /infer – Retrieves the inference result for the given model
GET /model – Retrieves all model information
POST /model – Starts a new model training job with data in the given S3 path
GET /hpo – Retrieves all HPO job information
POST /hpo – Starts a new HPO job with data in the given S3 path

We create each API with an AWS SAM template. The following code is a snippet of the POST /model endpoint:

  StartTrainingFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: functions/api/
      Handler: start_job.post
      Runtime: python3.7
      Environment:
        Variables:
          MODE: "MODEL"
          TRAINING_STATE_MACHINE_ARN: !Ref TrainingModelStateMachine
          # Other variables removed for brevity
      Policies:
        - AWSLambdaExecute
        - DynamoDBCrudPolicy:
            TableName: !Ref ModelTable
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - states:StartExecution
              Resource: !Ref TrainingModelStateMachine
      Events:
        PostModel:
          Type: Api
          Properties:
            Path: /model
            Method: post
            Auth:
              Authorizer: MyCognitoAuth
      Layers:
        - !Ref APIDependenciesLayer

We utilize several features from the AWS SAM template in this Lambda function. First, we pass the created state machine ARN via environment variables, using !Ref. Because the ARN isn’t available until the stack creation time, we use this method to avoid hardcoding.

Second, we follow the security best practices of the least privilege policy by using DynamoDBCrudPolicy in the AWS SAM policy template to give permission to modify the data in the specific DynamoDB table. For the permissions that aren’t available as a policy template (states:StartExecution), we define the policy statement directly.

Third, we control the access to this API by setting the Authorizer property. In the following example code, we allow only authenticated users in by an Amazon Cognito user pool to call this API. The authorizer is defined in the global section because it’s shared by all functions.

Globals:
  # Other properties are omitted for brevity…
  Api:
    Auth:
      Authorizers:
        MyCognitoAuth:
          UserPoolArn: !GetAtt UserPool.Arn # Can also accept an array

Finally, we use the Layers section to install API dependencies. This reduces the code package size and the build time during the development cycle. The referred APIDependenciesLayer is defined as follows:

  APIDependenciesLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: APIDependencies
      Description: Dependencies for API
      ContentUri: dependencies/api 
      CompatibleRuntimes:
        - python3.7
    Metadata:
      BuildMethod: python3.7 # This line tells SAM to install the library before packaging

Other APIs follow the same pattern. With this set up, our backend resources are managed in a .yaml file that you can version in Git and redeploy in any other account.

Build the front end and call the API

We build our front end using the React framework, which is hosted in an S3 bucket and CloudFront. We use the following template to deploy those resources and a shell script to build the static site and upload to the bucket.

We use the Amplify library to reduce coding efforts. We create a config file to specify which Amazon Cognito user pool to sign in to and which API Gateway URL to use. The example config file can be found here. The installation script generates the actual deployment file from the template and updates the pool ARN and URL automatically.

When we first open the website, we’re prompted to sign in with an Amazon Cognito user.

This authentication screen is generated by the Amplify library’s withAuthenticator() function in the App.js file. This function wraps the existing component and checks if the user has already logged in to the configured Amazon Cognito pool. If not, it shows the login screen before showing the component. See the following code:

import {withAuthenticator} from '@aws-amplify/ui-react';

// ...create an App that extends React.Component

// Wrap the application inside the Authenticator to require user to log in
export default withAuthenticator(withRouter(App));

After we sign in, the app component is displayed.

We can upload data to an S3 bucket and start HPO or train a new model. The UI also uses Amplify to upload data to Amazon S3. Amplify handles the authentication details for us, so we can easily upload files using the following code:

import { Storage} from "aws-amplify";

// … React logic to get file object when we click the Upload button
const stored = await Storage.vault.put(file.name, file, { 
        contentType: file.type,
});	
// stored.key will be passed to API for training

After we train a model, we can switch to inference functionality by using the drop-down menu on the top right.

On the next page, we select the model endpoint that has the READY status. Then we need to change the number of inputs. The number of inputs has to be the same as the number of features in the input file used to train the model. For example, if your input file has 19 features and one target value, we need to enter the first 18 inputs. For the last input, we have a range for the values from 1.1, 1.2, 1.3, all the way to 3.0. The purpose of allowing the last input to vary in a certain range is to understand the effects of changing that parameter on the model outcomes.

When we choose Predict, the front end calls the API to retrieve the result and display it in a graph.

The graph shows the target value as a function of values for the last input. Here, we can discover how the last input affects the target value, for the first given 18 inputs.

In the code, we also use Amplify to call the APIs. Just like in the Amazon S3 scenario, Amplify handles the authentication automatically, so we can call the API with the following code:

import {API} from "aws-amplify";

// Code to retrieve inputs and the selected endpoint from drop down box
const inferResult = await API.post("pyapi", `infer`, {
  body: {
    input: inputParam,
    modelName: selectedEndpoint,
    range: rangeInput
  }
});

Summary

In this post, we learned how to create a web application for performing custom deep learning model training and HPO using SageMaker. We learned how to orchestrate training, HPO, and endpoint creation using Step Functions. Finally, we learned how to create APIs and a web application to upload training data to Amazon S3, start and monitor training and HPO jobs, and perform inference.

Appendix A: Dockerize custom deep learning models on SageMaker

When working on deep learning projects, you can either use pre-built Docker images in SageMaker or build your own custom Docker image from scratch. In the latter case, you can still use SageMaker for training, hosting, and inference. This method allows developers and data scientists to package software into standardized units that run consistently on any platform that supports Docker. Containerization packages the code, runtime, system tools, system libraries, and settings all in the same place, isolating it from its surroundings, and ensures a consistent runtime regardless of where it runs.

When you develop a model in SageMaker, you can provide separate Docker images for the training code and the inference code, or you can combine them into a single Docker image. In this post, we build a single image to support both training and hosting.

We build on the approach used in the post Train and host Scikit-Learn models in Amazon SageMaker by building a Scikit Docker container, which uses the following example container folder to explain how SageMaker runs Docker containers for training and hosting your own algorithms. We strongly recommend you first review the aforementioned post, because it contains many details about how to run Docker containers on SageMaker. In this post, we skip the details of how containers work on SageMaker and focus on how to create them from an existing notebook that runs locally. If you use the folder structure that was described in preceding references, the key files are shown in the following container:

container/
    scripts/
        nginx.conf
        predictor.py
        serve
        train
        wsgi.py
    Dockerfile

We use Flask to launch an API to serve HTTP requests for inference. If you choose to run Flask for your service, you can use the following files from SageMaker sample notebooks as is:

Therefore, you only need to modify three files:

Dockerfile
train
py

We provide the local version of the code and briefly explain how to transform it into train and predictor.py formats that you can use inside a Docker container. We recommend you write your local code in a format that can be easily used in a Docker container. For training, there is not a significant difference between the two versions (local vs. Docker). However, the inference code requires significant changes.

Before going into details of how to prepare the train and predictor.py files, let’s look at the Dockerfile, which is a modified version of the previous work:

FROM python:3.6

RUN apt-get -y update && apt-get install -y --no-install-recommends 
         wget 
         python 
         nginx 
         ca-certificates 
    && rm -rf /var/lib/apt/lists/*

# Install all of the packages
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py

# install code dependencies
COPY "requirements.txt" .
RUN ["pip", "install", "-r", "requirements.txt"]

RUN pip list
# Env Variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml:${PATH}"

# Set up the program in the image
COPY scripts /opt/ml
WORKDIR /opt/ml

We use a different name (scripts) for the folder that contains the train and inference scripts.

SageMaker stores external model artifacts, training data, and other configuration information available to Docker containers in /opt/ml/. This is also where SageMaker processes model artifacts. We create local folders /opt/ml/ to make local testing mode similar to what happens inside the Docker container.

To understand how to modify your local code (in a Jupyter or SageMaker notebook) to be used in a Docker container, the easiest way is to compare it to what it looks like inside a Docker container.

The following notebook contains code (along with some dummy data after cloning the GitHub repo) for running Bayesian HPO and training for a deep learning regression model using Keras (with a TensorFlow backend) and Hyperopt library (for Bayesian HPO).

The notebook contains an example of running Bayesian HPO or training (referred to as Final Training in the code) for regression problems. Although HPO and Final Training are very similar processes, we treat these two differently in the code.

HPO and Final Training setup and parameters are quite similar. However, they have some important differences:

Only a fraction of the training data is used for HPO to reduce the runtime (controlled by the parameter used_data_percentage in the code).
Each iteration of HPO should be run by a very small number of epochs. The constructed networks allow different numbers of layers for the deep network (optimal number of layers to be found using HPO).
The number of nodes for each layer can be optimized.

For example, for a neural network with six dense layers, the network structure (controlled by user input) looks like the following visualizations.

The following image shows a neural network with five dense layers.

The following image shows a neural network with five dense layers, which also has dropout and batch normalization.

We have the option to have both dropout and batch normalization, or have only one, or not include either in your network.

The notebook loads the required libraries (Section 1) and preprocesses the data (Section 2). In Section 3, we define the train_final_model function to perform a final training, and in Section 4, we define the objective function to perform Bayesian HPO. In both functions (Sections 3 and 4), we define network architectures (in case of HPO in Section 4, we do it iteratively). You can evaluate the training and HPO using any metric. In this example, we are interested in minimizing the value of 95% quantile for the mean absolute error. You can modify this based on your interests.

Running this notebook up to Section 9 performs a training or HPO, based on the flag that you set up in the first line of code in Section 5 (currently defaulted to run the Final Training):

final_training = True

Every section in the notebook up to Section 9, except for Sections 5 and 8, is used as they are (with no change) in the train script for the Docker. Sections 5 and 8 have to be prepared differently for the Docker. In Section 5, we define parameters for Final Training or HPO. In Section 8, we simply define directories that contain the training data data and the directories that the training or HPO artifacts are saved to. We create an opt/ml folder to mimic what happens in the Docker, but we keep it outside of our main folder because it’s not required when Dockerizing.

To make the script in this notebook work in a Docker container, we need to modify Sections 5, 8, and 9. You can compare the difference in the train script. We have two new sections in the train script called 5-D and 8-D. D stands for the Docker version of the code (the order of sections has changed). Section 8-D defines directory names for storing the model artifacts. Therefore, you can use it with no changes for your future work. Section 5-D (the equivalent to Section 5 in the local notebook), might require modification for other use cases because we define the hyperparameters that are ingested by our Docker container.

As an example of how to add a hyperparameter in Section 5-D, check the variable nb_epochs, which specifies the number of epochs that each HPO job runs:

nb_epochs = trainingParams.get('nb_epochs', None)
if nb_epochs is not None:
    nb_epochs = int(nb_epochs)
else:
    nb_epochs = 5

For your use case, you might need to process these parameters differently. For instance, the optimizer is specified as a list of integers. Therefore, we need an eval function to turn it into a proper format and use the default value [‘adam’] when it’s not provided. See the following code:

optimizer = trainingParams.get('optimizer', None)
if optimizer is not None:
    optimizer = eval(optimizer)
else:
    optimizer =['adam']

Now let’s see how we need to write the inference code in local and Docker mode in Sections 10 and 11 of the notebook. This isn’t how you write an inference code locally, but if you’re working with Docker containers, we recommend writing your inference code as shown in Sections 10 and 11 so that you can quickly use it inside Dockers.

In Section 10, we define the model_path to load the saved model using the loadmodel function. We use ScoringService to keep the local code similar to what we have in predictor.py. You might need to modify this class depending on which framework you’re using for creating your model. This has been modified from its original form to work for a Keras model.

Then we define transform_data to prepare data sent for inference. Here, we load the scaler.pkl to normalize our data in the same way we normalized our training data.

In Section 11, we define the transformation function, which performs inference by reading the df_test.csv file. We removed the column names (headers) in this file from the data. Running the transformation function returns an array of predictions.

To use this code in a Docker container, we need to modify the path in Section 10:

prefix = '../opt/ml/'

The code is modified to the following line (line 38) in predictor.py:

prefix = '/opt/ml/'

This is because in local mode, we keep the model artifact outside of the Docker files. We need to include an extra section (Section 10b-D in predictor.py), which wasn’t used in the notebook. This section can be used as is for other Dockers as well. The next section that needs to be included in predictor.py is Section 11-D (a modified version of Section 11 in the notebook).

After making these changes, you can build your Docker container, push it to Amazon ECR, and test if it can complete a training job and do inference. You can use the following notebook to test your Docker.

About the Authors

Mehdi E. Far is a Sr Machine Learning Specialist SA within the Manufacturing and Industrial Global and Strategic Accounts organization. He helps customers build Machine Learning and Cloud solutions for their challenging problems.

Chadchapol Vittavutkarnvej is a Specialist Solutions Architect Builder Based in Amsterdam, Netherlands.

How Muthoni Ngatia’s childhood in Kenya led her to a career in economics

May 3, 2021

by admin Amazon AWS

The Amazon economist says lessons from her mother taught her a lot about how the world works, and why economics plays such a vital role.Read More

Amazon helps launch workshop on synthetic data generation

May 3, 2021

by admin Amazon AWS

Workshop at ICLR 2021 unites communities investigating synthetic data generation to improve machine learning and protect privacy.Read More

Introducing hierarchical deletion to easily clean up unused resources in Amazon Forecast

April 30, 2021

by Alex Kim Amazon AWS

Amazon Forecast just launched the ability to hierarchically delete resources at a parent level without having to locate the child resources. You can stay focused on building value-adding forecasting systems and not worry about trying to manage individual resources that are created in your workflow. Forecast uses machine learning (ML) to generate more accurate demand forecasts, without requiring any prior ML experience. Forecast brings the same technology used at Amazon.com to developers as a fully managed service, removing the need to manage resources or rebuild your systems.

When importing data, training a predictor, and creating forecasts, Forecast generates resources related to the dataset group. For example, when a predictor is generated using a dataset group, the predictor is the child resource and the dataset group is the parent resource. Previously, it was difficult to delete resources while building your forecasting system because you had to delete the child resources first, and then delete the parent resources. This was especially difficult and time-consuming because deleting resources required you to understand the various resource hierarchies, which weren’t immediately visible.

As you experiment and create multiple dataset groups, predictors, and forecasts, the resource hierarchy can become complicated. However, this streamlined hierarchical deletion method allows you to quickly clean up resources without having to worry about understanding the resource hierarchy.

In this post, we walk through the Forecast console experience of deleting all the resource types that are supported by Forecast. You can also perform hierarchical deletion by referencing the Deleting Resources page. To delete individual or child resources one at a time, you can continue to use the existing APIs such as DeleteDataset, DeleteDatasetGroup, DeleteDatasetImportJob, DeleteForecast, DeleteForecastExportJob, DeletePredictor and DeletePredictorBacktestExportJob.

Delete dataset group resources

To delete a dataset group when it doesn’t have any child resources, a simple dialog is displayed. You can delete the chosen resource by entering delete and choosing Delete.

When a dataset group has underlying child resources such as predictors, predictor backtest export jobs, forecasts, and forecast export jobs, a different dialog is displayed. After you enter delete and choose Delete, all these child resources are deleted, including the selected dataset group resource.

Delete dataset resources

For a dataset resource without child resources, you see a simple dialog is during the delete operation.

When a dataset has child dataset import jobs, the following dialog is displayed.

Delete predictor resources

For a predictor resource without child resources, the following simple dialog is displayed.

When the predictor resource has underlying child resources such as predictor backtest export jobs, forecasts, or forecast export jobs, the following dialog is displayed. If you proceed with the delete action, all these child resources are deleted, including the selected predictor resource.

Delete a forecast resource

For a forecast resource without child resources, the following dialog is displayed.

When a forecast resource has underlying child resources such as forecast export jobs, the following dialog is displayed.

Delete dataset import job, predictor backtest export job, or forecast export job resources

The dataset import job, predictor backtest export job, and forecast export job resources don’t have any child resources. Therefore, when you choose to delete any of these resources via the Forecast console, a simple delete dialog is displayed. When you proceed with the delete, only the selected resources are deleted.

For example, when deleting a dataset import job resource, the following dialog is displayed.

Conclusion

You now have more flexibility when deleting a resource or an entire hierarchy of resources. To get started with this capability, see the Deleting Resources page and go through the notebook in our GitHub repo that walks you through how to perform hierarchical deletion. You can use this capability in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.

About the Authors

Alex Kim is a Sr. Product Manager for Amazon Forecast. His mission is to deliver AI/ML solutions to all customers who can benefit from it. In his free time, he enjoys all types of sports and discovering new places to eat.

Ranga Reddy Pallelra works as an SDE on the Amazon Forecast team. In his current role, he works on large-scale distributed systems with a focus on AI/ML. In his free time, he enjoys listening to music, watching movies, and playing racquetball.

Shannon Killingsworth is a UX Designer for Amazon Forecast and Amazon Personalize. His current work is creating console experiences that are usable by anyone, and integrating new features into the console experience. In his spare time, he is a fitness and automobile enthusiast.

Caltech names eight AI4Science fellows supported by Amazon

April 30, 2021

by admin Amazon AWS

Amazon is collaborating with Caltech to support research, education, and outreach programs that help build bridges between AI and other areas of science and engineering.Read More

Translate All: Automating multiple file type batch translation with AWS CloudFormation

April 29, 2021

by Cyrus Wong Amazon AWS

This is a guest post by Cyrus Wong, an AWS Machine Learning Hero. You can learn more about and connect with AWS Machine Learning Heroes at the community page.

On July 29, 2020, AWS announced that Amazon Translate now supports Microsoft Office documents, including .docx, .xlsx, and .pptx.

The world is full of bilingual countries and cities like Hong Kong. I find myself always needing to prepare Office documents and presentation slides in both English and Chinese. Previously, it could be quite time-consuming to prepare the translated documents manually, and this approach can also lead to more errors. If I try to just select all, copy, and paste into a translation tool, then copy and paste the result into a new file, I lose all the formatting and images! My old method was to copy content piece by piece, translate it, then copy and paste it into the original document, over and over again. The new support for Office documents in Amazon Translate is really great news for teachers like me. It saves you a lot of time!

Still, we have to sort the documents by their file types and call Amazon Translate separately for different file types. For example, if I have notes in .docx files, presentations in .pptx files, and data in .xlsx files, I still have to sort them by their file type and send different TextTranslation API calls. In this post, I show how to sort content by document type and make batch translation calls. This solution automates the undifferentiated task of sorting files.

For my workflow, I need to upload all my course materials in a single Amazon Simple Storage Service (Amazon S3) bucket. This bucket often includes different types of files in one folder and subfolders.

However, when I start the Amazon Translate job on the console, I have to choose the file content type. The problem arises when different file types are in one folder without subfolders.

Therefore, our team developed a solution that we’ve called Translate All—a simple AWS serverless application to resolve those challenges and make it easier to integrate with other projects. A serverless architecture is a way to build and run applications and services without having to manage infrastructure. Your application still runs on servers, but all the server management is done by AWS. You no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. For more information about serverless computing, see Serverless on AWS.

Solution overview

We use the following AWS services to run this solution:

AWS Lambda – This serverless compute service lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration.
Amazon Simple Notification Service – Amazon SNS is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication. The A2A pub/sub functionality provides topics for high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications.
Amazon Simple Queue Service – Amazon SQS is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. Amazon SQS eliminates the complexity and overhead associated with managing and operating message-oriented middleware, and empowers developers to focus on differentiating work.
AWS Step Functions – This serverless function orchestrator makes it easy to sequence Lambda functions and multiple AWS services into business-critical applications. Through its visual interface, you can create and run a series of checkpointed and event-driven workflows that maintain the application state. The output of one step acts as an input to the next. Each step in your application runs in order, as defined by your business logic.

In our solution, if a JSON message is sent to the SQS queue, it triggers a Lambda function to start a Step Functions state machine (see the following diagram).

The state machine includes the following high-level steps:

In the Copy to Type Folder stage, the state machine gets all the keys under InputS3Uri and copies each type of file into a type-specific, individual folder. If subfolders exist, it replaces / with _ForwardSlash_.

The following screenshot shows the code for this step.

The following screenshot shows the output files.

The Parallel Map stage arranges the data according to contentTypes and starts the translation job workflow in parallel.

The Start Translation Job workflow loops for completion status until all translate jobs are complete.
In the Copy to Parent Folder stage, the step machine reconstructs the original input folder structure and generates a signed URL that remains valid for 7 days.
The final stage publishes the results to Amazon SNS.

Additional considerations

When implementing this solution, consider the following:

As of this writing, we just handle the happy path and assume that all jobs are in Completed status at the end
The default job completion maximum period is 180 minutes; you can change the NumberOfIteration variable to extend it as needed
You can’t use the reserved words for file name or folder name: !!plain!!, !!html!!, !!document!!, !!presentation!!, !!sheet!!, !!document!!, or -_ForwardSlash_-

Deploy the solution

To deploy this solution, complete the following steps:

Open the Serverless Application Repository link.
Select I acknowledge that this app creates custom IAM roles.
Choose Deploy.

When the AWS CloudFormation console appears, note the input and output parameters on the Outputs

Test the solution

In this section, we walk you through using the application.

On the Amazon SQS console, subscribe your email to TranslateCompletionSNSTopic .
Upload all files into the folder InputBucket.
Send a message to TranlateQueue. See the following example code:

{
  "JobName": "testing",
  "InputBucket": "//enter your InputBucket from the CloudFormation console//",
  "InputS3Uri": "test",
  "OutputBucket": "//enter your OutputBucket from the CloudFormation console//",
  "SourceLanguageCode": "en",
  "TargetLanguageCodes": [
    "zh-TW"
  ]
}

You receive the translation job result as an email.

The email contains a presigned URL with 7-day valid period. You can share the translated file without having to sign in to the AWS Management Console.

Conclusion

With this solution, my colleagues and I easily resolved our course materials translation problem. We saved a lot of time compared to opening the files one by one and copy-and-pasting repeatedly. The translation quality is good and eliminates the potential for errors that can often come with undifferentiated manual workflows. Now we can just use the AWS Command Line Interface (AWS CLI) to run the Amazon S3 sync command to upload our files into an S3 bucket and translate the all the course materials at once. Using this tool to leverage a suite of powerful AWS services has empowered my team to spend less time processing course materials and more time educating the next generation of cloud technology professionals!

Project collaborators include Mike Ng, Technical Program Intern at AWS, Brian Cheung, Sam Lam, and Pearly Law from the IT114115 Higher Diploma in Cloud and Data Centre Administration. This post was edited with Greg Rushing’s contribution.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

About the Author

Cyrus Wong is Data Scientist of Cloud Innovation Centre at the IT Department of the Hong Kong Institute of Vocational Education (Lee Wai Lee). He has achieved all 13 AWS Certifications and actively promotes the use of AWS in different media and events. His projects received four Hong Kong ICT Awards in 2014, 2015, and 2016, and all winning projects are running solely on AWS with Data Science and Machine Learning.

Scale session-aware real-time product recommendations on Shopify with Amazon Personalize and Amazon EventBridge

April 29, 2021

by Jeff McKelvey Amazon AWS

This is a guest post by Jeff McKelvey, Principal Development Lead at HiConversion. The team at HiConversion has collaborated closely with James Jory, Applied AI Services Solutions Architect at AWS, and Matt Chwastek, Senior Product Manager for Amazon Personalize at AWS. In their own words, “HiConversion is the eCommerce Intelligence platform helping merchants personalize and optimize shopping experiences for every visitor session.”

Shopify powers over 1 million online businesses worldwide. It’s an all-in-one commerce platform to start, run, and grow a brand. Shopify’s mission is to reduce the barriers to business ownership, making commerce more equitable for everyone.

With over 50% of ecommerce sales coming from mobile shoppers, one of the challenges limiting future growth for Shopify’s merchants is effective product discovery. If visitors can’t quickly find products of interest on a merchant’s site, they leave, often for good.

That’s why we introduced HiConversion Recommend, a Shopify Plus certified application powered by Amazon Personalize. This application helps Shopify merchants deliver personalized product discovery experiences based on a user’s in-session behavior and interests directly on their own storefront.

We chose to integrate Amazon Personalize into the HiConversion Recommend application because it makes the same machine learning (ML) technology used by Amazon.com accessible to more Shopify merchants. This enables merchants to generate product recommendations that adapt to visitor actions and behavioral context in real time.

In this post, we describe the architectures used in our application for serving recommendations as well as synchronizing events and catalog updates in real time. We also share some of the results for session-based personalization from a customer using the application.

Private, fully managed recommendation systems

Amazon Personalize is an AI service from AWS that provides multiple ML algorithms purpose-built for personalized recommendation use cases. When a Shopify merchant installs the HiConversion Recommend application, HiConversion provisions a dedicated, private environment, represented as a dataset group within Amazon Personalize, for that merchant.

Then data from the merchant’s catalog as well as the browsing and purchase history of their shoppers is uploaded into datasets within the dataset group. Private ML models are trained using that data and deployed to unique API endpoints to provide real-time recommendations. Therefore, each merchant has their own private ML-based recommendation system, isolated from other merchants, which is fully managed by HiConversion.

HiConversion also creates and manages the resources needed to stream new events and catalog updates from a merchant’s Shopify storefront directly into Amazon Personalize. This enables the real-time capabilities of Amazon Personalize, such as learning the interests of new shoppers to the storefront, adapting to each evolving shopper intent, and incorporating new products in recommendations.

We can also apply business rules using Amazon Personalize filters, enabling merchants to tailor recommendations to a particular category of products, excluding recently purchased products from being recommended, and more.

Serving millions of online shoppers in real time

Creating a premium, self-service Shopify application based on Amazon Personalize required the automation of many processes. Our goal was to democratize access to an advanced product discovery solution, making it easy to use by anyone running their store on Shopify.

To provide a seamless, real-time personalized user experience, an event-driven approach was needed to ensure that Shopify, Amazon Personalize, and HiConversion had the same picture of the visitor and product catalog at all times. For this, we chose to use Shopify’s integration with Amazon EventBridge as well as Amazon Simple Queue Service (Amazon SQS) and AWS Lambda.

The following high-level diagram illustrates how HiConversion Recommend manages the data connections between users and their product recommendations.

As shown in our diagram, AWS Lambda@Edge connects with three independent systems that provide the essential application capabilities:

Amazon Personalize campaign – A custom API endpoint enabling real-time product recommendations based on Amazon Personalize algorithms.
HiConversion Rich Data endpoint – This enables hybrid product recommendations based on a mix of HiConversion visitor and web analytics, and Amazon Personalize ranking algorithms.
Amazon CloudFront endpoint – This enables rapid access to product metadata, like product images, pricing, and inventory, in combination with Amazon Simple Storage Service (Amazon S3).

When a Shopify merchant activates the HiConversion Recommend application, all of this infrastructure is automatically provisioned and training of the Amazon Personalize ML models is initiated—dramatically reducing the time to go live.

Why Lambda@Edge?

According to a 2017 Akamai study, a 100-millisecond delay in website load time can hurt conversion rates by up to 7%. Bringing an application to Shopify’s global network of stores means we had to prioritize performance globally.

We use Lambda@Edge on the front end of our application to track visitor activity and contextual data, allowing us to deliver product recommendations to visitors on Shopify-powered ecommerce sites with low latency. Putting our code as close as possible to shoppers improves overall performance and leads to reduced latency.

We chose Lambda@Edge to maximize the availability of our content delivery network. Lambda@Edge also removes the need to provision or manage infrastructure in multiple locations around the world; it allows us to reduce costs while providing a highly scalable system.

Scaling with EventBridge for Shopify

Our application launched during the busiest time of the year—the holiday shopping season. One thing that stood out was the massive increase in promotional and business activity from our live customers. Due to those promotions, the frequency of catalog changes in our clients’ Shopify stores rapidly increased.

Our original implementation relied on Shopify webhooks, allowing us to take specific actions in response to connected events. Due to the increasing volume of data through our application, we realized that keeping the product metadata and the real-time product recommendations in sync was becoming problematic.

This was particularly common when large merchants started to use our application, or when merchants launched flash sales. The subsequent firehose of incoming data meant that our application infrastructure was at risk of not being able to keep up with the onslaught of web traffic, leading to broken shopping experiences for customers.

We needed a separate, more scalable solution.

We needed a solution that would scale with our customer base and our customers’ traffic. Enter EventBridge: a serverless, event-driven alternative to receiving webhooks via standard HTTP. Integrating with EventBridge meant that Shopify could directly send event data securely to AWS, instead of handling all that traffic within our own application.

Event-driven solutions like EventBridge provide a scalable buffer between our application and our addressable market of hundreds of thousands of live Shopify stores. It allows us to process events at the rate that works for our tech stack without getting overwhelmed. It’s highly scalable and resilient, is able to accept more event-based traffic, and reduces our infrastructure cost and complexity.

The following diagram illustrates how HiConversion Recommend uses EventBridge to enable our real-time product recommendation architecture with Amazon Personalize.

The architecture includes the following components:

The Amazon Personalize putEvents() API enables product recommendations that consider real-time visitor action and context. Visitor activity and contextual data is captured by the HiConversion web analytics module and sent to a Lambda@Edge function. We then use Amazon SQS and a Lambda function to stream events to an Amazon Personalize event tracker endpoint.
EventBridge notifies Amazon Personalize about product catalog changes via a Lambda function dedicated to that purpose. For example, Amazon Personalize can recommend new products even when they don’t have prior order history.
EventBridge also keeps Shopify product metadata in sync with HiConversion’s metadata stored in Amazon S3 for real-time delivery via Amazon CloudFront.

Ultimately, EventBridge replaced an undifferentiated custom implementation within our architecture with a fully managed solution able to automatically scale, allowing us to focus on building features that deliver differentiated value to our customers.

Measuring the effectiveness of session-based product recommendations on Shopify

Many product recommendation solutions are available to Shopify merchants, each using different types of algorithms. To measure the effectiveness of session-based recommendations from Amazon Personalize, and to indulge our data curious team culture, we ran an objective experiment.

Selecting a technology in todays’ economy is challenging, so we designed this experiment to assist merchants in determining for themselves the effectiveness of Amazon Personalize session-based algorithms.

We started with a hypothesis: If session-based recommendation algorithms can adapt to visitors’ actions and context in real time, they should produce improved results when visitor intent and preferences suddenly shift.

To test our hypothesis, we identified a predictable shift in intent and preferences to:

Visitor preferences – Before the holiday season, visitors are typically buying something for themselves, whereas during the holiday season, visitors are also buying things for others.
Visitor profiles – Visitor profiles before the holiday season are different than during the holidays. The holiday season sees more new visitors who have never purchased before and whose preferences are unknown.
Brand promotions – During the holiday season, brands aggressively promote their offerings, which impact visitor behavior and decision-making.

To evaluate our hypothesis, we compared pre-holiday product recommendation results with results from peak holiday time. One of our clients, a large and successful cosmetics brand, found that product recommendation improved Revenue Per Visitor (RPV) +113% when compared to pre-holiday.

Pre-holiday product recommendation results

First, we looked at the percentage of overall revenue impacted by personalized product recommendations. For this client, only 7.7% of all revenues were influenced by personalized product recommendations compared to non-personalized experiences.

Second, we looked at the RPV—the most important metric for measuring new ecommerce revenue growth. Conversion Rate (CR) or Average Order Value (AOV) only tell us part of the story and can be misleading if relied on alone.

For example, a merchant can increase the site conversion rate with aggressive promotions that actually lead to a decline in the average order value, netting a drop in overall revenue.

Based on our learnings, at HiConversion we evangelize RPV as the metric to use when measuring the effectiveness of an ecommerce product recommendation solution.

In this example, visitors who engaged with recommended products had over 175% higher RPV than visitors who did not.

Our analysis illustrates that session-based product recommendations are very effective. If recommendations weren’t effective, visitors who engaged with recommendations wouldn’t have seen a higher RPV when compared with those that didn’t engage with recommended products.

Peak-holiday product recommendation results.

A leading indicator that session-based recommendations were working was the increase in the percentage of overall sales influenced by personalized recommendations. It grew from 7.7% before the holidays to 14.6% during the holiday season.

This data is even more impressive when we looked at RPV lift. Visitors who engaged with personalized recommendations had over 259% higher RPV than those who didn’t.

In comparison, session-based recommendations over-performed pre-holiday RPV lift.

	Before Holidays	During Holidays	Relative Lift
RPV Lift	175.04%	258.61%	47.74%

New revenue calculations

Based on the preceding data points, we can calculate new revenues attributable directly to HiConversion Recommend.

	Before Holidays	During Holidays
RPV (personalized)	$6.84	$14.70
RPV (non-personalized)	$2.49	$4.10
Visits (personalized)	33,862	97,052
Visits (non-personalized)	1,143,147	2,126,693
Revenue attributable to HiConversion Recommend	$147,300	$1,028,751
% of all revenue attributable to HiConversion Recommend	5%	10%

These calculations make a strong case for the high ROI of using HiConversion Recommend, when considering the new revenue potential created by session-based recommendations.

Conclusions

Product recommendations for Shopify—powered by Amazon Personalize—are an effective way of engaging and converting more new shoppers. To prove it, we have built a challenge to show you how quickly you can achieve measurable, positive ROI. To get started, sign up for the 7-Day Product Recommendation Challenge.

A well-designed and scalable solution is particularly important in serving a massive, global customer. And because session-based, real-time personalization is a differentiator to drive ecommerce growth, it’s extremely important to consider the best technology partner for your business.

About the Authors

Jeff McKelvey is the Principal Development Lead at HiConversion.

James Jory is a Solutions Architect in Applied AI with AWS. He has a special interest in personalization and recommender systems and a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and auto racing simulation.

Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.

Annotate dense point cloud data using SageMaker Ground Truth

April 29, 2021

by Vidya Sagar Ravipati Amazon AWS

Autonomous vehicle companies typically use LiDAR sensors to generate a 3D understanding of the environment around their vehicles. For example, they mount a LiDAR sensor on their vehicles to continuously capture point-in-time snapshots of the surrounding 3D environment. The LiDAR sensor output is a sequence of 3D point cloud frames (the typical capture rate is 10 frames per second). Amazon SageMaker Ground Truth makes it easy to label objects in a single 3D frame or across a sequence of 3D point cloud frames for building machine learning (ML) training datasets. Ground Truth also supports sensor fusion of camera and LiDAR data with up to eight video camera inputs.

As LiDAR sensors become more accessible and cost-effective, customers are increasingly using point cloud data in new spaces like robotics, signal mapping, and augmented reality. Some new mobile devices even include LiDAR sensors, one of which supplied the data for this post! The growing availability of LiDAR sensors has increased interest in point cloud data for ML tasks, like 3D object detection and tracking, 3D segmentation, 3D object synthesis and reconstruction, and even using 3D data to validate 2D depth estimation.

Although dense point cloud data is rich in information (over 1 million point clouds), it’s challenging to label because labeling workstations often have limited memory, and graphics capabilities and annotators tend to be geographically distributed, which can increase latency. Although large numbers of points may be renderable in a labeler’s workstation, labeler throughput can be reduced due to rendering time when dealing with multi-million sized point clouds, greatly increasing labeling costs and reducing efficiency.

A way to reduce these costs and time is to convert point cloud labeling jobs into smaller, more easily rendered tasks that preserve most of the point cloud’s original information for annotation. We refer to these approaches broadly as downsampling, similar to downsampling in the signal processing domain. Like in the signal processing domain, point cloud downsampling approaches attempt to remove points while preserving the fidelity of the original point cloud. When annotating downsampled point clouds, you can use the output 3D cuboids for object tracking and object detection tasks directly for training or validation on the full-size point cloud with little to no impact on model performance while saving labeling time. For other modalities, like semantic segmentation, in which each point has its own label, you can use your downsampled labels to predict the labels on each point in the original point cloud, allowing you to perform a tradeoff between labeler cost (and therefore amount of labeled data) and a small amount of misclassifications of points in the full-size point cloud.

In this post, we walk through how to perform downsampling techniques to prepare your point cloud data for labeling, then demonstrate how to upsample your output labels to apply to your original full-size dataset using some in-sample inference with a simple ML model. To accomplish this, we use Ground Truth and Amazon SageMaker notebook instances to perform labeling and all preprocessing and postprocessing steps.

The data

The data we use in this post is a scan of an apartment building rooftop generated using the 3D Scanner App on an iPhone12 Pro. The app allows you to use the built-in LiDAR scanners on mobile devices to scan a given area and export a point cloud file. In this case, the point cloud data is in xyzrgb format, an accepted format for a Ground Truth point cloud. For more information about the data types allowed in a Ground Truth point cloud, see Accepted Raw 3D Data Formats.

The following image shows our 3D scan.

Methods

We first walk through a few approaches to reduce dataset size for labeling point clouds: tiling, fixed step sample, and voxel mean. We demonstrate why downsampling techniques can increase your labeling throughput without significantly sacrificing annotation quality, and then we demonstrate how to use labels created on the downsampled point cloud and apply them to your original point cloud with an upsampling approach.

Downsampling approaches

Downsampling is taking your full-size dataset and either choosing a subset of points from it to label, or creating a representative set of new points that aren’t necessarily in the original dataset, but are close enough to allow labeling.

Tiling

One naive approach is to break your point cloud space into 3D cubes, otherwise known as voxels, of (for example) 500,000 points each that are labeled independently in parallel. This approach, called tiling, effectively reduces the scene size for labeling.

However, it can greatly increase labeling time and costs, because a typical 8-million-point scene may need to be broken up into over 16 sub-scenes. The large number of independent tasks that result from this method means more annotator time is spent on context switching between tasks, and workers may lose context when the scene is too small, resulting in mislabeled data.

Fixed step sample

An alternative approach is to select or create a reduced number of points by a linear subsample, called a fixed step sample. Let’s say you want to hit a target of 500,000 points (we have observed this is generally renderable on a consumer laptop—see Accepted Raw 3D Data Format), but you have a point cloud with 10 million points. You can calculate your step size as step = 10,000,000 / 500,000 = 20. After you have a step size, you can select every 20th point in your dataset, creating a new point cloud. If your point cloud data is of high enough density, labelers should still be able to make out any relevant features for labeling even though you may only have 1 point for every 20 in the original scene.

The downside of this approach is that not all points contribute to the final downsampled result, meaning that if a point is one of few important ones, but not part of the sample, your annotators may miss the feature entirely.

Voxel mean

An alternate form of downsampling that uses all points to generate a downsampled point cloud is to perform grid filtering. Grid filtering means you break the input space into regular 3D boxes (or voxels) across the point cloud and replace all points within a voxel with a single representative point (the average point, for example). The following diagram shows an example voxel red box.

If no points exist from the input dataset within a given voxel, no point is added to the downsampled point cloud for that voxel. Grid filtering differs from a fixed step sample because you can use it to reduce noise and further tune it by adjusting the kernel size and averaging function to result in slightly different final point clouds. The following point clouds show the results of simple (fixed step sample) and advanced (voxel mean) downsampling. The point cloud downsampled using the advanced method is smoother, this is particularly noticeable when comparing the red brick wall in the back of both scenes.

Upsampling approach

After downsampling and labeling your data, you may want to see the labels produced on the smaller, downsampled point cloud projected onto the full-size point cloud, which we call upsampling. Object detection or tracking jobs don’t require post-processing to do this. Labels in the downsampled point cloud (like cuboids) are directly applicable to the larger point cloud because they’re defined in a world coordinate space shared by the full-size point cloud (x, y, z, height, width, length). These labels are minimally susceptible to very small errors along the boundaries of objects when a boundary point wasn’t in the downsampled dataset, but such occasional and minor errors are outweighed by the number of extra, correctly labeled points within the cuboid that can also be trained on.

For 3D point cloud semantic segmentation jobs, however, the labels aren’t directly applicable to the full-size dataset. We only have a subset of the labels, but we want to predict the rest of the full dataset labels based on this subset. To do this, we can use a simple K-Nearest Neighbors (K-NN) classifier with the already labeled points serving as the training set. K-NN is a simple supervised ML algorithm that predicts the label of a point using the “K” closest labeled points and a weighted vote. With K-NN, we can predict the point class of the rest of the unlabeled points in the full-size dataset based on the majority class of the three closest (by Euclidean distance) points. We can further refine this approach by varying the hyperparameters of a K-NN classifier, like the number of closest points to consider as well as the distance metric and weighting scheme of points.

After you map the sample labels to the full dataset, you can visualize tiles within the full-size dataset to see how well the upsampling strategy worked.

Now that we’ve reviewed the methods used in this post, we demonstrate these techniques in a SageMaker notebook on an example semantic segmentation point cloud scene.

Prerequisites

To walk through this solution, you need the following:

An AWS account.
A notebook AWS Identity and Access Management (IAM) role with the permissions required to complete this walkthrough. Your IAM role must have the following AWS managed policies attached:
- AmazonS3FullAccess
- AmazonSageMakerFullAccess
An Amazon Simple Storage Service (Amazon S3) bucket where the notebook artifacts (input data and labels) are stored.
A SageMaker work team. For this post, we use a private work team. You can create a work team on the SageMaker console.

Notebook setup

We use the notebook ground_truth_annotation_dense_point_cloud_tutorial.ipynb in the SageMaker Examples section of a notebook instance to demonstrate these downsampling and upsampling approaches. This notebook contains all code required to perform preprocessing, labeling, and postprocessing.

To access the notebook, complete the following steps:

Create a notebook instance. You can use the instance type, ml.t2.xlarge, to launch the notebook instance. Please choose an instance with at least 16 GB of RAM.
1. You need to use the notebook IAM role you created early. This role allows your notebook to upload your dataset to Amazon S3 and call the solution APIs.
Open Jupyter Lab or Jupyter to access your notebook instance.
In Jupyter, choose the SageMaker Examples In Jupyter Lab, choose the SageMaker icon.
Choose Ground Truth Labeling Jobs and then choose the ipynb notebook.
If you’re using Jupyter, choose Use to copy the notebook to your instance and run it. If you’re in Jupyter lab, choose Create a Copy.

Provide notebook inputs

First, we modify the notebook to add our private work team ARN and the bucket location we use to store our dataset as well as our labels.

Section 1: Retrieve the dataset and visualize the point cloud

We download our data by running Section 1 of our notebook, which downloads our dataset from Amazon S3 and loads the point cloud into our notebook instance. We download custom prepared data from an AWS owned bucket. An object called rooftop_12_49_41.xyz should be in the root of the S3 bucket. This data is a scan of an apartment building rooftop custom generated on a mobile device. In this case, the point cloud data is in xyzrgb format.

We can visualize our point cloud using the Matplotlib scatter3d function. The point cloud file contains all the correct points but isn’t rotated correctly. We can rotate the object around its axis by multiplying the point cloud by a rotation matrix. We can obtain a rotation matrix using scipy and specify the degree changes we want to make to each axis using the from_euler method:

!aws s3 cp s3://smgt-downsampling-us-east-1-322552456788/rooftop_12_49_41.xyz pointcloud.xyz

# Let's read our dataset into a numpy file
pc = np.loadtxt("pointcloud.xyz", delimiter=",")

print(f"Loaded points of shape {pc.shape}")

# playing with view of 3D scene

from scipy.spatial.transform import Rotation

def plot_pointcloud(pc, rot = [[30,90,60]], color=True, title="Simple Downsampling 1", figsize=(50,25), verbose=False):
    if rot:
        rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True)
        R1 = rot1.as_matrix()
        if verbose:
            print('Rotation matrix:','n',R1)
            
        # matrix multiplication between our rotation matrix and pointcloud 
        pc_show = np.matmul(R1, pc.copy()[:,:3].transpose() ).transpose()
        if color:
            try:
                rot_color1 = np.matmul(R1, pc.copy()[:,3:].transpose() ).transpose().squeeze()
            except:
                rot_color1 = np.matmul(R1, np.tile(pc.copy()[:,3],(3,1))).transpose().squeeze()
    else:
        pc_show = pc
            
    fig = plt.figure( figsize=figsize)
    ax = fig.add_subplot(111, projection="3d")
    ax.set_title(title, fontdict={'fontsize':20})
    if color:
        ax.scatter(pc_show[:,0], pc_show[:,1], pc_show[:,2], c=rot_color1[:,0], s=0.05)
    else:
        ax.scatter(pc_show[:,0], pc_show[:,1], pc_show[:,2], c='blue', s=0.05)
        
# rotate in z direction 30 degrees, y direction 90 degrees, and x direction 60 degrees
rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True) 
print('Rotation matrix:','n', rot1.as_matrix())

plot_pointcloud(pc, rot = [[30,90,60]], color=True, title="Full pointcloud", figsize=(50,30))

Section 2: Downsample the dataset

Next, we downsample the dataset to less than 500,000 points, which is an ideal number of points for visualizing and labeling. For more information, see the Point Cloud Resolution Limits in Accepted Raw 3D Data Formats. Then we plot the results of our downsampling by running Section 2.

As we discussed earlier, the simplest form of downsampling is to choose values using a fixed step size based on how large we want our resulting point cloud to be.

A more advanced approach is to break the input space into cubes, otherwise known as voxels, and choose a single point per box using an averaging function. A simple implementation of this is shown in the following code.

You can tune the target number of points and box size used to see the reduction in point cloud clarity as more aggressive downsampling is performed.

#Basic Approach
target_num_pts = 500_000
subsample = int(np.ceil(len(pc) / target_num_pts))
pc_downsample_simple = pc[::subsample]
print(f"We've subsampled to {len(pc_downsample_simple)} points")

#Advanced Approach
boxsize = 0.013 # 1.3 cm box size.
mins = pc[:,:3].min(axis=0)
maxes = pc[:,:3].max(axis=0)
volume = maxes - mins
num_boxes_per_axis = np.ceil(volume / boxsize).astype('int32').tolist()
num_boxes_per_axis.extend([1])

print(num_boxes_per_axis)

# For each voxel or "box", use the mean of the box to chose which points are in the box.
means, _, _ = scipy.stats.binned_statistic_dd(
    pc[:,:4],
    [pc[:,0], pc[:,1], pc[:,2], pc[:,3]], 
    statistic="mean",
    bins=num_boxes_per_axis,
)
x_means = means[0,~np.isnan(means[0])].flatten()
y_means = means[1,~np.isnan(means[1])].flatten()
z_means = means[2,~np.isnan(means[2])].flatten()
c_means = means[3,~np.isnan(means[3])].flatten()
pc_downsample_adv = np.column_stack([x_means, y_means, z_means, c_means])
print(pc_downsample_adv.shape)

Section 3: Visualize the 3D rendering

We can visualize point clouds using a 3D scatter plot of the points. Although our point clouds have color, our transforms have different effects on color, so comparing them in a single color provides a better comparison. We can see that the advanced voxel mean method creates a smoother point cloud because averaging has a noise reduction effect. In the following code, we can look at our point clouds from two separate perspectives by multiplying our point clouds by different rotation matrices.

When you run Section 3 in the notebook, you also see a comparison of a linear step approach versus a box grid approach, specifically in how the box grid filter has a slight smoothing effect on the overall point cloud. This smoothing could be important depending on the noise level of your dataset. Modifying the grid filtering function from mean to median or some other averaging function can also improve the final point cloud clarity. Look carefully at the back wall of the simple (fixed step size) and advanced (voxel mean) downsampled examples, notice the smoothing effect the voxel mean method has compared to the fixed step size method.

rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True)
R1 = rot1.as_matrix()

simple_rot1 = pc_downsample_simple.copy()
simple_rot1 = np.matmul(R1, simple_rot1[:,:3].transpose() ).transpose()
advanced_rot1 = pc_downsample_adv.copy()
advanced_rot1 = np.matmul(R1, advanced_rot1[:,:3].transpose() ).transpose()

fig = plt.figure( figsize=(50, 30))
ax = fig.add_subplot(121, projection="3d")
ax.set_title("Simple Downsampling 1", fontdict={'fontsize':20})
ax.scatter(simple_rot1[:,0], simple_rot1[:,1], simple_rot1[:,2], c='blue', s=0.05)
ax = fig.add_subplot(122, projection="3d")
ax.set_title("Voxel Mean Downsampling 1", fontdict={'fontsize':20})
ax.scatter(advanced_rot1[:,0], advanced_rot1[:,1], advanced_rot1[:,2], c='blue', s=0.05)

# to look at any of the individual pointclouds or rotate the pointcloud, use the following function

plot_pointcloud(pc_downsample_adv, rot = [[30,90,60]], color=True, title="Advanced Downsampling", figsize=(50,30))

Section 4: Launch a Semantic Segmentation Job

Run Section 4 in the notebook to take this point cloud and launch a Ground Truth point cloud semantic segmentation labeling job using it. These cells generate the required input manifest file and format the point cloud in a Ground Truth compatible representation.

To learn more about the input format of Ground Truth as it relates to point cloud data, see Input Data and Accepted Raw 3D Data Formats.

In this section, we also perform the labeling in the worker portal. We label a subset of the point cloud to have some annotations to perform upsampling with. When the job is complete, we load the annotations from Amazon S3 into a NumPy array for our postprocessing. The following is a screenshot from the Ground Truth point cloud semantic segmentation tool.

Section 5: Perform label upsampling

Now that we have the downsampled labels, we train a K-NN classifier from SKLearn to predict the full dataset labels by treating our annotated points as training data and performing inference on the remainder of the unlabeled points in our full-size point cloud.

You can tune the number of points used as well as the distance metric and weighting scheme to influence how label inference is performed. If you label a few tiles in the full-size dataset, you can use those labeled tiles as ground truth to evaluate the accuracy of the K-NN predictions. You can then use this accuracy metric for hyperparameter tuning of K-NN or to try different inference algorithms to reduce your number of misclassified points between object boundaries, resulting in the lowest possible in-sample error rate. See the following code:

# There's a lot of possibility to tune KNN further
# 1) Prevent classification of points far away from all other points (random unfiltered ground point)
# 2) Perform a non-uniform weighted vote
# 3) Tweak number of neighbors
knn = KNeighborsClassifier(n_neighbors=3)
print(f"Training on {len(pc_downsample_adv)} labeled points")
knn.fit(pc_downsample_adv[:,:3], annotations)

print(f"Upsampled to {len(pc)} labeled points")
annotations_full = knn.predict(pc[:,:3])

Section 6: Visualize the upsampled labels

Now that we have performed upsampling of our labeled data, we can visualize a tile of the original full-size point cloud. We aren’t rendering all of the full-size point cloud because that may prevent our visualization tool from rendering. See the following code:

pc_downsample_annotated = np.column_stack((pc_downsample_adv[:,:3], annotations))
pc_annotated = np.column_stack((pc[:,:3], annotations_full))

labeled_area = pc_downsample_annotated[pc_downsample_annotated[:,3] != 255]
min_bounds = np.min(labeled_area, axis=0)
max_bounds = np.max(labeled_area, axis=0)

min_bounds = [-2, -2, -4.5, -1]
max_bounds = [2, 2, -1, 256]

def extract_tile(point_cloud, min_bounds, max_bounds):
    return point_cloud[
        (point_cloud[:,0] > min_bounds[0])
        & (point_cloud[:,1] > min_bounds[1])
        & (point_cloud[:,2] > min_bounds[2])
        & (point_cloud[:,0] < max_bounds[0])
        & (point_cloud[:,1] < max_bounds[1])
        & (point_cloud[:,2] < max_bounds[2])
    ]


tile_downsample_annotated = extract_tile(pc_downsample_annotated, min_bounds, max_bounds)
tile_annotated = extract_tile(pc_annotated, min_bounds, max_bounds)

rot1 = Rotation.from_euler('zyx', [[30,90,60]], degrees=True)
R1 = rot1.as_matrix()

down_rot = tile_downsample_annotated.copy()
down_rot = np.matmul(R1, down_rot[:,:3].transpose() ).transpose()
down_rot_color = np.matmul(R1, np.tile(tile_downsample_annotated.copy()[:,3],(3,1))).transpose().squeeze()

full_rot = tile_annotated.copy()
full_rot = np.matmul(R1, full_rot[:,:3].transpose() ).transpose()
full_rot_color = np.matmul(R1, np.tile(tile_annotated.copy()[:,3],(3,1))).transpose().squeeze()


fig = plt.figure(figsize=(50, 20))
ax = fig.add_subplot(121, projection="3d")
ax.set_title("Downsampled Annotations", fontdict={'fontsize':20})
ax.scatter(down_rot[:,0], down_rot[:,1], down_rot[:,2], c=down_rot_color[:,0], s=0.05)
ax = fig.add_subplot(122, projection="3d")
ax.set_title("Upsampled Annotations", fontdict={'fontsize':20})
ax.scatter(full_rot[:,0], full_rot[:,1], full_rot[:,2], c=full_rot_color[:,0], s=0.05)

Because our dataset is dense, we can visualize the upsampled labels within a tile to see the downsampled labels upsampled to the full-size point cloud. Although a small number of misclassifications may exist along boundary regions between objects, you also have many more correctly labeled points in the full-size point cloud than the initial point cloud, meaning your overall ML accuracy may improve.

Cleanup

Notebook instance: you have two options if you do not want to keep the created notebook instance running. If you would like to save it for later, you can stop rather than deleting it.

To stop a notebook instance: click the Notebook instances link in the left pane of the SageMaker console home page. Next, click the Stop link under the ‘Actions’ column to the left of your notebook instance’s name. After the notebook instance is stopped, you can start it again by clicking the Start link. Keep in mind that if you stop rather than delete it, you will be charged for the storage associated with it.
To delete a notebook instance: first stop it per the instruction above. Next, click the radio button next to your notebook instance, then select Delete from the Actions drop down menu.

Conclusion

Downsampling point clouds can be a viable method when preprocessing data for object detection and object tracking labeling. It can reduce labeling costs while still generating high-quality output labels, especially for 3D object detection and tracking tasks. In this post, we demonstrated how the downsampling method can affect the clarity of the point cloud for workers, and showed a few approaches that have tradeoffs based on the noise level of the dataset.

Finally, we showed that you can perform 3D point cloud semantic segmentation jobs on downsampled datasets and map the labels to the full-size point cloud through in-sample prediction. We accomplished this by training a classifier to do inference on the remaining full dataset size points, using the already labeled points as training data. This approach enables cost-effective labeling of highly dense point cloud scenes while still maintaining good overall label quality.

Test out this notebook with your own dense point cloud scenes in Ground Truth, try out new downsampling techniques, and even try new models beyond K-NN for final in-sample prediction to see if downsampling and upsampling techniques can reduce your labeling costs.

About the Authors

Vidya Sagar Ravipati is a Deep Learning Architect at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption. Previously, he was a Machine Learning Engineer in Connectivity Services at Amazon who helped to build personalization and predictive maintenance platforms.

Isaac Privitera is a Machine Learning Specialist Solutions Architect and helps customers design and build enterprise-grade computer vision solutions on AWS. Isaac has a background in using machine learning and accelerated computing for computer vision and signals analysis. Isaac also enjoys cooking, hiking, and keeping up with the latest advancements in machine learning in his spare time.

Jeremy Feltracco is a Software Development Engineer with the Amazon ML Solutions Lab at Amazon Web Services. He uses his background in computer vision, robotics, and machine learning to help AWS customers accelerate their AI adoption.

The NeuronCore Pipeline mode explained

Run HuggingFace question answering models in AWS Inferentia

Optimize model deployment with NeuronCore Groups and Pipelines

Optimize for minimum latency: Multi-core pipeline parallel

Optimize for maximum throughput: Single-core data parallel

Conclusion

About the Authors

Solution overview

Deploy the application

Create a step for HPO and training in Step Functions

Automate state machine creation with AWS CloudFormation

Build an API for the application

Build the front end and call the API

Summary

Appendix A: Dockerize custom deep learning models on SageMaker

About the Authors

Delete dataset group resources

Delete dataset resources

Delete predictor resources

Delete a forecast resource

Delete dataset import job, predictor backtest export job, or forecast export job resources

Conclusion

About the Authors

Solution overview

Additional considerations

Deploy the solution

Test the solution

Conclusion

About the Author

Private, fully managed recommendation systems

Serving millions of online shoppers in real time

Why Lambda@Edge?

Scaling with EventBridge for Shopify

Measuring the effectiveness of session-based product recommendations on Shopify

Pre-holiday product recommendation results

Peak-holiday product recommendation results.

New revenue calculations

Conclusions

About the Authors

The data

Methods

Downsampling approaches

Tiling

Fixed step sample

Voxel mean

Upsampling approach

Prerequisites

Notebook setup

Provide notebook inputs

Section 1: Retrieve the dataset and visualize the point cloud

Section 2: Downsample the dataset

Section 3: Visualize the 3D rendering

Section 4: Launch a Semantic Segmentation Job

Section 5: Perform label upsampling

Section 6: Visualize the upsampled labels

Cleanup

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.