Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker

Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker

Text classification is a technique for putting text into different categories, and has a wide range of applications: email providers use text classification to detect spam emails, marketing agencies use it for sentiment analysis of customer reviews, and discussion forum moderators use it to detect inappropriate comments.

In the past, data scientists used methods such as tf-idf, word2vec, or bag-of-words (BOW) to generate features for training classification models. Although these techniques have been very successful in many natural language processing (NLP) tasks, they don’t always capture the meanings of words accurately when they appear in different contexts. Recently, we see increasing interest in using Bidirectional Encoder Representations from Transformers (BERT) to achieve better results in text classification tasks, due to its ability to encode the meaning of words in different contexts more accurately.

Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. The Amazon SageMaker Python SDK provides open-source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several different ML and deep learning frameworks.

Our customers often ask for quick fine-tuning and easy deployment of their NLP models. Furthermore, customers prefer low inference latency and low model inference cost. Amazon Elastic Inference enables attaching GPU-powered inference acceleration to endpoints, which reduces the cost of deep learning inference without sacrificing performance.

This post demonstrates how to use Amazon SageMaker to fine-tune a PyTorch BERT model and deploy it with Elastic Inference. The code from this post is available in the GitHub repo. For more information about BERT fine-tuning, see BERT Fine-Tuning Tutorial with PyTorch.

What is BERT?

First published in November 2018, BERT is a revolutionary model. First, one or more words in sentences are intentionally masked. BERT takes in these masked sentences as input and trains itself to predict the masked word. In addition, BERT uses a next sentence prediction task that pretrains text-pair representations.

BERT is a substantial breakthrough and has helped researchers and data engineers across the industry achieve state-of-art results in many NLP tasks. BERT offers representation of each word conditioned on its context (rest of the sentence). For more information about BERT, see BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

BERT fine-tuning

One of the biggest challenges data scientists face for NLP projects is lack of training data; you often have only a few thousand pieces of human-labeled text data for your model training. However, modern deep learning NLP tasks require a large amount of labeled data. One way to solve this problem is to use transfer learning.

Transfer learning is an ML method where a pretrained model, such as a pretrained ResNet model for image classification, is reused as the starting point for a different but related problem. By reusing parameters from pretrained models, you can save significant amounts of training time and cost.

BERT was trained on BookCorpus and English Wikipedia data, which contains 800 million words and 2,500 million words, respectively [1]. Training BERT from scratch would be prohibitively expensive. By taking advantage of transfer learning, you can quickly fine-tune BERT for another use case with a relatively small amount of training data to achieve state-of-the-art results for common NLP tasks, such as text classification and question answering.

Solution overview

In this post, we walk through our dataset, the training process, and finally model deployment.

We use an Amazon SageMaker notebook instance for running the code. For more information about using Jupyter notebooks on Amazon SageMaker, see Using Amazon SageMaker Notebook Instances or Getting Started with Amazon SageMaker Studio.

The notebook and code from this post is available on GitHub. To run it yourself, clone the GitHub repository and open the Jupyter notebook file.

Problem and dataset

For this post, we use Corpus of Linguistic Acceptability (CoLA), a dataset of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. In our notebook, we download and unzip the data using the following code:

if not os.path.exists("./cola_public_1.1.zip"):
    !curl -o ./cola_public_1.1.zip https://nyu-mll.github.io/CoLA/cola_public_1.1.zip
if not os.path.exists("./cola_public/"):
    !unzip cola_public_1.1.zip

In the training data, the only two columns we need are the sentence itself and its label:

df = pd.read_csv(
    "./cola_public/raw/in_domain_train.tsv",
    sep="t",
    header=None,
    usecols=[1, 3],
    names=["label", "sentence"],
)
sentences = df.sentence.values
labels = df.label.values

If we print out a few sentences, we can see how sentences are labeled based on their grammatical completeness. See the following code:

print(sentences[20:25])
print(labels[20:25])

["The professor talked us." "We yelled ourselves hoarse."
 "We yelled ourselves." "We yelled Harry hoarse."
 "Harry coughed himself into a fit."]
[0 1 0 0 1]

We then split the dataset for training and testing before uploading both to Amazon S3 for use later. The SageMaker Python SDK provides a helpful function for uploading to Amazon S3:

from sagemaker.session import Session
from sklearn.model_selection import train_test_split

train, test = train_test_split(df)
train.to_csv("./cola_public/train.csv", index=False)
test.to_csv("./cola_public/test.csv", index=False)

session = Session()
inputs_train = session.upload_data("./cola_public/train.tsv", key_prefix="sagemaker-bert/training/data")
inputs_test = session.upload_data("./cola_public/test.tsv", key_prefix="sagemaker-bert/testing/data")

Training script

For this post, we use the PyTorch-Transformers library, which contains PyTorch implementations and pretrained model weights for many NLP models, including BERT. See the following code:

model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",  # Use the 12-layer BERT model, with an uncased vocab.
    num_labels=2,  # The number of output labels--2 for binary classification.
    output_attentions=False,  # Whether the model returns attentions weights.
    output_hidden_states=False,  # Whether the model returns all hidden-states.
)

Our training script should save model artifacts learned during training to a file path called model_dir, as stipulated by the Amazon SageMaker PyTorch image. Upon completion of training, Amazon SageMaker uploads model artifacts saved in model_dir to Amazon S3 so they are available for deployment. The following code is used in the script to save trained model artifacts:

model_2_save = model.module if hasattr(model, "module") else model
model_2_save.save_pretrained(save_directory=args.model_dir)

We save this script in a file named train_deploy.py, and put the file in a directory named code/, where the full training script is viewable.

Because PyTorch-Transformer isn’t included natively in Amazon SageMaker PyTorch images, we have to provide a requirements.txt file so that Amazon SageMaker installs this library for training and inference. A requirements.txt file is a text file that contains a list of items that are installed by using pip install. You can also specify the version of an item to install. To install PyTorch-Transformer, we add the following line to the requirements.txt file:

transformers==2.3.0

You can view the entire file in the GitHub repo, and it also goes into the code/ directory. For more information about the format of a requirements.txt file, see Requirements Files.

Training on Amazon SageMaker

We use Amazon SageMaker to train and deploy a model using our custom PyTorch code. The Amazon SageMaker Python SDK makes it easier to run a PyTorch script in Amazon SageMaker using its PyTorch estimator. After that, we can use the SageMaker Python SDK to deploy the trained model and run predictions. For more information about using this SDK with PyTorch, see Using PyTorch with the SageMaker Python SDK.

To start, we use the PyTorch estimator class to train our model. When creating the estimator, we make sure to specify the following:

  • entry_point – The name of the PyTorch script
  • source_dir – The location of the training script and requirements.txt file
  • framework_version: The PyTorch version we want to use

The PyTorch estimator supports multi-machine, distributed PyTorch training. To use this, we just set train_instance_count to be greater than 1. Our training script supports distributed training for only GPU instances.

After creating the estimator, we call fit(), which launches a training job. We use the Amazon S3 URIs we uploaded the training data to earlier. See the following code:

from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="train_deploy.py",
    source_dir="code",
    role=role,
    framework_version="1.3.1",
    py_version="py3",
    train_instance_count=2,
    train_instance_type="ml.p3.2xlarge",
    hyperparameters={
        "epochs": 1,
        "num_labels": 2,
        "backend": "gloo",
    }
)
estimator.fit({"training": inputs_train, "testing": inputs_test})

After training starts, Amazon SageMaker displays training progress (as shown in the following code). Epochs, training loss, and accuracy on test data are reported:

2020-06-10 01:00:41 Starting - Starting the training job...
2020-06-10 01:00:44 Starting - Launching requested ML instances......
2020-06-10 01:02:04 Starting - Preparing the instances for training............
2020-06-10 01:03:48 Downloading - Downloading input data...
2020-06-10 01:04:15 Training - Downloading the training image..
2020-06-10 01:05:03 Training - Training image download completed. Training in progress.
...
Train Epoch: 1 [0/3207 (0%)] Loss: 0.626472
Train Epoch: 1 [350/3207 (98%)] Loss: 0.241283
Average training loss: 0.5248292144022736
Test set: Accuracy: 0.782608695652174
...

We can monitor the training progress and make sure it succeeds before proceeding with the rest of the notebook.

Deployment script

After training our model, we host it on an Amazon SageMaker endpoint by calling deploy on the PyTorch estimator. The endpoint runs an Amazon SageMaker PyTorch model server. We need to configure two components of the server: model loading and model serving. We implement these two components in our inference script train_deploy.py. The complete file is available in the GitHub repo.

model_fn() is the function defined to load the saved model and return a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking model_fn:

def model_fn(model_dir):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = BertForSequenceClassification.from_pretrained(model_dir)
    return model.to(device)

input_fn() deserializes and prepares the prediction input. In this use case, our request body is first serialized to JSON and then sent to model serving endpoint. Therefore, in input_fn(), we first deserialize the JSON-formatted request body and return the input as a torch.tensor, as required for BERT:

def input_fn(request_body, request_content_type):
    if request_content_type == "application/json":
        sentence = json.loads(request_body)
    
        input_ids = []
        encoded_sent = tokenizer.encode(sentence,add_special_tokens = True)
        input_ids.append(encoded_sent)
    
        # pad shorter sentences
        input_ids_padded =[]
        for i in input_ids:
            while len(i) < MAX_LEN:
                i.append(0)
            input_ids_padded.append(i)
        input_ids = input_ids_padded
    
        # mask; 0: added, 1: otherwise
        [int(token_id > 0) for token_id in sent] for sent in input_ids

        # convert to PyTorch data types.
        train_inputs = torch.tensor(input_ids)
        train_masks = torch.tensor(attention_masks)
    
        # train_data = TensorDataset(train_inputs, train_masks)
        return train_inputs, train_masks

predict_fn() performs the prediction and returns the result. See the following code:

def predict_fn(input_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval()
    input_id, input_mask = input_data
    input_id.to(device)
    input_mask.to(device)
    with torch.no_grad():
        return model(input_id, token_type_ids=None,attention_mask=input_mask)[0]

We take advantage of the prebuilt Amazon SageMaker PyTorch image’s default support for serializing the prediction result.

Deploying the endpoint

To deploy our endpoint, we call deploy() on our PyTorch estimator object, passing in our desired number of instances and instance type:

predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")

We then configure the predictor to use "application/json" for the content type when sending requests to our endpoint:

from sagemaker.predictor import json_deserializer, json_serializer

predictor.content_type = "application/json"
predictor.accept = "application/json"
predictor.serializer = json_serializer
predictor.deserializer = json_deserializer

Finally, we use the returned predictor object to call the endpoint:

result = predictor.predict("Somebody just left - guess who.")
print(np.argmax(result, axis=1))

[1]

The predicted class is 1, which is expected because the test sentence is a grammatically correct sentence.

Deploying the endpoint with Elastic Inference

Selecting the right instance type for inference requires deciding between different amounts of GPU, CPU, and memory resources. Optimizing for one of these resources on a standalone GPU instance usually leads to underutilization of other resources. Elastic Inference solves this problem by enabling you to attach the right amount of GPU-powered inference acceleration to your endpoint. In March 2020, Elastic Inference support for PyTorch became available for both Amazon SageMaker and Amazon EC2.

To use Elastic Inference, we must first convert our trained model to TorchScript. For more information, see Reduce ML inference costs on Amazon SageMaker for PyTorch models using Amazon Elastic Inference.

We first download the trained model artifacts from Amazon S3. The location of the model artifacts is estimator.model_data. We then convert the model to TorchScript using the following code:

model_torchScript = BertForSequenceClassification.from_pretrained("model/", torchscript=True)
device = "cpu"
for_jit_trace_input_ids = [0] * 64
for_jit_trace_attention_masks = [0] * 64
for_jit_trace_input = torch.tensor([for_jit_trace_input_ids])
for_jit_trace_masks = torch.tensor([for_jit_trace_input_ids])

traced_model = torch.jit.trace(
    model_torchScript, [for_jit_trace_input.to(device), for_jit_trace_masks.to(device)]
)
torch.jit.save(traced_model, "traced_bert.pt")

subprocess.call(["tar", "-czvf", "traced_bert.tar.gz", "traced_bert.pt"])

Loading the TorchScript model and using it for prediction requires small changes in our model loading and prediction functions. We create a new script deploy_ei.py that is slightly different from train_deploy.py script.

For model loading, we use torch.jit.load instead of the BertForSequenceClassification.from_pretrained call from before:

loaded_model = torch.jit.load(os.path.join(model_dir, "traced_bert.pt"))

For prediction, we take advantage of torch.jit.optimized_execution for the final return statement:

with torch.no_grad():
    with torch.jit.optimized_execution(True, {"target_device": "eia:0"}):
        return model(input_id,attention_mask=input_mask)[0]

The entire deploy_ei.py script is available in the GitHub repo. With this script, we can now deploy our model using Elastic Inference:

predictor = pytorch.deploy(
    initial_instance_count=1, 
    instance_type="ml.m5.large",
    accelerator_type="ml.eia2.xlarge"
)

We attach the Elastic Inference accelerator to our output by using the accelerator_type="ml.eia2.xlarge" parameter.

Cleaning up resources

Remember to delete the Amazon SageMaker endpoint and Amazon SageMaker notebook instance created to avoid charges. See the following code:

predictor.delete_endpoint()

Conclusion

In this post, we used Amazon SageMaker to take BERT as a starting point and train a model for labeling sentences on their grammatical completeness. We then deployed the model to an Amazon SageMaker endpoint, both with and without Elastic Inference acceleration. You can use this solution to tune BERT in other ways, or use other pretrained models provided by PyTorch-Transformers. For more about using PyTorch with Amazon SageMaker, see Using PyTorch with the SageMaker Python SDK.

Reference

[1] Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27.


About the Authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Noble Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

 

 

 

David Ping is a Principal Solutions Architect with the AWS Solutions Architecture organization. He works with our customers to build cloud and machine learning solutions using AWS. He lives in the NY metro area and enjoys learning the latest machine learning technologies.

 

 

 

Lauren Yu is a Software Development Engineer at Amazon SageMaker. She works primarily on the SageMaker Python SDK, as well as toolkits for integrating PyTorch, TensorFlow, and MXNet with Amazon SageMaker. In her spare time, she enjoys playing viola in the Amazon Symphony Orchestra and Doppler Quartet.

 

 

Read More

Announcing the winners of the 2020 request for proposals in applied statistics

In February 2020, Facebook launched the Statistics for Improving Insights, Models, and Decisions request for proposals (RFP). This RFP was designed to support research that addresses challenges in applied statistics that have direct applications for producing more effective insights and decisions for data scientists and researchers. Today, we’re announcing the recipients of these research awards.

View RFP

This program is a follow-up of the 2019 Statistics for Improving Insights and Decisions RFP, led by Facebook research teams working in Infra Data Science and Core Data Science. This year, we were particularly interested in the following topics:

  • Learning and evaluation under uncertainty
  • Statistical models of complex social processes
  • Causal inference with observational data
  • Efficient sampling and prevalence measurement
  • Design and analysis of experiments
  • Anomaly detection
  • Interpretability techniques for AI models

For descriptions of each topic, see the RFP application page.

“We are committed to enabling people to build safe and meaningful communities,” says Aude Hofleitner, Core Data Science Research Scientist Manager at Facebook. “This requires us to constantly innovate and push the state of the art of robust scientific methodologies. This commitment becomes all the more important in challenging economic and social times. We are looking forward to continuing to strengthen our engagements with the academic community and support research on these critical problems.”

“Facebook operates one of the largest and most sophisticated infrastructure among tech companies in the world,” says Xin Fu, Director of Research Data Science at Facebook. “We are excited about this opportunity to foster further innovation in research on statistical methodologies that can help improve the efficiency, reliability, and performance of large-scale infrastructure, from the detection of anomalies in services to advanced AI model interpretation techniques.”

We received 154 proposals from more than 107 universities. Thank you to all the researchers who took the time to submit a proposal, and congratulations to the award recipients.

Research award recipients

Adversarially robust temporal embedding models for social media integrity
Srijan Kumar and Duen Horng “Polo” Chau (Georgia Tech Research Corporation)

Learning from comparisons
Stratis Ioannidis, Deniz Erdogmus, and Jennifer Dy (Northeastern University)

Persistent activity mining in continually evolving networks
Danai Koutra (University of Michigan)

Personalized explanation of recommendations via natural language generation
Julian McAuley (University of California, San Diego)

Running experiments with unobservable outcomes: An invariant perspective
Andrea Montanari (Stanford University)

Towards transfer causal learning for average treatment effects
Bin Yu (University of California, Berkeley)

The post Announcing the winners of the 2020 request for proposals in applied statistics appeared first on Facebook Research.

Read More

Exploring Faster Screening with Fewer Tests via Bayesian Group Testing

Exploring Faster Screening with Fewer Tests via Bayesian Group Testing

Posted by Marco Cuturi and Jean-Philippe Vert, Research Scientists, Google Research, Brain Team

How does one find a needle in a haystack? At the turn of World War II, that question took on a very concrete form when doctors wondered how to efficiently detect diseases among those who had been drafted into the war effort. Inspired by this challenge, Robert Dorfman, a young statistician at that time (later to become Harvard professor of economics), proposed in a seminal paper a 2-stage approach to detect infected individuals, whereby individual blood samples first are pooled in groups of four before being tested for the presence or absence of a pathogen. If a group is negative, then it is safe to assume that everyone in the group is free of the pathogen. In that case, the reduction in the number of required tests is substantial: an entire group of four people has been cleared with a single test. On the other hand, if a group tests positive, which is expected to happen rarely if the pathogen’s prevalence is small, at least one or more people within that group must be positive; therefore, a few more tests to determine the infected individuals are needed.

Left: Sixteen individual tests are required to screen 16 people — only one person’s test is positive, while 15 return negative. Right: Following Dorfman’s procedure, samples are pooled into four groups of four individuals, and tests are executed on the pooled samples. Because only the second group tests positive, 12 individuals are cleared and only those four belonging to the positive group need to be retested. This approach requires only eight tests, instead of the 16 needed for an exhaustive testing campaign.

Dorfman’s proposal triggered many follow-up works with connections to several areas in computer science, such as information theory, combinatorics or compressive sensing, and several variants of his approach have been proposed, notably those leveraging binary splitting or side knowledge on individual infection probability rates. The field has grown to the extent that several sub-problems are recognized and deserving of an entire literature on their own. Some algorithms are tailored for the noiseless case in which tests are perfectly reliable, whereas some consider instead the more realistic case where tests are noisy and may produce false negatives or positives. Finally, some strategies are adaptive, proposing groups based on test results already observed (including Dorfman’s, since it proposes to re-test individuals that appeared in positive groups), whereas others stick to a non-adaptive setting in which groups are known beforehand or drawn at random.

In “Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design”, we present an approach to group testing that can operate in a noisy setting (i.e., where tests can be mistaken) to decide adaptively by looking at past results which groups to test next, with the goal to converge on a reliable detection as quickly, and with as few tests, as possible. Large scale simulations suggest that this approach may result in significant improvements over both adaptive and non-adaptive baselines, and are far more efficient than individual tests when disease prevalence is low. As such, this approach is particularly well suited for situations that require large numbers of tests to be conducted with limited resources, as may be the case for pandemics, such as that corresponding to the spread of COVID-19. We have open-sourced the code to the community through our GitHub repo.

Noisy and Adaptive Group Testing in a Non-Asymptotic Regime
A group testing strategy is an algorithm that is tasked with guessing who, among a list of n people, carries a particular pathogen. To do so, the strategy provides instructions for pooling individuals into groups. Assuming a laboratory can execute k tests at a time, the strategy will form a kn pooling matrix that defines these groups. Once the tests are carried out, the results are used to decide whether sufficient information has been gathered to determine who is or is not infected, and if not, how to form new groups for another round of testing.

We designed a group testing approach for the realistic setting where the testing strategy can be adaptive and where tests are noisy — the probability that the test of an infected sample is positive (sensitivity) is less than 100%, as is the specificity, the probability that a non-infected sample returns negative.

Screening More People with Fewer Tests Using Bayesian Optimal Experimental Design
The strategy we propose proceeds the way a detective would investigate a case. They first form several hypotheses about who may or may not be infected, using evidence from all tests (if any) that have been carried out so far and prior information on the infection rate (a). Using these hypotheses, our detectives produce an actionable item to continue the investigation, namely a next wave of groups that may help in validating or invalidating as many hypotheses as possible (b), and then loop back to (a) until the set of plausible hypotheses is small enough to unambiguously identify the target of the search. More precisely,

  1. Given a population of n people, an infection state is a binary vector of length n that describes who is infected (marked with a 1), and who is not (marked with a 0). At a certain time, a population is in a given state (most likely a few 1’s and mostly 0’s). The goal of group testing is to identify that state using as few tests as possible. Given a prior belief on the infection rate (the disease is rare) and test results observed so far (if any), we expect that only a small share of those infection states will be plausible. Rather than evaluating the plausibility of all 2n possible states (an extremely large number even for small n), we resort to a more efficient method to sample plausible hypotheses using a sequential Monte Carlo (SMC) sampler. Although quite costly by common standards (a few minutes using a GPU in our experimental setup), we show in this work that SMC samplers remain tractable even for large n, opening new possibilities for group testing. In short, in return for a few minutes of computations, our detectives get an extensive list of thousands of relevant hypotheses that may explain tests observed so far.
  2. Equipped with a relevant list of hypotheses, our strategy proceeds, as detectives would, by selectively gathering additional evidence. If k tests can be carried out at the next iteration, our strategy will propose to test k new groups, which are computed using the framework of Bayesian optimal experimental design. Intuitively, if k=1 and one can only propose a single new group to test, there would be clear advantage in building that group such that its test outcome is as uncertain as possible, i.e., with a probability that it returns positive as close to 50% as possible, given the current set of hypotheses. Indeed, to progress in an investigation, it is best to maximize the surprise factor (or information gain) provided by new test results, as opposed to using them to confirm further what we already hold to be very likely. To generalize that idea to a set of k>1 new groups, we score this surprise factor by computing the mutual information of these “virtual” group tests vs. the distribution of hypotheses. We also consider a more involved approach that computes the expected area under the ROC curve (AUC) one would obtain from testing these new groups using the distribution of hypotheses. The maximization of these two criteria is carried out using a greedy approach, resulting in two group selectors, GMIMAX and GAUCMAX (greedy maximization of mutual information or AUC, respectively).

The interaction between a laboratory (wet_lab) carrying out testing, and our strategy, composed of a sampler and a group selector, is summarized in the following drawing, which uses names of classes implemented in our open source package.

Our group testing framework describes an interaction between a testing environment, the wet_lab, whose pooled test results are used by the sampler to draw thousands of plausible hypotheses on the infection status of all individuals. These hypotheses are then used by an optimization procedure, group_selector, that figures out what groups may be the most relevant to test in order to narrow down on the true infection status. Once formed, these new groups are then tested again, closing the loop. At any point in the procedure, the hypotheses formed by the sampler can be averaged to obtain the average probability of infection for each patient. From these probabilities, a decision on whether a patient is infected or not can be done by thresholding these probabilities at a certain confidence level.

Benchmarking
We benchmarked our two strategies GMIMAX and GAUCMAX against various baselines in a wide variety of settings (infection rates, test noise levels), reporting performance as the number of tests increases. In addition to simple Dorfman strategies, the baselines we considered included a mix of non-adaptive strategies (origami assays, random designs) complemented at later stages with the so-called informative Dorfman approach. Our approaches significantly outperform the others in all settings.

We executed 5000 simulations on a sample population of 70 individuals with an infection rate of 2%. We have assumed sensitivity/specificity values of 85% / 97% for tests with groups of maximal size 10, which are representative of current PCR machines. This figure demonstrates that our approach outperforms the other baselines with as few as 24 tests (up to 8 tests used in 3 cycles), including both adaptive and non-adaptive varieties, and performs significantly better than individual tests (plotted in the sensitivity/specificity plane as a hexagon, requiring 70 tests), highlighting the savings potential offered by group testing. See preprint for other setups.

Conclusion
Screening a population for a pathogen is a fundamental problem, one that we currently face during the current COVID-19 epidemic. Seventy years ago, Dorfman proposed a simple approach currently adopted by various institutions. Here, we have proposed a method to extend the basic group testing approach in several ways. Our first contribution is to adopt a probabilistic perspective, and form thousands of plausible hypotheses of infection distributions given test outcomes, rather than trust test results to be 100% reliable as Dorfman did. This perspective allows us to seamlessly incorporate additional prior knowledge on infection, such as when we suspect some individuals to be more likely than others to carry the pathogen, based for instance on contact tracing data or answers to a questionnaire. This provides our algorithms, which can be compared to detectives investigating a case, the advantage of knowing what are the most likely infection hypotheses that agree with prior beliefs and tests carried out so far. Our second contribution is to propose algorithms that can take advantage of these hypotheses to form new groups, and therefore direct the gathering of new evidence, to narrow down as quickly as possible to the “true” infection hypothesis, and close the case with as little testing effort as possible.

Acknowledgements
We would like to thank our collaborators on this work, Olivier Teboul, in particular, for his help preparing figures, as well as Arnaud Doucet and Quentin Berthet. We also thank Kevin Murphy and Olivier Bousquet (Google) for their suggestions at the earliest stages of this project, as well as Dan Popovici for his unwavering support pushing this forward; Ignacio Anegon, Jeremie Poschmann and Laurent Tesson (INSERM) for providing us background information on RT-PCR tests and Nicolas Chopin (CREST) for giving guidance on his work to define SMCs for binary spaces.

Read More

Facebook uses Amazon EC2 to evaluate the Deepfake Detection Challenge

Facebook uses Amazon EC2 to evaluate the Deepfake Detection Challenge

In October 2019, AWS announced that it was working with Facebook, Microsoft, and the Partnership on AI on the first Deepfake Detection Challenge. Deepfake algorithms are the same as the underlying technology that has given us realistic animation effects in movies and video games. Unfortunately, those same algorithms have been used by bad actors to blur the distinction between reality and fiction. Deepfake videos result from using artificial intelligence to manipulate audio and video to make it appear as though someone did or said something they didn’t. For more information about deepfake content, see The Partnership on AI Steering Committee on AI and Media Integrity.

In machine learning (ML) terms, the Generative Adversarial Networks (GAN) algorithm has been the most popular algorithm to create deepfakes. GANs use a pair of neural networks: a generative network that produces candidates by adding noise to the original data, and a discriminative network that evaluates the data until it determines they aren’t synthesized. GANs matches one network against the other in an adversarial manner to generate new, synthetic instances of data that can pass for real data. This means the deepfake is indistinguishable from a normal dataset.

The goal of this challenge was to incentivize researchers around the world to build innovative methods that can help detect deepfakes and manipulated media. The competition, which ended on March 31, 2020, was popular amongst the Kaggle data science community. The deepfake project emphasized the benefits of scaling and optimizing the cost of deep learning batch inference. Once the competition was complete, the team at Facebook hosted the deepfake competition data on AWS and made it available to the world, encouraging researchers to keep fighting this problem.

There were over 4,200 total submissions from over 2,300 teams worldwide. The participating submissions are scored with the following log loss function, where a smaller score is better (for more information about scoring, see the contest rules):

Four groups of datasets were associated with the competition:

  • Training – The participating teams used this set for training their model. It consisted of 470 GB of video files, with real and fake labels for each video.
  • Public validation – Consisted of a sample of 400 videos from the test dataset.
  • Public test – Used by the Kaggle platform to compute the public leaderboard.
  • Private test – Held by the Facebook team, the host outside of the Kaggle competition platform for scoring the competition. The results from using the private test set were displayed on the competition’s private leaderboard. This set contains videos with a similar format and nature as the training and public validation and test sets, but contain real, organic videos as well as deepfakes.

After the competition deadline, Kaggle transferred the code for the two final submissions from each team to the competition host. The hosting team re-ran the submission code against this private dataset and returned prediction submissions to Kaggle to compute the final private leaderboard scores. The submissions were based on two types of compute virtual machines (VMs): GPU-based and CPU-based. Most of the submissions were GPU-based.

The competition hosting team at Facebook recognized several challenges in conducting an evaluation from the unexpectedly large number of participants. With over 4,200 total submissions and 9 GPU hours of runtime required for each using a p3.2xl Amazon Elastic Compute Cloud (Amazon EC2) P3 instance; they would need an estimated 42,000 GPU compute hours (or almost 5 years’ worth of compute hours) to complete the competition. To make the project even more challenging, they needed to do 5 years of GPU compute in 3 weeks.

Given the tight deadline, the host team had to address several constraints to complete the evaluation within the time and budget allotted.

Operational efficiency

To meet the tight timeframes for the competition and make the workload efficient due to the small team size, the solution must be low-code. To address the low-code requirement, they chose AWS Batch for scheduling and scaling out the compute workload. The following diagram illustrates the solution architecture.

AWS Batch was originally designed for developers, scientists, and engineers to easily and efficiently manage large numbers of batch computing jobs on AWS with little coding or cloud infrastructure deployment experience. There’s no need to install and manage batch computing software or server clusters, which allows you to focus on analyzing and solving problems. AWS Batch provides scheduling and scales out batch computing workloads across the full range of AWS compute services, such as Amazon EC2 and Spot Instances. Furthermore, AWS Batch has no additional charges for managing cluster resources. In this use case, the host simply submitted 4,200 compute jobs, which registered each Kaggle submission container, which ran for about 9 hours each. Using a cluster of instances, all jobs were complete in less than three weeks.

Elasticity

The tight timeframes for the competition, as well as requiring those instances for only a short period, speaks to the need for elasticity in compute. For example, the team estimated they would need a minimum of 85 Amazon EC2 P3 GPUs running in parallel around the clock to complete the evaluation. To account for restarts and other issues causing lost time, there was the potential for an additional 50% in capacity. Facebook was able to quickly scale up the number of GPUs and CPUs needed for the evaluation and scale them down when finished, only paying for what they used. This was much more efficient in terms of budget and operations effort than acquiring, installing, and configuring the compute on-premises.

Security

Security was another significant concern. Submissions from such a wide array of participants could contain viruses, malware, bots, or rootkits. Running these containers in a sandboxed, cloud environment avoided that risk. If the evaluation environment was exposed to various infectious agents, the environment could be terminated and easily rebuilt without exposing any production systems to downtime or data loss.

Privacy and confidentiality

Privacy and confidentiality are closely related to the security concerns. To address those concerns, all the submissions and data were held in a single, closely held AWS account with private virtual private clouds (VPCs) and restrictive permissions using AWS Identity and Access Management (IAM). To ensure privacy and confidentiality of the submitted models, and fairness in grading, a single, dedicated engineer was responsible for conducting the evaluation without looking into any of the Docker images submitted by the various teams.

Cost

Cost was another important constraint the team had to consider. A rough estimate of 42,000 hours of Amazon EC2 P3 instance runtime would cost about $125,000.

To lower the cost of GPU compute, the host team determined that the Amazon EC2 G4 (Nvida Tesla T4 GPUs) instance type was more cost-effective for this workload than the P3 instance (Volta 100 GPUs). Amongst the GPU instances in the cloud, Amazon EC2 G4 are cost-effective and versatile GPU instances for deploying ML models.

These instances are optimized for ML application deployments (inference), such as image classification, object detection, recommendation engines, automated speech recognition, and language translation, which push the boundary on AI innovation and latency.

The host team completed a few test runs with the G4 instance type. The test runtime for each submission resulted in a little over twice the comparative runtime of the P3 instances, resulting in the need for approximately 90,000 compute hours. The G4 instances cost up to 83% less per hour than the P3 instances. Even with longer runtimes per job with the G4 instances, the total compute cost decreased from $125,000 to just under $50,000. The following table illustrates the cost-effectiveness of the G4 instance type per inference.

p3.2xl g4dn.8xl
Runtime (hours) 90,000 25,000
Cost (USD) $125,000 $50,000
Cost per Inference $30 $12

The host team shared that many of the submission runs completed with less compute time than originally projected. The initial projection was based upon early model submissions, which were larger than the average size for all models submitted. About 80% of the runs took advantage of the G4 instance type, while some had to be run on the P3 instances due to slight differences in available GPU memory between the two instance types. The final numbers were 25,000 G4 (GPU) compute hours, 5,000 C4 (CPU) compute hours, and 800 P3 (GPU) compute hours, totaling $20,000 in compute cost. After approximately two weeks of around-the-clock evaluation, the host team completed the challenging task of evaluating all the submissions early and consumed less than half of the $50,000 estimate.

Conclusion

The host team was able to complete a full evaluation of the over 4,200 submission evaluations in less time than was available, while meeting the grading fairness criteria and coming in under budget. The host team successfully replicated the evaluation environment with a success rate of 94%, which is high for a two-stage competition.

Software projects are often risk-prone due to technological uncertainties, and perhaps even more so due to inherent complexity and constraints. The breadth and depth of AWS services running on Amazon EC2 allow you to solve your unique challenges by reducing technology uncertainty. In this case, the Facebook team completed the deepfake evaluation challenge on time and under budget with only one software engineer. The engineer started by selecting a low-code solution, AWS Batch, which is a proven service for even larger-scale HPC workloads, and reduced the evaluation cost by 2/3 through the choice of the AI inference-optimized G4 EC2 instance type.

AWS believes there’s no one solution to a problem. Solutions often consist of multiple and flexible building blocks from which you can craft solutions that meet your needs and priorities.


About the Authors

Wenming Ye is an AI and ML specialist architect at Amazon Web Services, helping researchers and enterprise customers use cloud-based machine learning services to rapidly scale their innovations. Previously, Wenming had a diverse R&D experience at Microsoft Research, SQL engineering team, and successful startups.

 

 

 

Tim O’Brien is a Senior Solutions Architect at AWS focused on Machine Learning and Artificial Intelligence. He has over 30 years of experience in information technology, security, and accounting. In his spare time, he likes hiking, climbing, and skiing with his wife and two dogs.

Read More

30 years of family videos in an AI archive

30 years of family videos in an AI archive

My dad got his first video camera the day I was born nearly three decades ago. “Say hello to the camera!” are the first words he caught on tape, as he pointed it at a red, puffy baby (me) in a hospital bassinet. The clips got more embarrassing from there, as he continued to film through many diaper changes, temper tantrums and—worst of all—puberty.

Most of those potential blackmail tokens sat trapped on miniDV tapes or scattered across SD cards until two years ago when my dad uploaded them all to Google Drive. Theoretically, since they were now stored in the cloud, my family and I could watch them whenever we wanted. But with more than 456 hours of footage, watching it all would have been a herculean effort. You can only watch old family friends open Christmas gifts so many times. So, as an Applied AI Engineer, I got down to business and built an AI-powered searchable archive of our family videos.

If you’ve ever used Google Photos, you’ve seen the power of using AI to search and organize images and videos. The app uses machine learning to identify people and pets, as well as objects and text in images. So, if I search “pool” in the Google Photos app, it’ll show me all the pictures and videos I ever took of pools.

But for this project, I needed a couple of features Photos doesn’t (yet!) support. First, because my dad’s first camera recorded footage to miniDV tapes, those videos were uploaded as meaty, two-hour-long movies with no useful metadata. Instead, my dad would start a clip by saying, “let me put a date on the screen here…” and a little white text snippet would appear in the bottom right corner of the frame. In between shots on a single reel, he’d say: “Say goodbye, I’m going to fade out now.” I would scream, “NO, DON’T FADE OUT,” while the screen faded to black. So, my first step was to use machine learning to automatically parse the date shown on the screen, and split the single long video into shorter clips after each fade out.

video screenshot

In this picture, you can see the timestamp shown on screen. Using the Vision API, I could extract it to sort my videos by date.

For this, I turned the Video intelligence API, a Google Cloud tool that lets developers analyze videos with machine learning. It allows you to replicate many of the features found in the Google Photos app—like tagging objects in images and recognizing on-screen text—and a whole lot more. For example, the API’s shot change detection feature automatically finds the timestamps in videos where a scene changes, this allowed me to split those longs videos into smaller chunks. 

Using the label detection feature, I could search for all sorts of different events, like “bridal shower,” “wedding,” “bat and ball games” and “baby.” By searching “performance,” I was able to finally find one of my life’s proudest accomplishments on tape—a starring role singing “It’s Not Easy Being Green” in my kindergarten’s production of the Sesame Street musical.

home video 2

My starring role as Kermit the Frog in my school’s Sesame Street musical. The Video Intelligence API tagged it as “performance”.  

The Video Intelligence API’s real “killer feature” for me was its ability to do audio transcription. By transcribing my videos, I was able to query clips by what people said in them. I could search for specific names (“Scott,” “Dale,” “grandma”), proper nouns (“Chuck E Cheese”, “Pokemon”), and for unique phrases. By searching “first steps,” I found a clip of my dad saying, “Here she comes… plunk. That’s the first time she’s taken major steps” alongside a video of my managing, just barely, to waddle along.

homevideo3

My first steps that I was able to find with the Video Intelligence API’s Transcription feature. Here, my dad says, “…this is the first time she’s taken major steps.”

In the end, machine learning helped me build exactly the kind of archive I wanted—one that let me search my family videos by memories, not timestamps.

P.S. Want to see how I built it? Check out my technical blog post or catch the video on the Cloud Youtube Channel

Read More

Using machine learning in the browser to lip sync to your favorite songs

Using machine learning in the browser to lip sync to your favorite songs

Posted by Pohung Chen, Creative Technologist, Google Partner Innovation

Today we are releasing LipSync, a web experience that lets you lip sync to music live in the web browser. LipSync was created as a playful way to demonstrate the facemesh model for TensorFlow.js. We partnered with Australian singer Tones and I to let you lip sync to Dance Monkey in this demonstration.

Using TensorFlow.js FaceMesh

The TensorFlow.js FaceMesh model provides a real-time high density estimate of key points of your facial expression using only a webcam and on device machine learning – meaning no data ever leaves your machine for inference. We essentially use the key points around the mouth and lips to estimate how well you synchronize to the lyrics of the Dance Monkey song.

Determining Correctness

When first testing the demo, many people assumed we used a complex lip reading algorithm to match the mouth shapes with lyrics. Lip reading is quite difficult to achieve, so we came up with a simpler solution. We capture a frame by frame recording of the “correct” mouth shapes lined up with the music, and then when the user is playing the game, we compare the mouth shapes to the pre-recorded baseline.

Measuring the shape of your mouth

What is a mouth shape? There are many different ways to measure the shape of your mouth. We needed a technique that allows the user to move their head around while singing and is relatively forgiving in different mouth shapes, sizes, and distance to the camera.

Mouth Ratio

One way of comparing mouth shapes is to use the width to height ratio of your mouth. For example, if your mouth is closed and forming the “mmm” sound, you have a high width to height ratio. If your mouth is open in an “ooo” sound, your mouth will be closer to a 1:1 width to height ratio.
While this method mostly works, there were still edge cases that made the detection algorithm not as robust, so we explored another method called Hu Moments explained below.

OpenCV matchShapes Hu Moments

In the OpenCV library, there is a matchShapes function which compares contours and returns a similarity score. Underneath the hood, the matchShapes function uses a technique called Hu Moments which provides a set of numbers calculated using central moments that are invariant to image transformations. This allowed us to compare shapes regardless of translation, scale, and rotation. So the user can freely rotate their head without impacting the detection of the mouth shape itself.

We use this in addition to the mouth shape above to determine how closely the shape of the mouth contours match.

Visual and Audio Feedback

In our original prototype, we wanted to create immediate audible feedback on how well the user is doing. We separated out the vocal track from the rest of the song and changed its volume based on real-time user performance score of their mouth shapes.

Vocal Track
Instrumental Track

This allowed us to create the effect such that if you stop lip syncing to the song, the lyrical portion of the song stops playing (but the background music continues to play).

While this was a fun way to demonstrate the mouth shape matching algorithm, however it still missed that satisfactory rush of joy you get when you hit the right notes during karaoke or nail a long sequence of moves just right in arcade rhythm games.

We started by adding a real-time score that is then accumulated over time shown to the player as they played the game. In our initial testing, this didn’t work as well as we had hoped. It was confusing what the score was and the exact numbers weren’t particularly meaningful. We also wanted the user to focus their attention on the lyrics and the center of the screen as opposed to a score off to the side.

So we went with a different approach, preferring to lean on visual effects overlaid on top of the player’s face as they lip synced to the music and colors to indicate how well the player was doing.

Try Lip Sync yourself!

The Tensorflow.js FaceMesh model enables web-based, playful, interactive experiences that go beyond basic face filters, and with a little bit of creative thinking, we could get a lip sync experience without needing the full complexity of a full lip reading ML model.

So go ahead and try our live demo yourself right now. You can also check out an example of how the mouth shape matching works in this open source repo.

We would also like to give a special shout out to Kiattiyot Panichprecha, Bryan Tanaka, KC Chung, Dave Bowman, Matty Burton, Roger Chang, Ann Yuan, Sandeep Gupta, Miguel de Andrés-Clavera, Alessandra Donati, and Ethan Converse for their help in bringing this experience to life, and to thank the MediaPipe team who designed Facemesh.Read More

Build a work-from-home posture tracker with AWS DeepLens and GluonCV

Build a work-from-home posture tracker with AWS DeepLens and GluonCV

Working from home can be a big change to your ergonomic setup, which can make it hard for you to keep a healthy posture and take frequent breaks throughout the day. To help you maintain good posture and have fun with machine learning (ML) in the process, this post shows you how to build a posture tracker project with AWS DeepLens, the AWS programmable video camera for developers to learn ML. You will learn how to use the latest pose estimation ML models from GluonCV to map out body points from profile images of yourself working from home and send yourself text message alerts whenever your code detects bad posture. GluonCV is a computer vision library built on top of the Apache MXNet ML framework that provides off-the-shelf ML models from state-of-the-art deep learning research. With the ability run GluonCV models on AWS DeepLens, engineers, researchers, and students can quickly prototype products, validate new ideas, and learn computer vision. In addition to detecting bad posture, you will learn to analyze your posture data over time with Amazon QuickSight, an AWS service that lets you easily create and publish interactive dashboards from your data.

This tutorial includes the following steps:

  1. Experiment with AWS DeepLens and GluonCV
  2. Classify postures with the GluonCV pose key points
  3. Deploy pre-trained GluonCV models to AWS DeepLens
  4. Send text message reminders to stretch when the tracker detects bad posture
  5. Visualize your posture data over time with Amazon QuickSight

The following diagram shows the architecture of our posture tracker solution.

Prerequisites

Before you begin this tutorial, make sure you have the following prerequisites:

Experimenting with AWS DeepLens and GluonCV

Normally, AWS developers use Jupyter notebooks hosted in Amazon SageMaker to experiment with GluonCV models. Jupyter notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. In this tutorial you are going to create and run Jupyter notebooks directly on an AWS DeepLens device, just like any other Linux computer, in order to enable rapid experimentation.

Starting with version AWS DeepLens software version 1.4.5, you can run GluonCV pretrained models directly on AWS DeepLens. To check the version number and update your software, go to the AWS DeepLens console, under Devices select your DeepLens device, and look at the Device status section. You should see the version number similar to the following screenshot.

To start experimenting with GluonCV models on DeepLens, complete the following steps:

  1. SSH into your AWS DeepLens device.

To do so, you need the IP address of AWS DeepLens on the local network. To find the IP address, select your device on the AWS DeepLens console. Your IP address is listed in the Device Details section.

You also need to make sure that SSH is enabled for your device. For more information about enabling SSH on your device, see View or Update Your AWS DeepLens 2019 Edition Device Settings.

Open a terminal application on your computer. SSH into your DeepLens by entering the following code into your terminal application:

ssh aws_cam@<YOUR_DEEPLENS_IP>

When you see a password prompt, enter the SSH password you chose when you set up SSH on your device.

  1. Install Jupyter notebook and GluonCV on your DeepLens. Enter each of the following commands one at a time in the SSH terminal. Press Enter after each line entry.
    sudo python3 -m pip install –-upgrade pip
    
    sudo python3 -m pip install notebook
    
    sudo python3.7 -m pip install ipykernel
    
    python3.7 -m ipykernel install  --name 'Python3.7' --user
    
    sudo python3.7 -m pip install gluoncv
    

  2. Generate a default configuration file for Jupyter notebook:
    jupyter notebook --generate-config

  3. Edit the Jupyter configuration file in your SSH session to allow access to the Jupyter notebook running on AWS DeepLens from your laptop.
    nano ~/.jupyter/jupyter_notebook_config.py

  4. Add the following lines to the top of the config file:
    c.NotebookApp.ip = '0.0.0.0'
    c.NotebookApp.open_browser = False
    

  5. Save the file (if you are using the nano editor, press Ctrl+X and then Y).
  6. Open up a port in the AWS DeepLens firewall to allow traffic to Jupyter notebook. See the following code:
    sudo ufw allow 8888

  7. Run the Jupyter notebook server with the following code:
    jupyter notebook

    You should see output like the following screenshot:

  8. Copy the link and replace the IP portion (DeepLens or 127.0.0.1). See the following code:
    http://(DeepLens or 127.0.0.1):8888/?token=sometoken

    For example, the URL based on the preceding screenshot is http://10.0.0.250:8888/?token=7adf9c523ba91f95cfc0ba3cacfc01cd7e7b68a271e870a8.

  9. Enter this link into your laptop web browser.

You should see something like the following screenshot.

  1. Choose New to create a new notebook.
  2. Choose Python3.7.

Capturing a frame from your camera

To capture a frame from the camera, first make sure you aren’t running any projects on AWS DeepLens.

  1. On the AWS Deeplens console, go to your device page.
  2. If a project is deployed, you should see a project name in the Current Project pane. Choose Remove Project if there is a project deployed to your AWS DeepLens.
  3. Now go back to the Jupyter notebook running on your AWS DeepLens, enter the following code into your first code cell:
    import awscam
    import cv2
    
    ret,frame = awscam.getLastFrame()
    print(frame.shape)
    

  4. Press Shift+Enter to execute the code inside the cell.

Alternatively, you can press the Run button in the Jupyter toolbar as shown in the screenshot below:

You should see the size of the image captured by AWS DeepLens similar to the following text:

(1520, 2688, 3)

The three numbers show the height, width, and number of color channels (red, green, blue) of the image.

  1. To view the image, enter the following code in the next code cell:
    %matplotlib inline
    from matplotlib import pyplot as plt
    plt.imshow(frame)
    plt.show()
    

    You should see an image similar to the following screenshot:

Detecting people and poses

Now that you have an image, you can use GluonCV pre-trained models to detect people and poses. For more information, see Predict with pre-trained Simple Pose Estimation models from the GluonCV model zoo.

  1. In a new code cell, enter the following code to import the necessary dependencies:
    import mxnet as mx
    from gluoncv import model_zoo, data, utils
    from gluoncv.data.transforms.pose import detector_to_simple_pose, heatmap_to_coord
    

  2. You load two pre-trained models, one to detect people (yolo3_mobilenet1.0_coco) in the frame and one to detect the pose (simple_pose_resnet18_v1b) for each person detected. To load the pre-trained models, enter the following code in a new code cell:
    people_detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
    pose_detector = model_zoo.get_model('simple_pose_resnet18_v1b', pretrained=True)
    

  3. Because the yolo_mobilenet1.0_coco pre-trained model is trained to detect many types of objects in addition to people, the code below narrows down the detection criteria to just people so that the model runs faster. For more information about the other types of objects that the model can predict, see the GluonCV MSCoco Detection source code.
    people_detector.reset_class(["person"], reuse_weights=['person'])

  4. The following code shows how to use the people detector to detect people in the frame. The outputs of the people detector are the class_IDs (just “person” in this use case because we’ve limited the model’s search scope), the confidence scores, and a bounding box around each person detected in the frame.
    img = mx.nd.array(frame)
    x, img = data.transforms.presets.ssd.transform_test(img, short=256)
    class_IDs, scores, bounding_boxs = people_detector(x)
    

  5. Enter the following code to feed the results from the people detector into the pose detector for each person found. Normally you need to use the bounding boxes to crop out each person found in the frame by the people detector, then resize each cropped person image into appropriately sized inputs for the pose detector. Fortunately GluonCV comes with a detector_to_simple_pose function that takes care of cropping and resizing for you.
    pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs)
    
    predicted_heatmap = pose_detector(pose_input)
    pred_coords, confidence = heatmap_to_coord(predicted_heatmap, upscale_bbox)
    

  6. The following code overlays the results of the pose detector onto the original image so you can visualize the result:
    ax = utils.viz.plot_keypoints(img, pred_coords, confidence,
                                  class_IDs, bounding_boxs,scores, box_thresh=0.5, keypoint_thresh=0.2)
    plt.show(ax)

After completing steps 1-6, you should see an image similar to the following screenshot.

If you get an error similar to the ValueError output below, make sure you have at least one person in the camera’s view.

ValueError: In HybridBlock, there must be one NDArray or one Symbol in the input. Please check the type of the args

So far, you experimented with a pose detector on AWS DeepLens using Jupyter notebooks. You can now collect some data to figure out how to detect when someone is hunching, sitting, or standing. To collect data, you can save the image frame from the camera out to disk using the built-in OpenCV module. See the following code:

cv2.imwrite('output.jpg', frame)

Classifying postures with the GluonCV pose key points

After you have collected a few samples of different postures, you can start to detect bad posture by applying some rudimentary rules.

Understanding the GluonCV pose estimation key points

The GluonCV pose estimation model outputs 17 key points for each person detected. In this section, you see how those points are mapped to human body joints and how to apply simple rules to determine if a person is sitting, standing, or hunching.

This solution makes the following assumptions:

  • The camera sees your entire body from head to toe, regardless of whether you are sitting or standing
  • The camera sees a profile view of your body
  • No obstacles exist between camera and the subject

The following is an example input image. We’ve asked the actor in this image to face the camera instead of showing the profile view to illustrate the key body joints produced by the pose estimation model.

The following image is the output of the model drawn as lines and key points onto the input image. The cyan rectangle shows where the people detector thinks a person is in the image.

The following code shows the raw results of the pose detector. The code comments show how each entry maps to point on the a human body:

array([[142.96875,  84.96875],# Nose
       [152.34375,  75.59375],# Right Eye
       [128.90625,  75.59375],# Left Eye
       [175.78125,  89.65625],# Right Ear
       [114.84375,  99.03125],# Left Ear
       [217.96875, 164.65625],# Right Shoulder
       [ 91.40625, 178.71875],# Left Shoulder
       [316.40625, 197.46875],# Right Elblow
       [  9.375  , 232.625  ],# Left Elbow
       [414.84375, 192.78125],# Right Wrist
       [ 44.53125, 244.34375],# Left Wrist
       [199.21875, 366.21875],# Right Hip
       [128.90625, 366.21875],# Left Hip
       [208.59375, 506.84375],# Right Knee
       [124.21875, 506.84375],# Left Knee
       [215.625  , 570.125  ],# Right Ankle
       [121.875  , 570.125  ]],# Left Ankle

Deploying pre-trained GluonCV models to AWS DeepLens

In the following steps, you convert your code written in the Jupyter notebook to an AWS Lambda inference function to run on AWS DeepLens. The inference function optimizes the model to run on AWS DeepLens and feeds each camera frame into the model to get predictions.

This tutorial provides an example inference Lambda function for you to use. You can also copy and paste code sections directly from the Jupyter notebook you created earlier into the Lambda code editor.

Before creating the Lambda function, you need an Amazon Simple Storage Service (Amazon S3) bucket to save the results of your posture tracker for analysis in Amazon QuickSight. If you don’t have an Amazon S3 Bucket, see How to create an S3 bucket.

To create a Lambda function to deploy to AWS DeepLens, complete the following steps:

  1. Download aws-deeplens-posture-lambda.zip onto your computer.
  2. On the Lambda console, choose Create Function.
  3. Choose Author from scratch and choose the following options:
    1. For Runtime, choose Python 3.7.
    2. For Choose or create an execution role, choose Use an existing role.
    3. For Existing role, enter service-role/AWSDeepLensLambdaRole.
  4. After you create the function, go to function’s detail page.
  5. For Code entry type¸ choose Upload zip.
  6. Upload the aws-deeplens-posture-lambda.zip you downloaded earlier.
  7. Choose Save.
  8. In the AWS Lambda code editor, select the lambda_funtion.py file and enter an Amazon S3 bucket where you want to store the results.
    S3_BUCKET = '<YOUR_S3_BUCKET_NAME>'

  9. Choose Save.
  10. From the Actions drop-down menu, choose Publish new version.
  11. Enter a version number and choose Publish. Publishing the function makes it available on the AWS DeepLens console so you can add it to your custom project.
  12. Give your AWS DeepLens Lambda function permissions to put files in the Amazon S3 bucket. Inside your Lambda function editor, click on Permissions, then click on the AWSDeepLensLambda role name.
  13. You will be directed to the IAM editor for the AWSDeepLensLambda role. Inside the IAM role editor, click Attach Policies.
  14. Type in S3 to search for the AmazonS3 policy and check the AmazonS3FullAccess policy. Click Attach Policy.

Understanding the Lambda function

This section walks you through some important parts of the Lambda function.

You load the GluonCV model with the following code:

detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', 
                pretrained=True, root='/opt/awscam/artifacts/')
pose_net = model_zoo.get_model('simple_pose_resnet18_v1b', 
                pretrained=True, root='/opt/awscam/artifacts/')

# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.

detector.reset_class(["person"], reuse_weights=['person'])

You run the model frame-per-frame over the images from the camera with the following code:

ret, frame = awscam.getLastFrame()
img = mx.nd.array(frame)
x, img = data.transforms.presets.ssd.transform_test(img, short=200)

class_IDs, scores, bounding_boxs = detector(x)
pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs)

predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord(predicted_heatmap, upscale_bbox)

The following code shows you how to send the text prediction results back to the cloud. Viewing the text results in the cloud is a convenient way to make sure the model is working correctly. Each AWS DeepLens device has a dedicated iot_topic automatically created to receive the inference results.

# Send the top k results to the IoT console via MQTT
cloud_output = {
        'boxes': bounding_boxs,
        'box_scores': scores,
        'coords': pred_coords,
        'coord_scors': confidence
    }
client.publish(topic=iot_topic, payload=json.dumps(cloud_output))

Using the preceding key points, you can apply the geometric rules shown in the following sections to calculate angles between the body joints to determine if the person is sitting, standing, or hunching. You can change the geometric rules to suit your setup. As a follow-up activity to this tutorial, you can collect the pose data and train a simple ML model to more accurately predict when someone is standing or sitting.

Sitting vs. Standing

To determine if a person is standing or sitting, use the angle between the horizontal (ground) and the line connecting the hip and knee.

Hunching

When a person hunches, their head is typically looking down and their back is crooked. You can use the angles between the ear and shoulder and the shoulder and hip to determine if someone is hunching. Again, you can modify these geometric rules as you see fit. The following code inside the provided AWS DeepLens Lambda function determines if a person is hunching:

def hip_and_hunch_angle(left_array):
    '''

    :param left_array: pass in the left most coordinates of a person , should be ok, since from side left and right overlap
    :return:
    '''
    # hip to knee angle
    hipX = left_array[-2][0] - left_array[-3][0]
    hipY = left_array[-2][1] - left_array[-3][1]

    # hunch angle = (hip to shoulder ) - (shoulder to ear )
    # (hip to shoulder )
    hunchX1 = left_array[-3][0] - left_array[-6][0]
    hunchY1 = left_array[-3][1] - left_array[-6][1]

    ang1 = degrees(atan2(hunchY1, hunchX1))

    # (shoulder to ear)
    hunchX2 = left_array[-6][0] - left_array[-7][0]
    hunchY2 = left_array[-6][1] - left_array[-7][1]
    ang2 = degrees(atan2(hunchY2, hunchX2))

    return degrees(atan2(hipY, hipX)), abs(ang1 - ang2)


def sitting_and_hunching(left_array):
    hip_ang, hunch_ang = hip_and_hunch_angle(left_array)
    if hip_ang < 25 or hip_ang > 155:
        print("sitting")
        hip = 0
    else:
        print("standing")
        hip = 1
    if hunch_ang < 3:
        print("no hunch")
        hunch = 0
    else:
        hunch = 1
    return hip, hunch

Deploying the Lambda inference function to your AWS DeepLens device

To deploy your Lambda inference function to your AWS DeepLens device, complete the following steps:

  1. On the AWS DeepLens console, under Projects, choose Create new project.
  2. Choose Create a new blank project.
  3. For Project name, enter posture-tracker.
  4. Choose Add model.

To deploy a project, AWS DeepLens requires you to select a model and a Lambda function. In this tutorial, you are downloading the GluonCV models directly onto AWS DeepLens from inside your Lambda function so you can choose any existing model on the AWS DeepLens console to be deployed. The model selected on the AWS DeepLens console only serves as a stub and isn’t be used in the Lambda function. If you don’t have an existing model, deploy a sample project and select the sample model.

  1. Choose Add function.
  2. Choose the Lambda function you created earlier.
  3. Choose Create.
  4. Select your newly created project and choose Deploy to device.
  5. On the Target device page, select your device from the list.
  6. Choose Review.
  7. On the Review and deploy page, choose Deploy.

To verify that the project has deployed successfully, you can check the text prediction results sent back to the cloud via AWS IoT Greengrass. For instructions on how to view the text results, see Viewing text output of custom model in AWS IoT Greengrass.

In addition to the text results, you can view the pose detection results overlaid on top of your AWS DeepLens live video stream. For instructions on viewing the live video stream, see Viewing AWS DeepLens Output Streams.

The following screenshot shows what you will see in the project stream:

Sending text messages to reminders to stand and stretch

In this section, you use Amazon Simple Notification Service (Amazon SNS) to send reminder text messages when your posture tracker determines that you have been sitting or hunching for an extended period of time.

  1. Register a new SNS topic to publish messages to.
  2. After you create the topic, copy and save the topic ARN, which you need to refer to in the AWS DeepLens Lambda inference code.
  3. Subscribe your phone number to receive messages posted to this topic.

Amazon SNS sends a confirmation text message before your phone number can receive messages.

You can now change the access policy for the SNS topic to allow AWS DeepLens to publish to the topic.

  1. On the Amazon SNS console, choose Topics.
  2. Choose your topic.
  3. Choose Edit.
  4. On the Access policy tab, enter the following code:

    {
      "Version": "2008-10-17",
      "Id": "lambda_only",
      "Statement": [
        {
          "Sid": "allow-lambda-publish",
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          },
          "Action": "sns:Publish",
          "Resource": "arn:aws:sns:us-east-1:your-account-no:your-topic-name",
          "Condition": {
            "StringEquals": {
              "AWS:SourceOwner": "your-AWS-account-no"
            }
          }
        }
      ]
    }
    

  5. Update the AWS DeepLens Lambda function with the ARN for the SNS topic. See the following code:
    def publishtoSNSTopic(SittingTime=None, hunchTime=None):
        sns = boto3.client('sns')
        
        # Publish a simple message to the specified SNS topic
        response = sns.publish(
        TopicArn='arn:aws:sns:us-east-1:xxxxxxxxxx:deeplenspose', # update topic arn
        Message='Alert: You have been sitting for {}, Stand up and stretch, and you have hunched for {}'.format(
        SittingTime, hunchTime),
        )
        
        print(SittingTime, hunchTime)
    

Visualizing your posture data over time with Amazon QuickSight

This next section shows you how to visualize your posture data with Amazon QuickSight. You first need to store the posture data in Amazon S3.

Storing the posture data in Amazon S3

The following code example records posture data one time every second; you can adjust this interval to suit your needs. The code writes the records to a CSV file every 60 seconds and uploads the results to the Amazon S3 bucket you created earlier.

  if len(physicalList) > 60:
            try:
                with open('/tmp/temp2.csv', 'w') as f:
                    writer = csv.writer(f)
                    writer.writerows(physicalList)
                physicalList = []
                write_to_s3('/tmp/temp2.csv', S3_BUCKET,
                            "Deeplens-posent/gluoncvpose/physicalstate-" + datetime.datetime.now().strftime(
                                "%Y-%b-%d-%H-%M-%S") + ".csv")
            except Exception as e:
                print(e)

Your Amazon S3 bucket now starts to fill up with CSV files containing posture data. See the following screenshot.

Using Amazon QuickSight

You can now use Amazon QuickSight to create an interactive dashboard to visualize your posture data. First, make sure that Amazon QuickSight has access to the S3 bucket with your pose data.

  1. On the Amazon QuickSight console, from the menu bar, choose Manage QuickSight.
  2. Choose Security & permissions.
  3. Choose Add or remove.
  4. Select Amazon S3.
  5. Choose Select S3 buckets.
  6. Select the bucket containing your pose data.
  7. Choose Update.
  8. On the Amazon QuickSight landing page, choose New analysis.
  9. Choose New data set.

You see a variety of options for data sources.

  1. Choose S3.

A pop-up window appears that asks for your data source name and manifest file. A manifest file tells Amazon QuickSight where to look for your data and how your dataset is structured.

  1. To build a manifest file for your posture data files in Amazon S3, open your preferred text editor and enter the following code:
    { "fileLocations": [ { "URIPrefixes": ["s3://YOUR_BUCKET_NAME/FOLDER_OF_POSE_DATA" ] } ], "globalUploadSettings": { "format": "CSV", "delimiter": ",", "textqualifier": "'", "containsHeader": "true" } }

  2. Save the text file with the name manifest.json.
  3. In the New S3 data source window, select Upload.
  4. Upload your manifest file.
  5. Choose Connect.

If you set up the data source successfully, you see a confirmation window like the following screenshot.

To troubleshoot any access or permissions errors, see How do I allow Amazon QuickSight access to my S3 bucket when I have a deny policy?

  1. Choose Visualize.

You can now experiment with the data to build visualizations. See the following screenshot.

The following bar graphs show visualizations you can quickly make with the posture data.

For instructions on creating more complex visualizations, see Tutorial: Create an Analysis.

Conclusion

In this post, you learned how to use Jupyter notebooks to prototype with AWS DeepLens, deploy a pre-trained GluonCV pose detection model to AWS DeepLens, send text messages using Amazon SNS based on triggers from the pose model, and visualize the posture data with Amazon QuickSight. You can deploy other GluonCV pre-trained models to AWS DeepLens or replace the hard-coded rules for classifying standing and sitting positions with a robust machine learning model. You can also dive deeper with Amazon QuickSight to reveal posture patterns over time.

For a detailed walkthrough of this tutorial and other tutorials, sample code, and project ideas with AWS DeepLens, see AWS DeepLens Recipes.


About the Authors

Phu Nguyen is a Product Manager for AWS DeepLens. He builds products that give developers of any skill level an easy, hands-on introduction to machine learning.

 

 

 

 

Raj Kadiyala is an AI/ML Tech Business Development Manager in AWS WWPS Partner Organization. Raj has over 12 years of experience in Machine Learning and likes to spend his free time exploring machine learning for practical every day solutions and staying active in the great outdoors of Colorado.

 

 

 

Read More

Google at ICML 2020

Google at ICML 2020

Posted by Jaqui Herman and Cat Armato, Program Managers

Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.

As a leader in machine learning research, Google is proud to be a Platinum Sponsor of the thirty-seventh International Conference on Machine Learning (ICML 2020), a premier annual event taking place virtually this week. With over 100 accepted publications and Googlers participating in workshops, we look forward to our continued collaboration with the larger machine learning research community.

If you’re registered for ICML 2020, we hope you’ll visit the Google virtual booth to learn more about the exciting work, creativity and fun that goes into solving some of the field’s most interesting challenges. You can also learn more about the Google research being presented at ICML 2020 in the list below (Google affiliations bolded).

ICML Expo
Google Dataset Search: Building an Open Ecosystem for Dataset Discovery
Natasha Noy

End-to-end Bayesian inference workflows in TensorFlow Probability
Colin Carroll

Publications
Population-Based Black-Box Optimization for Biological Sequence Design
Christof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley

Predictive Coding for Locally-Linear Control
Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung Bui

FedBoost: A Communication-Efficient Algorithm for Federated Learning
Jenny Hamer, Mehryar Mohri, Ananda Theertha Suresh

Faster Graph Embeddings via Coarsening
Matthew Fahrbach, Gramoz Goranci, Richard Peng, Sushant Sachdeva, Chi Wang

Revisiting Fundamentals of Experience Replay
William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Boosting for Control of Dynamical Systems
Naman Agarwal, Nataly Brukhim, Elad Hazan, Zhou Lu

Neural Clustering Processes
Ari Pakman, Yueqi Wang, Catalin Mitelut, JinHyung Lee, Liam Paninski

The Tree Ensemble Layer: Differentiability Meets Conditional Computation
Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder

Representations for Stable Off-Policy Reinforcement Learning
Dibya Ghosh, Marc Bellemare

REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang

Context Aware Local Differential Privacy
Jayadev Acharya, Keith Bonawitz, Peter Kairouz, Daniel Ramage, Ziteng Sun

Scalable Deep Generative Modeling for Sparse Graphs
Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans

Deep k-NN for Noisy Labels
Dara Bahri, Heinrich Jiang, Maya Gupta

Revisiting Spatial Invariance with Low-Rank Local Connectivity
Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Incremental Sampling Without Replacement for Sequence Models
Kensen Shi, David Bieber, Charles Sutton

SoftSort: A Continuous Relaxation for the argsort Operator
Sebastian Prillo, Julian Martin Eisenschlos

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation (see blog post)
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

Learning to Stop While Learning to Predict
Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, Le Song

Bandits with Adversarial Scaling
Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme

SimGANs: Simulator-Based Generative Adversarial Networks for ECG Synthesis to Improve Deep ECG Classification
Tomer Golany, Daniel Freedman, Kira Radinsky

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization
Geoffrey Negiar, Gideon Dresdner, Alicia Yi-Ting Tsai, Laurent El Ghaoui, Francesco Locatello, Robert M. Freund, Fabian Pedregosa

Implicit differentiation of Lasso-type models for hyperparameter optimization
Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Infinite attention: NNGP and NTK for deep attention networks
Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
Asaf Cassel, Alon Cohen, Tomer Koren

Adversarial Learning Guarantees for Linear Hypotheses and Neural Networks
Pranjal Awasthi, Natalie Frank, Mehryar Mohri

Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization
Daniel Golovin, Qiuyi (Richard) Zhang

Generating Programmatic Referring Expressions via Program Synthesis
Jiani Huang, Calvin Smith, Osbert Bastani, Rishabh Singh, Aws Albarghouthi, Mayur Naik

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach
Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (see blog post)
Esteban Real, Chen Liang, David R. So, Quoc V. Le

How Good is the Bayes Posterior in Deep Neural Networks Really?
Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Which Tasks Should Be Learned Together in Multi-task Learning?
Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

Disentangling Trainability and Generalization in Deep Neural Networks
Lechao Xiao, Jeffrey Pennington, Samuel S. Schoenholz

The Many Shapley Values for Model Explanation
Mukund Sundararajan, Amir Najmi

Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou, Lihong Li, Quanquan Gu

Automatic Shortcut Removal for Self-Supervised Representation Learning
Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen

Federated Learning with Only Positive Labels
Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

How Recurrent Networks Implement Contextual Processing in Sentiment Analysis
Niru Maheswaranathan, David Sussillo

Supervised Learning: No Loss No Cry
Richard Nock, Aditya Krishna Menon

Ready Policy One: World Building Through Active Learning
Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Weakly-Supervised Disentanglement Without Compromises
Francesco Locatello, Ben Poole, Gunnar Raetsch, Bernhard Schölkopf, Olivier Bachem, Michael Tschannen

Fast Differentiable Sorting and Ranking
Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga

Debiased Sinkhorn barycenters
Hicham Janati, Marco Cuturi, Alexandre Gramfort

Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure
John Sipple

Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar

An Optimistic Perspective on Offline Reinforcement Learning (see blog post)
Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Ben Adlam, Jeffrey Pennington

Private Query Release Assisted by Public Data
Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Zhiwei Steven Wu

Learning and Evaluating Contextual Embedding of Source Code
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi

Evaluating Machine Accuracy on ImageNet
Vaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, Ludwig Schmidt

Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly

Domain Aggregation Networks for Multi-Source Domain Adaptation
Junfeng Wen, Russell Greiner, Dale Schuurmans

Planning to Explore via Self-Supervised World Models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Context-Aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
Kimin Lee, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
Binghong Chen, Chengtao Li, Hanjun Dai, Le Song

On the Consistency of Top-k Surrogate Losses
Forest Yang, Sanmi Koyejo

Dual Mirror Descent for Online Allocation Problems
Haihao Lu, Santiago Balseiro, Vahab Mirrokni

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Batch Stationary Distribution Estimation
Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

Small-GAN: Speeding Up GAN Training Using Core-Sets
Samarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus Odena

Data Valuation Using Reinforcement Learning
Jinsung Yoon, Sercan ‎Ö. Arik, Tomas Pfister

A Game Theoretic Perspective on Model-Based Reinforcement Learning
Aravind Rajeswaran, Igor Mordatch, Vikash Kumar

Encoding Musical Style with Transformer Autoencoders
Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

The Shapley Taylor Interaction Index
Kedar Dhamdhere, Mukund Sundararajan, Ashish Agarwal

Multidimensional Shape Constraints
Maya Gupta, Erez Louidor, Olexander Mangylov, Nobu Morioka, Taman Narayan, Sen Zhao

Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh

Learning to Score Behaviors for Guided Policy Optimization
Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations
Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Jörn-Henrik Jacobsen

Optimizing Black-Box Metrics with Adaptive Surrogates
Qijia Jiang, Olaoluwa Adigun, Harikrishna Narasimhan, Mahdi Milani Fard, Maya Gupta

Circuit-Based Intrinsic Methods to Detect Overfitting
Sat Chatterjee, Alan Mishchenko

Automatic Reparameterisation of Probabilistic Programs
Maria I. Gorinova, Dave Moore, Matthew D. Hoffman

Stochastic Flows and Geometric Optimization on the Orthogonal Group
Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics
Matthew Hoffman, Yi-An Ma

Concise Explanations of Neural Networks Using Adversarial Training
Prasad Chalasani, Jiefeng Chen, Amrita Roy Chowdhury, Somesh Jha, Xi Wu

p-Norm Flow Diffusion for Local Graph Clustering
Shenghao Yang, Di Wang, Kimon Fountoulakis

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models
Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag

Robust Pricing in Dynamic Mechanism Design
Yuan Deng, Sébastien Lahaie, Vahab Mirrokni

Differentiable Product Quantization for Learning Compact Embedding Layers
Ting Chen, Lala Li, Yizhou Sun

Adaptive Region-Based Active Learning
Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang

Countering Language Drift with Seeded Iterated Learning
Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

Does Label Smoothing Mitigate Label Noise?
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Acceleration Through Spectral Density Estimation
Fabian Pedregosa, Damien Scieur

Momentum Improves Normalized SGD
Ashok Cutkosky, Harsh Mehta

ConQUR: Mitigating Delusional Bias in Deep Q-Learning
Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

Online Learning with Imperfect Hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks
Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

On Implicit Regularization in β-VAEs
Abhishek Kumar, Ben Poole

Is Local SGD Better than Minibatch SGD?
Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Sreb

A Simple Framework for Contrastive Learning of Visual Representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

Universal Average-Case Optimality of Polyak Momentum
Damien Scieur, Fabian Pedregosa

An Imitation Learning Approach for Cache Replacement
Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn

Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems
Zhe Dong, Bryan A. Seybold, Kevin P. Murphy, Hung H. Bui

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels
Lu Jiang, Di Huang, Mason Liu, Weilong Yang

Optimizing Data Usage via Differentiable Rewards
Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig

Sparse Sinkhorn Attention
Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control
Wenlong Huang, Igor Mordatch, Deepak Pathak

On Thompson Sampling with Langevin Algorithms
Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael I. Jordan

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, Qiang Liu

On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Concept Bottleneck Models
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

Supervised Quantile Normalization for Low-Rank Matrix Approximation
Marco Cuturi, Olivier Teboul, Jonathan Niles-Weed, Jean-Philippe Vert

Missing Data Imputation Using Optimal Transport
Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention Over Modules
Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

Stochastic Optimization for Regularized Wasserstein Estimators
Marin Ballu, Quentin Berthet, Francis Bach

Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Jakkam Reddi, Sanjiv Kumar

Rigging the Lottery: Making All Tickets Winners
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

Online Learning with Dependent Stochastic Feedback Graphs
Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang

Calibration, Entropy Rates, and Memory in Language Models
Mark Braverman, Xinyi Chen, Sham Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang

Composable Sketches for Functions of Frequencies: Beyond the Worst Case
Edith Cohen, Ofir Geri, Rasmus Pagh

Energy-Based Processes for Exchangeable Data
Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans

Near-Optimal Regret Bounds for Stochastic Shortest Path
Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization (see blog post)
Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu

The Complexity of Finding Stationary Points with Stochastic Gradient Descent
Yoel Drori, Ohad Shamir

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks
Jakub Swiatkowski, Kevin Roth, Bas Veeling, Linh Tran, Josh Dillon, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Regularized Optimal Transport is Ground Cost Adversarial
François-Pierre Paty, Marco Cuturi

Workshops
New In ML
Invited Speaker: Nicolas Le Roux
Organizers: Zhen Xu, Sparkle Russell-Puleri, Zhengying Liu, Sinead A Williamson, Matthias W Seeger, Wei-Wei Tu, Samy Bengio, Isabelle Guyon

LatinX in AI
Workshop Advisor: Pablo Samuel Castro

Women in Machine Learning Un-Workshop
Invited Speaker: Doina Precup
Sponsor Expo Speaker: Jennifer Wei

Queer in AI
Invited Speaker: Shakir Mohamed

Workshop on Continual Learning
Organizers: Haytham Fayek, Arslan Chaudhry, David Lopez-Paz, Eugene Belilovsky, Jonathan Schwarz, Marc Pickett, Rahaf Aljundi, Sayna Ebrahimi, Razvan Pascanu, Puneet Dokania

5th ICML Workshop on Human Interpretability in Machine Learning (WHI)
Organizers: Kush Varshney, Adrian Weller, Alice Xiang, Amit Dhurandhar, Been Kim, Dennis Wei, Umang Bhatt

Self-supervision in Audio and Speech
Organizers: Mirco Ravanelli, Dmitriy Serdyuk, R Devon Hjelm, Bhuvana Ramabhadran, Titouan Parcollet

Workshop on eXtreme Classification: Theory and Applications
Invited Speakers: Sanjiv Kumar

Healthcare Systems, Population Health, and the Role of Health-tech
Organizers: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

Theoretical Foundations of Reinforcement Learning
Program Committee: Alon Cohen, Chris Dann

Uncertainty and Robustness in Deep Learning Workshop (UDL)
Invited Speaker: Justin Gilmer

Organizers: Sharon Li, Balaji Lakshminarayanan, Dan Hendrycks, Thomas Dietterich, Jasper Snoek
Program Committee: Jeremiah Liu, Jie Ren, Rodolphe Jenatton, Zack Nado, Alexander Alemi, Florian Wenzel, Mike Dusenberry, Raphael Lopes

Beyond First Order Methods in Machine Learning Systems
Industry Panel: Jonathan Hseu

Object-Oriented Learning: Perception, Representation, and Reasoning
Invited Speakers: Thomas Kipf, Igor Mordatch

Graph Representation Learning and Beyond (GRL+)
Organizers: Michael Bronstein, Andreea Deac, William L. Hamilton, Jessica B. Hamrick, Milad Hashemi, Stefanie Jegelka, Jure Leskovec, Renjie Liao, Federico Monti, Yizhou Sun, Kevin Swersky, Petar Veličković, Rex Ying, Marinka Žitnik
Speakers: Thomas Kipf
Program Committee: Bryan Perozzi, Kevin Swersky, Milad Hashemi, Thomas Kipf, Ting Cheng

ML Interpretability for Scientific Discovery
Organizers: Subhashini Venugopalan, Michael Brenner, Scott Linderman, Been Kim
Program Committee: Akinori Mitani, Arunachalam Narayanaswamy, Avinash Varadarajan, Awa Dieng, Benjamin Sanchez-Lengeling, Bo Dai, Stephan Hoyer, Subham Sekhar Sahoo, Suhani Vora
Steering Committee: John Platt, Mukund Sundararajan, Jon Kleinberg

Negative Dependence and Submodularity for Machine Learning
Organizers: Zelda Mariet, Mike Gartrell, Michal Derezinski

7th ICML Workshop on Automated Machine Learning (AutoML)
Organizers: Charles Weill, Katharina Eggensperger, Matthias Feurer, Frank Hutter, Marius Lindauer, Joaquin Vanschoren

Federated Learning for User Privacy and Data Confidentiality
Keynote: Brendan McMahan
Program Committee: Peter Kairouz, Jakub Konecný

MLRetrospectives: A Venue for Self-Reflection in ML Research
Speaker: Margaret Mitchell

Machine Learning for Media Discovery
Speaker: Ed Chi

INNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models
Organizers: Chin-Wei Huang, David Krueger, Rianne van den Berg, George Papamakarios, Chris Cremer, Ricky Chen, Danilo Rezende

4th Lifelong Learning Workshop
Program Committee: George Tucker, Marlos C. Machado

2nd ICML Workshop on Human in the Loop Learning (HILL)
Organizers: Shanghang Zhang, Xin Wang, Fisher Yu, Jiajun Wu, Trevor Darrell

Machine Learning for Global Health
Organizers: Danielle Belgrave, Danielle Belgrave, Stephanie Hyland, Charles Onu, Nicholas Furnham, Ernest Mwebaze, Neil Lawrence

Committee
Social Chair: Adam White

Work performed while at Google

Read More

Sharing Pixelopolis, a self-driving car demo from Google I/O built with TF-Lite

Sharing Pixelopolis, a self-driving car demo from Google I/O built with TF-Lite

Posted by Miguel de Andrés-Clavera, Product Manager, Google PI

In this post, I’d like to share with you a demo we built for (and had planned to show at) Google I/O this year with TensorFlow Lite. I wish we had the opportunity to meet in person, but I hope you find this article interesting nonetheless!

Pixelopolis

Pixelopolis is an interactive installation that showcases self-driving miniature cars powered by TensorFlow Lite. Each car is outfitted with its own Pixel phone, which used its camera to detect and understand signals from the world around it. In order to sense lanes, avoid collisions and read traffic signs, the phone uses machine learning running on the Pixel Neural Core, which contains a version of an Edge TPU.

An edge computing implementation is a good option to make projects like this possible. Processing video and detecting objects are much more difficult using Cloud-based methods – due to latency. If you can, doing it on-device is much faster.

Users can interact with Pixelopolis via a “station” (an app running on a phone), where they can select the destination the car will drive to. The car will navigate to the destination, and during the journey, the app shows real-time streaming video from the Car — this allows the user to see what the car sees and detects. As you may notice from the gifs below, Pixelopolis has multilingual support built-in as well.

Station App
Car App

How it works

Using the front camera on a mobile device, we perform lane-keeping, localization and object detection right on the device in real-time. Not only that, in our case, the Pixel 4 also controls the motors and other electronic components via USB-C, so the car can stop when it detects other cars or turn at a right interaction when it needs to.

If you’re interested in technical details, the remainder of this article describes the major components of the car, and our journey building it.

Lane-keeping

We explored a variety of models for Lane-keeping. As a baseline, we used a CNN to detect the traffic lines in each frame and adjust the steering wheel every frame, which works fine. We improved this by adding an LSTM and using multiple previous frames. After experimenting a bit more, we followed a similar model architecture to this paper.

CNN model input and output

Model Architecture

net_in = Input(shape = (80, 120, 3))
x = Lambda(lambda x: x/127.5 - 1.0)(net_in)
x = Conv2D(24, (5, 5), strides=(2, 2),padding="same", activation='elu')(x)
x = Conv2D(36, (5, 5), strides=(2, 2),padding="same", activation='elu')(x)
x = Conv2D(48, (5, 5), strides=(2, 2),padding="same", activation='elu')(x)
x = Conv2D(64, (3, 3), padding="same",activation='elu')(x)
x = Conv2D(64, (3, 3), padding="same",activation='elu')(x)
x = Dropout(0.3)(x)
x = Flatten()(x)
x = Dense(100, activation='elu')(x)
x = Dense(50, activation='elu')(x)
x = Dense(10, activation='elu')(x)
net_out = Dense(1, name='net_out')(x)
model = Model(inputs=net_in, outputs=net_out)

Data Collection

Before we are able to use this model, we need to find a way to collect the image data from the car to train. The problem is we didn’t have a car or a track to use at the time. So, we decided to use a simulator. We chose Unity and this simulator project from Udacity for lane-keeping data collection.

Multiple waypoints on the track in the simulator

By setting multiple waypoints on the track, the car bot is able to drive to different locations and also collects data for us. In this simulator, we collect image data and steering angle every 50ms.

Image Augmentation

Data Augmentation with various environments

Since we do all data collection within the simulator, we need to create various environments in the scene because we want our model to be able to handle different lighting, background environment and other noises. We added these variables to the scene: random HDRI sphere ( with different rotation and exposure values), random brightness and color, and random cars.

Training

Output from the first Neural Network layer

Training the ML model using only the simulator doesn’t mean it will actually work in the real-world situation, at least not the first try. The car ran on the tracks for a few seconds and then just went off the track for various reasons.

Early versions of the toy car running off the track/td>

Later, we found out that we only trained the model using mostly straight tracks. To fix this imbalance data issue, we added various shapes of curves.

(Left) square shape track, (Right) Curvy track

After fixing the imbalanced dataset, the car began to correctly navigate corners.

Car successfully turn at the corners

Training with the final track design

Final track design

We started creating more complex situations for the car, such as adding multiple intersections to the tracks. We also added more routing paths to make the car handle these new conditions. However, we ran into new problems right away which is the car turned and hit the side track when it tried to turn at the intersection because it saw some random objects outside the track.

Training the model with additional routing

We tested out many solutions and went with the one that was most simple and effective. We cropped only the bottom ¼ of the image and fed it to the lane keeping model, then adjusted the model input size to 120×40 and it works like a charm.

Cropping bottom part of the image for lane-keeping

Object Detection

We use object detection for two purposes. One is for localization. Each car needs to know where it is in the city by detecting objects in its environment (in this case, we detect the traffic signs in the city). The other purpose is to detect other cars, so they won’t bump into each other.

For choosing the object detector model there are many models already available in TensorFlow object detection model zoo. But, for the Pixel 4 edge TPU, we use the ssd_mobilenet_edgetpu model.

ssd_mobilenet_edgetpu model on Pixel 4’s “Neural Core” Edge TPU is currently the fastest mobilenet object detection. It takes only 6.6 ms per frame, which is more than enough for using with real-time applications.

Pixel 4 Edge TPU model performance

Data labelling and Simulation

We use image data from both simulation and real scenes to train the model. Next, we developed our own simulator for this using Unreal Engine 4. The simulator generates random objects with random background and also an annotation file in a Pascal VOC format that is used in TensorFlow object detection API.

Object detection simulator using UE4

For images that were taken from the real scene We have to do manual labeling using the labelImg tool.

Data labeling with labelImg

Training

Loss report

We used TensorBoard to monitor training progress. We use it to evaluate mAP (mean Average Precision), which normally you have to do it manually.

TensorBoard
Detection result and the groundtruth

TensorFlow Lite

Since we want to run our ML model on the Pixel 4, which is running Android, we need to convert all the models to .tflite. Of course, you can use TensorFlow lite to target iOS and other devices as well (including microcontrollers). Here are the steps we did:

Lane keeping

First, we convert the lane keeping model from .h5 to .tflite by using

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model_file("lane_keeping.h5")
model = converter.convert()
file = open("lane_keeping.tflite",'wb')
file.write( model )
file.close()

Now, we have the model ready for the Android project. Next, we build a lane keeping class in our app. We began with an example android project from here.

Object detection

We have to convert the model checkpoint (.ckpt) to tensorflow lite format (.tflite)

  1. Using export_tflite_ssd_graph.py script to convert .ckpt to .pb file (the script already provided in Tensorflow object detector API)
  2. Using toco: TensorFlow Lite Converter to convert .pb to .tflite format

Using Neural Core

We use an Android sample project from here. Then we modified the delegate to use Pixel 4 Edge TPU with the following code.

Interpreter.Options tfliteOptions = new Interpreter.Options();
nnApiDelegate = new NnApiDelegate();
tfliteOptions.addDelegate(nnApiDelegate);
tfLite = new Interpreter(loadModelFile(assetManager, modelFilename),tfliteOptions);

Real-time Video Streaming

After a user selects a destination, the car will start driving itself. While it’s driving, the car will stream what it sees to the station phone as a video feed. When we started implementing this part, we knew right away that streaming a raw video feed wouldn’t be possible due to the amount of data that we need to transfer between several car phones and station phones. The solution that we use is, first, compress a raw image frame to a JPEG format to reduce the amount of that data, then stream the JPEG buffer via http protocol using multipart/x-mixed-replace as an HTTP Content-type. This way we can achieve several video streams at the same time with unnoticeable lag between the devices.

Server App

Server Stack

We use NodeJS for the server app and MongoDB for the database.

Hail a Car

Since we have multiple stations and cars, we need to find a way to connect these two together. We built a booking system similar to popular car apps. Our booking system has 3 steps. First, the car connects to the server and tells the server that it’s ready to be booked. Second, the station connects to the server and asks the server for a car. Third, the server looks for the car that’s ready and connects these two together and also stores the device_id from both station and car apps.

Navigation

Node/Edge

Since we will have a fleet of cars running around in the city, we need to find a way to navigate them. We use the Node/Edge concept. Node is a place on the map and Edge is the path between two Nodes. We then map each node to the actual signs in the city.

Top view of the tracks and sign locations

When the destination is selected on the station app, the station will send node_id to the server and the server will return an object which indicates a list of nodes and their properties so the car knows where to drive to and the expected sign it will see.

Electronics

Parts

We started off with NUCLEO-F411RE as our development board. We chose Dynamixel for the motors.

NUCLEO-F411RE

We designed and developed a shield for additional components such as motors to reduce the number of wires inside the car chassis.There are three parts in the shield: 1) Battery measurement in voltage, 2) On/off switch with MOSFET, 3) Buttons.

(Left) Shield and Motors, (Right) Power socket, power switch, Enable motor button, Reset Motor button, Board status LED, Motor status LED

In the later phase, we would like to make the car a lot smaller, so we moved from NUCLEO-F411RE to NUCLEO-L432KC because it has a lot smaller footprint.

NUCLEO-L432KC

Car Chassis & Exterior

Mark I

Mark I Design

We designed and 3D printed the car chassis with PLA material. The front wheels are castor wheels.

Mark II

Mark II Design

We added a battery measurement circuit to the board and cut off the power when the phone detached from the board.

Mark III

Mark III Design

We added status LEDs so we can easily debug the state of the board. From the previous version, we encountered a motor overheating issue, so in this version we improved the ventilation by adding a fan to the motor. We also added a USB Type-C power delivery to the board so the phone can use the car battery.

Mark IV

Mark IV Design

We moved all the control buttons and status LEDs to the back of the car for an easy access.

Mark V

Mark V Design

This is the final version and we need to reduce the car footprint as much as possible. First, we changed the board from NUCLEO-F411RE to NUCLEO-L432KC to achieve a smaller footprint. Second, the front wheel has been changed to ball caster wheels. Third, we rearranged the board location to the top of the car and stacked the battery underneath the board. Lastly, we removed the USB Type-C power delivery because we want to prolong the driving time by giving all battery power to the board and motors instead of the phone.

Performance metrics

Roadmap

There are many areas that we plan to improve this experience.

Battery

Currently, the motor and the controller board are powered by three packs of 3000mAh lithium-ion batteries and we have a charging circuit to handle the charging process. When we want to charge the battery, we would need to move the car to the charging station and plug the power adapter to the back of the car to charge. This has a lot of downsides because the car won’t be able to run on the track if it’s charging and the charging time is a few hours which is quite long.

3000mAh Li-ion Battery (left), 18650 Li-ion Battery (right)

We would like to reduce this process by changing the battery to an 18650 battery cell instead. This type of battery is used in electronics such as laptops, tools, and e-bikes, due to the high capacity in a small form factor. This way we can swap the battery easily by popping in the new ones and let the empty ones charge in the battery charger without leaving the car at the charging station.

Localization

Localization with SLAM

Localization is a very important process for this installation and we would like to make it more robust by adding SLAM to our app. We believe that this would improve the turning mechanism significantly.

Learning more

Thanks so much for reading! It’s incredible what you can do with a phone camera, TensorFlow and a bit of imagination. Hopefully, this post gave you ideas for your own projects – we learned a lot working on this one, and hope you will in yours as well. The article provides links to resources for you to delve deeper into all the different areas and you can find plenty of ML models and tutorials by the developer community to learn from on TensorFlow hub.
If you’re really passionate about building self-driving cars and want to learn more about how machine learning and deep learning are powering the autonomous vehicle industry check out Udemy’s Self Driving Cars Nanodegree Program. It’s perfect for engineers & students looking for complete training in all aspects of self-driving cars, including computer vision, sensor fusion & localization.

Acknowledgements

This project would not have been possible without the following awesome and talented group of people: Sina Hassani, Ashok Halambi, Pohung Chen, Eddie Azadi, Shigeki Hanawa, Clara Tan Su Yi, Daniel Bactol, Kiattiyot Panichprecha Praiya Chinagarn, Pittayathorn Nomrak, Nonthakorn Seelapun, Jirat Nakarit, Phatchara Pongsakorntorn, Tarit Nakavajara, Witsarut Buadit, Nithi Aiempongpaiboon, Witaya Junma, Taksapon Jaionnom and Watthanasuk Shuaytong.Read More