AI for AgriTech: Classifying Kiwifruits using Amazon Rekognition Custom Labels

Computer vision is a field of artificial intelligence (AI) that is gaining in popularity and interest largely due to increased access to affordable cloud-based training compute, more performant algorithms, and optimizations for scalable model deployment and inference. However, despite these advances in individual AI and machine learning (ML) domains, simplifying ML pipelines into coherent and observable workflows so they’re more accessible to smaller business units has remained an elusive goal. This is especially true in the agricultural technology space, where computer vision has strong potential for improving production yields through automation, but also in the area of health and safety, where dangerous jobs may be performed by AI rather than human agtech workers. Agricultural applications by AWS customers like sorting produce based on its grade and defects (IntelloLabs, Clarifruit, and Hectre) and proactively targeting pest control measures as early and as efficiently as possible (Bayer Crop Science), are some areas where computer vision shows strong promise.

Although compelling, these applications of machine vision are generally only accessible to larger agricultural enterprises due to the complexity of the train-compile-deploy-infer sequence for specific edge hardware architectures, which introduces a degree of separation between technology and the practitioners that could most benefit from it. In many cases, this disconnect is grounded in a perceived complexity of AI/ML, and the lack of a clear path for its end-to-end application in primary sectors like agriculture, forestry, and horticulture. In most cases, the prospect of hiring a qualified and experienced data scientist to explore opportunities, without the ability for managers and operators to experiment and innovate directly, is both financially and organizationally impractical. At a recent agtech presentation in New Zealand, an executive participant highlighted the lack of an end-to-end AWS computer vision solution as a limiting factor for experimentation, which would be required in order to justify organizational buy-in for more robust technology evaluation.

This post seeks to demystify how AWS AI/ML services work together, and specifically show how you can generate labeled imagery, train machine vision models against that imagery, and deploy custom image recognition models using Amazon Rekognition Custom Labels. You should be able to get up and running with a custom computer vision model within about an hour by following this tutorial, and make more informed judgments for further investment in AI/ML innovation based on data that is relevant to your specific needs.

Training image storage

As shown in the following pipeline, the first step to generating a custom computer vision model is to generate labeled images that we use to train our model. To do so, we first load our unlabeled training images into an Amazon Simple Storage Service (Amazon S3) bucket within our account, with each class being stored in its own folder under our bucket. For this example, our prediction classes are two types of kiwifruit (Golden and Monty), with images of known types. After you collect your images of each training class, simply upload them to the respective folder within your Amazon S3 bucket either through the Amazon S3 API or the AWS Management Console.

As shown in the following pipeline, the first step to generating a custom computer vision model is to generate labeled images that we use to train our model.

Setting up Amazon Rekognition

To start using Amazon Rekognition, complete the following steps:

  1. On the Amazon Rekognition console, choose Use Custom Labels.
  2. Choose Get started to create a new project.

Projects are used to store your models and training configurations.

  1. Enter a name for your project (for example, Kiwifruit-classifier-project).
  2. Choose Create.
  3. On the Datasets page, choose Create new dataset.
  4. Enter a name for the dataset (for example, kiwifruit classifier).
  5. For Image location, select Import images from Amazon S3 bucket.

For Image location, select Import images from Amazon S3 bucket.

  1. For S3 folder location, enter the location where your images are stored.
  2. For Automatic labeling, select Automatically attach a label to my images based on the folder they’re stored in.

This means that the labels of the folders are applied to each image as the class of that image.

For Policy, enter the provided JSON into the S3 bucket, to ensure that Amazon Rekognition can access that data to train the model.

  1. For Policy, enter the provided JSON into the Amazon S3 bucket, to ensure that Amazon Rekognition can access that data to train the model.

  1. Choose Submit.

Training the model

Now that we have successfully generated our labeled images using the folder names in which those images are stored, we can train our model.

  1. Choose Train model to create a project in which our models are stored after training.
  2. For Choose project, enter the ARN for the project that you created.
  3. For Choose a training dataset, choose the dataset you created.
  4. For Create test set, select Split training dataset.

This automatically reserves part of your labeled data for use in evaluating performance of our trained model.

  1. Choose Train to start your training job.

Training may take some time (depending on the number of labeled images you provided), and you can monitor progress on the Projects page.

  1. When training is finished, choose the model under your project to see its performance for each class.
  2. Under Use your model, choose API Code.

This allows you to get code samples to start and stop your model and conduct inference using the AWS Command Line Interface (AWS CLI).

It can take a few minutes to deploy the inference endpoint after starting the model.

Using your newly trained model

Now that you have a trained model that you’re happy with, using it is as simple as referencing an image from an Amazon S3 bucket using the sample API code provided in order to generate an inference. The following code is an example of Python code using the boto3 library to analyze an image:

client = boto3.client('rekognition', 

    api_output = client.detect_custom_labels(
            'S3Object': {
                'Bucket': bucket,
                'Name': 'images/' + filepath
    return api_output

Simply parse the JSON response in order to access the Name and Confidence fields of the payload for the image inference.


In this post, we learned how to use Amazon Rekognition Custom Labels with an Amazon S3 folder labeling functionality to train an image classification model, deploy that model, and use it to conduct inference. Next steps might be to follow similar steps for a multi-class classifier, or use Amazon SageMaker Ground Truth to generate data with bounding box annotations in addition to class labels. For more information and ideas for other ways to use computer vision in agriculture, check out the AWS Machine Learning Blog and the AWS for Industries: Agriculture Blog.

About the Author

STEFFEN MERTENSteffen Merten is a Startup aligned Principal Solutions Architect based in New Zealand. Prior to AWS, Steffen was Chief Data Officer for Marsello, following five years as an embedded analyst at Palantir. Steffen’s roots are in complex systems analysis with over ten years spent studying both ecological and social systems in the U.S. national security industry throughout the Middle East, South, and Central Asia.

Read More

Perform interactive data processing using Spark in Amazon SageMaker Studio Notebooks

Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). With a single click, data scientists and developers can quickly spin up Studio notebooks to explore datasets and build models. You can now use Studio notebooks to securely connect to Amazon EMR clusters and prepare vast amounts of data for analysis and reporting, model training, or inference.

You can apply this new capability in several ways. For example, data analysts may want to answer a business question by exploring and querying their data in Amazon EMR, viewing the results, and then either alter the initial query or drill deeper into the results. You can complete this interactive query process directly in a Studio notebook and run the Spark code remotely. The results are then presented in the notebook interface.

Data engineers and data scientists can also use Apache Spark for preprocessing data and use Amazon SageMaker for model training and hosting. SageMaker provides an Apache Spark library that you can use to easily train models in SageMaker using org.apache.spark.sql.DataFrame DataFrames in your EMR Spark clusters. After model training, you can also host the model using SageMaker hosting services.

This post walks you through securely connecting Studio to an EMR cluster configured with Kerberos authentication. After we authenticate and connect to the EMR cluster, we query a Hive table and use the data to train and build an ML model.

Solution walkthrough

We use an AWS CloudFormation template to set up a VPC with a private subnet to securely host the EMR cluster. Then we create a Kerberized EMR cluster and configure it to allow secure connectivity from Studio. We then create a Studio domain and a new Studio user. Finally, we use the new PySpark (SparkMagic) kernel to authenticate and connect a Studio notebook to the EMR cluster.

The PySpark (SparkMagic) kernel allows you to define specific Spark configurations and environment variables, and connect to an EMR cluster to query, analyze, and process large amounts of data. Studio comes with a SageMaker SparkMagic image that contains a PySpark kernel. The SparkMagic image also contains an AWS Command Line Interface (AWS CLI) utility, sm-sparkmagic, that you can use to create the configuration files required for the PySpark kernel to connect to the EMR cluster. For added security, you can specify that the connection to the EMR cluster uses Kerberos authentication.

Studio runs on an environment managed by AWS. In this solution, the network access for the new Studio domain is configured as VPC Only. For more details on different connectivity methods, see Securing Amazon SageMaker Studio connectivity using a private VPC. The Elastic Network Interface (ENI) created in the private subnet connects to required AWS services through VPC endpoints.

The following diagram represents the different components used in this solution.

The CloudFormation template creates a Kerberized EMR cluster and configures it with a bootstrap action to create a Linux user and install Python libraries (Pandas, requests, and Matplotlib).

You can set up Kerberos authentication in a few different ways (for more information, see Kerberos Architecture Options):

  • Cluster-dedicated Key Distribution Center (KDC)
  • Cluster-dedicated KDC with Active Directory cross-realm trust
  • External KDC
  • External KDC integrated with Active Directory

The KDC can have its own user database or it can use cross-realm trust with an Active Directory that holds the identity store. For this post, we use a cluster-dedicated KDC that holds its own user database.

First, the EMR cluster has security configuration enabled to support Kerberos and is launched with a bootstrap action to create Linux users on all nodes and install the necessary libraries. The CloudFormation template launches the bash step after the cluster is ready. This step creates HDFS directories for the Linux users with default credentials. The user must change the password the first time they log in to the EMR cluster. The template also creates and populates a Hive table with a movie reviews dataset. We use this dataset in the Explore and query data section of this post.

The CloudFormation template also creates a Studio domain and a user named defaultuser. You can access the SparkMagic image from the Studio environment.

Deploy the resources with CloudFormation

You can use the provided CloudFormation template to set up the solution’s building blocks, including the VPC, subnet, EMR cluster, Studio domain, and other required resources.

This template deploys a new Studio domain. Ensure the Region used to deploy the CloudFormation stack has no existing Studio domain.

Complete the following steps to deploy the environment:

  1. Sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user, preferably an admin user.
  2. Choose Launch Stack to launch the CloudFormation template:

  1. Choose Next.
  2. For Stack name, enter a name for the stack (for example, blog).
  3. Leave the other values as default.
  4. Continue to choose Next and leave other parameters at their default.
  5. On the review page, select the check box to confirm that AWS CloudFormation might create resources.
  6. Choose Create stack.

Wait until the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE. The process usually takes 10–15 minutes.

Connect a Studio Notebook to an EMR cluster

After we deploy our stack, we create a connection between our Studio notebook and the EMR cluster. Establishing this connection allows us to connect code to our data hosted on Amazon EMR.

Complete the following steps to set up and connect your notebook to the EMR cluster:

  1. On the SageMaker console, choose Amazon SageMaker Studio.

The first time launching a Studio session may take a few minutes to start.

  1. Choose the Open Studio link for defaultuser.

The Studio IDE opens. Next, we download the code for this walkthrough from Amazon Simple Storage Service (Amazon S3).

  1. Choose File, then choose New and Terminal.
  2. In the terminal, run the following commands:
    aws s3 cp s3://aws-ml-blog/artifacts/ml-1954/smstudio-pyspark-hive-sentiment-analysis.ipynb .
    aws s3 cp s3://aws-ml-blog/artifacts/ml-1954/ 

  3. Open the smstudio-pyspark-hive-sentiment-analysis.ipynb
  4. For Select Kernel, choose PySpark (SparkMagic).

  1. Run each cell in the notebook and explore the capabilities of Sparkmagic using the PySpark kernel.

Before you can run the code in the notebook, you need to provide the cluster ID of the EMR cluster that was created as part of the solution deployment. You can find this information on the EMR console, on the Clusters page.

  1. Substitute the placeholder value with the ID of the EMR cluster.

  1. Connect to the EMR cluster from the notebook using the open-source Studio Sparkmagic library.

The SparkMagic library is available as open source on GitHub.

  1. In the notebook toolbar, choose the Launch terminal icon () to open a terminal in the same SparkMagic image as the notebook.
  2. Run kinit user1 to get the Kerberos ticket.
  3. Enter your password when prompted.

This ticket is valid for 24 hours by default. If you’re connecting to the EMR cluster for the first time, you must change the password.

  1. Choose the notebook tab and restart the Kernel using the Restart kernel icon () from the toolbar.

This is required so that SparkMagic can pick up the generated configuration.

  1. To verify that the connection was set up correctly, run the %%info command.

This command displays the current session information.

Now that we have set up the connectivity, let’s explore and query the data.

Explore and query data

After you configure the notebook, run the code of the cells shown in the following screenshots. This connects to the EMR cluster in order to query data.

Sparkmagic allows you to run Spark code against the remote EMR cluster through Livy. Livy is an open-source REST server for Spark. For more information, see EMR Livy documentation.

Sparkmagic also creates an automatic SparkContext and HiveContext. You can use the HiveContext to query data in the Hive table and make it available in a spark DataFrame.

You can use the DataFrame to look at the shape of the dataset and size of each class (positive and negative) and visualize it using Matplotlib. The following screenshots show that we have a balanced dataset.

You can use the pyspark.sql.functions module as shown in the following screenshot to inspect the length of the reviews.

You can use SparkSQL queries using %%sql from the notebook and save results to a local DataFrame. This allows for a quick data exploration. The maximum rows returned by default is 2,500. You can set the maximum rows by using the -n argument.

As we continue through the notebook, query the movie reviews table in Hive, storing the results into a DataFrame. The Sparkmagic environment allows you to send local data to the remote cluster using %%send_to_spark. We send the S3 location (bucket and key) variables to the remote cluster, then convert the Spark DataFrame to a Pandas DataFrame. Then we upload it to Amazon S3 and use this data as an input to the preprocessing step that creates training and validation data. This data trains a sentiment analysis model using the SageMaker BlazingText algorithm.

Preprocess data and feature engineering

We perform data preprocessing and feature engineering on the data using SageMaker Processing. With SageMaker Processing, you can leverage a simplified, managed experience to run data preprocessing, data postprocessing, and model evaluation workloads on the SageMaker platform. A processing job downloads input from Amazon S3, then uploads output to Amazon S3 during or after the processing job. The script does the required text preprocessing with the movie reviews dataset and splits the dataset into train data and validation data for the model training.

The notebook uses the Scikit-learn processor within a Docker image to perform the processing job.

For this post, we use the SageMaker instance type ml.m5.xlarge for processing, training, and model hosting. If you don’t have access to this instance type and get a ResourceLimitExceeded error, use another instance type that you have access to. You can also request a service limit increase using AWS Support Center.

Train a SageMaker model

Amazon SageMaker Experiments allows us to organize, track, and review ML experiments with Studio notebooks. We can log metrics and information as we progress through the training process and evaluate results as we run the models.

We create a SageMaker experiment and trial, a SageMaker estimator, and set the hyperparameters. We then start a training job by calling the fit method on the estimator. We use Spot Instances to reduce the training cost.

Deploy the model and get predictions

When the training is complete, we host the model for real-time inference. The deploy method of the SageMaker estimator allows you to easily deploy the model and create an endpoint.

After the model is deployed, we test the deployed endpoint with test data and get predictions.

Clean up resources

Clean up the resources when you’re done, such as the SageMaker endpoint and the S3 bucket created in the notebook.

The %%cleanup -f command deletes all Livy sessions created by the notebook.


We have walked you through connecting a notebook backed by the Sparkmagic image to a kerberized EMR cluster. We then explored and queried the sample dataset from a Hive table. We used that dataset to train a sentiment analysis model with SageMaker. Finally, we deployed the model for inference.

For more information and other SageMaker resources, see the SageMaker Spark GitHub repo and Securing data analytics with an Amazon SageMaker notebook instance and Kerberized Amazon EMR cluster.

About the Authors

Graham Zulauf is a Senior Solutions Architect. Graham is focused on helping AWS’ strategic customers solve important problems at scale.




Huong Nguyen is a Sr. Product Manager at AWS. She is leading the user experience for SageMaker Studio. She has 13 years’ experience creating customer-obsessed and data-driven products for both enterprise and consumer spaces. In her spare time, she enjoys reading, being in nature, and spending time with her family.



James Sun is a Senior Solutions Architect with Amazon Web Services. James has over 15 years of experience in information technology. Prior to AWS, he held several senior technical positions at MapR, HP, NetApp, Yahoo, and EMC. He holds a PhD from Stanford University.


Naresh Kumar Kolloju is part of the Amazon SageMaker launch team. He is focused on building secure machine learning platforms for customers. In his spare time, he enjoys hiking and spending time with family.



Timothy Kwong is a Solutions Architect based out of California. During his free time, he enjoys playing music and doing digital art.




Praveen Veerath is a Senior AI Solutions Architect for AWS.




Read More

Racing Ahead, Predator Cycling Speeds Design and Development of Custom Bikes with Real-Time Rendering

The world of bicycle racing has changed. Aggressive cyclists expect their bikes to meet their every need, no matter how detailed and precise. And meeting these needs requires an entirely new approach.

Predator Cycling engineers and manufactures high-end custom-built carbon fiber bicycles that have garnered praise from championship cyclists around the world. For the past 15 years, the Predator team has designed every aspect of their frames and conducted intensive engineering simulation, rendering and manufacturing processes in-house.

“I had been following Predator for over a decade, and knew there was only one guy, Aram, who could take my raw idea, refine it, and turn it into a finished product worthy of use on track cycling’s biggest stage,” said Olympic cyclist Bobby Lea.

Recently, Predator Cycling worked on its most innovative project to date — the new RF20 frame. With increasing costs of materials and the complexity of production requirements, the RF20 spent extended time in research and development.

To bring the RF20 to market at a competitive price and minimize the build time of each bicycle, Predator Cycling knew it needed to maximize performance and efficiency. The team found the exact solution it needed with the Lenovo ThinkStation P620, powered by the NVIDIA RTX A6000 GPU.

The RTX A6000 enables Predator Cycling to process more complex models, and render and run engineering simulations in real time. With the ThinkStation P620, the team can easily handle real-time computing and multitasking, allowing them to accelerate production workflows.

RTX Hammers to the Front in Every Stage 

Since using the RTX A6000-powered ThinkStation P620, Predator Cycling has streamlined their manufacturing processes significantly. This allowed them to speed up production workflows and bring the RF20 frame out of R&D and get it out on the road. Overall, the company estimates it has shrunk its go-to-market timelines by 12-16 weeks.

“The NVIDIA RTX A6000 GPU and ThinkStation P620 deliver cutting-edge performance and speed to accelerate design processes and production times,” said Aram Goganian, co-founder of Predator Cycling. “We’re able to do complex wind drag simulations, mechanical and structural testing, and topology optimizations with AI in near real time, enabling us to show customers design changes with minimal delay.”

And it’s not just projects like the RF20 frame that have been significantly accelerated — Predator Cycling’s relationships and interactions with customers have also been enhanced. 

Each of Predator Cycling’s bikes is custom-built, so customers need real-life representations to help them select the components and finishes they want. Previously, this meant building physical prototypes that took months to complete.

Now, the powerful RTX A6000 has allowed the Predator team to provide customers with realistic bike renders long before they create physical prototypes. Clients can provide instant feedback on the photorealistic renders, which allows Predator Cycling to skip continuous iteration cycles and go from prototyping to testing fast.

Predator Cycling produces photorealistic bike renders in real time, saving weeks in their go-to-market timeline from producing physical prototypes. Image courtesy of Predator Cycling.

Predator Cycling has also notably improved internal workflows when it comes to running simulations, and they’ve seen performance gains of 2-6x across a number of applications such as Luxion KeyShot, ANSYS Discovery, ANSYS Fluent and Autodesk Fusion 360.

Discover More Breakthroughs at GTC

Learn about more advanced technologies in manufacturing and product design at the GPU Technology Conference, running from April 12-16. Predator Cycling (S31226) will be at GTC to share more details about their 3D print production experience with NVIDIA technologies.

Check out some featured GTC sessions by customers and partners below:

  • Rimac Automobili (E31424) will share their experience in accelerating the design of the world’s fastest electric car.
  • Polaris (S31512) will discuss locally streaming VR content for automotive design.
  • nTopology (S32033) will present a new approach to mechanical engineering with real-time feedback, thanks to GPU acceleration.
  • BMW (S31367) will share a simulation-first approach with leveraging robots for inspecting BMW vehicles.
  • AWS, Dassault Systèmes, NVIDIA and Renault Group (E31274) will discuss balancing peak rendering needs in the cloud and accurately simulating vehicle designs using a cluster of 4,000 GPUs.

And don’t miss our special keynote address on April 12, when NVIDIA CEO Jensen Huang will share exciting news and announcements.

Register now for free and explore other manufacturing sessions at GTC.

The post Racing Ahead, Predator Cycling Speeds Design and Development of Custom Bikes with Real-Time Rendering appeared first on The Official NVIDIA Blog.

Read More

In ‘Genius Makers’ Cade Metz Tells Tale of Those Behind the Unlikely Rise of Modern AI

Call it Moneyball for deep learning.

New York Times writer Cade Metz tells the funny, inspiring — and ultimately triumphant — tale of how a dogged group of AI researchers bet their careers on the long-dismissed technology of deep learning.

In his new book, Genius Makers: The Mavericks Who Brought AI to Google, Facebook, and the World, Metz reveals the human personalities behind the rise of AI, with a cast of well-known characters that includes Geoffrey Hinton, Yann Lecun, Yoshua Bengio and more.

The book begins with Metz’s favorite anecdote — how Hinton, a professor at the University of Toronto, and two students, Alex Kruszewski and Ilya Sutskever, moved into Google as large technology companies began to see the merits of AI.

Another fascinating story focuses on UC Berkeley professor Sergey Levine’s work on reinforcement learning at Google. He helped set up the “arm farm” — multiple robotic arms that learn how to successfully pick up items by trying and failing repeatedly.

Levine left the arms to do their work over the weekend, and came back Monday to what looked like a crime scene — one of the arms had failed to pick up lipstick correctly, resulting in what looked like bloodstains all over the room.

For more stories about the minds behind today’s technology, Genius Makers is out now.

Key Points From This Episode:

  • Metz’s book captures the nuance and contradictions within the AI community, where experts can have polarizing viewpoints on emerging technology. Metz gives the example of scientists Frank Rosenblatt and Marvin Minsky, who firmly disagreed on what potential neural networks held.
  • One recurring theme throughout Genius Makers is that of “old ideas or new” — a mantra at Hinton’s university lab. It represented his belief that, if a scientist believed an idea could work, even if it seemed like a slim chance, they should keep trying until proven wrong. It’s a belief that’s served him well throughout his long career.


“Part of my mission on earth is to show people that engineers, like my father, are real, interesting, fascinating people.” — Cade Metz [1:37]

“These technologies are continuing to progress at an incredibly fast rate and the questions that they raise have not been solved.” — Cade Metz [31:22]

You Might Also Like

Deep Learning Pioneer Andrew Ng on AI as the New Electricity

Purple shirts, haircuts and cats. How are these three all related? According to deep learning pioneer Andrew Ng, they all played a part in AI’s growing presence in our lives. Ng, formerly of Google and Baidu, shares his thoughts on AI being the new electricity.

NVIDIA Chief Scientist Bill Dally on Where AI Goes Next

Bill Dally, chief scientist at NVIDIA and one of the pillars of the computer science world, talks about his perspective on the world of deep learning and AI in general.

Investor, AI Pioneer Kai-Fu Lee on the Future of AI in the US, China

Kai-Fu Lee, developer of the world’s first speaker-independent continuous speech recognition system in 1988, has led teams at Apple, Silicon Graphics, Microsoft and Google. Lee also started Sinovation Ventures, managing a $2 billion fund focusing on tech startups in China and the U.S. He talks about his latest book, AI Superpowers: China, Silicon Valley, and the New World Order.

Tune in to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn. If your favorite isn’t listed here, drop us a note.

Tune in to the Apple Podcast Tune in to the Google Podcast Tune in to the Spotify Podcast

Make the AI Podcast Better

Have a few minutes to spare? Fill out this listener survey. Your answers will help us make a better podcast.

The post In ‘Genius Makers’ Cade Metz Tells Tale of Those Behind the Unlikely Rise of Modern AI appeared first on The Official NVIDIA Blog.

Read More

Contactless Sleep Sensing in Nest Hub

Posted by Michael Dixon, Software Engineer and Reena Singhal Lee, Product Manager, Google Health

People often turn to technology to manage their health and wellbeing, whether it is to record their daily exercise, measure their heart rate, or increasingly, to understand their sleep patterns. Sleep is foundational to a person’s everyday wellbeing and can be impacted by (and in turn, have an impact on) other aspects of one’s life — mood, energy, diet, productivity, and more.

As part of our ongoing efforts to support people’s health and happiness, today we announced Sleep Sensing in the new Nest Hub, which uses radar-based sleep tracking in addition to an algorithm for cough and snore detection. While not intended for medical purposes1, Sleep Sensing is an opt-in feature that can help users better understand their nighttime wellness using a contactless bedside setup. Here we describe the technologies behind Sleep Sensing and discuss how we leverage on-device signal processing to enable sleep monitoring (comparable to other clinical- and consumer-grade devices) in a way that protects user privacy.

Soli for Sleep Tracking
Sleep Sensing in Nest Hub demonstrates the first wellness application of Soli, a miniature radar sensor that can be used for gesture sensing at various scales, from a finger tap to movements of a person’s body. In Pixel 4, Soli powers Motion Sense, enabling touchless interactions with the phone to skip songs, snooze alarms, and silence phone calls. We extended this technology and developed an embedded Soli-based algorithm that could be implemented in Nest Hub for sleep tracking.

Soli consists of a millimeter-wave frequency-modulated continuous wave (FMCW) radar transceiver that emits an ultra-low power radio wave and measures the reflected signal from the scene of interest. The frequency spectrum of the reflected signal contains an aggregate representation of the distance and velocity of objects within the scene. This signal can be processed to isolate a specified range of interest, such as a user’s sleeping area, and to detect and characterize a wide range of motions within this region, ranging from large body movements to sub-centimeter respiration.

Soli spectrogram illustrating its ability to detect a wide range of motions, characterized as (a) an empty room (no variation in the reflected signal demonstrated by the black space), (b) large pose changes, (c) brief limb movements, and (d) sub-centimeter chest and torso displacements from respiration while at rest.

In order to make use of this signal for Sleep Sensing, it was necessary to design an algorithm that could determine whether a person is present in the specified sleeping area and, if so, whether the person is asleep or awake. We designed a custom machine-learning (ML) model to efficiently process a continuous stream of 3D radar tensors (summarizing activity over a range of distances, frequencies, and time) and automatically classify each feature into one of three possible states: absent, awake, and asleep.

To train and evaluate the model, we recorded more than a million hours of radar data from thousands of individuals, along with thousands of sleep diaries, reference sensor recordings, and external annotations. We then leveraged the TensorFlow Extended framework to construct a training pipeline to process this data and produce an efficient TensorFlow Lite embedded model. In addition, we created an automatic calibration algorithm that runs during setup to configure the part of the scene on which the classifier will focus. This ensures that the algorithm ignores motion from a person on the other side of the bed or from other areas of the room, such as ceiling fans and swaying curtains.

The custom ML model efficiently processes a continuous stream of 3D radar tensors (summarizing activity over a range of distances, frequencies, and time) to automatically compute probabilities for the likelihood of user presence and wakefulness (awake or asleep).

To validate the accuracy of the algorithm, we compared it to the gold-standard of sleep-wake determination, the polysomnogram sleep study, in a cohort of 33 “healthy sleepers” (those without significant sleep issues, like sleep apnea or insomnia) across a broad age range (19-78 years of age). Sleep studies are typically conducted in clinical and research laboratories in order to collect various body signals (brain waves, muscle activity, respiratory and heart rate measurements, body movement and position, and snoring), which can then be interpreted by trained sleep experts to determine stages of sleep and identify relevant events. To account for variability in how different scorers apply the American Academy of Sleep Medicine’s staging and scoring rules, our study used two board-certified sleep technologists to independently annotate each night of sleep and establish a definitive groundtruth.

We compared our Sleep Sensing algorithm’s outputs to the corresponding groundtruth sleep and wake labels for every 30-second epoch of time to compute standard performance metrics (e.g., sensitivity and specificity). While not a true head-to-head comparison, this study’s results can be compared against previously published studies in similar cohorts with comparable methodologies in order to get a rough estimate of performance. In “Sleep-wake detection with a contactless, bedside radar sleep sensing system”, we share the full details of these validation results, demonstrating sleep-wake estimation equivalent to or, in some cases, better than current clinical and consumer sleep tracking devices.

Aggregate performance from previously published accuracies for detection of sleep (sensitivity) and wake (specificity) of a variety of sleep trackers against polysomnography in a variety of different studies, accounting for 3,990 nights in total. While this is not a head-to-head comparison, the performance of Sleep Sensing on Nest Hub in a population of healthy sleepers who simultaneously underwent polysomnography is added to the figure for rough comparison. The size of each circle is a reflection of the number of nights and the inset illustrates the mean±standard deviation for the performance metrics.

Understanding Sleep Quality with Audio Sensing
The Soli-based sleep tracking algorithm described above gives users a convenient and reliable way to see how much sleep they are getting and when sleep disruptions occur. However, to understand and improve their sleep, users also need to understand why their sleep is disrupted. To assist with this, Nest Hub uses its array of sensors to track common sleep disturbances, such as light level changes or uncomfortable room temperature. In addition to these, respiratory events like coughing and snoring are also frequent sources of disturbance, but people are often unaware of these events.

As with other audio-processing applications like speech or music recognition, coughing and snoring exhibit distinctive temporal patterns in the audio frequency spectrum, and with sufficient data an ML model can be trained to reliably recognize these patterns while simultaneously ignoring a wide variety of background noises, from a humming fan to passing cars. The model uses entirely on-device audio processing with privacy-preserving analysis, with no raw audio data sent to Google’s servers. A user can then opt to save the outputs of the processing (sound occurrences, such as the number of coughs and snore minutes) in Google Fit, in order to view personal insights and summaries of their night time wellness over time.

The Nest Hub displays when snoring and coughing may have disturbed a user’s sleep (top) and can track weekly trends (bottom).

To train the model, we assembled a large, hand-labeled dataset, drawing examples from the publicly available AudioSet research dataset as well as hundreds of thousands of additional real-world audio clips contributed by thousands of individuals.

Log-Mel spectrogram inputs comparing cough (left) and snore (right) audio snippets.

When a user opts in to cough and snore tracking on their bedside Nest Hub, the device first uses its Soli-based sleep algorithms to detect when a user goes to bed. Once it detects that a user has fallen asleep, it then activates its on-device sound sensing model and begins processing audio. The model works by continuously extracting spectrogram-like features from the audio input and feeding them through a convolutional neural network classifier in order to estimate the probability that coughing or snoring is happening at a given instant in time. These estimates are analyzed over the course of the night to produce a report of the overall cough count and snoring duration and highlight exactly when these events occurred.

The new Nest Hub, with its underlying Sleep Sensing features, is a first step in empowering users to understand their nighttime wellness using privacy-preserving radar and audio signals. We continue to research additional ways that ambient sensing and the predictive ability of consumer devices could help people better understand their daily health and wellness in a privacy-preserving way.

This work involved collaborative efforts from a multidisciplinary team of software engineers, researchers, clinicians, and cross-functional contributors. Special thanks to D. Shin for his significant contributions to this technology and blogpost, and Dr. Logan Schneider, visiting sleep neurologist affiliated with the Stanford/VA Alzheimer’s Center and Stanford Sleep Center, whose clinical expertise and contributions were invaluable to continuously guide this research. In addition to the authors, key contributors to this research from Google Health include Jeffrey Yu, Allen Jiang, Arno Charton, Jake Garrison, Navreet Gill, Sinan Hersek, Yijie Hong, Jonathan Hsu, Andi Janti, Ajay Kannan, Mukil Kesavan, Linda Lei, Kunal Okhandiar‎, Xiaojun Ping, Jo Schaeffer, Neil Smith, Siddhant Swaroop, Bhavana Koka, Anupam Pathak, Dr. Jim Taylor, and the extended team. Another special thanks to Ken Mixter for his support and contributions to the development and integration of this technology into Nest Hub. Thanks to Mark Malhotra and Shwetak Patel for their ongoing leadership, as well as the Nest, Fit, Soli, and Assistant teams we collaborated with to build and validate Sleep Sensing on Nest Hub.

1 Not intended to diagnose, cure, mitigate, prevent or treat any disease or condition. 

Read More