Next-Generation Pose Detection with MoveNet and TensorFlow.js

Posted by Ronny Votel and Na Li, Google Research

Today we’re excited to launch our latest pose detection model, MoveNet, with our new pose-detection API in TensorFlow.js. MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. The model is offered on TF Hub with two variants, known as Lightning and Thunder. Lightning is intended for latency-critical applications, while Thunder is intended for applications that require high accuracy. Both models run faster than real time (30+ FPS) on most modern desktops, laptops, and phones, which proves crucial for live fitness, sports, and health applications. This is achieved by running the model completely client-side, in the browser using TensorFlow.js with no server calls needed after the initial page load and no dependencies to install.

Try out the live demo!

MoveNet can track keypoints through fast motions and atypical poses.
MoveNet can track keypoints through fast motions and atypical poses.

Human pose estimation has come a long way in the last five years, but surprisingly hasn’t surfaced in many applications just yet. This is because more focus has been placed on making pose models larger and more accurate, rather than doing the engineering work to make them fast and deployable everywhere. With MoveNet, our mission was to design and optimize a model that leverages the best aspects of state-of-the-art architectures, while keeping inference times as low as possible. The result is a model that can deliver accurate keypoints across a wide variety of poses, environments, and hardware setups.

Unlocking Live Health Applications with MoveNet

We teamed up with IncludeHealth, a digital health and performance company, to understand whether MoveNet can help unlock remote care for patients. IncludeHealth has developed an interactive web application that guides a patient through a variety of routines (using a phone, tablet, or laptop) from the comfort of their own home. The routines are digitally built and prescribed by physical therapists to test balance, strength, and range of motion.

The service requires web-based and locally run pose models for privacy that can deliver precise keypoints at high frame rates, which are then used to quantify and qualify human poses and movements. While a typical off-the-shelf detector is sufficient for easy movements such as shoulder abductions or full body squats, more complicated poses such as seated knee extensions or supine positions (laying down) cause grief for even state-of-the-art detectors trained on the wrong data.

Comparison of a traditional detector (top) vs MoveNet (bottom) on difficult poses.
Comparison of a traditional detector (top) vs MoveNet (bottom) on difficult poses.

We provided an early release of MoveNet to IncludeHealth, accessible through the new pose-detection API. This model is trained on fitness, dance, and yoga poses (see more details about the training dataset below). IncludeHealth integrated the model into their application and benchmarked MoveNet relative to other available pose detectors:

“The MoveNet model has infused a powerful combination of speed and accuracy needed to deliver prescriptive care. While other models trade one for the other, this unique balance has unlocked the next generation of care delivery. The Google team has been a fantastic collaborator in this pursuit.” – Ryan Eder, Founder & CEO at IncludeHealth.

As a next step, IncludeHealth is partnering with hospital systems, insurance plans, and the military to enable the extension of traditional care and training beyond brick and mortar.

IncludeHealth demo application running in browser that quantifies balance and motion using keypoint estimation powered by MoveNet and TensorFlow.js
IncludeHealth demo application running in browser that quantifies balance and motion using keypoint estimation powered by MoveNet and TensorFlow.js

Installation

There are two ways to use MoveNet with the new pose-detection api:

  1. Through NPM:
    import * as poseDetection from '@tensorflow-models/pose-detection';
  2. Through script tag:
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/pose-detection"></script>

Try it yourself!

Once the package is installed, you only need to follow the few steps below to start using it:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet);

The detector defaults to use the Lightning version; to choose the Thunder version, create the detector as below:

// Create a detector.
const detector = await poseDetection.createDetector(poseDetection.SupportedModels.MoveNet, {modelType: poseDetection.movenet.modelType.SINGLEPOSE_THUNDER});
// Pass in a video stream to the model to detect poses.
const video = document.getElementById('video');
const poses = await detector.estimatePoses(video);

Each pose contains 17 keypoints, with absolute x, y coordinates, confidence score and name:

console.log(poses[0].keypoints);
// Outputs:
// [
// {x: 230, y: 220, score: 0.9, name: "nose"},
// {x: 212, y: 190, score: 0.8, name: "left_eye"},
// ...
// ]

Refer to our README for more details about the API.

As you begin to play and develop with MoveNet, we would appreciate your feedback and contributions. If you make something using this model, tag it with #MadeWithTFJS on social so we can find your work, as we would love to see what you create.

MoveNet Deep Dive

MoveNet Architecture

MoveNet is a bottom-up estimation model, using heatmaps to accurately localize human keypoints. The architecture consists of two components: a feature extractor and a set of prediction heads. The prediction scheme loosely follows CenterNet, with notable changes that improve both speed and accuracy. All models are trained using the TensorFlow Object Detection API.

The feature extractor in MoveNet is MobileNetV2 with an attached feature pyramid network (FPN), which allows for a high resolution (output stride 4), semantically rich feature map output. There are four prediction heads attached to the feature extractor, responsible for densely predicting a:

  • Person center heatmap: predicts the geometric center of person instances
  • Keypoint regression field: predicts full set of keypoints for a person, used for grouping keypoints into instances
  • Person keypoint heatmap: predicts the location of all keypoints, independent of person instances
  • 2D per-keypoint offset field: predicts local offsets from each output feature map pixel to the precise sub-pixel location of each keypoint
MoveNet architecture
MoveNet architecture

Although these predictions are computed in parallel, one can gain insight into the model’s operation by considering the following sequence of operations:

Step 1: The person center heatmap is used to identify the centers of all individuals in the frame, defined as the arithmetic mean of all keypoints belonging to a person. The location with the highest score (weighted by the inverse-distance from the frame center) is selected.

Step 2: An initial set of keypoints for the person is produced by slicing the keypoint regression output from the pixel corresponding to the object center. Since this is a center-out prediction – which must operate over different scales – the quality of regressed keypoints will not be very accurate.

Step 3: Each pixel in the keypoint heatmap is multiplied by a weight which is inversely proportional to the distance from the corresponding regressed keypoint. This ensures that we do not accept keypoints from background people, since they typically will not be in the proximity of regressed keypoints, and hence will have low resulting scores.

Step 4: The final set of keypoint predictions are selected by retrieving the coordinates of the maximum heatmap values in each keypoint channel. The local 2D offset predictions are then added to these coordinates to give refined estimates. See the figure below which illustrates these four steps.

MoveNet post-processing steps
MoveNet post-processing steps.

Training Datasets

MoveNet was trained on two datasets: COCO and an internal Google dataset called Active. While COCO is the standard benchmark dataset for detection – due to its scene and scale diversity – it is not suitable for fitness and dance applications, which exhibit challenging poses and significant motion blur. Active was produced by labeling keypoints (adopting COCO’s standard 17 body keypoints) on yoga, fitness, and dance videos from YouTube. No more than three frames are selected from each video for training, to promote diversity of scenes and individuals.

Evaluations on the Active validation dataset show a significant performance boost relative to identical architectures trained using only COCO. This isn’t surprising since COCO infrequently exhibits individuals with extreme poses (e.g. yoga, pushups, headstands, and more).

To learn more about the dataset and how MoveNet performs across different categories, please see the model card.

Images from Active keypoint dataset.
Images from Active keypoint dataset.

Optimization

While a lot of effort went into architecture design, post-processing logic, and data selection to make MoveNet a high-quality detector, an equal focus was given to inference speed. First, bottleneck layers from MobileNetV2 were selected for lateral connections in the FPN. Likewise, the number of convolution filters in each prediction head were slimmed down significantly to speed up execution on the output feature maps. Depthwise separable convolutions are used throughout the network, except in the first MobileNetV2 layer.

MoveNet was repeatedly profiled, uncovering and removing particularly slow ops. For example, we replaced tf.math.top_k with tf.math.argmax, since it executes significantly faster and is adequate for the single-person setting.

To ensure fast execution with TensorFlow.js, all model outputs were packed into a single output tensor, so that there is only one download from GPU to CPU.

Perhaps the most significant speedup is the use of 192×192 inputs to the model (256×256 for Thunder). To counteract the lower resolution, we apply intelligent cropping based on detections from the previous frame. This allows the model to devote its attention and resources to the main subject, and not the background.

Temporal Filtering

Operating on a high FPS camera stream provides the luxury of applying smoothing to keypoint estimates. Both Lightning and Thunder apply a robust, non-linear filter to the incoming stream of keypoint predictions. This filter is tuned to simultaneously suppress high-frequency noise (i.e. jitter) and outliers from the model, while also maintaining high-bandwidth throughput during quick motions. This leads to smooth keypoint visualizations with minimal lag in all circumstances.

MoveNet Browser Performance

To quantify the inference speed of MoveNet, the model was benchmarked across multiple devices. The model latency (expressed in FPS) was measured on GPU with WebGL, as well as WebAssembly (WASM), which is the typical backend for devices with lower-end or no GPUs.

MacBook Pro 15” 2019. 

Intel core i9. 

AMD Radeon Pro Vega 20 Graphics.

(FPS)

iPhone 12

(FPS)

Pixel 5

(FPS)

Desktop 

Intel i9-10900K. Nvidia GTX 1070 GPU.

(FPS)

WebGL

104  |  77

51  |  43

34  |  12

87  |  82

WASM 

with SIMD + Multithread

42  |  21

N/A

N/A

71  |  30

Inference speed of MoveNet across different devices and TF.js backends. The first number in each cell is for Lightning, and the second number is for Thunder.

TF.js continuously optimizes its backends to accelerate model execution across all supported devices. We applied several techniques here to help the models achieve this performance, such as implementing a packed WebGL kernel for the depthwise separable convolutions and improving GL scheduling for mobile Chrome.

To see the model’s FPS on your device, try our demo. You can switch the model type and backends live in the demo UI to see what works best for your device.

Looking Ahead

The next step is to extend Lightning and Thunder models to the multi-person domain, so that developers can support applications with multiple people in the camera field-of-view.

We also have plans to speed up the TensorFlow.js backends to make model execution even faster. This is achieved through repeated benchmarking and backend optimization.

Acknowledgements

We would like to acknowledge the other contributors to MoveNet: Yu-Hui Chen, Ard Oerlemans, Francois Belletti, Andrew Bunner, and Vijay Sundaram, along with those involved with the TensorFlow.js pose-detection API: Ping Yu, Sandeep Gupta, Jason Mayes, and Masoud Charkhabi.

Read More

Building a TinyML Application with TF Micro and SensiML

A guest post by Chris Knorowski, SensiML CTO

TinyML reduces the complexity of adding AI to the edge, enabling new applications where streaming data back to the cloud is prohibitive. Some examples of applications that are making use of TinyML right now are :

  • Visual and audio wake words that trigger an action when a person is detected in an image or a keyword is spoken .
  • Predictive maintenance on industrial machines using sensors to continuously monitor for anomalous behavior.
  • Gesture and activity detection for medical, consumer, and agricultural devices, such as gait analysis, fall detection or animal health monitoring.

One common factor for all these applications is the low cost and power usage of the hardware they run on. Sure, we can detect audio and visual wake words or analyze sensor data for predictive maintenance on a desktop computer. But, for a lot of these applications to be viable, the hardware needs to be inexpensive and power efficient (so it can run on batteries for an extended time).

Fortunately, the hardware is now getting to the point where running real-time analytics is possible. It is crazy to think about, but the Arm Cortex-M4 processor can do more FFT’s per second than the Pentium 4 processor while using orders of magnitude less power. Similar gains in power/performance have been made in sensors and wireless communication. TinyML allows us to take advantage of these advances in hardware to create all sorts of novel applications that simply were not possible before.

At SensiML our goal is to empower developers to rapidly add AI to their own edge devices, allowing their applications to autonomously transform raw sensor data into meaningful insight. We have taken years of lessons learned in creating products that rely on edge optimized machine learning and distilled that knowledge into a single framework, the SensiML Analytics Toolkit, which provides an end-to-end development platform spanning data collection, labeling, algorithm development, firmware generation, and testing.

So what does it take to build a TinyML application?

Building a TinyML application touches on skill sets ranging from hardware engineering, embedded programming, software engineering, machine learning, data science and domain expertise about the application you are building. The steps required to build the application can be broken into four parts:

  1. Collecting and annotating data
  2. Applying signal preprocessing
  3. Training a classification algorithm
  4. Creating firmware optimized for the resource budget of an edge device

This tutorial will walk you through all the steps, and by the end of it you will have created an edge optimized TinyML application for the Arduino Nano 33 BLE Sense that is capable of recognizing different boxing punches in real-time using the Gyroscope and Accelerometer sensor data from the onboard IMU sensor.

Gesture recognition using TinyML. Male punching a punching bag

What you need to get started

We will use the SensiML Analytics Toolkit to handle collecting and annotating sensor data, creating a sensor preprocessing pipeline, and generating the firmware. We will use TensorFlow to train our machine learning model and TensorFlow Lite Micro for inferencing. Before you start, we recommend signing up for SensiML Community Edition to get access to the SensiML Analytics Toolkit.

The Software

The Hardware

  • Arduino Nano 33 BLE Sense
  • Adafruit Li-Ion Backpack Add-On (optional)
  • Lithium-Ion Polymer Battery ( 3.7v 100mAh)
  • Zebra Byte Case
  • Glove and Double Sided Tape

The Arduino Nano 33 BLE Sense has an Arm Cortex-M4 microcontroller running at 64 MHz with 1MB Flash memory and 256 KB of RAM. If you are used to working with cloud/mobile this may seem tiny, but many applications can run in such a resource-constrained environment.

The Nano 33 BLE Sense also has a variety of onboard sensors which can be used in your TinyML applications. For this tutorial, we are using the motion sensor which is a 9-axis IMU (accelerometer, gyroscope, magnetometer).

For wireless power, we used the Adafruit Li-Ion Battery Pack. If you do not have the battery pack, you can still walk through this tutorial using a suitably long micro USB cable to power the board. Though collecting gesture data is not quite as fun when you are wired. See the images below hooking up the battery to the Nano 33 BLE Sense.

Nano 33 BLE Sense
battery connected to boxing glove

Building Your Data Set

For every machine learning project, the quality of the final product depends on the quality of your data set. Time-series data, unlike image and audio, are typically unique to each application. Because of this, you often need to collect and annotate your datasets. The next part of this tutorial will walk you through how to connect to the Nano 33 BLE Sense to stream data wirelessly over BLE as well as label the data so it can be used to train a TensorFlow model.

For this project we are going to collect data for 5 different gestures as well as some data for negative cases which we will label as Unknown. The 5 boxing gestures we are going to collect data for are Jab, Overhand, Cross, Hook, and Uppercut.

boxing gestures

We will also collect data on both the right and left glove. Giving us a total of 10 different classes. To simplify things we will build two separate models one for the right glove, and one for the left. This tutorial will focus on the left glove.

Streaming sensor data from the Nano 33 over BLE

The first challenge of a TinyML project is often to figure out how to get data off of the sensor. Depending on your needs you may choose Wi-Fi, BLE, Serial, or LoRaWAN. Alternatively, you may find storing data to an internal SD card and transferring the files after is the best way to collect data. For this tutorial, we will take advantage of the onboard BLE radio to stream sensor data from the Nano 33 BLE Sense.

We are going to use the SensiML Open Gateway running on our computer to retrieve the sensor data. To download and launch the gateway open a terminal and run the following commands:

git clone https://github.com/sensiml/open-gateway

cd open-gateway
pip3 install -r requirements.txt
python3 app.py

The gateway should now be running on your machine.

Gateway

Next, we need to connect the gateway server to the Nano 33 BLE Sense. Make sure you have flashed the Data Collection Firmware to your Nano 33. This firmware implements the Simple Streaming Interface specification which creates two topics used for streaming data. The /config topic returns a JSON describing the sensor data and /stream topic streams raw sensor data as a byte array of Int16 values.

To configure the gateway to connect to your sensor:

  1. Go to the gateway address in your browser (defaults to localhost:5555)
  2. Click on the Home Tab
  3. Set Device Mode: Data Capture
  4. Set Connection Type: BLE
  5. Click the Scan button, and select the device named Nano 33 DCL
  6. Click the Connect to Device button
SensiML Gateway

The gateway will pull the configuration from your device, and be ready to start forwarding sensor data. You can verify it is working by going to the Test Stream tab and clicking the Start Stream button.

Setting up the Data Capture Lab Project

Now that we can stream data, the next step is to record and label the boxing gestures. To do that we will use the SensiML Data Capture Lab. If you haven’t already done so, download and install the Data Capture Lab to record sensor data.

We have created a template project to get you started. The project is prepopulated with the gesture labels and metadata information, along with some pre-recorded example gestures files. To add this project to your account:

  1. Download and unzip the Boxing Glove Gestures Demo Project
  2. Open the Data Capture Lab
  3. Click Upload Project
  4. Click Browse which will open the file explorer window
  5. Navigate to the Boxing Glove Gestures Demo folder you just unzipped and select the Boxing Glove Gestures Demo.dclproj file
  6. Click Upload
SensiML Data Capture Lab

Connecting to the Gateway

After uploading the project, you can start capturing sensor data. For this tutorial we will be streaming data to the Data Capture Lab from the gateway over TCP/IP. To connect to the Nano 33 BLE Sense from the Data Capture Lab through the gateway:

  1. Open the Project Boxing Glove Gestures Demo
  2. Click Switch Modes -> Capture Mode
  3. Select Connection Method: Wi-Fi
  4. Click the Find Devices button
  5. Enter the IP Address of your gateway machine, and the port the server is running on (typically 127.0.0.1:5555)
  6. Click Add Device
  7. Select the newly added device
  8. Click the Connect button
wi-fi connection

You should see sensor data streaming across the screen. If you are having trouble with this step, see the full documentation here for troubleshooting.

boxing gesture data streaming

Capturing Boxing Gesture Sensor Data

The Data Capture Lab can also play videos that have been recorded alongside your sensor data. If you want to capture videos and sync them up with sensor data see the documentation here. This can be extremely helpful during the annotation phase to help interpret what is happening at a given point in the time-series sensor waveforms.

Now that data is streaming into the Data Capture Lab, we can begin capturing our gesture data set.

  1. Select “Jab” from the Label dropdown in the Capture Properties screen. (this will be the name of the file)
  2. Select the Metadata which captures the context (subject, glove, experience, etc.)
  3. Then click the Begin Recording button to start recording the sensor data
  4. Perform several “Jab” gestures
  5. Click the Stop Recording button when you are finished

After you hit stop recording, the captured data will be saved locally and synced with the cloud project. You can view the file by going to the Project Explorer and double-clicking on the newly created file.

GIF showing male boxing while program collects data

The following video walks through capturing sensor data.

Annotating Sensor Data

To classify sensor data in real-time, you need to decide how much and which portion of the sensor stream to feed to the classifier. On edge devices, it gets even more difficult as you are limited to a small buffer of data due to the limited RAM. Identifying the right segmentation algorithm for an application can save on battery life by limiting the number of classifications performed as well as improving the accuracy by identifying the start and end of a gesture.

Segmentation algorithms work by taking the input from the sensor and buffering the data until they determine a new segment has been found. At that point, they pass the data buffer down to the result of the pipeline. The simplest segmentation algorithm is a sliding window, which continually feeds a set chunk of data to the classifier. However, there are many drawbacks to the sliding window for discrete gesture recognition, such as performing classifications when there are no events. This wastes battery and runs the risk of having events split across multiple windows which can lower accuracy.

Segmenting in the Data Capture Lab

We identify events in the Data Capture Lab by creating Segments around the events in your sensor data. Segments are displayed with a pair of blue and red lines when you open a file and define where an event is located.

The Data Capture Lab has two methods for labeling your events: Manual and Auto. In manual mode you can manually drag and drop a segment onto the graph to identify an event in your sensor data. Auto mode uses a segmentation algorithm to automatically detect events based on customizable parameters. For this tutorial, we are going to use a segmentation algorithm in Auto mode. The segmentation algorithms we use for determining events will also be compiled as part of the firmware so that the on-device model will be fed the same segments of data it was trained against.

We have already created a segmentation algorithm for this project based on the dataset we have collected so far. To perform automatic event detection on newly captured data file:

  1. Select the file from the Project Explorer
  2. Click on the Detect Segments button
  3. The segmentation algorithm will be run against the capture and the segments it finds will be added to the file
GIF of auto-segmentation

Note: If the events are not matching the real segments in your file, you may need to adjust the parameters of the segmentation algorithm.

Labeling Events in the Data Capture Lab

Keep in mind that automatic event detection only detects that an event has occurred, it does not determine what type of event has occurred. For each event that was detected, you will need to apply a label to them. To do that:

  1. Select one or more of the segments from the graph
  2. Click the Edit button or (Ctrl+E)
  3. Specify which label is associated with that event
  4. Repeat steps 1-3 for all segments in the capture
  5. Click Save
GIF labeling data

Building a TinyML Model

We are going to use Google Colab to train our machine learning model using the data we collected from the Nano 33 BLE Sense in the previous section. Colab provides a Jupyter notebook that allows us to run our TensorFlow training in a web browser. Open the Google Colab notebook and follow along to train your model.

Offline Model Validation

After saving the model, go to the Analytic Studio to perform offline validation. To test the model against any of the captured data files

  1. Open the Boxing Glove Gestures Demo project in the Summary Tab
    SensiML analytics studio
  2. Go to Test Model Tab
  3. Select your model from the Model Name dropdown
  4. Select one or more of the capture files by clicking on them
  5. Click the Compute Accuracy Button to classify the captures using the selected model
compute accuracy

p>When you click the Compute Accuracy button, the segmentation algorithm, preprocessing steps, and TensorFlow model are compiled into a single Knowledge Pack. Then the classification results and accuracy for each of the captures you selected are computed using the compiled Knowledge Pack. Click the Results button for the individual capture to see the classifications for all of the detected events and how they compared with the ground truth labels.

Deploy and Test on the Nano 33 BLE Sense

Downloading the model as firmware

Now that you validated the model offline, it’s time to see how it performs at the edge. To do that we download and flash the model to the Nano 33 BLE Sense.

  1. Go to the Download Model tab of the Analytics Studio
  2. Select the HW Platform: Arduino CortexM4
  3. Select Format: Library
  4. Click the Download button
  5. The compiled library file should download to your computer
download knowledge pack

Flashing the Firmware

After downloading the library, we will build and upload the firmware to the Nano 33 BLE Sense. For this step, you will need the Nano 33 Knowledge Pack Firmware. To compile the firmware, we are using Visual Studio Code with the Platform IO plugin. To compile your model and Flash the Nano 33 BLE Sense with this firmware:

  1. Open your terminal and run
    git clone https://github.com/sensiml/nano33_knowledge_pack/
  2. Unzip the downloaded Knowledge Pack.
  3. In the folder, you will find the following directories:

    knowledgepack_project/

    libsensiml/

  4. Copy the files from libsensiml to nano33_knowledge_pack/lib/sensiml. You will overwrite the files included in the repository.
  5. Copy the files from knowledgepack_project to nano33_knowledge_pack/src/
    Copy the files from knowledgepack_project to nano33_knowledge_pack/src/
  6. Switch to the Platform I/O extension tab in VS Code
  7. Connect your Nano 33 BLE Sense to your computer using the micro USB cable.
  8. Click Upload and Monitor under the nano33ble_with_tensorflow in the PlatformI/O tab.
    Upload and Monitor tab

When the device restarts, it will boot up and your model will be running automatically. The video below walks through these steps.

Viewing Classification Results

To see the classification results in real-time connect to your device over BLE using the Android TestApp or the SensiML Open Gateway. The device will show up with the name Nano33 SensiML KP when you scan for devices. We have trained two models, one for the left glove and one for the right glove. You can see a demo of both models running at the same time in the following video.

Conclusion

We hope this blog has given you the tools you need to start building an end-to-end TinyML application using TensorFlow Lite For Microcontrollers and the SensiML Analytics Toolkit. For more tutorials and examples of TinyML applications checkout the application examples in our documentation. Follow us on LinkedIn or get in touch with us, we love hearing about all of the amazing TinyML applications the community is working on!

Read More

Using TFX inference with Dataflow for large scale ML inference patterns

Posted by Reza Rokni, Snr Staff Developer Advocate

In part I of this blog series we discussed best practices and patterns for efficiently deploying a machine learning model for inference with Google Cloud Dataflow. Amongst other techniques, it showed efficient batching of the inputs and the use of shared.py to make efficient use of a model.

In this post, we walk through the use of the RunInference API from tfx-bsl, a utility transform from TensorFlow Extended (TFX), which abstracts us away from manually implementing the patterns described in part I. You can use RunInference to simplify your pipelines and reduce technical debt when building production inference pipelines in batch or stream mode.

The following four patterns are covered:

  • Using RunInference to make ML prediction calls.
  • Post-processing RunInference results. Making predictions is often the first part of a multistep flow, in the business process. Here we will process the results into a form that can be used downstream.
  • Attaching a key. Along with the data that is passed to the model, there is often a need for an identifier — for example, an IOT device ID or a customer identifier — that is used later in the process even if it’s not used by the model itself. We show how this can be accomplished.
  • Inference with multiple models in the same pipeline.Often you may need to run multiple models within the same pipeline, be it in parallel or as a sequence of predict – process – predict calls. We walk through a simple example.

Creating a simple model

In order to illustrate these patterns, we’ll use a simple toy model that will let us concentrate on the data engineering needed for the input and output of the pipeline. This model will be trained to approximate multiplication by the number 5.

Please note the following code snippets can be run as cells within a notebook environment.

Step 1 – Set up libraries and imports

%pip install tfx_bsl==0.29.0 --quiet
import argparse

import tensorflow as tf
from tensorflow import keras
from tensorflow_serving.apis import prediction_log_pb2

import apache_beam as beam
import tfx_bsl
from tfx_bsl.public.beam import RunInference
from tfx_bsl.public import tfxio
from tfx_bsl.public.proto import model_spec_pb2

import numpy

from typing import Dict, Text, Any, Tuple, List

from apache_beam.options.pipeline_options import PipelineOptions

project = "<your project>"
bucket = "<your bucket>"

save_model_dir_multiply = f'gs://{bucket}/tfx-inference/model/multiply_five/v1/'
save_model_dir_multiply_ten = f'gs://{bucket}/tfx-inference/model/multiply_ten/v1/'

Step 2 – Create the example data

In this step we create a small dataset that includes a range of values from 0 to 99 and labels that correspond to each value multiplied by 5.

"""
Create our training data which represents the 5 times multiplication table for 0 to 99. x is the data and y the labels.

x is a range of values from 0 to 99.
y is a list of 5x

value_to_predict includes a values outside of the training data
"""
x = numpy.arange(0, 100)
y = x * 5

Step 3 – Create a simple model, compile, and fit it

"""
Build a simple linear regression model.
Note the model has a shape of (1) for its input layer, it will expect a single int64 value.
"""
input_layer = keras.layers.Input(shape=(1), dtype=tf.float32, name='x')
output_layer= keras.layers.Dense(1)(input_layer)

model = keras.Model(input_layer, output_layer)
model.compile(optimizer=tf.optimizers.Adam(), loss='mean_absolute_error')
model.summary()

Let’s teach the model about multiplication by 5.

model.fit(x, y, epochs=2000)

Next, check how well the model performs using some test data.

value_to_predict = numpy.array([105, 108, 1000, 1013], dtype=numpy.float32)
model.predict(value_to_predict)

From the results below it looks like this simple model has learned its 5 times table close enough for our needs!

OUTPUT: 

array([[ 524.9939],
[ 539.9937],
[4999.935 ],
[5064.934 ]], dtype=float32)

Step 4 – Convert the input to tf.example

In the model we just built, we made use of a simple list to generate the data and pass it to the model. In this next step we make the model more robust by using tf.example objects in the model training.

tf.example is a serializable dictionary (or mapping) from names to tensors, which ensures the model can still function even when new features are added to the base examples. Making use of tf.example also brings with it the benefit of having the data be portable across models in an efficient, serialized format.

To use tf.example for this example, we first need to create a helper class, ExampleProcessor, that is used to serialize the data points.

class ExampleProcessor:

def create_example_with_label(self, feature: numpy.float32,
label: numpy.float32)-> tf.train.Example:
return tf.train.Example(
features=tf.train.Features(
feature={'x': self.create_feature(feature),
'y' : self.create_feature(label)
}))

def create_example(self, feature: numpy.float32):
return tf.train.Example(
features=tf.train.Features(
feature={'x' : self.create_feature(feature)})
)

def create_feature(self, element: numpy.float32):
return tf.train.Feature(float_list=tf.train.FloatList(value=[element]))

Using the ExampleProcess class, the in-memory list can now be moved to disk.

# Create our labeled example file for 5 times table

example_five_times_table = 'example_five_times_table.tfrecord'

with tf.io.TFRecordWriter(example_five_times_table) as writer:
for i in zip(x, y):
example = ExampleProcessor().create_example_with_label(
feature=i[0], label=i[1])
writer.write(example.SerializeToString())

# Create a file containing the values to predict

predict_values_five_times_table = 'predict_values_five_times_table.tfrecord'

with tf.io.TFRecordWriter(predict_values_five_times_table) as writer:
for i in value_to_predict:
example = ExampleProcessor().create_example(feature=i)
writer.write(example.SerializeToString())

With the new examples stored in TFRecord files on disk, we can use the Dataset API to prepare the data so it is ready for consumption by the model.

RAW_DATA_TRAIN_SPEC = {
'x': tf.io.FixedLenFeature([], tf.float32),
'y': tf.io.FixedLenFeature([], tf.float32)
}

RAW_DATA_PREDICT_SPEC = {
'x': tf.io.FixedLenFeature([], tf.float32),
}

With the feature spec in place, we can train the model as before.

dataset = tf.data.TFRecordDataset(example_five_times_table) 
dataset = dataset.map(lambda e : tf.io.parse_example(e, RAW_DATA_TRAIN_SPEC))
dataset = dataset.map(lambda t : (t['x'], t['y']))
dataset = dataset.batch(100)
dataset = dataset.repeat()
model.fit(dataset, epochs=500, steps_per_epoch=1)

Note that these steps would be done automatically for us if we had built the model using a TFX pipeline, rather than hand-crafting the model as we did here.

Step 5 – Save the model

Now that we have a model, we need to save it for use with the RunInference transform. RunInference accepts TensorFlow saved model pb files as part of its configuration. The saved model file must be stored in a location that can be accessed by the RunInference transform. In a notebook this can be the local file system; however, to run the pipeline on Dataflow, the file will need to be accessible by all the workers, so here we use a GCP bucket.

Note that the gs:// schema is directly supported by the tf.keras.models.save_model api.

tf.keras.models.save_model(model, save_model_dir_multiply)

During development it’s useful to be able to inspect the contents of the saved model file. For this, we use the saved_model_cli that comes with TensorFlow. You can run this command from a cell:

!saved_model_cli show --dir {save_model_dir_multiply} --all

Abbreviated output from the saved model file is shown below. Note the signature def 'serving_default', which accepts a tensor of float type. We will change this to accept another type in the next section.

OUTPUT: 

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['example'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: serving_default_example:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

RunInference will pass a serialized tf.example to the model rather than a tensor of float type as seen in the current signature. To accomplish this we have one more step to prepare the model: creation of a specific signature.

Signatures are a powerful feature as they enable us to control how calling programs interact with the model. From the TensorFlow documentation:

The optional signatures argument controls which methods in obj will be available to programs which consume SavedModels, for example, serving APIs. Python functions may be decorated with @tf.function(input_signature=…) and passed as signatures directly, or lazily with a call to get_concrete_function on the method decorated with @tf.function.

In our case, the following code will create a signature that accepts a tf.string data type with a name of ‘examples’. This signature is then saved with the model, which replaces the previous saved model.

@tf.function(input_signature=[tf.TensorSpec(shape=[None], dtype=tf.string , name='examples')])
def serve_tf_examples_fn(serialized_tf_examples):
"""Returns the output to be used in the serving signature."""
features = tf.io.parse_example(serialized_tf_examples, RAW_DATA_PREDICT_SPEC)
return model(features, training=False)

signature = {'serving_default': serve_tf_examples_fn}

tf.keras.models.save_model(model, save_model_dir_multiply, signatures=signature)

If you run the saved_model_cli command again, you will see that the input signature has changed to DT_STRING.

Pattern 1: RunInference for Predictions

Step 1 – Use RunInference within the pipeline

Now that the model is ready, the RunInference transform can be plugged into an Apache Beam pipeline. The pipeline below uses TFXIO TFExampleRecord, which it converts to a transform via RawRecordBeamSource(). The saved model location and signature are passed to the RunInference API as a SavedModelSpec configuration object.

pipeline = beam.Pipeline()

tfexample_beam_record = tfx_bsl.public.tfxio.TFExampleRecord(file_pattern=predict_values_five_times_table)

with pipeline as p:
_ = (p | tfexample_beam_record.RawRecordBeamSource()
| RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
| beam.Map(print)
)

Note:

You can perform two types of inference using RunInference:

  • In-process inference from a SavedModel instance. Used when the saved_model_spec field is set in inference_spec_type.
  • Remote inference by using a service endpoint. Used when the ai_platform_prediction_model_spec field is set in inference_spec_type.

Below is a snippet of the output. The values here are a little difficult to interpret as they are in their raw unprocessed format. In the next section the raw results are post-processed.

OUTPUT: 

predict_log {
request {
model_spec { signature_name: "serving_default" }
inputs {
key: "examples"
...
string_val: "n22n20n07example22053203n01i"
...
response {
outputs {
key: "output_0"
value {
...
float_val: 524.993896484375

Pattern 2: Post-processing RunInference results

The RunInference API returns a PredictionLog object, which contains the serialized input and the output from the call to the model. Having access to both the input and output enables you to create a simple tuple during post-processing for use downstream in the pipeline. Also worthy of note is that RunInference will consider the amenable-to-batching capability of the model (and does batch inference for performance purposes) transparently for you.

The PredictionProcessor beam.DoFn takes the output of RunInference and produces formatted text with the questions and answers as output. Of course in a production system, the output would more normally be a Tuple[input, output], or simply the output depending on the use case.

class PredictionProcessor(beam.DoFn):

def process(
self,
element: prediction_log_pb2.PredictionLog):
predict_log = element.predict_log
input_value = tf.train.Example.FromString(predict_log.request.inputs['examples'].string_val[0])
output_value = predict_log.response.outputs
yield (f"input is [{input_value.features.feature['x'].float_list.value}] output is {output_value['output_0'].float_val}");

pipeline = beam.Pipeline()

tfexample_beam_record = tfx_bsl.public.tfxio.TFExampleRecord(file_pattern=predict_values_five_times_table)

with pipeline as p:
_ = (p | tfexample_beam_record.RawRecordBeamSource()
| RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
| beam.ParDo(PredictionProcessor())
| beam.Map(print)
)

Now the output contains both the original input and the model’s output values.

OUTPUT: 

input is [[105.]] output is [523.6328735351562]
input is [[108.]] output is [538.5157470703125]
input is [[1000.]] output is [4963.6787109375]
input is [[1013.]] output is [5028.1708984375]

Pattern 3: Attaching a key

One useful pattern is the ability to pass information, often a unique identifier, with the input to the model and have access to this identifier from the output. For example, in an IOT use case you could associate a device id with the input data being passed into the model. Often this type of key is not useful for the model itself and thus should not be passed into the first layer.

RunInference takes care of this for us, by accepting a Tuple[key, value] and outputting Tuple[key, PredictLog]

Step 1 – Create a source with attached key

Since we need a key with the data that we are sending in for prediction, in this step we create a table in BigQuery, which has two columns: One holds the key and the second holds the test value.

CREATE OR REPLACE TABLE
maths.maths_problems_1 ( key STRING OPTIONS(description="A unique key for the maths problem"),
value FLOAT64 OPTIONS(description="Our maths problem" ) );
INSERT INTO
maths.maths_problems_1
VALUES
( "first_question", 105.00),
( "second_question", 108.00),
( "third_question", 1000.00),
( "fourth_question", 1013.00)

Step 2 – Modify post processor and pipeline

In this step we:

  • Modify the pipeline to read from the new BigQuery source table
  • Add a map transform, which converts a table row into a Tuple[ bytes, Example]
  • Modify the post inference processor to output results along with the key
class PredictionWithKeyProcessor(beam.DoFn):

def __init__(self):
beam.DoFn.__init__(self)

def process(
self,
element: Tuple[bytes, prediction_log_pb2.PredictionLog]):
predict_log = element[1].predict_log
input_value = tf.train.Example.FromString(predict_log.request.inputs['examples'].string_val[0])
output_value = predict_log.response.outputs
yield (f"key is {element[0]} input is {input_value.features.feature['x'].float_list.value} output is { output_value['output_0'].float_val[0]}" )

pipeline_options = PipelineOptions().from_dictionary({'temp_location':f'gs://{bucket}/tmp'})
pipeline = beam.Pipeline(options=pipeline_options)

with pipeline as p:
_ = (p | beam.io.gcp.bigquery.ReadFromBigQuery(table=f'{project}:maths.maths_problems_1')
| beam.Map(lambda x : (bytes(x['key'], 'utf-8'), ExampleProcessor().create_example(numpy.float32(x['value']))))
| RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(model_path=save_model_dir_multiply)))
| beam.ParDo(PredictionWithKeyProcessor())
| beam.Map(print)
)
key is b'first_question' input is [105.] output is 524.0875854492188
key is b'second_question' input is [108.] output is 539.0093383789062
key is b'third_question' input is [1000.] output is 4975.75830078125
key is b'fourth_question' input is [1013.] output is 5040.41943359375

Pattern 4: Inference with multiple models in the same pipeline

In part I of the series, the “join results from multiple models” pattern covered the various branching techniques in Apache Beam that make it possible to run data through multiple models.

Those techniques are applicable to RunInference API, which can easily be used by multiple branches within a pipeline, with the same or different models. This is similar in function to cascade ensembling, although here the data flows through multiple models in a single Apache Beam DAG.

Inference with multiple models in parallel

In this example, the same data is run through two different models: the one that we’ve been using to multiply by 5 and a new model, which will learn to multiply by 10.

Example of data being ran through 2 different models
"""
Create multiply by 10 table.

x is a range of values from 0 to 100.
y is a list of x * 10

value_to_predict includes a values outside of the training data
"""
x = numpy.arange( 0, 1000)
y = x * 10

# Create our labeled example file for 10 times table

example_ten_times_table = 'example_ten_times_table.tfrecord'

with tf.io.TFRecordWriter( example_ten_times_table ) as writer:
for i in zip(x, y):
example = ExampleProcessor().create_example_with_label(
feature=i[0], label=i[1])
writer.write(example.SerializeToString())

dataset = tf.data.TFRecordDataset(example_ten_times_table)
dataset = dataset.map(lambda e : tf.io.parse_example(e, RAW_DATA_TRAIN_SPEC))
dataset = dataset.map(lambda t : (t['x'], t['y']))
dataset = dataset.batch(100)
dataset = dataset.repeat()

model.fit(dataset, epochs=500, steps_per_epoch=10, verbose=0)

tf.keras.models.save_model(model,
save_model_dir_multiply_ten,
signatures=signature)

Now that we have two models, we apply them to our source data.

pipeline_options = PipelineOptions().from_dictionary(
{'temp_location':f'gs://{bucket}/tmp'})

pipeline = beam.Pipeline(options=pipeline_options)

with pipeline as p:
questions = p | beam.io.gcp.bigquery.ReadFromBigQuery(
table=f'{project}:maths.maths_problems_1')

multiply_five = ( questions
| "CreateMultiplyFiveTuple" >>
beam.Map(lambda x : (bytes('{}{}'.format(x['key'],' * 5'),'utf-8'),
ExampleProcessor().create_example(x['value'])))

| "Multiply Five" >> RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(
model_path=save_model_dir_multiply)))
)
multiply_ten = ( questions
| "CreateMultiplyTenTuple" >>
beam.Map(lambda x : (bytes('{}{}'.format(x['key'],'* 10'), 'utf-8'),
ExampleProcessor().create_example(x['value'])))
| "Multiply Ten" >> RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(
model_path=save_model_dir_multiply_ten)))
)
_ = ((multiply_five, multiply_ten) | beam.Flatten()
| beam.ParDo(PredictionWithKeyProcessor())
| beam.Map(print))
Output:

key is b'first_question * 5' input is [105.] output is 524.0875854492188
key is b'second_question * 5' input is [108.] output is 539.0093383789062
key is b'third_question * 5' input is [1000.] output is 4975.75830078125
key is b'fourth_question * 5' input is [1013.] output is 5040.41943359375
key is b'first_question* 10' input is [105.] output is 1054.333984375
key is b'second_question* 10' input is [108.] output is 1084.3131103515625
key is b'third_question* 10' input is [1000.] output is 9998.0908203125
key is b'fourth_question* 10' input is [1013.] output is 10128.0009765625

Inference with multiple models in sequence

In a sequential pattern, data is sent to one or more models in sequence, with the output from each model chaining to the next model.

sequential pattern sending data to one or more models in sequence, with the output from each model chaining to the next model.

Here are the steps:

  1. Read the data from BigQuery
  2. Map the data
  3. RunInference with multiply by 5 model
  4. Process the results
  5. RunInference with multiply by 10 model
  6. Process the results
pipeline_options = PipelineOptions().from_dictionary(
{'temp_location':f'gs://{bucket}/tmp'})

pipeline = beam.Pipeline(options=pipeline_options)

def process_interim_inference(element : Tuple[
bytes, prediction_log_pb2.PredictionLog
])-> Tuple[bytes, tf.train.Example]:

key = '{} original input is {}'.format(
element[0], str(tf.train.Example.FromString(
element[1].predict_log.request.inputs['examples'].string_val[0]
).features.feature['x'].float_list.value[0]))

value = ExampleProcessor().create_example(
element[1].predict_log.response.outputs['output_0'].float_val[0])

return (bytes(key,'utf-8'),value)

with pipeline as p:

questions = p | beam.io.gcp.bigquery.ReadFromBigQuery(
table=f'{project}:maths.maths_problems_1')

multiply = ( questions
| "CreateMultiplyTuple" >>
beam.Map(lambda x : (bytes(x['key'],'utf-8'),
ExampleProcessor().create_example(x['value'])))
| "MultiplyFive" >> RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(
model_path=save_model_dir_multiply)))

)

_ = ( multiply
| "Extract result " >>
beam.Map(lambda x : process_interim_inference(x))
| "MultiplyTen" >> RunInference(
model_spec_pb2.InferenceSpecType(
saved_model_spec=model_spec_pb2.SavedModelSpec(
model_path=save_model_dir_multiply_ten)))
| beam.ParDo(PredictionWithKeyProcessor())
| beam.Map(print)
)
Output: 

key is b"b'first_question' original input is 105.0" input is [524.9771118164062] output is 5249.7822265625
key is b"b'second_question' original input is 108.0" input is [539.9765014648438] output is 5399.7763671875
key is b"b'third_question' original input is 1000.0" input is [4999.7841796875] output is 49997.9453125
key is b"b'forth_question' original input is 1013.0" input is [5064.78125] output is 50647.91796875

Running the pipeline on Dataflow

Until now the pipeline has been run locally, using the direct runner, which is implicitly used when running a pipeline with the default configuration. The same examples can be run using the production Dataflow runner by passing in configuration parameters including --runner. Details and an example can be found here.

Here is an example of the multimodel pipeline graph running on the Dataflow service:

example of the multimodel pipeline graph running on the Dataflow service

With the Dataflow runner you also get access to pipeline monitoring as well as metrics that have been output from the RunInference transform. The following table shows some of these metrics from a much larger list available from the library.

Table showing Dataflow runner metrics

Conclusion

In this blog, part II of our series, we explored the use of the tfx-bsl RunInference within some common scenarios, from standard inference, to post processing and the use of RunInference API in multiple locations in the pipeline.

To learn more, review the Dataflow and TFX documentation, you can also try out TFX with Google Cloud AI platform pipelines..

Acknowledgements

None of this would be possible without the hard work of many folks across both the Dataflow TFX and TF teams. From the TFX and TF team we would especially like to thank Konstantinos Katsiapis, Zohar Yahav, Vilobh Meshram, Jiayi Zhao, Zhitao Li, and Robert Crowe. From the Dataflow team I would like to thank Ahmet Altay for his support and input throughout.

Read More

Adaptive Framework for On-device Recommendation

Posted by Ellie Zhou, Tian Lin, Shuangfeng Li and Sushant Prakash

Introduction & Motivation

We are excited to announce an adaptive framework to build on-device recommendation ML solutions with your own data and advanced user modeling architecture.

After the previously open-sourced on-device recommendation solution, we received a lot of interest from the community on introducing on-device recommender AI. Motivated and inspired by the feedback, we considered various use cases, and created a framework that could generate TensorFlow Lite recommendation models accommodating different kinds of data, features, and architectures to improve the previous models.

Benefits of this framework:

  • Flexible: The adaptive framework allows users to create a model in a configurable way.
  • Better model representation: To improve the previous model, our new recommendation models can utilize multiple kinds of features than a single feature.

Personalized recommendations play an increasingly important role in digital life nowadays. With more and more user actions being moved to edge devices, supporting recommenders on-device becomes an important direction. Compared with conventional pure server-based recommenders, the on-device solution has unique advantages, such as protecting users’ privacy, providing fast reaction to on-device user actions, leveraging lightweight TensorFlow Lite inference, and bypassing network dependency. We welcome you to try out this framework and create recommendation experience in your applications.

In this article, we will

  • Introduce the improved model architecture and framework adaptivity.
  • Walk you through how to utilize the framework step-by-step.
  • Provide insights based on research done with a public dataset.

Please find more details on TensorFlow website.

Model

A recommendation model typically predicts users’ future activities, based on users’ previous activities. Our framework supports models using context information to do the prediction, which can be described in the following architecture:

recommendation code structure
Figure 1: An illustration of the configurable recommendation model. Each module is created according to the user-defined configuration.

At the context side, representations of all user activities are aggregated by the encoder to generate the context embedding. We support three different types of encoders: 1) bag-of-words (a.k.a. BOW), 2) 1-D convolution (a.k.a. CNN), and 3) LSTM. At the label side, the label item as positive and all other items in the vocabulary as negatives will be encoded to vectors as well. Context and label embeddings are combined with a dot product and fed to the loss of softmax cross entropy.

Inside the framework, we encapsulate tf.keras layers for ContextEncoder, LabelEncoder and DotProductSimilarity as key components in RecommendationModel.

To model each user activity, we could use the ID of the activity item (called ID-based), or multiple features of the item (called feature-based), or a combination of both. The feature-based model utilizing multiple features to collectively encode users’ behavior. With our framework, you could create either ID-based or feature-based models in a configurable way.

Similar to the last version, a TensorFlow Lite model will be exported after training which can directly provide top-K predictions among the recommendation candidates.

Step-by-step

To demonstrate the new adaptive framework, we trained a on-device movie recommendation model with MovieLens dataset using multiple features, and integrated it in the demo app. (Both the model and the app are for demonstration purposes only.) The MovieLens 1M dataset contains ratings from 6039 users across 3951 movies, with each user rating only a small subset of movies.

Let’s look at how to use the framework step-by-step in this notebook.

(a) Environment preparation

git clone https://github.com/tensorflow/examples
cd examples/lite/examples/recommendation/ml/
pip install -r requirements.txt

(b) Prepare training data

Please prepare your training data reference to the movielens example generation file. Would like to note that TensorFlow Lite input features are expected to be FixedLenFeature, please pad or truncate your features, and set up feature lengths in input configuration. Feel free to use the following command to process the example dataset.

python -m data.example_generation_movielens 
--data_dir=data/raw
--output_dir=data/examples
--min_timeline_length=3
--max_context_length=10
--max_context_movie_genre_length=32
--min_rating=2
--train_data_fraction=0.9
--build_vocabs=True

MovieLens data contains ratings.dat (columns: UserID, MovieID, Rating, Timestamp), and movies.dat (columns: MovieID, Title, Genres). The example generation script takes both files, only keep ratings higher than 2, form user movie interaction timelines, sample activities as labels and previous user activities as the context for prediction. Please find the generated tf.Example:

0 : {
features: {
feature: {
key : "context_movie_id"
value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
}
feature: {
key : "context_movie_rating"
value: { float_list: {value: [ 3.0, 3.0, 4.0, ..., 3.0 ] } }
}
feature: {
key : "context_movie_year"
value: { int64_list: { value: [ 1981, 1980, 1985, ..., 1990 ] } }
}
feature: {
key : "context_movie_id"
value: { int64_list: { value: [ 1124, 2240, 3251, ..., 1268 ] } }
}
feature: {
key : "context_movie_genre"
value: { bytes_list: { value: [ "Drama", "Drama", "Mystery", ..., "UNK" ] } }
}
feature: {
key : "label_movie_id"
value: { int64_list: { value: [ 3252 ] } }
}
}
}

(c) Create input config

Once data prepared, please set up input configuration, e.g. this is one example configuration for movielens movie recommendation model.

activity_feature_groups {
features {
feature_name: "context_movie_id"
feature_type: INT
vocab_size: 3953
embedding_dim: 8
feature_length: 10
}
features {
feature_name: "context_movie_rating"
feature_type: FLOAT
feature_length: 10
}
encoder_type: CNN
}
activity_feature_groups {
features {
feature_name: "context_movie_genre"
feature_type: STRING
vocab_name: "movie_genre_vocab.txt"
vocab_size: 19
embedding_dim: 4
feature_length: 32
}
encoder_type: CNN
}
label_feature {
feature_name: "label_movie_id"
feature_type: INT
vocab_size: 3953
embedding_dim: 8
feature_length: 1
}

(d) Train model

The model trainer will construct the recommendation model based on the input config, with a simple interface.

python -m model.recommendation_model_launcher -- 
--training_data_filepattern "data/examples/train_movielens_1m.tfrecord"
--testing_data_filepattern "data/examples/test_movielens_1m.tfrecord"
--model_dir "model/model_dir"
--vocab_dir "data/examples"
--input_config_file "configs/sample_input_config.pbtxt"
--batch_size 32
--learning_rate 0.01
--steps_per_epoch 2
--num_epochs 2
--num_eval_steps 2
--run_mode "train_and_eval"
--gradient_clip_norm 1.0
--num_predictions 10
--hidden_layer_dims "32,32"
--eval_top_k "1,5"
--conv_num_filter_ratios "2,4"
--conv_kernel_size 4
--lstm_num_units 16

Inside the recommendation model, core components are packaged up to keras layers (context_encoder.py, label_encoder.py and dotproduct_similarity.py), each of which could be utilized by itself. The following diagram illustrates the code structure:

An example of model architecture using context information to predict the next movie.
Figure 2: An example of model architecture using context information to predict the next movie. The inputs are the history of (a) movie IDs, (b) ratings and (c) genres, which is specified by the config mentioned above.

With the framework, you can directly execute the model training launcher with command:

python -m model.recommendation_model_launcher 
--input_config_file "configs/sample_input_config.pbtxt"
--vocab_dir "data/examples"
--run_mode "export"
--checkpoint_path "model/model_dir/ckpt-1000"
--num_predictions 10
--hidden_layer_dims "32,32"
--conv_num_filter_ratios "2,4"
--conv_kernel_size 4
--lstm_num_units 16

The inference code after exporting to TensorFlow Lite can be found in the notebook, and we refer readers to check out the details there.

Framework Adaptivity

Our framework provides a protobuf interface, through which feature groups, types and other information can be configured to build models accordingly. With the interface, you can configure:

  • Features

    The framework generically categorizes features into 3 types: integer, string, and float. Embedding spaces will be created for both integer and string features, hence, embedding dimension, vocabulary name and size need to be specified. Float feature values will be directly used. Besides, for on-device models, we suggest to use fixed length features which can be configured directly.

message Feature {
optional string feature_name = 1;

// Supported feature types: STRING, INT, FLOAT.
optional FeatureType feature_type = 2;

optional string vocab_name = 3;

optional int64 vocab_size = 4;

optional int64 embedding_dim = 5;

optional int64 feature_length = 6;
}
  • Feature groups

    One feature for one user activity may have multiple values. For instance, one movie can belong to multiple categories, each movie will have multiple genre feature values. To handle the different feature shapes, we introduced the “feature group” to combine features as a group . The features with the same length can be put in the same feature group to be encoded together. Inside input config, you can set up global feature groups and activity feature groups.

message FeatureGroup {
repeated Feature features = 1;

// Supported encoder types: BOW, CNN, LSTM.
optional EncoderType encoder_type = 2;
}
  • Input config

    You can use the input config interface to set up all the features and feature groups together.

message InputConfig {
repeated FeatureGroup global_feature_groups = 1;

repeated FeatureGroup activity_feature_groups = 2;

optional Feature label_feature = 3;
}

The input config is utilized by both input_pipeline.py and recommendation_model.py to process training data to tf.data.Dataset and construct the model accordingly. Inside ContexEncoder, FeatureGroupEncoders will be created for all feature groups, and used to compute feature group embeddings from input features. Concatenated feature group embeddings will be fed through top hidden layers to get the final context embedding. Worth noting that the final context embedding and label embedding dimensions should be equal.

Please check out the different model graphs produced with the different input configurations in the Appendix section.

Experiments and Analysis

We take this opportunity to analyze the performance of ID-based and feature-based models with various configurations, and provide some empirical results.

For the ID-based model, only movie_id is used as the input feature. And for the feature-based model, both movie_id and movie_genre features are used. Both types of models are experimented with 3 encoder types (BOW/CNN/LSTM) and 3 context history lengths (10/50/100).

Comparison between ID-based and Feature-based models.
Comparison between ID-based and Feature-based models. We compare them on BOW/CNN/LSTM encoders and context history lengths 10/50/100.

Since MovieLens dataset is an experimental dataset with ~4000 candidate movies and 19 movie genres, hence we scaled down embedding dimensions in the experiments to simulate the production scenario. For the above experiment result chart, ID embedding dimension is set to 8, and movie genre embedding dimension is set to 4. If we take the context10_cnn as an example, the feature-based model outperforms the ID-based model by 58.6%. Furthermore, the on average results show that feature-based models outperforms by 48.35%. Therefore, In this case, the feature-based model outperforms the ID-based model, because movie_genre feature introduces additional information to the model.

Besides, underlying features of candidate items mostly have a smaller vocabulary size, hence smaller embedding spaces as well. For instance,the movie genre vocabulary is much smaller than the movie ID vocabulary. In this case, utilizing underlying features could reduce the memory size of the model, which is more on-device friendly.

Acknowledgement

Special thanks to Cong Li, Josh Gordon, Khanh LeViet‎, Arun Venkatesan and Lawrence Chan for providing valuable suggestions to this work.

Read More

Reconstructing thousands of particles in one go at the CERN LHC with TensorFlow

A guest post by Jan Kieseler from CERN, EP/CMG

Introduction

At large colliders such as the CERN LHC (Large Hadron Collider) high energetic particle beams collide and thereby create massive and possibly yet unknown particles from the collision energy following the well known equation E=mc2. Most of these newly created particles are not stable and decay to more stable particles almost immediately. Detecting these decay products and measuring their properties precisely is the key to understanding what happened during the high energy collision, and will possibly shed light on big questions such as the origin of dark matter.

Detecting and measuring particles

For this purpose, the collision interaction points are surrounded by large detectors covering as much as possible in all possible directions and energies of the decay products. These detectors are further split into sub-detectors, each collecting complementary information. The innermost detector, closest to the interaction point, is the tracker consisting of multiple layers. Similar to a camera, each layer can detect the spatial position at which a charged particle passed through it, providing access to its trajectory. Combined with a strong magnetic field, this trajectory gives access to the particle charge and the particle momentum.

While the tracker is aimed at measuring the trajectories, only, while minimising any further interaction with and scattering of the particles, the next sub-detector layer is aimed at stopping them entirely. By stopping the particles completely, these calorimeters can extract the initial particle energy, and can also detect neutral particles. The only particles that pass through these calorimeters are muons, which are identified by additional muon chambers that constitute the outermost detector shell and use the same detection principles as the tracker.

Layout of the CMS detector, showing different particle species interacting with the sub-detectors (Image credit: CERN).
Layout of the CMS detector, showing different particle species interacting with the sub-detectors (Image credit: CERN).

Combining the information from all these sub-detectors to reconstruct the final particles is a challenging task, not only because we want to achieve the best physics performance, but also in terms of computing resources and person-power available to develop and tune the reconstruction algorithms. In particular for the High-Luminosity LHC, the extension of the CERN LHC, aiming to collect unprecedented amounts of data, these algorithms need to perform well given a collision rate of 40MHz and up to 200 simultaneous interactions in each collision, which result in up to about a million signals from all sub-detectors.

Even with triggers, a fast, step-wise filtering of interesting events, in place, the total data collected to disk still comprises several petabytes, making efficient algorithms a must at all stages.

Classic reconstruction algorithms in high energy physics heavily rely on factorisation of individual steps and a lot of domain (physics) knowledge. While they perform reasonably well, the idealistic assumptions that are needed to develop these algorithms limit the performance, such that machine-learning approaches are often used to refine the classically reconstructed particles, and make their way into more and more reconstruction chains. The machine learning approaches benefit from a very precise simulation of all detector components and physics processes, valid over several orders of magnitude, such that large sets of labelled data can be produced very easily in a short amount of time. This led to a rise of neural network based identification and regression algorithms, and to the inclusion of TensorFlow as the standard inference engine in the software framework of the Compact Muon Solenoid (CMS) experiment.

Machine-learning reconstruction approaches also come with the advantage that by construction they have to be automatically optimizable and need a loss function to train that quantifies the final reconstruction target. In contrast, classic approaches are often optimised without the need to define such an inclusive quantitative metric, and parameters are tuned by hand, involving many experts, and taking a lot of person-power that could be spent on developing new algorithms instead of tuning existing ones. Therefore, moving to differentiable machine-learning algorithms such as deep neural networks in general can also help use the human resources more efficiently.

However, extending machine-learning based algorithms to the first step of reconstructing the particles from hits – instead of just refining already reconstructed particles – comes with two main challenges: the data structure and phrasing reconstruction as a minimisation problem.

The detector data is highly irregular, due to the inclusion of multiple sub-detectors, each with its own geometry. But even within a sub-detector, such as the tracker, the geometry is designed based on physics aspects with a fine resolution close to the interaction point and more coarse further away. Furthermore, the individual tracker layers are not densely packed, but have a considerable amount of space between them, and in each event only a small fraction of sensors are actually active, changing the number of inputs from event to event. Therefore, neural networks that require a regular grid, such as convolutional neural network architectures are – despite their good performance and highly optimised implementations – not applicable.

Graph neural networks can bridge this gap and, in principle, allow abstracting from the detector geometry. Recently, several graph neural network proposals from computer science have been studied in the context of refining already reconstructed particles in high energy physics. However, given the high input dimensionality of the data many of these proposals cannot be employed for reconstructing particles directly from hits, and custom solutions are needed. One example is GravNet that – by construction – reduces the resource requirements significantly while maintaining good physics performance by using sparse dynamic adjacency matrices and performing most operations without memory overhead.

This in particular becomes possible through TensorFlow which makes it easy to implement and load custom kernels into the graph and integrate custom analytic gradients for fused operations. Only the combination of these custom kernels and the network structure allows loading a full physics event into the GPU memory, training the network on it, and performing the inference.

GravNet layer architecture
GravNet layer architecture (from left to right): point features are projected into a feature space FLR, and a low dimensional coordinate space S; k nearest neighbours are determined in S; mean and maximum of distance weighted neighbour features are accumulated; accumulated features are combined with original features.

Since many of the reconstruction tasks, even the refinement of already reconstructed particles, rely on an unknown number of inputs, the recent addition and support of ragged data structures in TensorFlow in principle opens up a lot of new possibilities. While the integration is not sufficient to build full neural network architectures, yet, a future full support of ragged data structures would be a significant step forward for integrating TensorFlow even deeper into the reconstruction algorithms and would make some custom kernels obsolete.

The second challenge when reconstructing particles directly from hits using deep neural networks is to train the network to predict an unknown number of particles from an unknown number of inputs. There is a plethora of algorithms and training methods for detecting dense objects in dense data, such as images, but while the requirement of the dense data can be loosened in some cases, most of these algorithms still rely on the objects being dense or having a clear boundary, making it possible to exploit anchor boxes or central points of the object. Particles in the detector however, often overlap to a large degree, and their sparsity does not allow defining central points nor clear boundaries. A solution to this problem is Object Condensation, where object properties are condensed in at least one representative condensation point per object that can be chosen freely by the network through a high confidence score.

cluster points

To resolve ambiguities, the other points are clustered around the object they belong to using potential functions (illustrated above). However, these potentials scale with the confidence score in a tunable manner, such that the amount of segmentation the network should perform is adjustable up to the point where all points except the condensation points can be left free floating in case we are only interested in the final object properties.

Some parts of this algorithm are very similar to the method proposed in a previous paper, but the goal is entirely different. While the previous approach constitutes a very powerful segmentation algorithm moving pixels to cluster objects in an image towards a central point, the condensation points here directly carry object properties, and through the choice of the potential functions the clustering space can be completely detached from the input space. The latter has big implications for the applicability to the sparse detector data with overlapping particles, but also does not distinguish conceptually between “stuff” and “things”, providing a new perspective on one-shot panoptic segmentation.

But coming back to particle reconstruction, as shown in the corresponding paper, Object Condensation can outperform classic particle reconstruction algorithms even on simplified problems that are quite close to the idealistic assumptions of the classic algorithm. Therefore, it provides an alternative to classic reconstruction approaches directly from hits.

Based on this promising study, there is work ongoing to extend the approach to simulated events in the High Granularity Calorimeter, a planned new sub-detector of the CMS experiment at CERN, with about 2 million sensors, covering the particularly challenging forward region close to the incident beams, where most particles are produced. Compared to the published proof-of-concept, this realistic environment is much more challenging and requires the optimisation of the network structures and even more custom TensorFlow operations that can be found in the developing repository on github, using DeepJetCore as an interface to data formats commonly used in high-energy physics. Right now, there is a particular focus on implementing fast k-nearest-neighbour algorithms, a crucial building block for GravNet, that can handle the large input dimensionality, but also ragged implementations of other operations as well as implementations of the Object Condensation loss can be found there.

Conclusion

To conclude, the application of deep neural networks to reconstruction tasks is exhibiting a shift from refining classically reconstructed particles to reconstructing the particles and their properties directly, in an optimizable and highly parallelizable way to meet the person-power and computing challenges in the future. This development will give rise to more custom implementations, and will be using more of the bleeding edge features in TensorFlow and tf.keras such as ragged data structures, so that a closer contact between high-energy physics reconstruction developers and TensorFlow developers is foreseeable.

Acknowledgements

I would like to acknowledge the support of Thiru Palanisamy and Josh Gordon at Google for their help with the blog post collaboration and with providing active feedback.

Read More

How-to Write a Python Fuzzer for TensorFlow

Posted by Laura Pak

TensorFlow Python Fuzzer graphic

Fuzz testing is a process of testing APIs with generated data. Fuzzing ensures that code will not break on the negative path, generating randomized inputs that try to cover every branch of code. A popular choice is to pair fuzzers with sanitizers, which are tools that check for illegal conditions and thus flag the bugs triggered by the fuzzers’ inputs.

In this way, fuzzing can find:

  • Buffer overflows
  • Memory leaks
  • Deadlocks
  • Infinite recursion
  • Round-trip consistency failures
  • Uncaught exceptions
  • And more.

The best way to fuzz to have your fuzz tests running continuously. The more a test runs, the more inputs can be generated and tested against. In this article, you’ll learn how to add a Python fuzzer to TensorFlow.

The technical how-to

TensorFlow Python fuzzers run via OSS-Fuzz, the continuous fuzzing service for open source projects.

For Python fuzzers, OSS-Fuzz uses Atheris, a coverage-guided Python fuzzing engine. Atheris is based on the fuzzing engine libFuzzer, and it can be used with the dynamic memory error detector Address Sanitizer or the fast undefined behavior detector, Undefined Behavior Sanitizer. Atheris dependencies will be pre-installed on OSS-Fuzz base Docker images.

Here is a barebones example of a Python fuzzer for TF. The runtime will call TestCode with different random data.

import sys
import atheris_no_libfuzzer as atheris

def TestCode(data):
DoSomethingWith(data)

def main():
atheris.Setup(sys.argv, TestCode, enable_python_coverage=True)
atheris.Fuzz()

In the tensorflow repo, in the directory with the other fuzzers, add your own Python fuzzer like above. In TestCode, pick a TensorFlow API that you want to fuzz. In constant_fuzz.py, that API is tf.constant. That fuzzer simply passes data to the chosen API to see if it breaks. No need for code that catches the breakage; OSS-Fuzz will detect and report the bug.

Sometimes an API needs more structured data than just one input. TensorFlow has a Python class called FuzzingHelper that allows you to generate random int lists, a random bool, etc. See an example of its use in sparseCountSparseOutput_fuzz.py, a fuzzer that checks for uncaught exceptions in the API tf.raw_ops.SparseCountSparseOutput.

To build and run, your fuzzer needs a fuzzing target of type tf_py_fuzz_target, defined in tf_fuzzing.bzl. Here is an example fuzz target, with more examples here.

tf_py_fuzz_target(
name = "fuzz_target_name",
srcs = ["your_fuzzer.py"],
tags = ["notap"], # Important: include to run in OSS.
)

Testing your fuzzer with Docker

Make sure that your fuzzer builds in OSS-Fuzz with Docker.

First install Docker. In your terminal, run command docker image prune to remove any dangling images.

Clone oss-fuzz from Github. The project for a Python TF fuzzer, tensorflow-py, contains a build.sh file to be executed in the Docker container defined in the Dockerfile. Build.sh defines how to build binaries for fuzz targets in tensorflow-py. Specifically, it builds all the Python fuzzers found in $SRC/tensorflow/tensorflow, including your new fuzzer!

Inside oss-fuzz, run the following commands:

python infra/helper.py shell tensorflow
export FUZZING_LANGUAGE=python
compile

The command compile will run build.sh, which will attempt to build your new fuzzer.

The results

Once your fuzzer is up and running, you can search this dashboard for your fuzzer to see what vulnerabilities your fuzzer has uncovered.

Conclusion

Fuzzing is an exciting way to test software from the unhappy path. Whether you want to dabble in security or gain a deeper understanding of TensorFlow’s internals, we hope this post gives you a good place to start.

Read More

TensorFlow Quantum turns one year old

Posted by Michael Broughton, Alan Ho, Masoud Mohseni

Last year we announced TensorFlow Quantum (TFQ) at the 2020 TensorFlow developer summit and on the Google AI Blog. Bringing all of the tools and features that TensorFlow has to offer to the world of quantum computing has led to some great research success stories. In this post, we would like to look back on what’s happened in the last year involving TensorFlow Quantum and how far it’s come. We also discuss the future of quantum computing and machine learning in TensorFlow Quantum.

Since the release of TensorFlow Quantum, we’ve been happy to see increasing use of the library in the academic world as well as inside Alphabet, in particular the Quantum AI team at Google. There have been many research articles published in the last year that made use of TensorFlow Quantum in quantum machine learning or hybrid quantum-classical models, including discriminative models and generative models. With the cross pollination of ideas between the two fields, we are also seeing advanced learning algorithms from classical machine learning being reimagined such as quantum reinforcement learning, layerwise, and neural architecture search. We leverage the scalability and tooling of TensorFlow to run numerical experiments with large numbers of qubits and gates to more faithfully discover algorithms that will be practical in the future.

Here are a few papers published using TFQ if you’d like to check them out:

In our recent publication to quantify the computational advantage of quantum machine learning, experiments were conducted at PetaFLOP/s throughput scales, which is nothing new for classical machine learning, but represents a huge leap forward in the scale seen in quantum machine learning experiments before TensorFlow Quantum came along. We are very excited for the future that quantum computing and machine learning have together and we are happy to see TensorFlow Quantum having such a positive impact already.

The academic world isn’t the only place machine learning and quantum computing have been able to come together. Over the past year members of the TensorFlow Quantum team helped out in supporting the artistic works of Refik Anadol Studios’ “Quantum memories” piece. This combines the random circuit sample data from the 2019 beyond classical experiment and adoptions of StyleGAN to create some truly magnificent works of art

Quantum memories installation at the NGV (image used with permission).

Next steps

We will soon be releasing TensorFlow Quantum 0.5.0, with more support for distributed workloads as well as lots of new quantum centric features and some small performance boosts. Looking forward, we hope that these features will enable our users to continue to push the boundaries of complexity and scale in quantum computing and machine learning and eventually help lead to groundbreaking quantum computing experiments (not just simulations). Our ultimate goal when we released TensorFlow Quantum was to have it aid in the search for quantum advantage in the field of machine learning. In time, it is our hope to see the world reach that goal, with the help of the continued hard work and dedication of the QML research community. Quantum machine learning is still a very young field and there’s still a long way to go before this happens, but over the past year we’ve seen the community make amazing strides in many different areas and we can’t wait to see what you will accomplish in the years to come.

Read More

A Tour of SavedModel Signatures

Posted by Daniel Ellis, TensorFlow Engineer

Note: This blog post is aimed at TensorFlow developers who want to learn the details of how graphs and models are stored. If you are new to TensorFlow, you should check out the TensorFlow Basics guides before reading this article.

TensorFlow can run models without the original Python objects, as demonstrated by TensorFlow Serving and TensorFlow Lite, or when you download a trained model from TensorFlow Hub.

Models and layers can be loaded from this representation without actually making an instance of the Python class that created it. This is desired in situations where you do not have (or want) a Python interpreter, such as serving at scale or on an edge device, or in situations where the original Python code is not available.

Saved models are represented by two separate, but equally important, parts: the graph, which describes the fixed computation described in code, and the weights, which are the dynamic parameters you trained during training. If you aren’t already familiar with this and @tf.function, you should check out the Introduction to graphs and functions guide as well as the section on saving in the modules, layers, and models guide.

From a code standpoint, functions decorated with @tf.function create a Python callable; in the documentation we refer to these as polymorphic functions, as they are Python callables that can take a variety argument signatures. Each time you call a @tf.function with a new argument signature, TensorFlow traces out a new graph just for that set of arguments. This new graph is then added as a “concrete function” to the callable. Thus, a saved model can be one or more subgraphs, each with a different signature.

A SavedModel is what you get when you call tf.saved_model.save(). Saved models are stored as a directory on disk. The file, saved_model.pb,within that directory, is a protocol buffer describing the functional tf.Graph.

In this blog post, we’ll take a look inside this protobuf and see how function signature serialization and deserialization works under the hood. After reading this, you’ll have a greater appreciation for what functions and signatures before, which can help you load, modify, or optimize saved models.

Background

There are a total of five places inputs to functions are defined in the saved model protobuf. It can be tough to understand and remember what each of these does. This post intends to inventory each of these definitions and what they’re used for. It also goes through a basic example illustrating what a simple model looks like after serialization.

The actual APIs you use will always be carefully versioned (as they have been since 2016), and the models themselves will conform to the version compatibility guide. However, the material in this document lays out a snapshot of the existing state of things. Any links to code will include point-in-time revisions so as not to drift out of date. As with all non-documented implementation details, these details are subject to change in the future.

We’ll occasionally use the term “signatures” to talk about the general concept of describing function inputs (e.g. in the title of this document). In this sense, we will be referring not just to TensorFlow’s specific concept of signatures, but all of the ways TensorFlow defines and validates inputs to functions. Context should make the meaning clear.

What This Is Not About

This document is not intended to describe how signatures or functions work from a user perspective. It is intended for TensorFlow developers working on the internals of TensorFlow. Likewise, this document does not make a statement of the way things “should” be. It aims to simply document the way things are.

Overview of Signature Definitions

There are five protos that store definitions of function inputs in one manner or another. Their names and code locations, as well as their paths within the saved model proto, are as follows:

Proto messages, and their location in SavedModel

FunctionDef

Of the five definitions discussed in this document, FunctionsDefs are the most core to execution. When loading a saved model, these function definitions are registered in the function library of the runtime and used to create ConcreteFunctions. These functions can then be executed via PartitionedCall or TFE_Py_Execute.

This is where the actual nodes describing execution are defined, as well as what the inputs and outputs to the function are.

SignatureDef

SignatureDefs are generated from signatures passed into @tf.function. We do not save the signature’s TensorSpecs directly, however. Instead, when saving, we call the underlying function using the TensorSpecs in order to generate a concrete function. From there, we inspect the generated concrete function to get the inputs and outputs, storing them on the SignatureDef.

On the loading side,SignatureDefs are essentially ignored. They are primarily used in v1 or C++, where the developer loading the model can inspect the returned SignatureDef protos directly. This allows them to use their desired signature name to lookup the placeholder and output names needed for execution.

These input and output names can then be passed into feeds and fetches when calling Session.run in TensorFlow V1 code.

SavedFunction

SavedFunction is one of the many types of SavedObjects in the nodes list of the ObjectGraphDef. SavedFunctions are restored into a RestoredFunctions at load time. Like all nodes in this list, they are then attached to the returned model via the hierarchy defined by the children ObjectReference field.

SavedFunction’s main purpose is polymorphism. SavedFunctions support polymorphism by specifying a number of concrete function names defined in the function library above (via FunctionDef). At call time, we iterate through the concrete function names to find the first whose signature matches. If we find a match, we call it; if not, we throw an exception.

There is one more bit of complexity. When a RestoredFunction is called with a particular set of arguments, a new concrete function is created whose sole purpose is to call the matching concrete function. This is done using restored_function_body under the hood and is where the logic lives to find the appropriate concrete function.

This is invisible in the SavedModel proto, but these extra concrete functions are registered at call time in the runtime’s function library just as the other function library functions are.

The second purpose of SavedFunction is to update the FunctionSpec of all associated ConcreteFunctions using the FunctionSpec stored on the SavedFunction. This function spec is used at call time to

  1. validate passed in structured arguments, and
  2. convert structured arguments into flat ones needed for calling the underlying concrete function.

SavedBareConcreteFunction

Similar to SavedFunctions, SavedBareConcreteFunctions are used to update a

specific concrete function’s arguments and function spec. This is done here. Unlike SavedFunctions, they only reference a single specific concrete function.

In practice, SavedBareConcreteFunctions are commonly attached to and accessed via the signatures map (i.e. the signatures attribute on the loaded object). The underlying concrete functions they modify, in this case, are signature_wrapper functions. This wrapping is done to format the output in the way v1 expects (i.e. a dictionary of tensors). Similar to restored_function_body concrete functions, and other than restructuring the output, these concrete functions do nothing but call their associated concrete functions.

SavedConcreteFunction

SavedConcreteFunction objects are not SavedObjectGraph nodes. They are stored in a map directly on the SavedObjectGraph. These objects reference a specific, already-registered concrete function — the key in the map is that concrete function’s registered name.

These objects serve two purposes. The first is handling function “captures” via

the bound_inputs field. Captured variables are those a function reads or modifies that were not explicitly passed in when calling into the function. Since functions in the function library do not have a concept of captured variables, any variables used by the function must be passed in as an argument. bound_inputs stores a list of node IDs that should be passed in to the underlying ConcreteFunction when called. We set this up here.

The second purpose, and similar to SavedFunction and SavedBareConcreteFunction, is modifying the existing concrete function’s FuncGraph structured inputs and outputs. This also is used for argument validation. The setup for this is done here.

Example Walkthrough

A simple example may help illustrate all of this with more clarity. Let’s make a basic model and take a look at the subsequent generated proto to get a better feel for what’s going on.

Basic Model

class ExampleModel(tf.Module):

@tf.function(input_signature=[tf.TensorSpec(shape=(), dtype=tf.float32)])
def capture_fn(self, x):
if not hasattr(self, 'weight'):
self.weight = tf.Variable(5.0, name='weight')
self.weight.assign_add(x * self.weight)
return self.weight

@tf.function
def polymorphic_fn(self, x):
return tf.constant(3.0) * x

model = ExampleModel()
model.polymorphic_fn(tf.constant(4.0))
model.polymorphic_fn(tf.constant([1.0, 2.0, 3.0]))
tf.saved_model.save(
model, "/tmp/example-model", signatures={'capture_fn': model.capture_fn})

This model contains the basis for most of the complexity we’ll need to fully explore the intricacies of saving and signatures. This will allow us to look at functions with and without signatures, with and without captures, and with and without polymorphism.

Function with Captures

Let’s start by looking at our function with captures, capture_fn. We can see we have a concrete function defined in the function library, as expected:

Image of concrete function defined in the function library
A FunctionDef located in FunctionDefLibrary of MetaGraphDef.graph_def

Note the expected float input, "x", as well as the additional captured argument, "mul_readvariableop_resource". Since this function has a capture, we should see a variable being referenced in the bound_inputs field of one of our SavedConcreteFunctions:

SavedConcreteFunctions
A SavedConcreteFunction located in the concrete_functions map of the ObjectGraphDef

Indeed, we can see bound_inputs refers to node 1, which is a SavedVariable with the name and dtype we expect:

A `SavedVariable` located in `ObjectGraphDef.nodes`
A SavedVariable located in ObjectGraphDef.nodes

Note that we also are storing on canonicalized_input_signature additional data that will be used to modify the concrete function. The key of this object, "__inference_capture_fn_59", is the same name as the concrete function registered in our function library.

Since we’ve specified a signature, we should also see a SavedBareConcreteFunction:

SavedBareConcreteFunction
A SavedBareConcreteFunction located in ObjectGraphDef.nodes

As discussed above, we use the function spec and argument information to modify the underlying concrete function. But what’s up with the "__inference_signature_wrapper_68" name? And how does this fit in with the rest of the code?

First, note that this is the fifth (5) node in the node list. This will come up again shortly.

Let’s start by looking at the nodes list. If we start at the first node in the nodes list, we’ll see a "signatures" node attached as a child:

SavedUserObject
A SavedUserObject located in ObjectGraphDef.nodes

If we look at node 2, we’ll see this node is a signature map that references one final node: node 5, our BareConcreteSavedFunction.

Node5
A SavedUserObject located in ObjectGraphDef.nodes

Thus, when we access this function via model.signatures["capture_fn"], we will actually be calling into this intermediate signature wrapper function first.

And what does that function, "__inference_signature_wrapper_68", look like?

FunctionDef
A FunctionDef located in FunctionDefLibrary of MetaGraphDef.graph_def

It takes the arguments we expect, and makes a call out to… "__inference_capture_fn_59", our original function! Just as we expect.

But wait… what happens if we don’t access our function via model.signatures["capture_fn"]? After all, we should be able to call it directly via model.capture_fn.

Notice above, we had a child on the top level object named "capture_fn" with a node_id of 3. If we look at node 3, we’ll see a SavedFunction object that references our original concrete function with no signature wrapper intermediary:

Node 3
A SavedFunction located in ObjectGraphDef.nodes

Again, the function spec is used to modify the function spec of our concrete function, "__inference_capture_fn_59". Notice also that concrete_functions here is a list. We only have one item right now, but this will come up again when we take a look at our polymorphic function example.

Now, we’ve fully mapped essentially everything needed for execution of this function, but we have one last thing to look at: SignatureDef. We’ve defined a signature, so we expect a SignatureDef to be defined:

SignatureDef
A SignatureDef located in the MetaObjectGraph.signature_def map

This is very important for loading in v1 and C++ for serving. Note those funky names: "capture_fn_x:0" and "StatefulPartitionedCall:0". To call this function in v1, we need a way to map our nice argument names to the actual graph placeholder names for passing in as feeds and fetches (and doing validation, if we wish). Looking at this SignatureDef allows us to do just that.

Polymorphic Functions

We’re not quite done yet. Let’s take a look at our polymorphic function. We won’t repeat everything, since a lot of it is the same. We won’t have any signature wrapper functions or signature defs, since we skipped the signature on this one. Let’s look at what’s different.

A Polymorphic FunctionDef
A FunctionDef located in FunctionDefLibrary of MetaGraphDef.graph_def

For one, we now have two concrete functions registered in the function library, each with slightly different input shapes.

We also have two SavedConcreteFunction modifiers:

Two SavedConcreteFunctions
Two SavedConcreteFunctions located in the concrete_functions map of the ObjectGraphDef

And finally, we can see our SavedFunction references two underlying concrete functions instead of one:

SavedFunction
A SavedFunction located in ObjectGraphDef.nodes

The function spec here will be attached to both of these concrete functions at load time. When we call our SavedFunction, it will use the arguments we pass in to find the correct concrete function and execute it.

Next Steps

You should now be an expert on how functions and their signatures are saved at a code level. Remember, what’s described in this blog post is how the code works right now. For updated code and examples in the future, see the official documentation on tensorflow.org.

Speaking of documentation, if you want a fast introduction to the basic APIs for saved models, you should introductory articles on how the APIs for functions and modules are traced and saved. For experts, don’t miss this detailed guide on SavedModel itself, as well as a complete discussion of autograph.

And finally, if you do any exciting or useful protobuf surgery, share with us on Twitter. Thanks for reading this far!

Read More

Transfer Learning for Audio Data with YAMNet

Posted by Luiz GUStavo Martins, Developer Advocate

Transfer learning is a popular machine learning technique, in which you train a new model by reusing information learned by a previous model. Most common applications of transfer learning are for the vision domain, to train accurate image classifiers, or object detectors, using a small amount of data — or for text, where pre-trained text embeddings or language models like BERT are used to improve on natural language understanding tasks like sentiment analysis or question answering. In this article, you’ll learn how to use transfer learning for a new and important type of data: audio, to build a sound classifier.

There are many important use cases of audio classification, including to protect wildlife, to detect whales and even to fight against illegal deforestation.

With YAMNet, you can create a customized audio classifier in a few easy steps:

  • Prepare and use a public audio dataset
  • Extract the embeddings from the audio files using YAMNet
  • Create a simple two layer classifier and train it.
  • Save and test the final model

You can follow the code here in this tutorial.

The YAMNet model

YAMNet (“Yet another Audio Mobilenet Network”) is a pretrained model that predicts 521 audio events based on the AudioSet corpus.

This model is available on TensorFlow Hub including the TFLite and TF.js versions, for running the model on mobile and the web. The code can be found on their repository.

The model has 3 outputs:

  • Class scores that you’d use for inference
  • Embeddings, which are the important part for transfer learning
  • Log Mel Spectrograms to provide a visualization of the input signal

The model takes a waveform represented as 16 kHz samples in the range [-1.0, 1.0], frames it in windows of 0.96 seconds and hop of 0.48 seconds, and then runs the core of the model to extract the embeddings on a batch of these frames.

The 0.96 seconds windows hopping over a waveform
The 0.96 seconds windows hopping over a waveform

As an example, trying the model with this audio file [link] will give you these results:

The first graph is the waveform. The second graph is the log-mel spectrogram. The third graph shows the class probability predictions per frame of audio, where darker is more likely.
The first graph is the waveform. The second graph is the log-mel spectrogram. The third graph shows the class probability predictions per frame of audio, where darker is more likely.

The ESC-50 dataset

To do transfer learning with the model, you’ll use the Dataset for Environmental Sound Classification, or ESC-50 for short. This is a collection of 2000 environmental audio recordings from 50 classes. Each recording is 5 seconds long and they came originally from the Freesound project.

The ESC-50 has the classes Dog and Cat that you’ll need.

The dataset has two important components: the audio files and a metadata csv file with the metadata about every audio file.

The columns in the metadata csv file contains information that will be used to train the model:

  • Filename gives the name of the .wav audio file
  • Category is the human-readable class name for the numeric target id
  • Target is the unique numeric id of the category
  • Fold ensures that clips originating from the same initial source are always contained in the same group. This is important to avoid cross-contamination when splitting the data into train, validation and test sets and for cross-validation.

For more detailed information you can read the original ESC paper.

Working with the dataset

To load the dataset, you’ll start from the metadata file and load it using the Pandas method read_csv.

With the loaded dataframe, the next steps are to filter by the classes that will be used, in this case: Dogs and Cats.

Next step would be to load the audio files to start the process but if there are too many audio files, just loading all of them to memory can be prohibitive and lead to out of memory issues. The best solution is to lazily load the audio files when needed. TensorFlow can help do this easily with tf.data.Dataset and the map method.

Let’s create the Dataset from the the previous created pandas dataframe and apply the load_wav method to all the files:

filenames = filtered_pd['filename']
targets = filtered_pd['target']
folds = filtered_pd['fold']

main_ds = tf.data.Dataset.from_tensor_slices((filenames, targets, folds))
main_ds = main_ds.map(load_wav_for_map)

Here, no audio file was loaded to memory yet since the mapping wasn’t evaluated. For example, if you request a size of the dataset for example (len(list(train_ds.as_numpy_iterator()))

), that would make the map function to be evaluated and load all the files.

The same technique will be used to extract all the features (embeddings) from each audio file.

Extracting the audio embeddings

Here you are going to load the YAMNet model from TensorFlow Hub. All you need is the model’s handle, and call the load method from the tensorflow_hub library.

yamnet_model_handle = 'https://tfhub.dev/google/yamnet/1'
yamnet_model = hub.load(yamnet_model_handle)

This will load the model to memory ready to be used.

For each audio file, you’ll extract the embeddings using the YAMNet model. For each audio file, YAMNet is executed. The embeddings output is paired with the same label and folder from the audio file.

def extract_embedding(wav_data, label, fold):
''' run YAMNet to extract embedding from the wav data '''
scores, embeddings, spectrogram = yamnet_model(wav_data)
num_embeddings = tf.shape(embeddings)[0]
return (embeddings,
tf.repeat(label, num_embeddings),
tf.repeat(fold, num_embeddings))

main_ds = main_ds.map(extract_embedding).unbatch()

These embeddings will be the input for the classification model. From the model’s documentation, you can read that for a given audio file, it will frame the waveform into sliding windows of length 0.96 seconds and hop 0.48 seconds, and then run the core of the model. So, in summary, for each 0.48 seconds, the model will output one embedding array with 1024 float values. This part is also done using map(), so again, lazy evaluation and that’s why it executes so fast.

The final dataset contains the three used columns: embedding, label and fold.

The last dataset operation is to split into train, validation and test datasets. To do so the filter() method and use the fold field (an integer between 1 and 5) as criteria.

cached_ds = main_ds.cache()
train_ds = cached_ds.filter(lambda embedding, label, fold: fold < 4)
val_ds = cached_ds.filter(lambda embedding, label, fold: fold == 4)
test_ds = cached_ds.filter(lambda embedding, label, fold: fold == 5)

Training the Classifier

With the YAMNet embedding vectors and the label, the next step is to train a classifier that learns what’s a dog’s sound and what is a cat’s sound.

The classifier model is very simple with just two dense layers, but as you’ll see this is enough for the amount of data used.

my_model = tf.keras.Sequential([
tf.keras.layers.Input(shape=(1024), dtype=tf.float32, name='input_embedding'),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(len(my_classes))
])

Saving the final model

The model that was trained works and has good accuracy but the input it expects is not an audio waveform but an embedding array. To address this problem, the final model will combine YAMNet as the input layer and the model just trained. This way, the final model will accept a waveform and output the class:

input_segment = tf.keras.layers.Input(shape=(), dtype=tf.float32,
name='audio')
embedding_extraction_layer = hub.KerasLayer('https://tfhub.dev/google/yamnet/1', trainable=False)
scores, embeddings, spectrogram = embedding_extraction_layer(input_segment)
serving_outputs = my_model(embeddings_output)
serving_outputs = ReduceMeanLayer(axis=0, name='classifier')(serving_outputs)
serving_model = tf.keras.Model(input_segment, serving_outputs)
serving_model.save(saved_model_path, include_optimizer=False)

To try the reloaded model, you can use the same way it was used earlier in the colab:

reloaded_model = tf.saved_model.load(saved_model_path)
reloaded_results = reloaded_model(testing_wav_data)
cat_or_dog = my_classes[tf.argmax(reloaded_results)]

This model can also be used with TensorFlow Serving with the ‘serving_default’

serving_results =  reloaded_model.signatures['serving_default'](testing_wav_data)
cat_or_dog = my_classes[tf.argmax(serving_results['classifier'])]

In this post, you learned how to use the YAMNet model for transfer learning to recognize audio of dogs and cats from the ESC-50 dataset.

Check out the YAMNet model on tfhub.dev and the tutorial on tensorflow.org. You can apply this technique to your own dataset, or to other classes in the ESC-50 dataset.

We would love to know what you can build with this! Share your project with us on social media by using the hashtag #TFHub.

Acknowledgements

We’d like to thank a number of colleagues for their contribution to this work: Dan Ellis, Manoj Plakal and Eduardo Fonseca for an amazing YAMNet model and support with the colab and multiple reviews.

Mark Daoust and Elizabeth Kemp have greatly improved the presentation of the material in this post and the associated tutorial.

Read More

Introducing TensorFlow Videos for a Global Audience: Vietnamese

Posted by TensorFlow Team

When the TensorFlow YouTube channel launched in 2018, we had a vision to inform and inspire developers around the world about what was possible with Machine Learning. With series like Coding TensorFlow showing how you can use it, and Made with TensorFlow showing inspirational stories about what people have done with TensorFlow and much more, the channel has grown greatly. But we learned an important lesson: it’s a global phenomenon, and to reach the world effectively, we should provide some of our content in multiple languages with native speakers presenting. Check out the popular Zero to Hero series in Vietnamese!

Nhập môn Học máy với TensorFlow

Dường như mỗi khi bạn lướt web, đọc sách, báo thì thông tin về công nghệ học máy (machine learning) và trí tuệ nhân tạo (AI) luôn đập vào mắt bạn. Trong số đó cũng có không ít những thông tin và quảng cáo thổi phồng về AI. Bởi vậy, từ góc nhìn của developer, chúng tôi trong nhóm TensorFlow quyết định sản xuất một chuỗi video gồm 4 phần về bản chất thực sự của công nghệ học máy, dựa trên bài thuyết trình nổi tiếng của Laurence Moroney tại Google IO 2019 với tựa đề Machine Learning: Zero to Hero (tạm dịch là Công nghệ Học máy: Trở thành chuyên gia từ con số 0 với TensorFlow).

Trong video 1, chúng tôi sẽ giới thiệu về một hình thức lập trình mới là học máy. Trong đó, thay vì lập trình các chỉ thị cho máy tính bằng ngôn ngữ lập trình như Java hoặc C++, thì trong học máy bạn sẽ tạo một chương trình được huấn luyện dựa trên dữ liệu và máy tính sẽ tự suy ra các logic từ dữ liệu này. Vậy công nghệ học máy thực sự là như thế nào? Chúng ta sẽ cùng tìm hiểu về một ví dụ “Hello World” về tạo mô hình học máy, giới thiệu các ý tưởng mà chúng ta sau đó sẽ áp dụng cho một vấn đề thú vị hơn: thị giác máy tính.

Trong video 2, bạn sẽ tìm hiểu về thị giác máy tính dựa trên học máy. Chúng ta sẽ huấn luyện cho máy tính nhìn thấy và nhận diện các đồ vật khác nhau.

Trong video 3, chúng ta sẽ học về các mạng nơ ron tích chập và lý do chúng đóng vai trò quan trọng trong các ứng dụng thị giác máy tính. Tích chập là bộ lọc xử lý hình ảnh và trích xuất các đặc điểm đặc trưng của ảnh. Bạn sẽ tìm hiểu về cách hoạt động của các mạng nơ ron tích chập qua việc xử lý và trích xuất đặc điểm của một tập các hình ảnh thực tế.

Trong video 4, bạn sẽ học về cách xây dựng mô hình phân loại hình ảnh để chơi trò oẳn tù tì. Trong phần 1, chúng ta đã đề cập đến trò chơi oẳn tù tì và việc lập trình để máy tính nhận biết hình ảnh bàn tay ra đấm, lá, kéo khó như thế nào. Tuy nhiên, sau đó chúng ta cũng đã tìm hiểu nhiều về công nghệ học máy, cách xây dựng mạng nơ ron để phát hiện các quy luật từ dữ liệu điểm ảnh, và phương pháp sử dụng mạng tích chập để phát hiện các đặc trưng trong bức ảnh. Trong phần này, chúng ta đã áp dụng những kiến thức đã học từ 3 phần trước để xây dựng mạng nơ ron để máy tính chơi oẳn tù tì.

Hy vọng loạt video này sẽ giúp bạn làm quen với học máy. Nếu có góp ý gì, các bạn hãy viết vào phần comment trong video trên YouTube. Và đừng quên subscribe kênh YouTube của TensorFlow thể xem các video khác về học máy nữa nhé!

Read More