Variational Inference with Joint Distributions in TensorFlow Probability

Posted by Emily Fertig, Joshua V. Dillon, Wynn Vonnegut, Dave Moore, and the TensorFlow Probability team

In this post, we introduce new tools for variational inference with joint distributions in TensorFlow Probability, and show how to use them to estimate Bayesian credible intervals for weights in a regression model.

Overview

Variational Inference (VI) casts approximate Bayesian inference as an optimization problem and seeks a ‘surrogate’ posterior distribution that minimizes the KL divergence with the true posterior. Gradient-based VI is often faster than MCMC methods, composes naturally with optimization of model parameters, and provides a lower bound on model evidence that can be used directly for model comparison, convergence diagnosis, and composable inference.

TensorFlow Probability (TFP) offers tools for fast, flexible, and scalable VI that fit naturally into the TFP stack. These tools enable the construction of surrogate posteriors with covariance structures induced by linear transformations or normalizing flows.

VI can be used to estimate Bayesian credible intervals for parameters of a regression model to estimate the effects of various treatments or observed features on an outcome of interest. Credible intervals bound the values of an unobserved parameter with a certain probability, according to the posterior distribution of the parameter conditioned on observed data and given an assumption on the parameter’s prior distribution.

In this post, we demonstrate how to use VI to obtain credible intervals for parameters of a Bayesian linear regression model for radon levels measured in homes (using Gelman et al.’s (2007) Radon dataset; see similar examples in Stan). We demonstrate how TFP JointDistributions combine with bijectors to build and fit two types of expressive surrogate posteriors:

  • a standard Normal distribution transformed by a block matrix. The matrix may reflect independence among some components of the posterior and dependence among others, relaxing the assumption of a mean-field or full-covariance posterior.
  • a more complex, higher-capacity inverse autoregressive flow.

The surrogate posteriors are trained and compared with results from a mean-field surrogate posterior baseline. These plots show credible intervals for four model parameters obtained with the three VI surrogate posteriors, as you’ll learn about below, as well as Hamiltonian Monte Carlo (HMC) for comparison.

credible intervals for four model parameters obtained with the three VI surrogate posteriors

You can follow along and see all the details in this Google Colab.

Example: Bayesian hierarchical linear regression on Radon measurements

Radon is a radioactive gas that enters homes through contact points with the ground. It is a carcinogen that is the primary cause of lung cancer in non-smokers. Radon levels vary greatly from household to household.

The EPA did a study of radon levels in 80,000 houses. Two important predictors are:

  • Floor on which the measurement was taken (radon higher in basements)
  • County uranium level (positive correlation with radon levels)

Predicting radon levels in houses grouped by county is a classic problem in Bayesian hierarchical modeling, introduced by Gelman and Hill (2006). We are interested in credible intervals for the effect of location (county) on the radon level of houses in Minnesota. In order to isolate this effect, the effects of floor and uranium level are also included in the model. Additionally, we will incorporate a contextual effect corresponding to the mean floor on which the measurement was taken, by county, so that if there is variation among counties of the floor on which the measurements were taken, this is not attributed to the county effect.

The regression model is specified as follows:

Regression model

in which i indexes the observations and countyi is the county in which the ith observation was taken.

We use a county-level random effect to capture geographical variation. The parameters uranium_weight and county_floor_weight are modeled probabilistically, and floor_weight and the constant bias are deterministic. These modeling choices are largely arbitrary, and are made for the purpose of demonstrating VI on a probabilistic model of reasonable complexity. For a more thorough discussion of multilevel modeling with fixed and random effects in TFP, using the radon dataset, see Multilevel Modeling Primer and Fitting Generalized Linear Mixed-effects Models Using Variational Inference.

The full code for this example is available on Github.

Variables are defined for the deterministic parameters and the Normal distribution scale parameters, with the latter constrained to be positive.

import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors

floor_weight = tf.Variable(0.)
bias = tf.Variable(0.)
log_radon_scale = tfp.util.TransformedVariable(1., tfb.Exp())
county_effect_scale = tfp.util.TransformedVariable(1., tfb.Exp())

We specify the probabilistic graphical model for the regression as a TFP JointDistribution.

@tfd.JointDistributionCoroutineAutoBatched
def model():
uranium_weight = yield tfd.Normal(0., scale=1., name='uranium_weight')
county_floor_weight = yield tfd.Normal(
0., scale=1., name='county_floor_weight')
county_effect = yield tfd.Sample(
tfd.Normal(0., scale=county_effect_scale),
sample_shape=[num_counties], name='county_effect')
yield tfd.Normal(
loc=(log_uranium * uranium_weight
+ floor_of_house * floor_weight
+ floor_by_county * county_floor_weight
+ tf.gather(county_effect, county, axis=-1)
+ bias),
scale=log_radon_scale[..., tf.newaxis],
name='log_radon')

We pin log_radon to the observed radon data to model the unnormalized posterior.

target_model = model.experimental_pin(log_radon=log_radon)

Quick summary of Bayesian Variational Inference

Suppose we have the following generative process, where 𝜃 represents random parameters (uranium_weight, county_floor_weight, and county_effect in the regression model) and ω represents deterministic parameters (floor_weight, log_radon_scale, county_effect_scale, and bias). The x𝑖 are features (log_uranium, floor_of_house, and floor_by_county) and the 𝑦𝑖 are target values (log_radon) for 𝑖 = 1…n observed data points:

equation

VI is then characterized by:

equation

(Technically we’re assuming q is absolutely continuous with respect to r. See also, Jensen’s inequality.)

Since the bound holds for all q, it is obviously tightest for:

equation

Regarding terminology, we call

  • q* the “surrogate posterior,” and,
  • Q the “surrogate family.”

ω* represents the maximum-likelihood values of the deterministic parameters on the VI loss. See this survey for more information on variational inference.

Expressive surrogate posteriors

Next we estimate the posterior distributions of the parameters using VI with two different types of surrogate posteriors:

  • A constrained multivariate Normal distribution, with covariance structure induced by a blockwise matrix transformation.
  • A multivariate standard Normal distribution transformed by an Inverse Autoregressive Flow, which is then split and restructured to match the support of the posterior.

Multivariate Normal surrogate posterior

To build this surrogate posterior, a trainable linear operator is used to induce correlation among the components of the posterior.

We begin by constructing a base distribution with vector-valued standard Normal components, with sizes equal to the sizes of the corresponding prior components. The components are vector-valued so they can be transformed by the linear operator.

flat_event_size = tf.nest.map_structure(
tf.reduce_prod,
tf.nest.flatten(target_model.event_shape_tensor()))
base_standard_dist = tfd.JointDistributionSequential(
[tfd.Sample(tfd.Normal(loc=0., scale=1.), s)
for s in flat_event_size])

To this distribution, we apply a trainable blockwise lower-triangular linear operator to induce correlation in the posterior. Within the linear operator, a trainable full-matrix block represents full covariance between two components of the posterior, while a block of zeros (or None) expresses independence. Blocks on the diagonal are either lower-triangular or diagonal matrices, so that the entire block structure represents a lower-triangular matrix.

Applying this bijector to the base distribution results in a multivariate Normal distribution with mean 0 and (Cholesky-factored) covariance equal to the lower-triangular block matrix.

operators = (
(tf.linalg.LinearOperatorDiag,), # Variance of uranium weight (scalar).
(tf.linalg.LinearOperatorFullMatrix, # Covariance between uranium and floor-by-county weights.
tf.linalg.LinearOperatorDiag), # Variance of floor-by-county weight (scalar).
(None, # Independence between uranium weight and county effects.
None, # Independence between floor-by-county and county effects.
tf.linalg.LinearOperatorDiag) # Independence among the 85 county effects.
)
block_tril_linop = (
tfp.experimental.vi.util.build_trainable_linear_operator_block(
operators, flat_event_size))
scale_bijector = tfb.ScaleMatvecLinearOperatorBlock(block_tril_linop)

Finally, we allow the mean to take nonzero values by applying trainable Shift bijectors.

loc_bijector = tfb.JointMap(
tf.nest.map_structure(
lambda s: tfb.Shift(
tf.Variable(tf.random.uniform(
(s,), minval=-2., maxval=2., dtype=tf.float32))),
flat_event_size))

The resulting multivariate Normal distribution, obtained by transforming the standard Normal distribution with the scale and location bijectors, must be reshaped and restructured to match the prior, and finally constrained to the support of the prior.

reshape_bijector = tfb.JointMap(
tf.nest.map_structure(tfb.Reshape, flat_event_shape))
unflatten_bijector = tfb.Restructure(
tf.nest.pack_sequence_as(
event_shape, range(len(flat_event_shape))))
event_space_bijector = target_model.experimental_default_event_space_bijector()

Now, put it all together — chain the trainable bijectors together and apply them to the base standard Normal distribution to construct the surrogate posterior.

surrogate_posterior = tfd.TransformedDistribution(
base_standard_dist,
bijector = tfb.Chain([ # Chained bijectors are applied in reverse order.
event_space_bijector, # Constrain to the support of the prior.
unflatten_bijector, # Pack components into the event_shape structure.
reshape_bijector, # Reshape the vector-valued components.
loc_bijector, # Allow for nonzero mean.
scale_bijector # Apply the block matrix transformation.
]))

Train the multivariate Normal surrogate posterior.

optimizer = tf.optimizers.Adam(learning_rate=1e-2)
@tf.function(jit_compile=True)
def run_vi():
return tfp.vi.fit_surrogate_posterior(
target_model.unnormalized_log_prob,
surrogate_posterior,
optimizer=optimizer,
num_steps=10**4,
sample_size=16)
mvn_loss = run_vi()
mvn_samples = surrogate_posterior.sample(1000)

Since the trained surrogate posterior is a TFP distribution, we can take samples from it and process them to produce posterior credible intervals for the parameters.

The box-and-whiskers plots below show 50% and 95% credible intervals for the county effect of the two largest counties and the regression weights on soil uranium measurements and mean floor by county. The posterior credible intervals for county effects indicate that location in St. Louis county is associated with lower radon levels, after accounting for other variables, and that the effect of location in Hennepin county is near neutral.

Posterior credible intervals on the regression weights show that higher levels of soil uranium are associated with higher radon levels, and counties where measurements were taken on higher floors (likely because the house didn’t have a basement) tend to have higher levels of radon, which could relate to soil properties and their effect on the type of structures built.

The (deterministic) coefficient of floor is -0.7, indicating that lower floors have higher radon levels, as expected.

Chart measuring effect and weight

Inverse autoregressive flow surrogate posterior

Inverse autoregressive flows (IAFs) are normalizing flows that use neural networks to capture complex, nonlinear dependencies among components of the distribution. Next we build an IAF surrogate posterior to see whether this higher-capacity, more flexible model outperforms the constrained multivariate Normal.

We begin by building a standard Normal distribution with vector event shape, of length equal to the total number of degrees of freedom in the posterior.

base_distribution = tfd.Sample(
tfd.Normal(loc=0., scale=1.),
sample_shape=[tf.reduce_sum(flat_event_size)])

A trainable IAF transforms the Normal distribution.

num_iafs = 2
iaf_bijectors = [
tfb.Invert(tfb.MaskedAutoregressiveFlow(
shift_and_log_scale_fn=tfb.AutoregressiveNetwork(
params=2,
hidden_units=[256, 256],
activation='relu')))
for _ in range(num_iafs)
]

The IAF bijectors are chained together with other bijectors to build a surrogate posterior with the same event shape and support as the prior.

iaf_surrogate_posterior = tfd.TransformedDistribution(
base_distribution,
bijector=tfb.Chain([
event_space_bijector, # Constrain to the support of the prior.
unflatten_bijector, # Pack components into the event_shape structure.
reshape_bijector, # Reshape the vector-valued components.
tfb.Split(flat_event_size), # Split into parts, same size as prior.
] + iaf_bijectors)) # Apply a flow model.

Like the multivariate Normal surrogate posterior, the IAF surrogate posterior is trained using tfp.vi.fit_surrogate_posterior. The credible intervals for the IAF surrogate posterior appear similar to those of the constrained multivariate Normal.

IAF Surrogate Posterior

Mean-field surrogate posterior

VI surrogate posteriors are often assumed to be mean-field (independent) Normal distributions, with trainable means and variances, that are constrained to the support of the prior with a bijective transformation. We define a mean-field surrogate posterior in addition to the two more expressive surrogate posteriors, using the same general formula as the multivariate Normal surrogate posterior. Instead of a blockwise lower triangular linear operator, we use a blockwise diagonal linear operator, in which each block is diagonal:

operators = (
tf.linalg.LinearOperatorDiag,
tf.linalg.LinearOperatorDiag,
tf.linalg.LinearOperatorDiag,
)
block_diag_linop = (
tfp.experimental.vi.util.build_trainable_linear_operator_block(
operators, flat_event_size))

In this case, the mean field surrogate posterior gives similar results to the more expressive surrogate posteriors, indicating that this simpler model may be adequate for the inference task. As a “ground truth”, we also take samples with Hamiltonian Monte Carlo (see the Colab for the full example). All three surrogate posteriors produced credible intervals that are visually similar to the HMC samples, though sometimes under-dispersed due to the effect of the ELBO loss, as is common in VI.

chart

Conclusion

In this post, we built VI surrogate posteriors using joint distributions and multipart bijectors, and fit them to estimate credible intervals for weights in a regression model on the radon dataset. For this simple model, more expressive surrogate posteriors appeared to perform similarly to a mean-field surrogate posterior. The tools we demonstrated, however, can be used to build a wide range of flexible surrogate posteriors suitable for more complex models.

Check out our code, documentation, and further examples on the TFP home page.

Read More

How OpenX Trains and Serves for a Million Queries per Second in under 15 Milliseconds

A guest post by Larry Price, OpenX

Edited by Robert Crowe, Anusha Ramesh – TensorFlow

OpenX logo

Overview

Adtech is an industry built on latency at scale. At OpenX this means that during peak traffic periods our exchange processes more than one million requests for ads every second, most of which require a response in under 300 milliseconds. Under such high volume and strict time budgets, it’s crucial to prioritize traffic to ensure we’re simultaneously helping publishers get top dollar for their inventory as well as ensuring buyers hit their campaign goals.

To accomplish this, we’ve leveraged several products in the TensorFlow ecosystem & Google Cloud including TensorFlow Extended (TFX), TF Serving, and Kubeflow Pipelines – to build a service that prioritizes traffic to our buyers (demand side platforms, or DSPs in adtech lingo) and more specifically to brands, and agencies within those DSPs.

About OpenX

OpenX operates the world’s largest independent advertising exchange. At a basic level, the exchange is a marketplace connecting tens of thousands of top brands to consumers across the most-visited websites and mobile apps.

The fundamental means of transacting is the auction, where buyers representing brands bid on publishers’ inventory, which are ad impressions on websites and mobile apps. The auctions themselves are fairly straightforward, but there are two facts that make this system incredibly complicated:

  1. Scale: At peak traffic our systems process more than one million requests for ads every second. A typical day sees more than 1.5 trillion bid transactions, resulting in petabytes of raw data.
  2. Latency: Preserving user experience on both the web and mobile apps is crucial to publishers, so most of the requests we process have strict time limits of 300 milliseconds or less, most of which is spent asking for and receiving the buyers’ bids. This means that any overhead introduced by machine learning models at auction time must be limited to at most about 15 milliseconds, otherwise we risk not giving buyers enough time to submit their bids.

This need for low latency coupled with the high throughput requirement is fairly atypical for machine learning systems. Before we get to the details of how we built a machine learning infrastructure capable of dealing with both requirements, we’ll dig a little deeper into how we got here and what problem we’re trying to solve.

Cloud Transformation: A rare opportunity

In 2019 OpenX undertook the ambitious task of moving off of on-premise computing resources to Google Cloud Platform (GCP). We completed the process over a span of seven months. As a company, we were empowered to utilize managed services and modify our stack as we transition, so it wasn’t just a simple “lift-and-shift”. We really took this to heart on the Data Science team.

Prior to the move to GCP, our legacy machine learning infrastructure followed a pattern where models trained by scientists had to be re-implemented by engineers in the components that needed to execute the models. This scenario satisfies the scale and latency requirements but comes with a whole host of other issues:

  • It takes a long time to get models to production because the scientist’s work (typically in Python) now has to be reproduced by an engineer in the native language of the component that has to call it.
  • The same is true for changes to model architecture, or even the way data transformations are performed.
  • It’s essentially a recipe for training-serving skew.
  • QA was challenging.

For these and several other reasons we decided to start from scratch. At the same time, we were working on a new problem and decided to tie the two efforts together and develop a new framework as part of the new project.

Our problem

The OpenX marketplace is not completely unlike an equities market or stock exchange. And much like high volume financial markets, to ensure the buyers fulfill their campaign goals and simultaneously help publishers monetize appropriately on their inventory, there’s a need to prioritize traffic. Fundamentally, this means we need a model that can accurately value and hence rank every single request that hits the exchange.

Why TensorFlow

As we looked for a solution for our next-generation platform we had a couple of goals in mind. We were looking primarily to drastically reduce the time and effort to put a model into production, and as part of getting there try to use managed services wherever possible. TensorFlow had already been in use at OpenX for a while prior to our migration to GCP, but our legacy infrastructure involved a number of custom scripts for data transformation and pipelining. At the same time as we were researching our options, both TensorFlow Extended (TFX) and Kubeflow Pipelines (KFP) were reaching a level of maturity that made them interesting for our purposes. It was a no-brainer to adopt these technologies into our stack.

How we solved it

Training Terabytes of Data Every Day

Our pipeline looks something like this.

TFX pipeline

It’s useful to spend some time breaking down the topology of the pipeline:

  • Raw Data – Our data consists of transaction logs that are streamed directly from StackDriver into a BigQuery sink as they arrive. To help avoid bias in our model we train on a fraction of the total data that is held out from our prioritization system, resulting in roughly 50TB of new data daily. This was a simple design choice as it was very straightforward to implement, and the big benefit is that we can use BigQuery on the data directly without an additional ETL.
  • BigQueryExampleGen – The first place we leverage BigQuery is using builtin functions to preprocess the data. By embedding our own specific processes into the query calls made by the ExampleGen component, we were able to avoid building out a separate ETL that would exist outside the scope of a TFX pipeline. This ultimately proved to be a good way to get the model in production more quickly. This preprocessed data is then split into training and test sets and converted to tf.Examples via the ExampleGen component.
  • Transform – This component does the necessary feature engineering and transformations necessary to handle strings, normalize values, setup embeddings etc. The major benefit here is that the resulting transformation is ultimately prepended to the computational graph, so that the exact same code is used for training and serving.
  • Trainer – The Trainer component does just that. We leverage parallel training on AI Platform to speed things up.
  • Evaluator – The Evaluator compares the existing production model to the model received by the Trainer and blesses the “better” one for use in production. The decisioning criteria is based on custom metrics aligned with business requirements (as opposed to, e.g. precision and recall). It was easy to implement the custom metrics meeting the business requirements owing to the extensibility of the evaluator component.
  • Pusher – The Pusher’s primary function is to send the blessed model to our TFServing deployment for production. However, we added functionality to use the custom metrics produced in the Evaluator to determine decisioning criteria to be used in serving, and attach that to the computational graph. The level of abstraction available in TFX components made it easy to make this custom modification. Overall, the modification allows the pipeline to operate without a human in the loop so that we are able to make model updates frequently, while continuing to deliver consistent performance on metrics that are important to our business.

Overall, out-of-the box TFX components provided most of the functionality we require. The biggest need we had to address is that our marketplace changes constantly, which requires frequent model updates. As mentioned previously, the design of TFX made those augmentations straightforward to implement.

However this really only solves the model training part of our problem. Serving up a million queries per second, each in under 15 milliseconds, is a major challenge. For that we turned to TensorFlow Serving.

Serving Over a Million Queries Per Second (QPS)

TensorFlow Serving enabled us to quickly take our TensorFlow models and serve them in production in a performant and scalable way. Using TensorFlow Serving provided us with a number of benefits. First, because it natively supports Google Cloud Storage as a model warehouse, we can automatically update our models used in serving simply by uploading to a GCS bucket. This allows us to quickly refresh our models with the newest data and have them instantly served in production. Next, TensorFlow Serving supports a batching mode that drastically increases throughput by queuing up several requests and processing them in a single graph run at the same time. This was an essential feature that massively helped us achieve our throughput goals just by setting a single option. Finally, TensorFlow Serving exposes metrics out of the box that allow us to monitor the throughput and latency of our requests and observe any scaling bottlenecks and inefficiencies.

All of these out of the box features in TensorFlow Serving were a massive win for us and helped us achieve our goals, but scaling it to millions of requests a second was not without challenges. By using large virtual machines with many CPUs we were able to hit our target goal of 15 millisecond predictions, but it did not scale very cost effectively and we knew we could do better. Luckily, TensorFlow Serving has several knobs and parameters that we used to tune our production VMs for better efficiency and scalability. By setting things like the number of batch threads, inter- and intra-op parallelism, and batch timeout, we were able to efficiently autoscale on custom sized VMs while still maintaining our throughput and latency goals.

The end result was a TensorFlow Serving deployment running on Google Kubernetes Engine serving 2.5 million prediction requests per second under 15 milliseconds each. This deployment spans over 25 kubernetes clusters across 10 different GCP regions and is able to scale up and down seamlessly to respond to spikes in traffic and save costs by scaling down during quiet periods. With around 500 TensorFlow Serving instances running around the world at peak times, each 8-CPU deployment is able to handle 5000 requests per second.

Building on Success

In the few months since implementing this we’ve been able to make dozens of improvements to the model – everything from changing the architecture of the original model, to changing the way certain features are processed – without support from any other engineering team. Changes at this pace were all but impossible with our legacy architecture. Moreover, each of these improvements brings new value to our customers – the buyers and sellers in our marketplace – more quickly than we’ve been able to in the past.

Since our initial implementation of this reference architecture, we’ve used it as a template for both new projects and the migration of existing models. It’s quite remarkable how many of the existing TFX components that we have in place carry over to new projects, and even more so how drastically we’ve reduced the time it takes to get a model in production. As a result, data scientists are able to spend more of their time optimizing the parameters and architectures of the models they produce, understanding their impact on the business, and ultimately delivering more value to our customers.

Acknowledgements

None of this would have been possible without the hard work of Michal Brys, Andy Gooden, Junbo Park, and Paul Selden, along with the rest of the OpenX Data Science and Engineering Teams as well as the support of Paul Ryan. We’re also grateful for the support of strategic cloud engineers Will Beebe and Leonid Kuligin, as well as Dillon Do, Iman Kafarah, and Kyle Winn from the GCP account management team. Many thanks to the TensorFlow (TFX, TF Serving), and Kubeflow Teams, particularly Robert Crowe and Anusha Ramesh for helping to bring this case study to life.

Read More

Accelerated inference on Arm microcontrollers with TensorFlow Lite for Microcontrollers and CMSIS-NN

A guest post by Fredrik Knutsson of Arm

arm logo

The MCU universe

Microcontrollers (MCUs) are the tiny computers that power our technological environment. There are over 30 billion of them manufactured every year, embedded in everything from household appliances to fitness trackers. If you’re in a house right now, there are dozens of microcontrollers all around you. If you drive a car, there are dozens riding with you on every drive. Using TensorFlow Lite for Microcontrollers (TFLM), developers can deploy TensorFlow models to many of these devices, enabling entirely new forms of on-device intelligence.

While ubiquitous, microcontrollers are designed to be inexpensive and energy efficient, which means they have small amounts of memory and limited processing power. A typical microcontroller might have a few hundred kilobytes of RAM, and a 32-bit processor running at less than 100 MHz. With advances in machine learning enabled by TFLM, it has become possible to run neural networks on these devices.

microcontroller

With minimal computational resources, it is important that microcontroller programs are optimized to run as efficiently as possible. This means making the most of the features of their microprocessor hardware, which requires carefully tuned application code.

Many of the microcontrollers used in popular products are built around Arm’s Cortex-M based processors, which are the industry leader in 32-bit microcontrollers, with more than 47 billion shipped. Arm’s open source CMSIS-NN library provides optimized implementations of common neural network functions that maximize performance on Cortex-M processors. This includes making use of DSP and M-Profile Vector Extension (MVE) instructions for hardware acceleration of operations such as matrix multiplication.

Benchmarks for key use cases

Arm’s engineers have worked closely with the TensorFlow team to develop optimized versions of the TensorFlow Lite kernels that use CMSIS-NN to deliver blazing fast performance on Arm Cortex-M cores. Developers using TensorFlow Lite can use these optimized kernels with no additional work, just by using the latest version of the library. Arm has made these optimizations in open source, and they are free and easy for developers to use today!

The following benchmarks show the performance uplift when using CMSIS-NN optimized kernels versus reference kernels for several key use cases featured in the TFLM example applications. The tests have been performed on an Arm Cortex-M4 based FPGA platform:

Table showing performance uplift when using CMSIS-NN kernels

The Arm Cortex-M4 processor supports DSP extensions, that enables the processor to execute DSP-like instructions for faster inference. To improve the inference performance even further, the new Arm Cortex-M55 processor supports MVE, also known as Helium technology.

Improving performance with CMSIS-NN

So far, the following optimized CMSIS-NN kernels have been integrated with TFLM:

 Table showing CMSIS-NN kernels integrated with TFLM

There will be regular updates to the CMSIS-NN library to expand the support of optimized kernels, where the key driver for improving support is that it should give a significant performance increase for a given use case. For discussion regarding kernel optimizations, a good starting point is to raise a ticket on the TensorFlow or CMSIS Github repository describing your use case.

Most of the optimizations are implemented specifically for 8-bit quantized (int8) operations, and this will be the focus of future improvements.

It’s easy to try the optimized kernels yourself by following the instructions that accompany the examples. For example, to build the person detection example for the SparkFun Edge with CMSIS-NN kernels, you can use the following command:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=sparkfun_edge OPTIMIZED_KERNEL_DIR=cmsis_nn person_detection_int8_bin

The latest version of the TensorFlow Lite Arduino library includes the CMSIS-NN optimizations, and includes all of the example applications, which are compatible with the Cortex-M4 based Arduino Nano 33 BLE Sense.

Next leap in neural processing

Looking ahead into 2021 we can expect a dramatic increase in neural processing from the introduction of devices including a microNPU (Neural Processing Unit) working alongside a microcontroller. These microNPUs are designed to accelerate ML inference within the constraints of embedded and IoT devices, with devices using the Arm Cortex-M55 MCU coupled with the new Ethos-U55 microNPU delivering up to a 480x performance increase compared to previous microcontrollers.

This unprecedented level of ML processing capability within smaller, power constrained devices will unlock a huge amount of innovation across a range of applications, from smart homes and cities to industrial, retail, and healthcare. The potential for innovation within each of these different areas is huge, with hundreds of sub segments and thousands of potential applications that will make a real difference to people’s lives.

Read More

Join us at TensorFlow Everywhere

Posted by Biswajeet Mallik, Program Manager

Developer communities have played a strong part in TensorFlow’s success over the years. Our 70+ user groups, 170+ ML GDEs, and 12 special interest groups all play a critical role in education, advancing the art of machine learning, and supporting ML communities around the world.

To continue this momentum, we’re excited to announce a new event series: TensorFlow Everywhere, a series of global events led by TensorFlow and machine learning communities around the world.

Events planned by our community leads will be specific to each locale (with talks in local languages). These events are for everyone from machine learning and data science beginners to advanced developers; check with your event organizers for details on what will be presented. Be sure to join as members of the TensorFlow team may be dropping in virtually to say hi and answer questions.

And after the event, you can stay in touch with the organizing communities for future events if you enjoyed and benefited from these sessions.

Find an event and join the conversation on social #TFEverywhere2021.

Read More

Leveraging TensorFlow-TensorRT integration for Low latency Inference

Posted by Jonathan Dekhtiar (NVIDIA), Bixia Zheng (Google), Shashank Verma (NVIDIA), Chetan Tekur (NVIDIA)

TensorFlow-TensorRT (TF-TRT) is an integration of TensorFlow and TensorRT that leverages inference optimization on NVIDIA GPUs within the TensorFlow ecosystem. It provides a simple API that delivers substantial performance gains on NVIDIA GPUs with minimal effort. The integration allows for leveraging of the optimizations that are possible in TensorRT while providing a fallback to native TensorFlow when it encounters segments of the model that are not supported by TensorRT.

In our previous blog on TF-TRT integration, we covered the workflow for TensorFlow 1.13 and earlier releases. This blog will introduce TensorRT integration in TensorFlow 2.x, and demonstrate a sample workflow with the latest API. Even if you are new to this integration, this blog contains all the information you need to get started. Using the TensorRT integration has shown to improve performance by 2.4X compared to native TensorFlow inference on Nvidia T4 GPUs.

TF-TRT Integration

When TF-TRT is enabled, in the first step, the trained model is parsed in order to partition the graph into TensorRT-supported subgraphs and unsupported subgraphs. Then each TensorRT-supported subgraph is wrapped in a single special TensorFlow operation (TRTEngineOp). In the second step, for each TRTEngineOp node, an optimized TensorRT engine is built. The TensorRT-unsupported subgraphs remain untouched and are handled by the TensorFlow runtime. This is illustrated in Figure 1.

TF-TRT allows for leveraging TensorFlow’s flexibility while also taking advantage of the optimizations that can be applied to the TensorRT supported subgraphs. Only portions of the graph are optimized and executed with TensorRT, and TensorFlow executes the remaining graph.

In the inference example shown in Figure 1, TensorFlow executes the Reshape Op and the Cast Op. Then TensorFlow passes the execution of the TRTEngineOp_0, the pre-built TensorRT engine, to TensorRT runtime.

An example of graph partitioning and building TRT engine in TF-TRT
Figure 1: An example of graph partitioning and building TRT engine in TF-TRT

Workflow

In this section, we will take a look at the typical TF-TRT workflow using an example.

Workflow diagram when performing inference in TensorFlow only, and in TensorFlow-TensorRT using a converted SavedModel
Figure 2: Workflow diagram when performing inference in TensorFlow only, and in TensorFlow-TensorRT using a converted SavedModel

Figure 2 shows a standard inference workflow in native TensorFlow and contrasts it with the TF-TRT workflow. The SavedModel format contains all the information required to share or deploy a trained model. In native TensorFlow, the workflow typically involves loading the saved model and running inference using TensorFlow runtime. In TF-TRT, there are a few additional steps involved, including applying TensorRT optimizations to the TensorRT supported subgraphs of the model, and optionally pre-building the TensorRT engines.

First, we create an object to hold the conversion parameters, including a precision mode. The precision mode is used to indicate the minimum precision (for example FP32, FP16 or INT8) that TF-TRT can use to implement the TensorFlow operations. Then we create a converter object which takes the conversion parameters and input from a saved model. Note that in TensorFlow 2.x, TF-TRT only supports models saved in the TensorFlow SavedModel format.

Next, when we call the converter convert() method, TF-TRT will convert the graph by replacing TensorRT compatible portions of the graph with TRTEngineOps. For better performance at runtime, the converter build() method can be used for creating the TensorRT execution engine ahead of time. The build() method requires the input data shapes to be known before the optimized TensorRT execution engines are built. If input data shapes are not known then TensorRT execution engine can be built at runtime when the input data is available. The TensorRT execution engine should be built on a GPU of the same device type as the one on which inference will be executed as the building process is GPU specific. For example, an execution engine built for a Nvidia A100 GPU will not work on a Nvidia T4 GPU.

Finally, the TF-TRT converted model can be saved to disk by calling the save method. The code corresponding to the workflow steps mentioned in this section are shown in the codeblock below:

from tensorflow.python.compiler.tensorrt import trt_convert as trt

# Conversion Parameters
conversion_params = trt.TrtConversionParams(
precision_mode=trt.TrtPrecisionMode.<FP32 or FP16>)

converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)

# Converter method used to partition and optimize TensorRT compatible segments
converter.convert()

# Optionally, build TensorRT engines before deployment to save time at runtime
# Note that this is GPU specific, and as a rule of thumb, we recommend building at runtime
converter.build(input_fn=my_input_fn)

# Save the model to the disk
converter.save(output_saved_model_dir)

As can be seen from the code example above, the build() method requires an input function corresponding to the shape of the input data. An example of an input function is shown below:

# input_fn: a generator function that yields input data as a list or tuple,
# which will be used to execute the converted signature to generate TensorRT
# engines. Example:
def my_input_fn():
# Let's assume a network with 2 input tensors. We generate 3 sets
# of dummy input data:
input_shapes = [[(1, 16), (2, 16)], # min and max range for 1st input list
[(2, 32), (4, 32)], # min and max range for 2nd list of two tensors
[(4, 32), (8, 32)]] # 3rd input list
for shapes in input_shapes:
# return a list of input tensors
yield [np.zeros(x).astype(np.float32) for x in shapes]

Support for INT8

Compared to FP32 and FP16, INT8 requires additional calibration data to determine the best quantization thresholds. When the precision mode in the conversion parameter is INT8, we need to provide an input function to the convert() method call. This input function is similar to the input function provided to the build() method. In addition, the calibration data generated by the input function passed to the convert() method should generate data that are statistically similar to the actual data seen during inference.

from tensorflow.python.compiler.tensorrt import trt_convert as trt

conversion_params = trt.TrtConversionParams(
precision_mode=trt.TrtPrecisionMode.INT8)

converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_saved_model_dir,
conversion_params=conversion_params)

# requires some data for calibration
converter.convert(calibration_input_fn=my_input_fn)

# Optionally build TensorRT engines before deployment.
# Note that this is GPU specific, and as a rule of thumb we recommend building at runtime
converter.build(input_fn=my_input_fn)

converter.save(output_saved_model_dir)

Example: ResNet-50

The rest of this blog will show the workflow of taking a TensorFlow 2.x ResNet-50 model, training it, saving it, optimizing it with TF-TRT and finally deploying it for inference. We will also compare inference throughputs using TensorFlow native vs TF-TRT in three precision modes, FP32, FP16, and INT8.

Prerequisites for the example :


Training ResNet-50 using the TensorFlow 2.x container:

First, the latest release of the ResNet-50 model needs to be downloaded from the TensorFlow github repository:

# Adding the git remote and fetch the existing branches
$ git clone --depth 1 https://github.com/tensorflow/models.git .

# List the files and directories present in our working directory
$ ls -al

rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 ./
rwxrwxr-x user user 4 KiB Wed Sep 30 15:30:45 2020 ../
rw-rw-r-- user user 337 B Wed Sep 30 15:31:05 2020 AUTHORS
rw-rw-r-- user user 1015 B Wed Sep 30 15:31:05 2020 CODEOWNERS
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 community/
rw-rw-r-- user user 390 B Wed Sep 30 15:31:05 2020 CONTRIBUTING.md
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:15 2020 .git/
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 .github/
rw-rw-r-- user user 1 KiB Wed Sep 30 15:31:05 2020 .gitignore
rw-rw-r-- user user 1 KiB Wed Sep 30 15:31:05 2020 ISSUES.md
rw-rw-r-- user user 11 KiB Wed Sep 30 15:31:05 2020 LICENSE
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 official/
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:05 2020 orbit/
rw-rw-r-- user user 3 KiB Wed Sep 30 15:31:05 2020 README.md
rwxrwxr-x user user 4 KiB Wed Sep 30 15:31:06 2020 research/

As noted in the earlier section, for this example we will be using the latest TensorFlow container available in the Docker repository. The user does not need any additional installation steps as TensorRT integration is already included in the container. The steps to pull the container and launch it are as follows:

$ docker pull tensorflow/tensorflow:latest-gpu

# Please ensure that the Nvidia Container Toolkit is installed before running the following command
$ docker run -it --rm
--gpus="all"
--shm-size=2g --ulimit memlock=-1 --ulimit stack=67108864
--workdir /workspace/
-v "$(pwd):/workspace/"
-v "</path/to/save/data/>:/data/" # This is the path that will hold the training data
tensorflow/tensorflow:latest-gpu

From inside the container, we can then verify that we have access to the relevant files and the Nvidia GPU we would like to target:

# Let's first test that we can access the ResNet-50 code that we previously downloaded
$ ls -al
drwxrwxr-x 8 1000 1000 4096 Sep 30 22:31 .git
drwxrwxr-x 3 1000 1000 4096 Sep 30 22:31 .github
-rw-rw-r-- 1 1000 1000 1104 Sep 30 22:31 .gitignore
-rw-rw-r-- 1 1000 1000 337 Sep 30 22:31 AUTHORS
-rw-rw-r-- 1 1000 1000 1015 Sep 30 22:31 CODEOWNERS
-rw-rw-r-- 1 1000 1000 390 Sep 30 22:31 CONTRIBUTING.md
-rw-rw-r-- 1 1000 1000 1115 Sep 30 22:31 ISSUES.md
-rw-rw-r-- 1 1000 1000 11405 Sep 30 22:31 LICENSE
-rw-rw-r-- 1 1000 1000 3668 Sep 30 22:31 README.md
drwxrwxr-x 2 1000 1000 4096 Sep 30 22:31 community
drwxrwxr-x 12 1000 1000 4096 Sep 30 22:31 official
drwxrwxr-x 3 1000 1000 4096 Sep 30 22:31 orbit
drwxrwxr-x 23 1000 1000 4096 Sep 30 22:31 research

# Let's verify we can see our GPUs:
$ nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.XX.XX Driver Version: 450.XX.XX CUDA Version: 11.X |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:1A:00.0 Off | Off |
| 38% 52C P8 14W / 70W | 1MiB / 16127MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

We can now start training ResNet-50. To avoid spending hours training a deep learning model, this article will use the smaller MNIST dataset. However, the workflow will not change with a more state-of-the-art dataset like ImageNet.

# Install dependencies
$ pip install tensorflow_datasets tensorflow_model_optimization

# Download MNIST data and Train
$ python -m "official.vision.image_classification.mnist_main"
--model_dir=./checkpoints
--data_dir=/data
--train_epochs=10
--distribution_strategy=one_device
--num_gpus=1
--download

# Let’s verify that we have the trained model saved on our machine.
$ ls -al checkpoints/

-rw-r--r-- 1 root root 87 Sep 30 22:34 checkpoint
-rw-r--r-- 1 root root 6574829 Sep 30 22:34 model.ckpt-0001.data-00000-of-00001
-rw-r--r-- 1 root root 819 Sep 30 22:34 model.ckpt-0001.index
[...]
-rw-r--r-- 1 root root 6574829 Sep 30 22:34 model.ckpt-0010.data-00000-of-00001
-rw-r--r-- 1 root root 819 Sep 30 22:34 model.ckpt-0010.index
drwxr-xr-x 4 root root 4096 Sep 30 22:34 saved_model
drwxr-xr-x 3 root root 4096 Sep 30 22:34 train
drwxr-xr-x 2 root root 4096 Sep 30 22:34 validation


Obtaining a SavedModel to be used by TF-TRT

After training, Google’s ResNet-50 code exports the model in the SavedModel format at the following path: checkpoints/saved_model/.

The following sample code can be used as a reference in order to export your own trained model as a TensorFlow SavedModel.

import numpy as np

import tensorflow as tf
from tensorflow import keras

def get_model():
# Create a simple model.
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer="adam", loss="mean_squared_error")
return model

model = get_model()

# Train the model.
test_input = np.random.random((128, 32))
test_target = np.random.random((128, 1))
model.fit(test_input, test_target)

# Calling `save('my_model')` creates a SavedModel folder `my_model`.
model.save("my_model")

We can verify that the SavedModel generated by Google’s ResNet-50 script is readable and correct:

$ ls -al checkpoints/saved_model

drwxr-xr-x 2 root root 4096 Sep 30 22:49 assets
-rw-r--r-- 1 root root 118217 Sep 30 22:49 saved_model.pb
drwxr-xr-x 2 root root 4096 Sep 30 22:49 variables

$ saved_model_cli show --dir checkpoints/saved_model/ --tag_set serve --signature_def serving_default

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

The given SavedModel SignatureDef contains the following input(s):
inputs['input_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 28, 28, 1)
name: serving_default_input_1:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 10)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

Now that we have verified that our SavedModel has been properly saved, we can proceed with loading it with TF-TRT for inference.


Inference

ResNet-50 Inference using TF-TRT

In this section, we will go over the steps for deploying the saved ResNet-50 model on the NVIDIA GPU using TF-TRT. As previously described, we first convert a SavedModel into a TF-TRT model using the convert method and then load the model.

# Convert the SavedModel
converter = trt.TrtGraphConverterV2(input_saved_model_dir=path)
converter.convert()

# Save the converted model
converter.save(converted_model_path)

# Load converted model and infer
model = tf.saved_model.load(converted_model_path)
func = root.signatures['serving_default']
output = func(input_tensor)

For simplicity, we will use a script to perform inference (tf2_inference.py). We will download the script from github.com and put it in the working directory “/workspace/” of the same docker container as before. After this, we can execute the script:

$ wget https://raw.githubusercontent.com/tensorflow/tensorrt/master/tftrt/blog_posts/Leveraging%20TensorFlow-TensorRT%20integration%20for%20Low%20latency%20Inference/tf2_inference.py

$ ls
AUTHORS CONTRIBUTING.md LICENSE checkpoints data orbit tf2_inference.py
CODEOWNERS ISSUES.md README.md community official research

$ python tf2_inference.py --use_tftrt_model --precision fp16

=========================================
Inference using: TF-TRT …
Batch size: 512
Precision: fp16
=========================================

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TrtConversionParams(rewriter_config_template=None, max_workspace_size_bytes=8589934592, precision_mode='FP16', minimum_segment_size=3, is_dynamic_op=True, maximum_cached_engines=100, use_calibration=True, max_batch_size=512, allow_build_at_runtime=True)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


Processing step: 0100 ...
Processing step: 0200 ...
[...]
Processing step: 9900 ...
Processing step: 10000 ...

Average step time: 2.1 msec
Average throughput: 244248 samples/sec

Similarly, we can run inference for INT8, and FP32

$ python tf2_inference.py --use_tftrt_model --precision int8

$ python tf2_inference.py --use_tftrt_model --precision fp32

Inference using native TensorFlow (GPU) FP32

You can also run the unmodified SavedModel without any TF-TRT acceleration.

$ python tf2_inference.py --use_native_tensorflow

=========================================
Inference using: Native TensorFlow …
Batch size: 512
=========================================

Processing step: 0100 ...
Processing step: 0200 ...
[...]
Processing step: 9900 ...
Processing step: 10000 ...

Average step time: 4.1 msec
Average throughput: 126328 samples/sec

This run was executed with a NVIDIA T4 GPU. The same workflow will work on any NVIDIA GPU.


Comparing Native Tensorflow 2.x performance vs TF-TRT for Inference

Making minimal code changes to take advantage of TF-TRT can result in a significant performance boost. For example, using the inference script in this blog, with a batch-size of 512 on an NVIDIA T4 GPU, we observe almost 2x speedup with TF-TRT FP16, and a 2.4x speedup with TF-TRT INT8 over native TensorFlow. The amount of speedup obtained may differ depending on various factors like the model used, the batch size, the size and format of images in the dataset, and any CPU bottlenecks.

In conclusion, in this blog we show the acceleration provided by TF-TRT. Additionally, with TF-TRT we can use the full TensorFlow Python API and interactive environments like Jupyter Notebooks or Google Colab.

Supported Operators

The TF-TRT user guide lists operators that are supported in TensorRT-compatible subgraphs. Operators outside this list will be executed by the native TensorFlow runtime.

We encourage you to try it yourself and if you encounter problems, please open an issue here.

Read More

Custom object detection in the browser using TensorFlow.js

A guest post by Hugo Zanini, Machine Learning Engineer

Object detection is the task of detecting where in an image an object is located and classifying every object of interest in a given image. In computer vision, this technique is used in applications such as picture retrieval, security cameras, and autonomous vehicles.

One of the most famous families of Deep Convolutional Neural Networks (DNN) for object detection is the YOLO (You Only Look Once).

In this post, we are going to develop an end-to-end solution using TensorFlow to train a custom object-detection model in Python, then put it into production, and run real-time inferences in the browser through TensorFlow.js.

This post is going to be divided into four steps, as follows:

Object detection pipeline

Prepare the data

The first step to train a great model is to have good quality data. When developing this project, I did not find a suitable (and small enough) object detection dataset, so I decided to create my own.

I looked around and saw a Kangaroo sign that I have in my bedroom — a souvenir that I bought to remember my Aussie days. So I decided to build a Kangaroo detector.

To build my dataset, I downloaded 350 kangaroo images from an image search for kangaroos and labeled all of them by hand using the LabelImg application. As we can have more than one animal per image, the process resulted in 520 labeled kangaroos.

Labelling example

In that case, I chose just one class, but the software can be used to annotate multiple classes as well. It’s going to generate an XML file per image (Pascal VOC format) that contains all annotations and bounding boxes.

<annotation>
<folder>images</folder>
<filename>kangaroo-0.jpg</filename>
<path>/home/hugo/Documents/projects/tfjs/dataset/images/kangaroo-0.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>3872</width>
<height>2592</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>kangaroo</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>60</xmin>
<ymin>367</ymin>
<xmax>2872</xmax>
<ymax>2399</ymax>
</bndbox>
</object>
</annotation>

XML Annotation example

To facilitate the conversion to TF.record format (below), I then converted the XML of the program above into two CSV files containing the data already split in train and test (80%-20%). These files have 9 columns:

  • filename: Image name
  • width: Image width
  • height: Image height
  • class: Image class (kangaroo)
  • xmin: Minimum bounding box x coordinate value
  • ymin: Minimum bounding box y coordinate value
  • xmax: Maximum value of the x coordinate of the bounding box
  • ymax: Maximum value of the y coordinate of the bounding box
  • source: Image source

Using LabelImg makes it easy to create your own dataset, but feel free to use my kangaroo dataset, I’ve uploaded it on Kaggle:

Kangaroo Dataset

Training the model

With a good dataset, it’s time to think about the model.TensorFlow 2 provides an Object Detection API that makes it easy to construct, train, and deploy object detection models. In this project, we’re going to use this API and train the model using a Google Colaboratory Notebook. The remainder of this section explains how to set up the environment, the model selection, and training. If you want to jump straight to the Colab Notebook, click here.

Setting up the environment

Create a new Google Colab notebook and select a GPU as hardware accelerator:

 Runtime > Change runtime type > Hardware accelerator: GPU 

Clone, install, and test the TensorFlow Object Detection API:

Getting and processing the data

As mentioned before, the model is going to be trained using the Kangaroo dataset on Kaggle. If you want to use it as well, it’s necessary to create a user, go into the account section of Kaggle, and get an API Token:

Getting an API Token

Then, you’re ready to download the data:

Now, it’s necessary to create a labelmap file to define the classes that are going to be used. Kangaroo is the only one, so right-click in the File section on Google Colab and create a New file named labelmap.pbtxt as follows:

 item {
name: "kangaroo"
id: 1
}

The last step is to convert the data into a sequence of binary records so that they can be fed into Tensorflow’s object detection API. To do so, transform the data into the TFRecord format using the generate_tf_records.py script available in the Kangaroo Dataset:

Choosing the model

We’re ready to choose the model that’s going to be the Kangaroo Detector. TensorFlow 2 provides 40 pre-trained detection models on the COCO 2017 Dataset. This collection is the TensorFlow 2 Detection Model Zoo and can be accessed here.

Every model has a Speed, Mean Average Precision(mAP) and Output. Generally, a higher mAP implies a lower speed, but as this project is based on a one-class object detection problem, the faster model (SSD MobileNet v2 320×320) should be enough.

Besides the Model Zoo, TensorFlow provides a Models Configs Repository as well. There, it’s possible to get the configuration file that has to be modified before the training. Let’s download the files:

Configure training

As mentioned before, the downloaded weights were pre-trained on the COCO 2017 Dataset, but the focus here is to train the model to recognize one class so these weights are going to be used only to initialize the network — this technique is known as transfer learning, and it’s commonly used to speed up the learning process.

From now, what has to be done is to set up the mobilenet_v2.config file, and start the training. I highly recommend reading the MobileNetV2 paper (Sandler, Mark, et al. – 2018) to get the gist of the architecture.

Choosing the best hyperparameters is a task that requires some experimentation. As the resources are limited in the Google Colab, I am going to use the same batch size as the paper, set a number of steps to get a reasonably low loss, and leave all the other values as default. If you want to try something more sophisticated to find the hyperparameters, I recommend Keras Tuner – an easy-to-use framework that applies Bayesian Optimization, Hyperband, and Random Search algorithms.

With the parameters set, start the training:

To identify how well the training is going, we use the loss value. Loss is a number indicating how bad the model’s prediction was on the training samples. If the model’s prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples (Descending into ML: Training and Loss | Machine Learning Crash Course).

From the logs, it’s possible to see a downward trend in the values so we say that “The model is converging”. In the next section, we’re going to plot these values for all training steps and the trend will be even clearer.

The model took around 4h to train (with Colab GPU), but by setting different parameters, you can make the process faster or slower. Everything depends on the number of classes you are using and your Precision/Recall target. A highly accurate network that recognizes multiple classes will take more steps and require more detailed parameters tuning.

Validate the model

Now let’s evaluate the trained model using the test data:

The evaluation was done in 89 images and provides three metrics based on the COCO detection evaluation metrics: Precision, Recall and Loss.

The Recall measures how good the model is at hitting the positive class, That is, from the positive samples, how many did the algorithm get right?

Recall

Precision defines how much you can rely on the positive class prediction: From the samples that the model said were positive, how many actually are?

Precision

Setting a practical example: Imagine we have an image containing 10 kangaroos, our model returned 5 detections, being 3 real kangaroos (TP = 3, FN =7) and 2 wrong detections (FP = 2). In that case, we have a 30% recall (the model detected 3 out of 10 kangaroos in the image) and a 60% precision (from the 5 detections, 3 were correct).

The precision and recall were divided by Intersection over Union (IoU) thresholds. The IoU is defined as the area of the intersection divided by the area of the union of a predicted bounding box (B) to a ground-truth box (B)(Zeng, N. – 2018):

Intersection over Union

For simplicity, it’s possible to consider that the IoU thresholds are used to determine whether a detection is a true positive(TP), a false positive(FP) or a false negative (FN). See an example below:

IoU threshold examples

With these concepts in mind, we can analyze some of the metrics we got from the evaluation. From the TensorFlow 2 Detection Model Zoo, the SSD MobileNet v2 320×320 has an mAP of 0.202. Our model presented the following average precisions (AP) for different IoUs:

 
AP@[IoU=0.50:0.95 | area=all | maxDets=100] = 0.222
AP@[IoU=0.50 | area=all | maxDets=100] = 0.405
AP@[IoU=0.75 | area=all | maxDets=100] = 0.221

That’s pretty good! And we can compare the obtained APs with the SSD MobileNet v2 320×320 mAP as from the COCO Dataset documentation:

We make no distinction between AP and mAP (and likewise AR and mAR) and assume the difference is clear from context.

The Average Recall(AR) was split by the max number of detection per image (1, 10, 100). When we have just one kangaroo per image, the recall is around 30% while when we have up to 100 kangaroos it is around 51%. These values are not that good but are reasonable for the kind of problem we’re trying to solve.

 
(AR)@[ IoU=0.50:0.95 | area=all | maxDets= 1] = 0.293
(AR)@[ IoU=0.50:0.95 | area=all | maxDets= 10] = 0.414
(AR)@[ IoU=0.50:0.95 | area=all | maxDets=100] = 0.514

The Loss analysis is very straightforward, we’ve got 4 values:

 
INFO:tensorflow: + Loss/localization_loss: 0.345804
INFO:tensorflow: + Loss/classification_loss: 1.496982
INFO:tensorflow: + Loss/regularization_loss: 0.130125
INFO:tensorflow: + Loss/total_loss: 1.972911

The localization loss computes the difference between the predicted bounding boxes and the labeled ones. The classification loss indicates whether the bounding box class matches with the predicted class. The regularization loss is generated by the network’s regularization function and helps to drive the optimization algorithm in the right direction. The last term is the total loss and is the sum of three previous ones.

Tensorflow provides a tool to visualize all these metrics in an easy way. It’s called TensorBoard and can be initialized by the following command:

 
%load_ext tensorboard
%tensorboard --logdir '/content/training/'

This is going to be shown, and you can explore all training and evaluation metrics.

Tensorboard — Loss

In the tab IMAGES, it’s possible to find some comparisons between the predictions and the ground truth side by side. A very interesting resource to explore during the validation process as well.

Tensorboard — Testing images

Exporting the model

Now that the training is validated, it’s time to export the model. We’re going to convert the training checkpoints to a protobuf (pb) file. This file is going to have the graph definition and the weights of the model.

As we’re going to deploy the model using TensorFlow.js and Google Colab has a maximum lifetime limit of 12 hours, let’s download the trained weights and save them locally. When running the command files.download(‘/content/saved_model.zip”), the colab will prompt the file automatically.

If you want to check if the model was saved properly, load, and test it. I’ve created some functions to make this process easier so feel free to clone the inferenceutils.py file from my GitHub to test some images.

Everything is working well, so we’re ready to put the model in production.

Deploying the model

The model is going to be deployed in a way that anyone can open a PC or mobile camera and perform inferences in real-time through a web browser. To do that, we’re going to convert the saved model to the Tensorflow.js layers format, load the model in a javascript application and make everything available on Glitch.

Converting the model

At this point, you should have something similar to this structure saved locally:

 
├── inference-graph
│ ├── saved_model
│ │ ├── assets
│ │ ├── saved_model.pb
│ │ ├── variables
│ │ ├── variables.data-00000-of-00001
│ │ └── variables.index

Before we start, let’s create an isolated Python environment to work in an empty workspace and avoid any library conflict. Install virtualenv and then open a terminal in the inference-graph folder and create and activate a new virtual environment:

 
virtualenv -p python3 venv
source venv/bin/activate

Install the TensorFlow.js converter:

  pip install tensorflowjs[wizard]

Start the conversion wizard:

  tensorflowjs_wizard

Now, the tool will guide you through the conversion, providing explanations for each choice you need to make. The image below shows all the choices that were made to convert the model. Most of them are the standard ones, but options like the shard sizes and compression can be changed according to your needs.

To enable the browser to cache the weights automatically, it’s recommended to split them into shard files of around 4MB. To guarantee that the conversion is going to work, don’t skip the op validation as well, not all TensorFlow operations are supported so some models can be incompatible with TensorFlow.js — See this list for which ops are currently supported.

Model conversion using Tensorflow.js Converter (Full resolution image here

If everything worked well, you’re going to have the model converted to the Tensorflow.js layers format in the web_modeldirectory. The folder contains a model.json file and a set of sharded weights files in a binary format. The model.json has both the model topology (aka “architecture” or “graph”: a description of the layers and how they are connected) and a manifest of the weight files (Lin, Tsung-Yi, et al).

  
└ web_model
├── group1-shard1of5.bin
├── group1-shard2of5.bin
├── group1-shard3of5.bin
├── group1-shard4of5.bin
├── group1-shard5of5.bin
└── model.json

Configuring the application

The model is ready to be loaded in javascript. I’ve created an application to perform inferences directly from the browser. Let’s clone the repository to figure out how to use the converted model in real-time. This is the project structure:

 
├── models
│ └── kangaroo-detector
│ ├── group1-shard1of5.bin
│ ├── group1-shard2of5.bin
│ ├── group1-shard3of5.bin
│ ├── group1-shard4of5.bin
│ ├── group1-shard5of5.bin
│ └── model.json
├── package.json
├── package-lock.json
├── public
│ └── index.html
├── README.MD
└── src
├── index.js
└── styles.css

For the sake of simplicity, I already provide a converted kangaroo-detector model in the models folder. However, let’s put the web_model generated in the previous section in the models folder and test it.

The first thing to do is to define how the model is going to be loaded in the function load_model (lines 10–15 in the file src>index.js). There are two choices.

The first option is to create an HTTP server locally that will make the model available in a URL allowing requests and be treated as a REST API. When loading the model, TensorFlow.js will do the following requests:

 
GET /model.json
GET /group1-shard1of5.bin
GET /group1-shard2of5.bin
GET /group1-shard3of5.bin
GET /group1-shardo4f5.bin
GET /group1-shardo5f5.bin

If you choose this option, define the load_model function as follows:

  async function load_model() {
// It's possible to load the model locally or from a repo
// You can choose whatever IP and PORT you want in the "http://127.0.0.1:8080/model.json" just set it before in your https server
const model = await loadGraphModel("http://127.0.0.1:8080/model.json");
//const model = await loadGraphModel("https://raw.githubusercontent.com/hugozanini/TFJS-object-detection/master/models/web_model/model.json");
return model;
}

Then install the http-server:

  npm install http-server -g

Go to models > web_model and run the command below to make the model available at http://127.0.0.1:8080 . This a good choice when you want to keep the model weights in a safe place and control who can request inferences to it. The -c1 parameter is added to disable caching, and the –cors flag enables cross origin resource sharing allowing the hosted files to be used by the client side JavaScript for a given domain.

  http-server -c1 --cors .

Alternatively you can upload the model files somewhere, in my case, I chose my own Github repo and referenced to the model.json URL in the load_model function:

 

async function load_model() {
// It's possible to load the model locally or from a repo
//const model = await loadGraphModel("http://127.0.0.1:8080/model.json");
const model = await loadGraphModel("https://raw.githubusercontent.com/hugozanini/TFJS-object-detection/master/models/web_model/model.json");
return model;
}

This is a good option because it gives more flexibility to the application and makes it easier to run on some platform as Glitch.

Running locally

To run the app locally, install the required packages:

 npm install
 And start:
 npm start

The application is going to run at http://localhost:3000 and you should see something similar to this:

Application running locally

The model takes from 1 to 2 seconds to load and, after that, you can show kangaroos images to the camera and the application is going to draw bounding boxes around them.

Publishing in Glitch

Glitch is a simple tool for creating web apps where we can upload the code and make the application available for everyone on the web. Uploading the model files in a GitHub repo and referencing to them in the load_model function, we can simply log into Glitch, click on New project > Import from Github and select the app repository.

Wait some minutes to install the packages and your app will be available in a public URL. Click on Show > In a new window and a tab will be open. Copy this URL and past it in any web browser (PC or Mobile) and your object detection will be ready to run. See some examples in the video below:

Running the model on different devices

First, I did a test showing a kangaroo sign to verify the robustness of the application. It showed that the model is focusing specifically on the kangaroo features and did not specialize in irrelevant characteristics that were present in many images, such as pale colors or shrubs.

Then, I opened the app on my mobile and showed some images from the test set. The model runs smoothly and identifies most of the kangaroos. If you want to test my live app, it is available here (glitch takes some minutes to wake up).

Besides the accuracy, an interesting part of these experiments is the inference time — everything runs in real-time in the browser via JavaScript. Good object detection models running in the browser and using few computational resources is a must in many applications, mostly in industry. Putting the Machine Learning model on the client-side means cost reduction and safer applications as user privacy is preserved as there is no need to send the information to any server to perform the inference.

Next steps

Object detection in the browser can solve a lot of real-world problems and I hope this article will serve as a basis for new projects involving Computer Vision, Python, TensorFlow and Javascript.

As the next steps, I’d like to make more detailed training experiments. Due to the lack of resources, I could not try many different parameters and I’m sure that there is a lot of room for improvements in the model.

I’m more focused on the models’ training, but I’d like to see a better user interface for the app. If someone is interested in contributing to the project, feel free to create a pull request in the project repo. It will be nice to make a more user-friendly application.

If you have any questions or suggestions you can reach me out on Linkedin. Thanks for reading!

Read More

ML Metadata: Version Control for ML

Posted by Ben Mathes and Neoklis Polyzotis, on behalf of the TFX Team

When you write code, you need version control to keep track of it. What’s the ML equivalent of version control? If you’re building production ML systems, you need to be able to answer questions like these:

  • Which dataset was this model trained on?
  • What hyperparameters were used?
  • Which pipeline was used to create this model?
  • Which version of TensorFlow (and other libraries) were used to create this model?
  • What caused this model to fail?
  • What version of this model was last deployed?

Engineers at Google have learned, through years of hard-won experience, that this history and lineage of ML artifacts is far more complicated than a simple, linear log. You use Git (or similar) to track your code; you need something to track your models, datasets, and more. Git, for example, may simplify your life a lot, but under the hood there’s a graph of many things! The complexity of ML code and artifacts like models, datasets, and much more requires a similar approach.

That’s why we built Machine Learning Metadata (MLMD). It’s a library to track the full lineage of your entire ML workflow. Full lineage is all the steps from data ingestion, data preprocessing, validation, training, evaluation, deployment, and so on. MLMD is a standalone library, and also comes integrated in TensorFlow Extended. There’s also a demo notebook to see how you can integrate MLMD into your ML infrastructure today.

ML Metadata icon
Beyond versioning your model, ML Metadata captures the full lineage of the training process, including the dataset, hyperparameters, and software dependencies.

Here’s how MLMD can help you:

  • If you’re a ML Engineer: You can use MLMD to trace bad models back to their dataset, or trace from a bad dataset to the models you trained on it, and so on.
  • If you’re working in ML infrastructure: You can use MLMD to record the current state of your pipeline and enable event-based orchestration. You can also enable optimizations like skipping a step if the inputs and code are the same, memoizing steps in your pipelines. You can integrate MLMD into your training system so it automatically creates logs for querying later. We’ve found that this auto-logging of the full lineage as a side effect of training is the best way to use MLMD. Then you have the full history without extra effort.

MLMD is more than a TFX research project. It’s a key foundation to multiple internal MLOps solutions at Google. Furthermore, Google Cloud integrates tools like MLMD into its core MLOps platform:

The foundation of all these new services is our new ML Metadata Management service in AI Platform. This service lets AI teams track all the important artifacts and experiments they run, providing a curated ledger of actions and detailed model lineage. This will enable customers to determine model provenance for any model trained on AI Platform for debugging, audit, or collaboration. AI Platform Pipelines will automatically track artifacts and lineage and AI teams can also use the ML Metadata service directly for custom workloads, artifact and metadata tracking.

Want to know where your models come from? What training data was used? Did anyone else train a model on this dataset already, and was their performance better? Are there any tainted datasets we need to clean up after?

If you want to answer these questions for your users, check out MLMD on github, as a part of TensorFlow Extended, or in our demo notebook.

Read More

Join the TensorFlow Special Interest Groups (SIGs)

Join the TensorFlow Special Interest Groups (SIGs)

Posted by Joana Carrasqueira, TensorFlow Program Manager and Thea Lamkin, Open Source Program Manager, in collaboration with TensorFlow SIG Leads.

TensorFlow SIGs (Special Interest Groups) organize community contributions to key parts of the TensorFlow ecosystem, and enable community members to contribute and maintain new features in important areas.

SIG leads and members work together to build and support important TensorFlow use cases, and are a vital part of our open source community. It all started with the SIG Build, and we now have 13 Active SIGs, with more on the way.

In this article, you’ll learn about the SIGs that exist today, and how you can get involved. Many SIGs are led by members of the open source community, from industry collaborators to Machine Learning Google Developer Experts (ML GDEs). TensorFlow’s success is due in large part to the hard work and contributions of our vibrant community. We welcome contributors to join the SIGs working on the parts of TensorFlow’s ecosystem they are most excited to collaborate on. Here is an overview of the SIGs and their areas of focus, contributed by their leads:

SIG Addons

In a fast-moving field like Machine Learning, there are many new developments that cannot be integrated into core TensorFlow. SIG Addons was created to tackle this problem by maintaining a repository of bleeding edge contributions that conform to well-established API patterns, but implement new functionality not available in core TensorFlow and adopted some of the parts of tf.contrib.

To contribute to TensorFlow Addons, join the conversation at our monthly meeting.

SIG Build

Started as a forum for development topics like new architecture support and packaging improvements, SIG Build grew to a discussion center dedicated to building, testing, packaging, and distributing TensorFlow that bridges internal and external TensorFlow development. The goal of this group is to ensure TensorFlow is a good citizen in the wider OSS ecosystem (Python, C++, Linux, Windows, MacOS).

To contribute to TensorFlow Build, join the conversation at our monthly meeting.

SIG IO

SIG IO is a repository of dataset, streaming, and file systems extension support for TensorFlow. Recent accomplishments include the release of v.0.13.0 (with TF 2.2), added Video Studio Code tutorial, and added AVIF imagine file format support.

To contribute to TensorFlow IO, join the conversation at our monthly meeting.

SIG JVM

SIG JVM provides comprehensive support for building, training and serving TensorFlow models on top of Java Virtual Machine (JVM). This group focuses on using Java but also includes other popular JVM languages, like Kotlin and Scala. Some of the recent accomplishments include adding n-dimensional data access in native memory and the creation of a high-level API similar to Keras for building models.

To contribute to TensorFlow JVM, join the conversation at our monthly meeting.

SIG Keras

This group focuses on care and feeding of the tf.Keras API (new features, docs, guides), Keras Tuner, AutoKeras, and Keras applications.

To contribute to TensorFlow Keras, join the conversation at our bi-monthly meeting.

SIG Micro

SIG Micro is a discussion and collaboration group around running TensorFlow models on Microcontrontrollers, DSPs, and other highly resource constrained embedded devices.

To contribute to TensorFlow Micro, join the conversation at our monthly meeting.

SIG MLIR

The goal of this group is to foster an open discussion on high performance compilers and how optimization techniques can be applied to TensorFlow graphs. Ultimately this project aims to create a common intermediate representation that reduces the cost of new hardware and improves usability for existing TensorFlow users.

To contribute to TensorFlow MLIR, join the conversation at our monthly meeting.

SIG Networking

SIG Networking aims to add support for different network fabrics and protocols. The group evaluates proposals and designs in this area and maintains code in the tensorflow/networking repository. Join us, if you are interested in improving TensorFlow on different types of networks or underlying drivers and libraries!

To contribute to TensorFlow Networking, join the conversation at our monthly meeting.

SIG Reccomenders (New!)

SIG Recommenders was created to drive discussion and collaborations around using TensorFlow for large scale recommendation systems (Recommenders). We hope to encourage sharing of best practices in the industry, get consensus and product feedback to help evolve TensorFlow support for recommenders, and facilitate the contributions of RFCs and PRs in this domain.

To contribute to TensorFlow Recommenders, join the mailing list to get updates about our upcoming meetings.

SIG Rust

SIG Rust was created for users and contributors on the TensorFlow Rust binding project. It provides stable support for running models created in other languages, and can both train and evaluate.

To contribute to TensorFlow Rust, join the conversation at our monthly meeting.

SIG Swift

The purpose of SIG Swift is to host design reviews, discuss upcoming API changes, share project roadmap, and encourage collaboration in the Swift for TensorFlow (S4TF) open-source community.

To contribute to TensorFlow Swift, join the conversation at our monthly meeting.

SIG Tensorboard

SIG TensorBoard was created for discussion and collaboration around TensorBoard, the visualization tool for TensorFlow. The goal of this group is to engage the TensorBoard user and developer community and get feedback; encourage development of new TensorBoard plugins; promote collaboration ML via TensorBoard.dev; and encourage community improvements to TensorBoard.

To contribute to TensorFlow TensorBoard, join the conversation at our monthly meeting.

SIG TF.js (New!)

SIG TF.js was created to facilitate community-contributed components to tensorflow/tfjs (and potential community-maintained libraries). The core TensorFlow.js engineering team has been working on building the infrastructure and tooling to enable ML to run in JavaScript powered applications, and has an active contributor community of individual developers, GDEs, and enterprise users. We want to accelerate the community involvement in the project to help continue meet the needs and help drive new directions for the project.

To contribute to TensorFlow TF.js, join the conversation at our monthly meeting.

Thank you to our SIG Leads for their work and leadership:

Picture: 1st TensorFlow Contributor Summit, Santa Clara, 2019.
Picture: 1st TensorFlow Contributor Summit, Santa Clara, 2019.

Sean Morgan, Tzu-Wei Sung | SIG Addons

Jason Zaman, Austin Anderson | SIG Build

Yong Tang, Anthony Dmitriev, Derek Murray | SIG IO

Karl Lessard, Adam Pocock, Rajagopal Ananthanarayanan | SIG JVM

Francois Chollet | SIG Keras

Neil Tan, Pete Warden | SIG Micro

Tatiana Shpeisman, Pankaj Kanwar | SIG MLIR

Bairen Yi, Jeroen Bedorf | SIG Networking

Bo Liu, Haidong Rong, Yong Li, Wei Wei | SIG Recommenders

Adam Crume | SIG Rust

Ewa Matejska | SIG Swift

Mani Varadarajan, Gal Oshri | SIG TensorBoard

Sandeep Gupta, Ping Yu | SIG TF.js

Read More

InSpace: A new video conferencing platform that uses TensorFlow.js for toxicity filters in chat

InSpace: A new video conferencing platform that uses TensorFlow.js for toxicity filters in chat

A guest post by Narine Hall, Assistant Professor at Champlain College, CEO of InSpace

InSpace is a communication and virtual learning platform that gives people the ability to interact, collaborate, and educate in familiar physical ways, but in a virtual space. InSpace is built by educators for educators, putting education at the center of the platform.

  • InSpace is designed to mirror the fluid, personal, and interactive nature of a real classroom. It allows participants to break free of “Brady Bunch” boxes in existing conference solutions to create a fun, natural, and engaging environment that fosters interaction and collaboration.
  • Each person is represented in a video circle that can freely move around the space. When people are next to each other, they can hear and engage in conversation, and as they move away, the audio fades, allowing them to find new conversations.
  • As participants zoom out, they can see the entire space, which provides visual social cues. People can seamlessly switch from class discussions to private conversations or group/team-based work, similar to the format of a lab or classroom.
  • Teachers can speak to everyone when needed, move between individual students and groups for more private discussion, and place groups of students in audio-isolated rooms for collaboration while still belonging to one virtual space.

Being a collaboration platform, a fundamental InSpace feature is chat. From day one, we wanted to provide a mechanism to help warn users from sending and receiving toxic messages. For example, a teacher in a classroom setting with young people may want a way to prevent students from typing inappropriate comments. Or, a moderator of a large discussion may want a way to reduce inappropriate spam. Or individual users may want to filter them out on their own.

A simple way to identify toxic comments would be to check for the presence of a list of words, including profanity. Moving a step beyond this, we did not want to identify toxic messages just by words contained in the message, we also wanted to consider the context. So, we decided to use machine learning to accomplish that goal.

After some research, we found a pre-trained model for toxicity detection in TensorFlow.js that could be easily integrated into our platform. Importantly, this model runs entirely in the browser, which means that we can warn users against sending toxic comments without their messages ever being stored or processed by a server.

Performance wise, we found that running the toxicity process in a browser’s main thread would be detrimental to the user experience. We decided a good approach was to use the Web Workers API to separate message toxicity detection from the main application so the processes are independent and non-blocking.

GIF moderating toxic comments

Web Workers connect to the main application by sending and receiving messages, in which you can wrap your data. Whenever a user sends a message, it is automatically added to a so-called queue and is sent from the main application to the web worker. When the web worker receives the message from the main app, it starts classification of the message, and when the output is ready, it sends the results back to the main application. Based on the results from the web worker, the main app either sends the message to all participants or warns the user that it is toxic.

Chart showing how toxicity filter works

Below is the pseudocode for the main application, where we initialize the web worker by providing its path as an argument, then set the callback that will be called each time the worker sends a message, and also we declare the callback that will be called when the user submits a message.

// main application
// initializing the web worker
const toxicityFilter = new Worker('toxicity-filter.worker.js'));
// now we need to set the callback which will process the data from the worker
worker.onMessage = ({ data: { message, isToxic } }) => {
if (isToxic) {
markAsToxic(message);
} else {
sendToAll(message);
}
}

When the user sends the message, we pass it to the web worker:

onMessageSubmit = message => {
worker.postMessage(message);
addToQueue(message);
}

After the worker is initialized, it starts listening to the data messages from the main app, and handling them using the declared onmessage callback, which then sends a message back to the main app.

// toxicity-filter worker
// here we import dependencies
importScripts(
// the main library to run Tenser Flow in the browser
'https://cdn.jsdelivr.net/npm/@tensorflow/tfjs',
// trained models for toxicity detection
'https://cdn.jsdelivr.net/npm/@tensorflow-models/toxicity',
);
// threshold point for the decision
const threshold = 0.9;
// the main model promise which would be used to classify the message
const modelPromise = toxicity.load(threshold);
// registered callback to run when the main app sends the data message
onmessage = ({ data: message }) => {
modelPromise.then(model => {
model.classify([message.body]).then(predictions => {
// as we want to check the toxicity for all labels,
// `predictions` will contain the results for all 7 labels
// so we check, whether there is a match for any of them
const isToxic = predictions.some(prediction => prediction.results[0].match);
// here we send the data message back to the main app with the results
postMessage({ message, isToxic });
});
});
};

As you can see, the toxicity detector is straightforward to integrate the package with an app, and does not require significant changes to existing architecture. The main application only needs a small “connector,” and the logic of the filter is written in a separate file.

To learn more about InSpace visit https://inspace.chat.

Read More

How to generate super resolution images using TensorFlow Lite on Android

How to generate super resolution images using TensorFlow Lite on Android

Posted by Wei Wei, TensorFlow Developer Advocate

The task of recovering a high resolution (HR) image from its low resolution counterpart is commonly referred to as Single Image Super Resolution (SISR). While interpolation methods, such as bilinear or cubic interpolation, can be used to upsample low resolution images, the quality of resulting images is generally less appealing. Deep learning, especially Generative Adversarial Networks, has successfully been applied to generate more photo-realistic images, for example, SRGAN and ESRGAN. In this blog, we are going to use a pre-trained ESRGAN model from TensorFlow Hub and generate super resolution images using TensorFlow Lite in an Android app. The final app looks like below and the complete code has been released in TensorFlow examples repo for reference.

Screencap of TensorFlow Lite

First, we can conveniently load the ESRGAN model from TFHub and easily convert it to a TFLite model. Note that here we are using dynamic range quantization and fixing the input image dimensions to 50×50. The converted model has been uploaded to TFHub but we want to demonstrate how to do it just in case you want to convert it yourself (for example, try a different input size in your own app):

model = hub.load("https://tfhub.dev/captain-pool/esrgan-tf2/1")
concrete_func = model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
concrete_func.inputs[0].set_shape([1, 50, 50, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

# Save the TF Lite model.
with tf.io.gfile.GFile('ESRGAN.tflite', 'wb') as f:
f.write(tflite_model)

esrgan_model_path = './ESRGAN.tflite'

You can also convert the model without hardcoding the input dimensions at conversion time and later resize the input tensor at runtime, as TFLite now supports inputs of dynamic shapes. Please refer to this example for more information.

Once the model is converted, we can quickly verify that the ESRGAN TFLite model does generate a much better image than bicubic interpolation. We also have another tutorial on ESRGAN if you want to better understand the model.

lr = cv2.imread(test_img_path)
lr = cv2.cvtColor(lr, cv2.COLOR_BGR2RGB)
lr = tf.expand_dims(lr, axis=0)
lr = tf.cast(lr, tf.float32)

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=esrgan_model_path)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Run the model
interpreter.set_tensor(input_details[0]['index'], lr)
interpreter.invoke()

# Extract the output and postprocess it
output_data = interpreter.get_tensor(output_details[0]['index'])
sr = tf.squeeze(output_data, axis=0)
sr = tf.clip_by_value(sr, 0, 255)
sr = tf.round(sr)
sr = tf.cast(sr, tf.uint8)
ESRGAN model with low res and high res image
LR: low resolution input image cropped from an butterfly image in DIV2K dataset. ESRGAN (x4): super resolution output image generated using ESRGAN model with upscale_ratio=4. Bicubic: output image generated using bicubic interpolation. As can be seen here, bicubic interpolation-generated image is much blurrier than the ESRGAN-generated one. PSNR is also higher on the ESRGAN-generated image.

As you may already know, TensorFlow Lite is the official framework to run inference with TensorFlow models on edge devices and is deployed on more than 4 billions edge devices worldwide, supporting Android, iOS, Linux-based IoT devices and microcontrollers. You can use TFLite in Java, C/C++ or other languages to build Android apps. In this blog we will use TFLite C API since many developers have asked for such an example.

We distribute TFLite C header files and libraries in prebuilt AAR files (the core library and the GPU library). To set up the Android build correctly, the first thing we need to do is download the AAR files and extract the header files and shared libraries. Have a look at how this is done in the download.gradle file.

Since we are using Android NDK to build the app (NDK r20 is confirmed to work), we need to let Android Studio know how native files should be handled. This is done in CMakeList.txt file:

We included 3 sample images in the app, so a user may easily run the same model multiple times, which means we need to cache the interpreter for better efficiency. This is done by passing the interpreter pointer from C++ code to Java code, after the interpreter is successfully created:

extern "C" JNIEXPORT jlong JNICALL
Java_org_tensorflow_lite_examples_superresolution_MainActivity_initWithByteBufferFromJNI(JNIEnv *env, jobject thiz, jobject model_buffer, jboolean use_gpu) {
const void *model_data = static_cast<void *>(env->GetDirectBufferAddress(model_buffer));
jlong model_size_bytes = env->GetDirectBufferCapacity(model_buffer);
SuperResolution *super_resolution = new SuperResolution(model_data, static_cast<size_t>(model_size_bytes), use_gpu);
if (super_resolution->IsInterpreterCreated()) {
LOGI("Interpreter is created successfully");
return reinterpret_cast<jlong>(super_resolution);
} else {
delete super_resolution;
return 0;
}
}

Once the interpreter is created, running the model is fairly straightforward as we can follow the TFLite C API documentation. We first need to carefully extract RGB values from each pixel. Now we can run the interpreter:

// Feed input into model
status = TfLiteTensorCopyFromBuffer(input_tensor, input_buffer, kNumberOfInputPixels * kImageChannels * sizeof(float));
…...

// Run the interpreter
status = TfLiteInterpreterInvoke(interpreter_);
…...

// Extract the output tensor data
const TfLiteTensor* output_tensor = TfLiteInterpreterGetOutputTensor(interpreter_, 0);
float output_buffer[kNumberOfOutputPixels * kImageChannels];
status = TfLiteTensorCopyToBuffer(output_tensor, output_buffer, kNumberOfOutputPixels * kImageChannels * sizeof(float));
…...

With the models results at hand, we can pack the RGB values back into each pixel.

There you have it. A reference Android app using TFLite to generate super resolution images on device. More details can be found in the code repository. Hopefully this is useful as a reference to Android developers who are getting started to use TFLite in C/C++ to build amazing ML apps.

Feedback

We are looking forward to seeing what you have built with TensorFlow Lite, as well as hearing your feedback. Share your use cases with us directly or on Twitter with hashtags #TFLite and #PoweredByTF. To report bugs and issues, please reach out to us on GitHub.

Acknowledgements

The author would like to thank @captain__pool for uploading the ESRGAN model to TFHub, and Tian Lin and Jared Duke for the helpful feedback.

[1] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi. 2016. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

[2] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang. 2018. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks.

[3] Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

[4] @captain__pool’s ESGRAN code implementation

[5] Eirikur Agustsson, Radu Timofte. 2017. NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study.

Read More