Tensorflow – Page 10

AI and Machine Learning @ I/O Recap

May 12, 2022

by TensorFlow Blog Tensorflow

Posted by TensorFlow Team

Google I/O 2022 was a major milestone in the evolution of AI and Machine Learning for developers. We’re really excited about the potential for developers using our technologies and Machine Learning to build intelligent solutions, and we believe that 2022 is the year when AI and ML become part of every developer’s toolbox.

At the I/O keynotes we showed our fully open source ecosystem that takes you from end to end with Machine Learning. There are developer tools for managing data, training your models, and deploying them to a variety of surfaces from global scale cloud all the way down to tiny microcontrollers…and of course ways to monitor and maintain your systems with MLOps. All of this comes with a common set of accelerated hardware for training and inference, along with open source tooling for responsible AI end-to-end.

You can get a tour through this ecosystem in the Keynote “AI and Machine Learning updates for Developers”

Responsible AI review processes: From a developer’s point of view

We can all agree that responsible and ethical AI is important, but when you want to build responsibly, you need tooling. We could, and will, create a whole video series about these tools, but the great content to watch right now is the talk on the Responsible AI review process. Googlers who worked on projects like the Covid-19 public forecasts or the Celebrity Recognition APIs will take you step-by-step through their thought process and how the tools lined up to help them build more responsibly and thoughtfully. You’ll also learn about some of the new releases in Responsible AI tools, such as the Counterfactual Logit Pairing library.

Adding machine learning to your developer toolbox

If you’re just getting started on your journey and you want ML to be a part of your toolbox, you probably have a million questions. Follow a developer’s journey through the best offerings, from a turnkey API that can solve basic problems fast, to custom models that can be tuned and deployed.

TensorFlow.js: From prototype to production, what’s new in 2022?

If you’re a web developer there’s a whole bunch of new updates, from the announcement of a new set of courses that will take you from first principles through a deep dive of what’s possible to lots of new models available to web devs. These include a selfie depth estimation model that can be used for cool things like a 3D effect in your pictures without needing any kind of extra sensor. You’ll also see 3D pose estimation that allows you to run at a high FPS to get real time results, allowing you to do things like having a full animated character following your body motion. All in the browser!

Deploy a custom ML model to mobile

If you want to build better mobile apps with AI and Machine Learning, you probably need to understand the ins and outs of getting models to execute on Android or iOS devices, including shrinking them and optimizing them to be power friendly. Supercharge your model with new releases from the TensorFlow Lite team that let you quantize, debug, and accelerate your model on CPU or delegated GPUs, and a whole lot more.

Further on the edge with Coral Dev Board Micro

Speaking of acceleration, this year at I/O we introduced the Coral Dev Board Micro. This is a new microcontroller class device with an on-board Edge TPU that’s powerful enough to run multiple models in tandem. The Coral team has also updated their catalog of pre-trained models, now including over 40 models now available for you to use on embedded systems out of the box!

Tips and tricks for distributed large model training

On the other side of the spectrum, if you want to train large models, you’ll need to understand how to shard training and data across multiple processors or cores. We’ve released lots of new guidance and updates for model and data parallelism. You can learn all about them in this talk, including lessons learned from Google researchers in building language models.

Easier data preprocessing with Keras

Of course, not all data is big data, and if you’re not building giant models, you still need to be able to manage your data. Often this is where devs will write the most code for ML, so we want to highlight some ways of making this easier, in particular with Keras. Keras’s new preprocessing layers that not only make vectorization and augmentation much easier, but also allow for precomputation to make your training more efficient by reducing idle time. Learn about data preprocessing from the creator of Keras!

An introduction to MLOps with TFX

Finally, let’s not forget MLOps and TFX, the open source, end-to-end pipeline management tool. Check out the talk from Robert Crowe who will help you understand everything, from why you need MLOps to managing your process through managing change. You’ll see the component model in TFX, and get an introduction to the new TFX-Addons community that’s focussed on building new ones. Check it all out in this talk!!

I/O wasn’t just about new releases and talks! If you are inspired by any of what you saw, we also have workshops and learning paths you can dig into to learn in more detail.

Full playlist to all AI/ML talks and workshops.

That’s it for this roundup of AI and ML at Google I/O 2022. We hope you’ve enjoyed it, and we’d love to hear your feedback when you explore the content. Please drop by the TensorFlow Forum and let us know what you think!

Using Machine Learning to Help Protect the Great Barrier Reef in Partnership with Australia’s CSIRO

May 11, 2022

by TensorFlow Blog Tensorflow

Posted by Megha Malpani, Google Product Manager and Ard Oerlemans, Google Software Engineer

Coral reefs are some of the most diverse and important ecosystems in the world, both for marine life and society more broadly. Not only are healthy reefs critical to fisheries and food security, they also protect coastlines from storm surge, support tourism-based economies, and advance drug discovery research, among other countless benefits.

Reefs face a number of rising threats, most notably climate change, pollution, and overfishing. In the past 30 years alone, there have been dramatic losses in coral cover and habitat in the Great Barrier Reef (GBR), with other reefs experiencing similar declines. In Australia, outbreaks of the coral-eating crown of thorns starfish (COTS) have been shown to cause major coral loss. While COTS naturally exist in the Indo-Pacific, reductions in the abundance of natural predators and excess run-off nutrients have led to massive outbreaks that are devastating already vulnerable coral communities. Controlling COTS populations is critical to promoting coral growth and resilience.

The Great Barrier Reef Foundation established an innovation program to develop new survey and intervention methods that radically improve COTS control. Google teamed up with CSIRO, Australia’s national science agency, to develop innovative machine learning technology that can analyze video sequences accurately, efficiently, and in near real-time. The goal is to transform the underwater survey, monitoring and mapping reefs at scale to help rapidly identify and prioritize COTS outbreaks. This project is part of a broader partnership with CSIRO under Google’s Digital Future Initiative in Australia.

CSIRO developed an edge ML platform (built on top of the NVIDIA Jetson AGX Xavier) that can analyze underwater image sequences and map out detections in near real-time. Our goal was to use the annotated dataset CSIRO had built over multiple field trips to develop the most accurate object detection model (across a variety of environments, weather conditions, and COTS populations) within a set of performance constraints, most notably, processing more than 10 frames per second (FPS) on a <30 watt device.

We hosted a Kaggle competition, leveraging insights from the open source community to drive our experimentation plan. With over 2,000 teams and 61,000 submissions, we were able to learn from the successes and failures of far more experiments than we could hope to execute on our own. We used these insights to define our experimentation roadmap and ended up running hundreds of experiments on Google TPUs.

We used TensorFlow 2’s Model Garden library as our foundation, making use of its scaled YOLOv4 model and corresponding training pipeline implementations. Our team of modeling experts then got to work, modifying the pipeline, experimenting with different image resolutions and model sizes, and applying various data augmentation and quantization techniques to create the most accurate model within our performance constraints.

Due to the limited amount of annotated data, a key part of this problem was figuring out the most effective data augmentation techniques. We ran hundreds of experiments based on what we learned from the Kaggle submissions to determine which techniques in combination were most effective in increasing our model’s accuracy.

In parallel with our modeling workstream, we experimented with batching, XLA, and auto mixed precision (which converts parts of the model to fp16) to try and improve our performance, all of which resulted in increasing our FPS by 3x. We found however, that on the Jetson module, using TensorFlow-TensorRT (converting the entire model to fp16) by itself actually resulted in a 4x total speed up, so we used TF-TRT exclusively moving forward.

After the starfish are detected in specific frames, a tracker is applied that links detections over time. This means that every detected starfish will be assigned a unique ID that it keeps as long as it stays visible in the video. We link detections in subsequent frames to each other by first using optical flow to predict where the starfish will be in the next frame, and then matching detections to predictions based on their Intersection over Union (IoU) score.

In a task like this where recall is more important than precision (i.e. we care more about not missing COTS than false positives), it is useful to consider the F2 metric to assess model accuracy. This metric can be used to evaluate a model’s performance on individual frames. However, our ultimate goal was to determine the total number of COTS present in the video stream. Thus, we cared more about evaluating the entire pipeline’s accuracy (model + tracker) than frame-by-frame performance (i.e. it’s okay if the model has inaccurate predictions on a frame or two as long as the pipeline correctly identifies the starfish’s overall existence and location). We ended up using a sequence-based F₂ metric that determines how many “tracks” are found at a certain average IoU threshold.

Our current 1080p model using TensorFlow TensorRT runs at 11 FPS on the Jetson AGX Xavier, reaching a sequence-based F₂ score of 0.80! We additionally trained a 720p model that runs at 22 FPS on the Jetson module, with a sequence-based F₂ score of 0.78.

Google & CSIRO are thrilled to announce that we are open-sourcing both COTS Object Detection models and have created a Colab notebook to demonstrate the server-side inference workflow. Our Colab tutorial allows students, marine researchers, or data scientists to evaluate our COTS ML models on image sequences with zero configuration/ML knowledge. Additionally, it provides a blueprint for implementing an optimized inference pipeline for edge ML platforms, such as the Jetson module. Please stay tuned as we plan to continue updating our models & trackers, ultimately open-sourcing a full TFX pipeline and dataset so that conservation organizations and other governments around the world can retrain and modify our model with their own datasets. Please reach out to us if you have a specific use case you’d like to collaborate on!

Acknowledgements

A huge thank you to everyone who’s hard work made this project possible!

We couldn’t have done this without our partners at CSIRO – Brano Kusy, Jiajun Liu, Yang Li, Lachlan Tychsen-Smith, David Ahmedt-Aristizabal, Ross Marchant, Russ Babcock, Mick Haywood, Brendan Do, Jeremy Oorloff, Lars Andersson, and Joey Crosswell, the amazing Kaggle community, and last but not least, the team at Google – Glenn Cameron, Scott Riddle, Di Zhu, Abdullah Rashwan, Rebecca Borgo, Evan Rosen, Wolff Dobson, Tei Jeong, Addison Howard, Will Cukierski, Sohier Dane, Mark McDonald, Phil Culliton, Ryan Holbrook, Khanh LeViet, Mark Daoust, George Karpenkov, and Swati Singh.

On-device Text-to-Image Search with TensorFlow Lite Searcher Library

May 11, 2022

by TensorFlow Blog Tensorflow

Posted by Zonglin Li, Lu Wang, Maxime Brénon, and Yuqi Li, Software Engineers

Today, we’re excited to announce a new on-device embedding-based search library that allows you to quickly find similar images, text or audio from millions of data samples in a few milliseconds.

It works by using a model to embed the search query into a high-dimensional vector representing the semantic meaning of the query. Then it uses ScaNN (Scalable Nearest Neighbors) to search for similar items from a predefined database. In order to apply it to your dataset, you need to use Model Maker Searcher API (vision/text) to build a custom TFLite Searcher model, and then deploy it onto devices using Task Library Searcher API (vision/text).

For example, with the Searcher model trained on COCO, searching the query, A passenger plane on the runway, will return the following images:

Figure 1: All images are from COCO 2014 train and validation datasets. Image 1 by Mark Jones Jr. under Attribution License. Image 2 by 305 Seahill under Attribution-NoDerivs License. Image 3 by tataquax under Attribution-ShareAlike License.

Train a dual encoder model for image and text query encoding using the COCO dataset.
Create a text-to-image Searcher model using the Model Maker Searcher API.
Retrieve images with text queries using the Task Library Searcher API.

Train a Dual Encoder Model

Figure 2: Train the dual encoder model with dot product similarity distance. The loss encourages related images and text to have larger dot products (the shaded green squares).

The dual encoder model consists of an image encoder and a text encoder. The two encoders map the images and text, respectively, to embeddings in a high-dimensional space. The model computes the dot product between the image and text embeddings, and the loss encourages relevant image and text to have larger dot product (closer), and unrelated ones to have smaller dot product (farther apart).

The training procedure is inspired by the CLIP paper and this Keras example. The image encoder is based on a pre-trained EfficientNet model and the text encoder is based on a pre-trained Universal Sentence Encoder model. The outputs from both encoders are then projected to a 128 dimensional space and are L2 normalized. For the dataset, we chose to use COCO, as its train and validation splits have human generated captions for each image. Please take a look at the companion Colab notebook for the details of the training process.

The dual encoder model makes it possible to retrieve images from a database without captions because once trained, the image embedder can directly extract the semantic meaning from the image without any need for human-generated captions.

Create the text-to-image Searcher model using Model Maker

Figure 3: Generate image embeddings using the image encoder and use Model Maker to create the TFLite Searcher model.

Once the dual encoder model is trained, we can use it to create the TFLite Searcher model that searches for the most relevant images from an image dataset based on the text queries. This can be done by the following three steps:

Generate the embeddings of the image dataset using the TensorFlow image encoder. ScaNN is capable of searching through a very large dataset, hence we combined the train and validation splits of COCO 2014 totaling 123K+ images to demonstrate its capabilities. See the code here.
Convert the TensorFlow text encoder model into TFLite format. See the code here.
Use Model Maker to create the TFLite Searcher model from the TFLite text encoder and the image embeddings using the code below:


# Configure ScaNN options. See the API doc for how to configure ScaNN. 
scann_options = searcher.ScaNNOptions(
      distance_measure='dot_product',
      tree=searcher.Tree(num_leaves=351, num_leaves_to_search=4),
      score_ah=searcher.ScoreAH(1, anisotropic_quantization_threshold=0.2))

# Load the image embeddings and corresponding metadata if any.
data = searcher.DataLoader(tflite_embedder_path, image_embeddings, metadata)

# Create the TFLite Searcher model.
model = searcher.Searcher.create_from_data(data, scann_options)

# Export the TFLite Searcher model.
model.export(
      export_filename='searcher.tflite',
      userinfo='',
      export_format=searcher.ExportFormat.TFLITE)

When creating the Searcher model, Model Maker leverages ScaNN to index the embedding vectors. The embedding dataset is first partitioned into multiple subsets. In each of the subsets, ScaNN stores the quantized representation of the embedding vectors. At retrieval time, ScaNN selects a few most relevant partitions and scores the quantized representations with fast, approximate distances. This procedure saves both the model size (through quantization) and achieves speed up (through partition selection). See the in-depth examination to learn more about the ScaNN algorithm.

In the above example, we divide the dataset into 351 partitions (roughly the square root of the number of embeddings we have), and search 4 of them during retrieval, which is roughly 1% of the dataset. We also quantize the 128 dimensional float embeddings to 128 int8 values to save space.

Run inference using Task Library

Figure 4: Run inference using Task Library with the TFLite Searcher model. It takes the query text and returns the top neighbor’s metadata. From there we can find the corresponding images.

To query images using the Searcher model, you only need a couple of lines of code like the following using Task Library:

from tflite_support.task import text

# Initialize a TextSearcher object
searcher = text.TextSearcher.create_from_file('searcher.tflite')

# Search the input query
results = searcher.search(query_text)

# Show the results
for rank in range(len(results.nearest_neighbors)):
  print('Rank #', rank, ':')
  image_id = results.nearest_neighbors[rank].metadata
  print('image_id: ', image_id)
  print('distance: ', results.nearest_neighbors[rank].distance)
  show_image_by_id(image_id)

Try the code from the Colab. Also, see more information on how to integrate the model using the Task Library Java and C++ API, especially on Android. Each query in general takes only 6 milliseconds on Pixel 6.

Here are some example results:

Query: A man riding a bike

Results are ranked according to the approximate similarity distance. Here is a sample of retrieved images. Note that we are only showing images if their licenses allow.

Figure 5: All images are from COCO 2014 train and validation datasets. Image 1 by Reuel Mark Delez under Attribution License. Image 2 by Richard Masoner / Cyclelicious under Attribution-ShareAlike License. Image 3 by Julia under Attribution-ShareAlike License. Image 4 by Aaron Fulkerson under Attribution-ShareAlike License. Image 5 by Richard Masoner / Cyclelicious under Attribution-ShareAlike License. Image 6 by Richard Masoner / Cyclelicious under Attribution-ShareAlike License.

Future work

We’ll be working on enabling more search types beyond image and text, such as audio clips.

Contact odml-pipelines-team@google.com if you want to leave any feedback. Our goal is to make on-device ML even easier for you and we value your input!

In this post, we will walk you through an end-to-end example of building a text-to-image search feature (retrieve the images given textual queries) using the new TensorFlow Lite Searcher Library. Here are the major steps:

Acknowledgements

We would like to thank Khanh LeViet, Chuo-Ling Chang, Ruiqi Guo, Lawrence Chan, Laurence Moroney, Yu-Cheng Ling, Matthias Grundmann, as well as Robby Neale, Chung-Ching Chang‎, Tom Small and Khalid Salama for their active support of this work. We would also like to thank the entire ScaNN team: David Simcha, Erik Lindgren, Felix Chern, Phil Sun and Sanjiv Kumar.

Portrait Depth API: Turning a Single Image into a 3D Photo with TensorFlow.js

May 10, 2022

by TensorFlow Blog Tensorflow

Posted by Ruofei Du, Yinda Zhang, Ahmed Sabie, Jason Mayes, Google.

A depth map is essentially an image (or image channel) that contains information relating to the distance of the surfaces of objects in the scene from a given viewpoint (in this case, the camera itself) for every pixel in that image. Depth maps are a fundamental building block for a variety of computer graphics and computer vision applications, such as augmented reality, portrait mode, and 3D reconstruction. Despite the recent advances in depth sensing capabilities with ARCore Depth API, the majority of photographs on the web are still missing associated depth maps. This, combined with users from the web community expressing a growing interest in having depth capabilities within JavaScript to enhance existing web apps such as to bring images to live, apply real time AR effects to a human face and body, or even reconstruct items for use in VR environments, helped shape the path for what you see today.

Today we are introducing the Depth API, the first depth estimation API from TensorFlow.js. With this new API, we are also introducing the first depth model for portrait, ARPortraitDepth, which estimates a depth map for a single portrait image. To demonstrate one of many possible usages of depth information, we also present a computational photography application, 3D photo, which utilizes the predicted depth and enables a 3D parallax effect on the given portrait image. Try the live demo below, everyone can easily make their social media profile photo 3D as shown below.

Try out the 3D portrait demo for yourself!

Examples generated from the 3D photo application.

ARPortraitDepth: Single Image Depth Estimation

At the core of the Portrait Depth API is a deep learning model, named ARPortraitDepth, that takes a single color portrait image as the input and produces a depth map. For the sake of computational efficiency, we adopt a light-weight U-Net architecture. As shown below, the encoder gradually downscales the image or feature map resolution by half, and the decoder increases the feature resolution to the same as the input. Deep learning features from the encoder are concatenated to the corresponding layers with the same spatial resolution in the decoders to bring high resolution signals for depth estimation. During training, we force the decoder to produce depth predictions with increasing resolutions at each layer, and add a loss for each of them with the ground truth. This empirically helps the decoder to predict accurate depth by gradually adding details.

Abundant and diverse training data is critical for the machine learning model to achieve overall decent performance, e.g. accuracy and robustness. We synthetically render pairs of color and depth images with various camera configurations, e.g. focal length, camera pose, from 3D digital humans captured by a high quality performance capture system, and run relighting augmentation with High Dynamic Range environment illumination maps to increase the realism and diversity of the color images, e.g. shadows on the face. We also collect real data using mobile phones equipped with a front facing depth sensor, e.g. Google Pixel 4, where the depth quality, as the training ground truth, is not as accurate and complete as that in our synthetic data, but the color images are effective in improving the performance of our model when running on images in the wild.

Single image depth estimation pipeline.

To enhance the robustness against background variation, in practice, we run an off-the-shelf body segmentation model with MediaPipe and TensorFlow.js before sending the image into the neural network of depth estimation.

The portrait depth model could enable a whole host of creative applications orientated around the human body that could drive next generation web apps. We refer readers to ARCore Depth Lab for more inspirations.

For the 3D photo application, we created a high-performance rendering pipeline. It first generates a segmented mask using the TensorFlow.js existing body segmentation API. Next, we pass the masked portrait into the Portrait Depth API and obtain a depth map on the GPU. Eventually, we generate a depth mesh in three.js, with vertices arranged in a regular grid and displaced by re-projecting corresponding depth values (see the figure below for generating the depth mesh). Finally, we apply texture projection to the depth mesh and rotate the camera around the z axis in a circle. Users can download the animations in GIF or WebM format.

Generating the depth mesh from the depth map for the 3D photo application.

Portrait Depth API Installation

The portrait depth API is currently offered as one variant of the new depth API.

To install the API and runtime library, you can either use the <script> tag in your html file or use NPM.

Through script tag:

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-converter"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/body-segmentation"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/depth-estimation"></script>

Through NPM:

yarn add @tensorflow/tfjs-core @tensorflow/tfjs-backend-webgl
yarn add @tensorflow/tfjs-converter
yarn add @tensorflow-models/body-segmentation
yarn add @tensorflow-models/depth-estimation

To reference the API in your JS code, it depends on how you installed the library.

If installed through script tag, you can reference the library through the global namespace depthEstimation.

If installed through NPM, you need to import the libraries first:

import '@tensorflow/tfjs-backend-core';
import '@tensorflow/tfjs-backend-webgl';
import '@tensorflow/tfjs-converter';
import '@tensorflow-models/body-segmentation;
import * as depthEstimation from '@tensorflow-models/depth-estimation;

Try it yourself!

First, you need to create an estimator:

const model = depthEstimation.SupportedModels.ARPortraitDepth;

const estimatorConfig = {
  minDepth: 0, // The minimum depth value outputted by the estimator.
  maxDepth: 1, // The maximum depth value outputted by the estimator.

};

estimator = await depthEstimation.createEstimator(model, estimatorConfig);

Once you have an estimator, you can pass in a video stream, static image, or TensorFlow.js tensors to estimate depth:

const video = document.getElementById('video');
const depthMap = await estimator.estimateDepth(video);

How to use the output?

The depthMap result above contains depth values for each pixel in the image.

The depthMap is an object which stores the underlying depth values. You can then utilize the provided asynchronous conversion functions such as toCanvasImageSource, toArray, and toTensor depending on the desired output type that you want for efficiency.

It should be noted that different models have different internal representations of data. Therefore converting from one form to another may be expensive. In the name of efficiency, you can call getUnderlyingType to determine what form the depth map is in already so you may choose to keep it in the same form for faster results.

The semantics of the depthMap are as follows: the depth map is the same size as the input image. For array and tensor representations, there is one depth value per pixel. For CanvasImageSource, the green and blue channels are always set to 0, whereas the red channel stores the depth value.

See below output snippet for example:

  {
    toCanvasImageSource(): ...
    toArray(): ...
    toTensor(): ...
    getUnderlyingType(): ...
  }

Browser Performance

Portrait Depth model

MacBook M1 Pro 2021.

(FPS)

iPhone 13 Pro

(FPS)

Pixel 6 Pro

(FPS)

Desktop PC

Intel i9-10900K. Nvidia GTX 1070 GPU.

(FPS)

TFJS Runtime

With WebGL backend.

Acknowledgements

We would like to acknowledge our colleagues who participated in or sponsored creating Portrait Depth API in TensorFlow.js: Na Li, Xiuxiu Yuan, Rohit Pandey, Abhishek Kar, Sergio Orts Escolano, Christoph Rhemann, Idris Aleem, Sean Fanello, Adarsh Kowdle, Ping Yu, Alex Olwal‎. We would also like to acknowledge the body segmentation model provided by MediaPipe, and The Relightables for high quality synthetic data.

Why Karrot Uses TFX, and How to Improve Productivity on ML Pipeline Development

May 6, 2022

by TensorFlow Blog Tensorflow

Posted by Ukjae Jeong, Gyoung-yoon Yoo, and Myeonghyeon Song from Karrot

Karrot (the global service of Danggeun Market in Korea) is a local community service app that connects neighbors based on a secondhand marketplace. Danggeun Market was launched in 2015, and over 23 million people in Korea are using Danggeun Market in their local communities. Currently, Karrot is operated in 440 local communities in four countries: the U.K., the U.S., Canada, and Japan. In our service, scrolling through feeds to find inexpensive and useful items has become a daily pleasure for users. For better user experiences, we’ve been applying several machine learning models such as recommendation models.

We are also working on ways to effectively and efficiently apply ML models. In particular, we’re putting lots of effort into building machine learning pipelines for periodic deployment, rapid experiments, and continuous model improvement.

For the ML pipelines, we’ve been using TFX (TensorFlow Extended) for production. So in this article, we will briefly introduce why we use TFX, and how we utilize TFX to improve productivity.

Machine Learning in Karrot

There are many ML projects inside Karrot. ML models are running inside the services. For example, we use automation models to detect fraud, and there are recommendation models to improve the user experience on our app feed. If you are interested in detailed descriptions of the models, please refer to our team blogs, which are written in Korean.

As we’ve been using Kubeflow for our ML models, we were able to periodically train, experiment, and deploy models but still, we had some pain points. As we started to use TFX with Kubeflow last year, TFX pushed this line further and let the team easily use our production pipelines.

How TFX helps us with production ML

TFX helps build and deploy production ML pipelines easily with open and extendable design.

TFX, completely open-sourced in 2019, is an end-to-end platform for production ML pipelines. It supports writing ML workflows in component units, which then can be run in multiple environments – Apache Beam, Dataflow, Kubeflow, and Airflow. It also comes with well-written standard components for data ingestion/transformation, training, and deployment.

Standard Components

TFX provides several standard components. If you are looking for components for data ingestion, there are CsvExampleGen based on local CSV files, PrestoExampleGen, and BigQueryExampleGen which can ingest data directly from Presto, BigQuery, and many other sources with some customization. So you can easily process data from multiple sources just by connecting pre-built components to your TFX pipelines.

It can also handle large-scale data processing smoothly. Since the Transform component that performs feature engineering is implemented on Apache Beam, you can execute it on GCP Dataflow or another compute cluster in a distributed manner.

Of course, many other convenient components exist and are added constantly.

Component Reusability

In order to adapt TFX to our product, there is a need for custom components. TFX has a well-structured component design that enables us to create custom components naturally and easily connect them to existing TFX pipelines. A simple Python function or container can be transformed into a TFX component, or you can write the whole component in the exact same way as standard components are written. For more details, check out the custom component guide.

To enhance our productivity by delivering these advantages, we share custom components that have similar use cases among our ML pipelines as an internal library of Karrot Market.

Various Runners are Supported

TFX is compatible with a variety of environments. It can be run locally on your laptop or run on DataFlow, GCP’s batch data processing service compatible with Apache Beam. You can visualize the output by manually running each component in a Jupyter Notebook. TFX also supports KubeFlow and Vertex AI, which have recently been released with new features as well. Therefore, the pipeline code is written once, and can then be run almost anywhere. We can simply create development, experiment, and deployment environments at once. For that reason, the burden of deploying models to production was significantly reduced by using TFX for our services.

Technical lessons

As we set up our ML pipelines with TFX, code qualities and our experiences in model development have increased.

However, there were difficulties. We didn’t have a uniform project structure or best practices among our team. Maybe this is because TFX itself is relatively new and we’ve been using it before version 1. It became harder to understand codes and start to contribute. As the pipelines are becoming larger and more complex, it’s getting harder to understand the meaning of custom components, corresponding config values, and dependencies. In addition, it was difficult to introduce some of the latest features to the team.

Improving the Development Experience

We decided to create and use a template for TFX pipelines to make it easier to understand each other’s code, implement pipelines with the same pattern, and share know-how with each other. We merged components frequently used in Karrot and put them in a shared library so that ML pipelines can be developed very quickly.

It was expected that the template would accelerate the development of new projects. In addition, as mentioned above, we expected that each project would have a similar structure, making it easier to understand each other’s projects.

So far, we have briefly introduced the template project. Here are some of our considerations to make better use of TFX in this project.

Configuration first

We prioritize our configuration first. It should be enough to understand how pipelines work by reading their configuration. If we can understand specific settings very easily, we can set up various experiments and proceed with them to AB testing.

example_gen_config.proto written in Protocol Buffer(Protobuf), denotes the specification of config. config.pbtxt holds the values, and pipeline.py builds up the pipeline.

// config.pbtxt
example_gen_config {
    big_query_example_gen_config {
        query: "# query for example gen"
    }


    ...
}

...

// example_gen_config.proto
message ExampleGenConfig {
    oneof config {
        BigQueryExampleGenConfig big_query_example_gen_config = 1;
        CsvExampleGenConfig csv_example_gen_config = 2;
    }

    ...
}

// When BigQueryExampleGen is used
message BigQueryExampleGenConfig {
    optional string query = 1;
}

// When CsvExampleGenConfig is used
message CsvExampleGenConfig {
    optional string input_base = 1;
}

# pipeline.py
def create_pipeline(config):
   ...
   example_gen = _create_example_gen(config.example_gen_config)
   ...




def _create_example_gen(config: example_gen_config_pb2.ExampleGenConfig):
    ...

    if config.HasField("big_query_example_gen_config"):
        ...
        return ...


    if config.HasField("csv_example_gen_config"):
        ...
        return ...


    raise ...

All configurations of ExampleGen are determined by a single ExampleGenConfig message. Similarly, all pipeline components only depend on their configs and are created from them. This way, you can understand the structure of the pipeline just by looking at the configuration file. There is also the intention to make customization and code understanding easier by separating the part that defines each component.

For example, let’s assume the following situation: In order to test the data transformation later, the Transform component needs to support various data processing methods. You might want to add a data augmentation process in the transform component. Then it should be done by adding a config related to the data augmentation function. Similarly, you can extend the predefined Protobuf specification to easily support multiple processing methods and make it easy to see which processing method to use.

Managing Configs with Protobuf

About the example code above, some people may wonder why they use Protobuf as a configuration tool. There are several reasons for this, and we will compare advantages with YAML, which is one of the common practices for configuration.

First, Protobuf has a robust interface, and validation such as type checking is convenient. There is no need to check whether any field is defined, as Protobuf defines the object structure in advance. In addition, it is useful to support backward/forward compatibility in a project under active development.

Also, you can easily check the pipeline structure. YAML has a hierarchical structure, but in the case of hydra, which is often used in the machine learning ecosystem, the stage (e.g. production, dev, alpha) settings are divided into several files, so we thought that Protobuf has better stability and visibility.

If you use Protobuf as your project setup tool, many of the Protobuf definitions defined in TFX can be reused.

TensorFlow Ecosystem with Bazel

Bazel is a language-independent build system that is easy to extend and supports a variety of languages and tools. From simple projects to large projects using multiple languages and tools, it can be used quickly and concisely in most situations. For more information, please refer to Bazel Vision on the Bazel documentation page.

Using Bazel in a Python project is an uncommon setting, but we used Bazel as the project build system of the TFX template project. A brief introduction to the reason is as follows.

First of all, it works really well with Protobuf. Because Bazel is a language-independent build system, you can easily tie your Protobuf build artifacts as dependencies with other builds without worry. In addition, the Protocol Buffer repository itself uses Bazel, so it is easy to integrate it into the Bazel-based project.

The second reason is the special environment of the TensorFlow ecosystem. Many projects in the TensorFlow ecosystem use Bazel, and TFX also uses Bazel, so you can easily link builds with other projects (TensorFlow, TFX) using Bazel.

Internal Custom TFX Modules

As mentioned before, we’ve been building an internal library for the custom TFX modules (especially the custom components) that are frequently used across multiple projects. Anyone in Karrot can add their components and share them with the team.

For example, we are using ArgoCD to manage applications(e.g. TF Serving) in Kubernetes clusters, so if someone develops a component for deploying with ArgoCD, we can easily share it via an internal library. The library now contains several custom modules for our team for productivity.

The reason why we can share our custom features as an internal shared library is probably thanks to the modular structure of TFX. Through this, we were able to improve the overall productivity of the team easily. We can reuse most of the components that were developed from several projects, and develop new projects very easily.

Conclusion

TFX provides lots of great features to develop production ML pipelines. We’re using TFX on Kubeflow for ML pipelines to develop, experiment, and deploy in a better way, and it brings us many benefits. So we decided to introduce how we are using TFX in this blog post.

To learn more about Karrot, check out our website (Korea, US, and Canada). For the TFX, check out the TFX documentation page.

Video Classification on Edge Devices with TensorFlow Lite and MoViNet

April 14, 2022

by TensorFlow Blog Tensorflow

Posted by Dan Kondratyuk, Liangzhe Yuan, Google Research and Khanh LeViet, TensorFlow Developer Relations

We are excited to announce MoViNets (pronounced “movie nets”), a family of new mobile-optimized model architectures for video classification. The models are trained on the Kinetics-600 dataset to be able to recognize 600 different human actions (such as playing trumpet, robot dancing, bowling, and more) and can classify video streams captured on a modern smartphone in real time. You can download the pre-trained TensorFlow Lite models from TensorFlow Hub or try it out using our Android and Raspberry Pi demo apps, as well as fine-tune your own MoViNets with the Colab demo and the code in the TensorFlow Model Garden.

Demo from the TensorFlow Lite video classification reference app

Video classification is a machine learning task that takes video frames as input and predicts a single class from a larger set of classes. Video action recognition is a type of video classification where the set of predicted classes consists of human actions that happened in the frames. Video action recognition is similar to image recognition in that both take input images and output the probabilities of the images belonging to each of the predefined classes. However, a video action recognition model has to look at both the content of each frame and the spatial relationships between adjacent frames to understand the actions in the video. For example, if you look at these still images, it’s difficult to tell what the person is doing.

But if you look at the full video, it becomes clear that the person is performing jumping jacks.

MoViNet Model Architecture

MoViNets are a family of convolutional neural networks which efficiently process video streams, outputting accurate predictions with a fraction of the latency of convolutional video classifiers like 3D ResNets or transformer-based classifiers like ViT.

Frame-based classifiers output predictions on each 2D frame independently, resulting in sub-optimal performance due to their lack of temporal reasoning. On the other hand, 3D video classifiers offer high accuracy predictions by processing all frames in a video clip simultaneously, at a cost of significant memory and latency penalties as the number of input frames increases. MoViNets offer key advantages from both 2D frame-based classifiers and 3D video classifiers while mitigating their disadvantages.

The following figure shows a typical approach to using 3D networks with multi-clip evaluation, where the predictions of multiple overlapping subclips are averaged together. Shorter subclips result in lower latency, but reduce the overall accuracy.

Diagram illustrating Multi-Clip Evaluation for 3D Video Networks

MoViNets take a hybrid approach, which proposes the use of causal convolutions in place of 3D convolutions, allowing intermediate activations to be cached across frames with a Stream Buffer. The Stream Buffer copies the input activations of all 3D operations, which is output by the model and then input back into the model on the next clip input.

Diagram illustrating Streaming Evaluation for MoViNets

The result is that MoViNets can receive one frame input at a time, reducing peak memory usage while resulting in no loss of accuracy, with predictions equivalent to inputting all frames at once like a 3D video classifier. MoViNets additionally leverage Neural Architecture Search (NAS) by searching for efficient configurations of models on video datasets (specifically Kinetics 600) across network width, depth, and resolution.

The result is a set of action classifiers that can output temporally-stable predictions that smoothly transition based on frame content. Below is an example plot of MoViNet-A2 making predictions on each frame on a video clip of skateboarding. Notice how the initial scene with a small amount of motion has relatively constant predictions, while the next scene with much larger motion causes a dramatic shift in predicted classes.

MoViNets need a few modifications to be able to run effectively on edge devices. We start with MoViNet-A0-Stream, MoViNet-A1-Stream, and MoViNet-A2-Stream, which represent the smaller models that can feasibly run in real time (20 fps or higher). To effectively quantize MoViNet, we adapt a few modifications to the model architecture – the hard swish activation is replaced by ReLU6, and Squeeze-and-Excitation layers are removed in the original architectures, which results in 3-4 p.p accuracy drop on Kinetics-600. We then convert the models to TensorFlow Lite and use integer-based post-training quantization (as well as float16 quantization) to reduce the model sizes and make them run faster on mobile CPUs. The integer-based post-training quantization process further introduces 2-3 p.p. accuracy loss. Compared to the original MoViNets, quantized MoViNets lag behind in accuracy on full 10-second Kinetics 600 clips (5-7 p.p. accuracy reduction in total), but in practice they are able to provide very accurate predictions on daily human actions, e.g., push ups, dancing, and playing piano. In the future, we plan to train with quantization-aware training to bridge this accuracy gap.

A video plotting the top-5 predictions of MoViNet-A2 over time on an example 8-second (25 fps) skateboarding video clip. Create your own plots with this Colab notebook.

We benchmark quantized A0, A1, and A2 on real hardware and the model inference time achieves 200, 120, and 60 fps respectively on Pixel 4 CPU. In practice, due to the input pipeline overhead, we see increased latency closer to 20-60 fps when running on Android with a camera as input.

Model	Quantization	Top-1 Accuracy (%)	Latency (ms, Pixel 4 CPU)	Model Size (MB)	Recommended Input
A0-Stream	int8	65.0	4.80	3.1	172 x 172, 5 fps
A1-Stream	int8	70.1	8.35	4.5	172 x 172, 5 fps
A2-Stream	int8	72.2	15.76	5.1	224 x 224, 5 fps
A0-Stream	float16	71.5	17.47	7.6	172 x 172, 5 fps
A1-Stream	float16	76.0	34.82	13	172 x 172, 5 fps
A2-Stream	float16	77.5	76.31	15	224 x 224, 5 fps

Train a Custom Model

You can train your own video classifier model using the MoViNet codebase in the TensorFlow Model Garden. The provided Colab notebook provides specific steps on how to fine-tune a pretrained video classifier on another dataset.

Future Steps

We are excited to see on-device online video action recognition powered by MoViNets, which demonstrate highly efficient performance. In the future, we plan to support quantize-aware training for MoViNets to mitigate the quantization accuracy loss. We also are interested in extending MoViNets as the backbone for more on-device video tasks, e.g. video object detection, video object segmentation, visual tracking, pose estimation, and more.

Acknowledgement

We would like to extend a big thanks to Yeqing Li for supporting MoViNets in TensorFlow Model Garden, Boqing Gong, Huisheng Wang, and Ting Liu for project guidance, Lu Wang for code reviews, and the TensorFlow Hub team for hosting our models.

How to migrate from BoostedTrees Estimators to TensorFlow Decision Forests

April 4, 2022

by TensorFlow Blog Tensorflow

Posted by Mathieu Guillame-Bert and Josh Gordon for the TensorFlow team

Decision forest models like random forests and gradient boosted trees are often the most effective tools available for working with tabular data. They provide many advantages over neural networks, including being easier to configure, and faster to train. Using trees greatly reduces the amount of code required to prepare your dataset, as they natively handle numeric, categorical, and missing features. And they often give good results out-of-the-box, with interpretable properties.

Although we usually think of TensorFlow as a library to train neural networks, a popular use case at Google is to use TensorFlow to create decision forests.

An animation of a decision tree classifying data.

This article provides a migration guide if you were previously creating tree-based models using tf.estimator.BoostedTrees, which was introduced in 2019. The Estimator API took care of much of the complexity of working with models in production, including distributed training and serialization. However, it is no longer recommended for new code.

If you are starting a new project, we recommend that you use TensorFlow Decision Forests (TF-DF). This library provides state-of-the-art algorithms for training, serving and interpreting decision forest models, with many benefits over the previous approach, notably regarding quality, speed, and ease of use.

To start, here are equivalent examples using the Estimator API and TF-DF to create a boosted tree model.

Previously, this is how you would train a gradient boosted tree models with tf.estimator.BoostedTrees (no longer recommended)

python
import tensorflow as tf

# Dataset generators
def make_dataset_fn(dataset_path):
    def make_dataset():
        data = ... # read dataset
        return tf.data.Dataset.from_tensor_slices(...data...).repeat(10).batch(64)
    return make_dataset

# List the possible values for the feature "f_2".
f_2_dictionary = ["NA", "red", "blue", "green"]

# The feature columns define the input features of the model.
feature_columns = [
    tf.feature_column.numeric_column("f_1"),
    tf.feature_column.indicator_column(
       tf.feature_column.categorical_column_with_vocabulary_list("f_2",
         f_2_dictionary,
         # A special value "missing" is used to represent missing values.
         default_value=0)
       ),
    ]

# Configure the estimator
estimator = boosted_trees.BoostedTreesClassifier(
          n_trees=1000,
          feature_columns=feature_columns,
          n_classes=3,
          # Rule of thumb proposed in the BoostedTreesClassifier documentation.
          n_batches_per_layer=max(2, int(len(train_df) / 2 / FLAGS.batch_size)),
      )

# Stop the training is the validation loss stop decreasing.
early_stopping_hook = early_stopping.stop_if_no_decrease_hook(
      estimator,
      metric_name="loss",
      max_steps_without_decrease=100,
      min_steps=50)

tf.estimator.train_and_evaluate(
      estimator,
      train_spec=tf.estimator.TrainSpec(
          make_dataset_fn(train_path),
          hooks=[
              # Early stopping needs a CheckpointSaverHook.
              tf.train.CheckpointSaverHook(
                  checkpoint_dir=input_config.raw.temp_dir, save_steps=500),
              early_stopping_hook,
          ]),
      eval_spec=tf.estimator.EvalSpec(make_dataset_fn(valid_path)))

How to train the same model using TensorFlow Decision Forests

python
import tensorflow_decision_forests as tfdf

# Load the datasets
# This code is similar to the estimator.
def make_dataset(dataset_path):
    data = ... # read dataset
    return tf.data.Dataset.from_tensor_slices(...data...).batch(64)

train_dataset = make_dataset(train_path)
valid_dataset = make_dataset(valid_path)

# List the input features of the model.
features = [
  tfdf.keras.FeatureUsage("f_1", keras.FeatureSemantic.NUMERICAL),
  tfdf.keras.FeatureUsage("f_2", keras.FeatureSemantic.CATEGORICAL),
]

model = tfdf.keras.GradientBoostedTreesModel(
  task = tfdf.keras.Task.CLASSIFICATION,
  num_trees=1000,
  features=features,
  exclude_non_specified_features=True)

model.fit(train_dataset, valid_dataset)

# Export the model to a SavedModel.
model.save("project/model")

Remarks

While not explicit in this example, early stopping is automatically enabled and configured.
The dictionary of the “f_2” features is automatically built and optimized (e.g. rare values are merged into an out-of-vocabulary item).
The number of classes (3 in this example) is automatically determined from the dataset.
The batch size (64 in this example), has no impact on the model training. Larger values are often preferable as it makes reading the dataset more efficient.

TF-DF is all about ease of use, and the previous example can be further simplified and improved, as shown next.

How to train a TensorFlow Decision Forests (recommended solution)

import tensorflow_decision_forests as tfdf
import pandas as pd

# Pandas dataset can be used easily with pd_dataframe_to_tf_dataset.
train_df = pd.read_csv("project/train.csv")

# Convert the Pandas dataframe into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")

model = tfdf.keras.GradientBoostedTreeModel(num_trees=1000)
model.fit(train_dataset)

Remarks

We did not specify the semantics (e.g. numerical, or categorical) of the features. In this case, the semantics will be automatically inferred.
We also didn’t list which input features to use. In this case, all the columns (except for the label) will be used. The list and semantics of the input feature is visible in the training logs, or with the model inspector API.
We did not specify any validation dataset. Each algorithm will optionally extract a validation dataset from the training examples as best for the algorithm. For example, by default, GradientBoostedTreeModel uses 10% of the training data for validation if no validation dataset is provided.

Now, let’s look at a couple differences between the Estimator API and TF-DF.

Differences between the Estimator API and TF-DF

Type of algorithms

TF-DF is a collection of decision forest algorithms. This includes (but is not limited to) the Gradient Boosted Trees available with the Estimator API. Notably, TF-DF also supports Random Forest (great for nosy datasets) and a CART implementation (great for model interpretation).

In addition, for each of those algorithms, TF-DF includes many variations found in the literature and validated experimentally [1, 2, 3].

Exact vs approximate splits

The TF1 GBT Estimator is an approximated tree learning algorithm. Informally, the Estimator builds trees by only considering a random subset of examples and a random subset of the conditions at each step.

By default, TF-DF is an exact tree training algorithm. Informally, TF-DF considers all the training examples and all the possible splits at each step. This is a more common and often better performing solution.

While sometimes faster on larger datasets (>10B examples x features), the estimator approximation are often less accurate (as more trees need to be grown to reach the same quality). In a small dataset (<100M examples x features), the form of approximated training implemented in the Estimator can even be slower than exact training.

TF-DF also supports various types of “approximated” tree training. The recommended approach is to use exact training, and optionally test approximated training on large datasets.

Inference

The Estimator runs model inference using the top-down tree routing algorithm. TF-DF uses an extension of the QuickScorer algorithm.

While both algorithms return the exact same results, the top-down algorithm is less efficient because of exceeding branching predictions and cache misses. TF-DF inference is generally 10x faster on the same model.

For latency critical applications TF-DF offers a C++ API. It provides often ~1µs/example/core inference time. This is often a 50x-1000x speed-up over TF SavedModel inference (especially on small batches).

Multi-head models

The Estimator supports multi-head models (a model that outputs multiple predictions). TF-DF (currently) does not support multi-head models directly, however, using the Keras Functional API, multiple TF-DF models trained in parallel can be assembled into a multi-head model.

Learning more

You can learn more about TensorFlow Decision Forests by visiting the website. If you’re new to this library, the beginner example is a good place to start. Experienced TensorFlow users can visit this guide for important details about the difference between using decision forests and neural networks in TensorFlow, including how to configure your training pipeline, and tips on Dataset I/O. You can also see Migrate from Estimator to Keras APIs for more info on migrating from Estimators to Keras in general.

How LinkedIn Personalized Performance for Millions of Members using Tensorflow.js

March 29, 2022

by TensorFlow Blog Tensorflow

A guest post by LinkedIn

Mark Pascual, Sr. Staff Engineer

Nitin Pasumarthy, Staff Engineer

Introduction

The Performance team at LinkedIn optimizes latency to load web and mobile pages. Faster sites improve customer engagement and eventually revenue to LinkedIn. This concept is well documented by many other companies too who have had similar experiences but how do you define the optimal trade off between page load times and engagement?

The relationship between speed and engagement is non-linear. Fast loading sites, after a point, may not increase engagement by further reducing their load times. At LinkedIn we have used this relationship between engagement and speed to selectively customize the features on LinkedIn Lite – a lighter, faster version of LinkedIn, specifically built for mobile web browsers.

To do this, we trained a deep neural network to identify if a request to LinkedIn would result in a fast page load in real time. Based on the performance quality result predicted by this model we change the resolution of all images on a given user’s news feed before the resulting webpage was sent to the client. This led to an increase in the magnitude of billions for extra Feed Viral Actions (+0.23%) taken, millions more Engaged Feed Users (+0.16%) and Sponsored Revenue increased significantly for us too (+0.76%).

Image Quality Comparison: Image on the left uses 4x more memory than the one on the right which is less than ideal to send to users on slow network connections or when the device may be low on resources. Prior to using an ML model, we only showed the low resolution image which was not great for users that had capacity for higher quality images on newer devices.

We described in great detail why many of our performance optimization experiments failed back in 2017 and how we used those learnings to build a Performance Quality Model (PQM) in our Personalizing Performance: Adapting Application in real time to member environments blog.

PQM’s bold goal is to predict various performance metrics (e.g. page load time) of any web / mobile page using both device and network characteristics of end users to empower (web) developers to build impactful application features that are otherwise tricky to implement (like the one we described above).

We are happy to announce that we are open sourcing our first performance quality model that is trained on millions of RUM data samples from around the world free to use for your own website performance optimizations! Learn more and get started here.

In the rest of this blog, we will go over how our team of full stack developers deployed this PQM in production that works at Linkedin scale! We wish to prove that deploying TensorFlow.js ML models today is both easy and beneficial for those working on the Node.js stack.

TensorFlow.js: Model Deployment in Node.js

At the time of our production deployment, LinkedIn’s TensorFlow model deployment machinery was still being developed. Furthermore, using TensorFlow Serving was not yet a feasible option for us. So even though we had a model ready for use, we needed to figure out a way to deploy it.

As LinkedIn is primarily a Java/JVM stack for serving external traffic, it might seem like TensorFlow Java would be ideal, but it was still experimental and didn’t have the API stability guarantees that we require.

We looked at our options and realized that we already use Node.js (behind the JVM) as part of our frontend web serving stack in order to perform server side optimizations when serving HTML pages. The architecture for this is unique in that we use the JVM to manage an external pool of Node.js processes to perform “work,” e.g., the previously mentioned server side optimizations. The “work” can really be anything that Node.js can perform. In our use case, this enables us to use TensorFlow.js in an architecture that was already proven.

We repurposed our frontend stack to use Node.js to deploy our custom model and ended up with great results. In terms of performance, our mixed stack of Java and Node.js easily met our SLAs. The 50th and 90th percentile production latencies as measured (a) from a client (within the datacenter), (b) from on host instrumentation, and (c) in terms of only Node.js performing inference using TensorFlow.js are shown in the table below.

	50th Percentile	90th Percentile
From client (within datacenter)	10 ms	12 ms
On host	8 ms	10 ms
Inference in Node.js	2 ms	3 ms

The resulting architecture is shown above in Figure 1 below.

The API request that requires a prediction is received by the JVM server and is routed to our Rest.li infrastructure which in turn routes the request to our performance prediction resource. To handle the request, the PaaS resource performs some feature generation based on the inputs and then makes an RPC call out to the Node.js process for the prediction.

The N Node.js processes are long-lived. They are started upon JVM startup and have already loaded the desired model using tf.node.loadSavedModel(). When a process receives a request for a prediction, it simply takes the input features, calls tf_model.predict(), and returns the result. Here is a simplified version of the Node.js code:

const tf = require(‘@tensorflow/tfjs-node’);

async function main() {
  // load the model when the process starts so it’s always ready
  const model = await tf.node.loadSavedModel(‘model_dir’);

  function predict(rawInput) {
    return tf.tidy(() => {
      // prepare the inputs as tensor arrays
      const x = {}
      for (const feature of Object.keys(predictionInput)) {
        x[feature] = tf.tensor([input[feature]], [1, 1]);
      }

      const output = model.predict(x, {});
      const probs = Array.from(output.probabilities.dataSync());
      const classes = Array.from(output.all_class_ids.dataSync());
      const result = Object.fromEntries(classes.map((classId, i) => [classId, probs[i]]));
      return result; // {0: 0.8, 1: 0.15, 2: 0.05} probability of each performance quality
    });
  }

  // Register our ‘predict’ RPC handler (pseudo-code)
  // process is an abstraction of the Node.js side of the communication channel
  // with the JVM
  process.registerHandler(‘predict’, input => {    
    const result = predict(input);
    return Promise.resolve(result);
  });
}

main();

Express could replace Rest.li’s role and the feature generation pieces would need to be ported to Node.js, but everything else remains the same. As you can see, the architecture is cleaner and requires less mental hoops to manage both Java and Node.js in the same stack.

An Unexpected Win

In the architecture we described above, the external processes do not have to be Node.js. The library that we use to manage the external process is pretty straightforward to implement in most technologies. In fact, we could have chosen Python for the external processes as it’s popular for this ML use case. So what are the reasons we stuck with Node.js? Well, there’s two: (1) we already had a Node.js implementation for the external process infrastructure and would have had to develop a new one for Python, and (2) it turns out that Node.js is also slightly faster at making the predictions due to the pre/post processing benefitting from the JIT compiler of JavaScript.

In order to prove this to ourselves, we took samples (~100k) from our real world prediction inputs and ran them against both Node.js and Python. The test bed was not exactly our production stack (we didn’t include the JVM side), but it was close enough for our purposes. The results are shown below:

Stack	50th percentile	Delta (from Python)
Python	1.872 ms	0%
Node.js	1.713 ms	-8.47%

The results show that Node.js is almost 10% faster at performing inference for our specific model. Of course, performance may vary based on model architectures and the amount of pre and post processing being performed in Node. These results were from our model running on a typical production machine. Results may also vary due to model complexity, machine differences, and so on.

Checkout our README in the open source repo to find out how we tested the model in Python and Node.js.

Looking to the future

Our current unique architecture does have some areas for improvement. Probably the biggest opportunity is to address the uniqueness of this multi stack architecture itself. The mix of both Java and Node.js technologies adds additional cognitive overhead and complexity during design, development, debugging, operations, maintenance – however as previously stated you could move the whole stack to Node to simplify matters, so this is a solvable problem.

Another potential area for improvement comes from currently using a single threaded architecture on the Node.js side. Because of this, only a single prediction currently occurs at a time so latency sometimes includes some amount of queueing time. This can potentially be worked around by using Node worker threads for parallel execution that may be considered in future versions of this implementation. In our particular case, however, prediction is very fast even as it stands, so we do not feel the need to explore this right now.

Summary

The availability of TensorFlow.js gave us an easy option to deploy our model to serve production use cases when other options were not quite suitable or available to us. While our unique requirements resulted in using non-standard architecture (the mixture of the JVM and Node.js), TensorFlow.js can be used to an even greater effect in a homogeneous Node.js serving stack resulting in a very clean and performant architecture. With our open source performance quality model, a full stack JS engineer can personalize performance and improve their user engagement and we look forward to seeing how others use our open sourced model to do just that on their own websites.

Acknowledgements

This success would not be possible without the tremendous work by Prasanna Vijayanathan and Niranjan Sharma. A huge thank you to Ritesh Maheshwari and Rahul Kumar for their support and encouragement. Special thanks to Ping Yu (Google) and Jason Mayes (Google) for their continued support and feedback.

Intro mPOD DxTrack: A low-cost healthcare device using TensorFlow Lite Micro

March 28, 2022

by TensorFlow Blog Tensorflow

A guest post by Jeffrey Ly, CEO & Joanna Ashby, CMO of mPOD, Inc.

mPOD is a NIH-funded pre-seed startup headquartered out of Johnson & Johnson’s Innovation (JLABS) in New York City. In this article, we’d like to share with you a hardware device we have developed independently at mPOD leveraging TensorFlow Lite Micro (TFLM) as a core technology, called DxTrack.

mPOD DxTrack leverages TFLM and low cost hardware to enable accurate, rapid and objective interpretation of currently available lateral flow assays (LFAs) in less than 10 seconds. LFAs serve as diagnostic tools because they are low-cost and simple to use without specialized skills or equipment. Most recently popularized by COVID-19 rapid antigen tests, LFAs are also used extensively testing for pregnancy, disease tracking, STDs, food intolerances, and therapeutic drugs along with an extensive array of biomarkers totaling billions of tests sold each year. The mPOD Dxtrack is applicable to use with any type of visually read lateral flow assay, demonstrating a healthcare use case for TFLM that can directly impact our everyday lives.

The LFA begins with a sample (nasal swab, saliva, urine, blood, etc) loaded at (1) in the figure below. Once the sample has flowed to the green conjugate zone (2), it is labeled with a signaling moiety. Through capillary action, the sample will continue flowing until it is immobilized at (3), with these LFA tests, two lines indicate a positive result, one line indicates a negative result.

Figure 1. Side (A) & Top (B) view of a lateral flow assay (LFA) sample where at (1) the sample (nasal swab, saliva, urine, blood, etc) is loaded before flowing to the green zone (2), where the target is labeled with a signaling moiety. Through capillary action, the sample will continue flowing until it is immobilized at (3) to form the test line. Excess material is absorbed at (4).

Figure 2. These are the 3 possible classes results for a lateral flow assay (LFA) test.

Figure 3. This is a diagram NOWDiagnostics ADEXUSDx lateral flow assay (LFA) designed to collect and run saliva sample in point-of-care (POC) and over-the-counter (OTC) settings.

When used correctly, these tests are very effective; however self-testing presents challenges for the lay user to interpret. Significant variability is present between devices, making it difficult to tell if the test line you see is negative …or a faint positive?

Figure 4. A visualization of how the TinyML model on the mPOD DxTrack break interprets and classifies different lateral flow assay (LFA) results.

To address this challenge, we developed mPOD DxTrack, an over-the-counter (OTC) LFA reader that improves the utility of lateral flow assays by enabling rapid and objective readings with a simple, under $5 (Cost-of-Goods) globally-deployable device. The mPOD DxTrack aims to read lateral flow assay tests using ML to accomplish two goals: 1) enable rapid and objective readings of LFAs and 2) streamline digital reporting. Critically, TinyML allows for the software on the mPOD DxTrack to be deployed on low-cost (less-than $5) hardware that can be widely distributed – which is difficult with existing LFA readers which rely on high-cost/high complexity hardware that cost hundreds to thousands of dollars per unit. Ultimately, we believe that TinyML will enable the mPOD DxTrack to catch missed positive test results by removing human bias and increasing confidence in lateral flow device testing, reducing user error, and increasing overall result accuracy.

Figure 5. Assembly view of the mPOD DxTrack with lateral flow assay (LFA) cassette.

Technical Dive

Key Considerations

Achieving high accuracy 99% overall accuracy, (99% sensitivity, 99% specificity) for model performance when interpreting live-run LFA strips.
Ensuring the model can maintain that level of performance while fitting the hardware constraints.

Model size constraints for TinyML

Deployment of the DxTrack TinyML model on the Pico4ML Dev kit is constrained by 2 pieces of hardware: Flash memory and SRAM. The Pico4ML Dev kit has 2MB of flash memory to host the .uf2 file and 264kb of SRAM that accommodate the intermediate arrays (among other things) of the model. Ensuring the model size stays within these bounds is critical because while the code can successfully compile, run on the host machine and even successfully flash on the Pico4Ml Dev Kit, it will hang during set-up and not execute the main loop.

Rather than guess and check the size of intermediate arrays (a process we initially took with little reproducible success), we ended up developing a workflow that enabled us to quantify the model’s arena size by first using the interpreter function. See below, where this function was called during setup:

TfLiteStatus setup_status = ScreenInit(error_reporter);
if (setup_status != kTfLiteOk){
while(1){TF_LITE_REPORT_ERROR(error_reporter, "Set up failedn");};
 }
arena_size = interpreter->arena_used_bytes();
printf("Arena_Size Used: %zu n", arena_size);

When printed out, this is what the value from the interpreter function should look during Pico4ML Dev kit boot-up:

DEV_Module_Init OK                                                              
Arena_Size Used: 93500                                                         
 sd_spi_go_low_frequency: Actual frequency: 122070                               
V2-Version Card                                                                 
R3/R7: 0x1aa                                                                    
R3/R7: 0xff8000                                                                 
R3/R7: 0xc0ff8000                                                               
Card Initialized: High Capacity Card                                           
SD card initialized
SDHC/SDXC Card: hc_c_size: 15237                                                
Sectors: 15603712                                                               
Capacity: 7619 MB                                                           
sd_spi_go_high_frequency: Actual frequency: 12500000

With this value available to us, we are then able to set the appropriate TensorArenaSize. As you can see from above, the model uses 93500 bytes of SRAM. By setting the TensorArenaSize to just above that amount 99×1024 = 101376 bytes, we are able to allocate enough memory to host the model without going over the hardware limits (which also causes the Pico4ML Dev Kit to freeze).

// An area of memory to use for input, output, and intermediate arrays.
constexpr int  kTensorArenaSize =  99* 1024; // 136 * 1024; //81 * 1024;
static uint8_t tensor_arena[kTensorArenaSize];

Transforming from Unquantized to Quantized Models

Now that we have a reproducible methodology to quantify and deploy the model onto the Pico4ML Dev Kit, our next challenge is ensuring that the model can achieve the accuracy we require while still fitting with the size constrained by the hardware. For reference, the mPOD DxTrack platform is designed to interpret a 96×96 image. In the original model design, we were able to achieve > 99.999% accuracy with our model, but the intermediate layer is 96x96x32 at fp32 which requires over 1 MB of memory – it would never fit on the Pico4ML Dev Kit’s 264KB of SRAM. In order to achieve the size requirement for the model, we needed to take the model from unquantized to quantized; our best option was to utilize full int8 quantization. In essence, instead of treating the tensor values as floating points (float32), we correlate those values to integers (int8). Unfortunately, this decreased the model size 4-fold, allowing it to fit onto the Pico4ML Dev Kit’s rounding error from fp32 to int8 compounded, resulting in dramatically reduced model performance.

To combat this drop in model performance, we examined the effect of two different quantization strategies to improve performance: Post-training quantization (PTQ) and Quantization-aware training (QAT).

Below, we compare 3 different models to understand which quantization strategy is best. For reference:

Model 1: 2-layer convolutional network
Model 2: 3-layer convolutional network
Model 3: 4-layer convolutional network

As we can see, Quantization-aware training (QAT) uniformly beats the post-training quantization (PTQ) method and it became part of our workflow moving forward.

What performance can we achieve now?

Tested across over 800 real-world test runs, the mPOD DxTrack can preliminary achieve an overall accuracy of 98.7%. This version of the model is currently being evaluated by our network of manufacturing partners who we work closely with. Currently we are assembling a unique dataset of images as part of a patient-focused data pipeline to learn from each manufacturing partnership and building bespoke models.

Our preliminary work has also helped us correlate model performance with appropriately large dataset size to achieve the performance high enough accuracy for our healthcare application. Per the figure attached, the model needs to be trained on a quality dataset of at least 15,000 images. Our commercial-ready target is likely to require datasets that are greater than 100,000 images.

To learn more about mPOD Inc, please visit our website at www.mpod.io. If you’re interested in learning more about TinyML, we recommend checking out this book and this course.

Accelerating TensorFlow Lite Micro on Cadence Audio Digital Signal Processors

March 24, 2022

by TensorFlow Blog Tensorflow

Posted by Raj Pawate (Cadence) and Advait Jain (Google)

Digital Signal Processors (DSPs) are a key part of any battery-powered device offering a way to process audio data with a very low power consumption. These chips run signal processing algorithms such as audio codecs, noise canceling and beam forming.

Increasingly these DSPs are also being used to run neural networks such as wake-word detection, speech recognition, and noise suppression. A key part of enabling such applications is the ability to execute these neural networks as efficiently as possible.

However, productization paths for machine learning on DSPs can often be ad-hoc. In contrast, speech, audio, and video codecs have worldwide standards bodies such as ITU and 3GPP creating algorithms for compression and decompression addressing several aspects of quality measurement, fixed point arithmetic considerations and interoperability.

TensorFlow Lite Micro (TFLM) is a generic open-sourced inference framework that runs machine learning models on embedded targets, including DSPs. Similarly, Cadence has invested heavily in PPA-optimized hardware-software platforms such as Cadence Tensilica HiFi DSP family for audio and Cadence Tensilica Vision DSP family for vision.

Google and Cadence – A Multi-Year Partnership for Enabling AI at the Edge

This was the genesis of the collaboration between the TFLM team and the Audio DSP teams at Cadence, starting in 2019. The TFLM team is focusing on leveraging the broad TensorFlow framework and developing a smooth path from training to embedded and DSP deployment via an interpreter and reference kernels. Cadence is developing a highly optimized software library, called NeuralNet library (NNLIB), that leverages the SIMD and VLIW capabilities of their low-power HiFi DSPs. This collaboration started with three optimized kernels for one Xtensa DSP, and now encompasses over 50 kernels across a variety of platforms such as HiFi 5, HiFi 4, HiFi 3z, Fusion F1 as well as Vision DSPs such as P6, and includes the ability to offload to an accelerator, if available.

Additionally, we have collaborated to add continuous integration for all the optimized code targeted for the Cadence DSPs. This includes infrastructure that tests that every pull request to the TFLM repository passes all the unit tests for the Tensilica toolchain with various HiFix and Vision P6 cores. As such, we ensure that the combined TFLM and NNLIB open source software is both tightly integrated and has good automated test coverage.

Performance Improvements

Most recently, we have collaborated on adding optimizations for models that are quantized with int16 activations. Specifically in the domain of audio, int16 activations can be critical for the quality of quantized generative models. We expect that these optimized kernels will enable a new class of ML-powered audio signal processing. The table below shows a few operators that are required for implementing a noise suppression neural net. We show a 267x improvement in cycle count for a variant of SEANet, an example noise suppression neural net.

The following table shows the improvement with the optimized kernels relative to the reference implementations as measured with the Xtensa instruction set simulation tool.

Operator	Improvement
Transpose Conv	458x
Conv2D	287x
Sub	39x
Add	24x
Leaky ReLU	18x
Srided_Slice	10x
Pad	6x
Overall Network	267x

How to use these optimizations

All of the code can be used from the TFLite Micro GitHub repository.

To use HiFi 3z targeted TFLM optimizations, the following conditions need to be met:

the TensorFlow Lite (TFLite) flatbuffer model is quantized with int16 activations and int8 weights
it uses one or more of the operators listed in the table above
TFLM is compiled with OPTIMIZED_KERNEL_DIR=xtensa

For example, you can run Conv2D kernel integration tests with reference C++ code with:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=hifi4 XTENSA_CORE= test_integration_tests_seanet_conv

And compare that to the optimized kernels by adding OPTIMIZED_KERNEL_DIR=xtensa:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=xtensa TARGET_ARCH=hifi4 OPTIMIZED_KERNEL_DIR=xtensa XTENSA_CORE= test_integration_tests_seanet_conv

Looking Ahead

While the work thus far has been primarily focused on convolutional neural networks, Google and Cadence are also working together to develop an optimized LSTM operator and have released a first example of an LSTM-based key-word recognizer. We expect to expand on this and continue to bring optimized and production-ready implementations of the latest developments in AI/ML to Tensilica Xtensa DSPs.

Acknowledgements

We would like to acknowledge a number of our colleagues who have contributed to making this collaboration successful.

Cadence: Manjunath CP, Bhanu Prakash Bandaru Venkata, Anirban Mandal

Google: Advait Jain, Deiqang Chen, Lawrence Chan, Marco Tagliasacchi, Nat Jeffries, Nick Kreeger, Pete Warden, Rocky Rhodes, Ting Yan, Yunpeng Li, Victor Ungureanu

Responsible AI review processes: From a developer’s point of view

Adding machine learning to your developer toolbox

TensorFlow.js: From prototype to production, what’s new in 2022?

Deploy a custom ML model to mobile

Further on the edge with Coral Dev Board Micro

Tips and tricks for distributed large model training

Easier data preprocessing with Keras

An introduction to MLOps with TFX

Train a Dual Encoder Model

Create the text-to-image Searcher model using Model Maker

Run inference using Task Library

Future work

Acknowledgements

ARPortraitDepth: Single Image Depth Estimation

Portrait Depth API Installation

Try it yourself!

Browser Performance

Acknowledgements

Machine Learning in Karrot

How TFX helps us with production ML

Standard Components

Component Reusability

Various Runners are Supported

Technical lessons

Improving the Development Experience

Configuration first

Managing Configs with Protobuf

TensorFlow Ecosystem with Bazel

Internal Custom TFX Modules

Conclusion

MoViNet Model Architecture

Differences between the Estimator API and TF-DF

Learning more

Introduction

TensorFlow.js: Model Deployment in Node.js

An Unexpected Win

Looking to the future

Summary

Acknowledgements

Google and Cadence – A Multi-Year Partnership for Enabling AI at the Edge

Performance Improvements

How to use these optimizations

Looking Ahead

Acknowledgements

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.