How to Create a Cartoonizer with TensorFlow Lite

How to Create a Cartoonizer with TensorFlow Lite

A guest post by ML GDEs Margaret Maynard-Reid (Tiny Peppers) and Sayak Paul (PyImageSearch)

This is an end-to-end tutorial on how to convert a TensorFlow model to TensorFlow Lite (TFLite) and deploy it to an Android app for cartoonizing an image captured by the camera.

We created this end-to-end tutorial to help developers with these objectives:

  • Provide a reference for the developers looking to convert models written in TensorFlow 1.x to their TFLite variants using the new features of the latest (v2) converter — for example, the MLIR-based converter, more supported ops, and improved kernels, etc.
    (In order to convert TensorFlow 2.x models in TFLite please follow this guide.)
  • How to download the .tflite models directly from TensorFlow Hub if you are only interested in using the models for deployment.
  • Understand how to use the TFLite tools such as the Android Benchmark Tool, Model Metadata, and Codegen.
  • Guide developers on how to create a mobile application with TFLite models easily, with ML Model Binding feature from Android Studio.

Please follow along with the notebooks here for model saving/conversion, populating metadata; and the Android code on GitHub here. If you are not familiar with the SavedModel format, please refer to the TensorFlow documentation for details. While this tutorial discusses the steps of how to create the TFLite models , feel free to download them directly from TensorFlow Hub here and get started using them in your own applications.
White-box CartoonGAN is a type of generative adversarial network that is capable of transforming an input image (preferably a natural image) to its cartoonized representation. The goal here is to produce a cartoonized image from an input image that is visually and semantically aesthetic. For more details about the model check out the paper Learning to Cartoonize Using White-box Cartoon Representations by Xinrui Wang and Jinze Yu. For this tutorial, we used the generator part of White-box CartoonGAN.

Create the TensorFlow Lite Model

The authors of White-box CartoonGAN provide pre-trained weights that can be used for making inference on images. However, those weights are not ideal if we were to develop a mobile application without having to make API calls to fetch them. This is why we will first convert these pre-trained weights to TFLite which would be much more suitable to go inside a mobile application. All of the code discussed in this section is available on GitHub here. Here is a step-by-step summary of what we will be covering in this section:

  • Generate a SavedModel out of the pre-trained model checkpoints.
  • Convert SavedModel with post-training quantization using the latest TFLiteConverter.
  • Run inference in Python with the converted model.
  • Add metadata to enable easy integration with a mobile app.
  • Run model benchmark to make sure the model runs well on mobile.

Generate a SavedModel from the pre-trained model weights

The pre-trained weights of White-box CartoonGAN come in the following format (also referred to as checkpoints) –

├── checkpoint
├── model-33999.data-00000-of-00001
└── model-33999.index

As the original White-box CartoonGAN model is implemented in TensorFlow 1, we first need to generate a single self-contained model file in the SavedModel format using TensorFlow 1.15. Then we will switch to TensorFlow 2 later to convert it to the lightweight TFLite format. To do this we can follow this workflow –

  • Create a placeholder for the model input.
  • Instantiate the model instance and run the input placeholder through the model to get a placeholder for the model output.
  • Load the pre-trained checkpoints into the current session of the model.
  • Finally, export to SavedModel.

Note that the aforementioned workflow will be based on TensorFlow 1.x.
This is how all of this looks in code in TensorFlow 1.x:

with tf.Session() as sess:
input_photo = tf.placeholder(tf.float32, [1, None, None, 3], name='input_photo')

network_out = network.unet_generator(input_photo)
final_out = guided_filter.guided_filter(input_photo, network_out, r=1, eps=5e-3)
final_out = tf.identity(final_out, name='final_output')

all_vars = tf.trainable_variables()
gene_vars = [var for var in all_vars if 'generator' in var.name]
saver = tf.train.Saver(var_list=gene_vars)
sess.run(tf.global_variables_initializer())
saver.restore(sess, tf.train.latest_checkpoint(model_path))

# Export to SavedModel
tf.saved_model.simple_save(
sess,
saved_model_directory,
inputs={input_photo.name: input_photo},
outputs={final_out.name: final_out}
)

Now that we have the original model in the SavedModel format, we can switch to TensorFlow 2 and proceed toward converting it to TFLite.

Convert SavedModel to TFLite

TFLite provides support for three different post-training quantization strategies

  • Dynamic range
  • Float16
  • Integer

Based on one’s use-case a particular strategy is determined. In this tutorial, however, we will be covering all of these different quantization strategies to give you a fair idea.

TFLite models with dynamic-range and float16 quantization

The steps to convert models to TFLite using these two quantization strategies are almost identical except during float16 quantization, you need to specify an extra option. The steps for model conversion are demonstrated in the code below –

# Create a concrete function from the SavedModel 
model = tf.saved_model.load(saved_model_dir)
concrete_func = model.signatures[
tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]

# Specify the input shape
concrete_func.inputs[0].set_shape([1, IMG_SHAPE, IMG_SHAPE, 3])

# Convert the model and export
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16] # Only for float16
tflite_model = converter.convert()
open(tflite_model_path, 'wb').write(tflite_model)

A couple of things to note from the code above –

  • Here, we are specifying the input shape of the model that will be converted to TFLite. However, note that TFLite supports dynamic shaped models from TensorFlow 2.3. We used fixed-shaped inputs in order to restrict the memory usage of the models running on mobile devices.
  • In order to convert the model using dynamic-range quantization, one just needs to comment this line converter.target_spec.supported_types = [tf.float16].

TFLite models with integer quantization

In order to convert the model using integer quantization, we need to pass a representative dataset to the converter so that the activation ranges can be calibrated accordingly. TFLite models generated using this strategy are known to sometimes work better than the other two that we just saw. Integer quantized models are generally smaller as well.
For the sake of brevity, we are going to skip the representative dataset generation part but you can refer to it in this notebook.
In order to let the TFLiteConverter take advantage of this strategy, we need to just pass converter.representative_dataset = representative_dataset_gen and remove converter.target_spec.supported_types = [tf.float16].
So after we generated these different models here’s how we stand in terms of model size – You might feel tempted to just go with the model quantized with integer quantization but you should also consider the following things before finalizing this decision –

  • Quality of the end results of the models.
  • Inference time (the lower the better).
  • Hardware accelerator compatibility.
  • Memory usage.

We will get to these in a moment. If you want to dig deeper into these different quantization strategies refer to the official guide here.
These models are available on TensorFlow Hub and you can find them here.

Running inference in Python

After you have generated the TFLite models, it is important to make sure that models perform as expected. A good way to ensure that is to run inference with the models in Python before integrating them in mobile applications.
Before feeding an image to our White-box CartoonGAN TFLite models it’s important to make sure that the image is preprocessed well. Otherwise, the models might perform unexpectedly. The original model was trained using BGR images, so we need to account for this fact in the preprocessing steps as well. You can find all of the preprocessing steps in this notebook.
Here is the code to use a TFLite model for making inference on a preprocessed input image –

interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
input_details = interpreter.get_input_details()

interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'],
preprocessed_source_image)
interpreter.invoke()

raw_prediction = interpreter.tensor(
interpreter.get_output_details()[0]['index'])()

As mentioned above, the output would be an image but with BGR channel ordering which might not be visually appropriate. So, we would need to account for that fact in the postprocessing steps.
After the postprocessing steps are incorporated here is how the final image would look like alongside the original input image – Again, you can find all of the postprocessing steps in this notebook.

Add metadata for easy integration with a mobile app

Model metadata in TFLite makes the life of mobile application developers much easier. If your TFLite model is populated with the right metadata then it becomes a matter of only a few keystrokes to integrate that model into a mobile application. Discussing the code to populate a TFLite model with metadata is out of the scope for this tutorial, and please refer to the metadata guide. But in this section, we are going to provide you with some of the important pointers about metadata population for the TFLite models we generated. You can follow this notebook to refer to all the code. Two of the most important parameters we discovered during metadata population are mean and standard deviation with which the results should be processed. In our case, mean and standard deviation need to be used for both preprocessing postprocessing. For normalizing the input image the metadata configuration should be like the following –

input_image_normalization.options.mean = [127.5]
input_image_normalization.options.std = [127.5]

This would make the pixel range in an input image to [-1, 1]. Now, during postprocessing, the pixels need to be scaled back to the range of [0, 255]. For this, the configuration would go as follows –

output_image_normalization.options.mean = [-1]
output_image_normalization.options.std = [0.00784313] # 1/127.5

There are two files created from the “add metadata process”:

  • A .tflite file with the same name as the original model, with metadata added, including model name, description, version, input and output tensor, etc.
  • To help to display metadata, we also export the metadata into a .json file so that you can print it out. When you import the model into Android Studio, metadata can be displayed as well.

The models that have been populated with metadata make it really easy to import in Android Studio which we will discuss later under the “Model deployment to an Android” section.

Benchmark models on Android (Optional)

As an optional step, we used the TFLite Android Model Benchmark tool to get an idea of the runtime performance on Android before deploying it.
There are two options of using the benchmark tool, one with a C++ binary running in background and another with an Android APK running in foreground.
Here ia a high-level summary using the benchmark C++ binary:
1. Configure Android SDK/NDK prerequisites
2. Build the benchmark C++ binary with bazel

bazel build -c opt 
--config=android_arm64
tensorflow/lite/tools/benchmark:benchmark_model

3. Use adb (Android Debug Bridge) to push the benchmarking tool binary to device and make executable

adb push benchmark_model /data/local tmp
adb shell chmod +x /data/local/tmp/benchmark_model

4. Push the whitebox_cartoon_gan_dr.tflite model to device

adb push whitebox_cartoon_gan_dr.tflite /data/local/tmp

5. Run the benchmark tool

adb shell /data/local/tmp/android_aarch64_benchmark_model        
--graph=/data/local/tmp/whitebox_cartoon_gan_dr.tflite
--num_threads=4

You will see a result in the terminal like this: Repeat above steps for the other two tflite models: float16 and int8 variants.
In summary, here is the average inference time we got from the benchmark tool running on a Pixel 4: Refer to the documentation of the benchmark tool (C++ binary | Android APK) for details and additional options such as how to reduce variance between runs and how to profile operators, etc. You can also see the performance values of some of the popular ML models on the TensorFlow official documentation here.

Model deployment to Android

Now that we have the quantized TensorFlow Lite models with metadata by either following the previous steps (or by downloading the models directly from TensorFlow Hub here), we are ready to deploy them to Android. Follow along with the Android code on GitHub here.
The Android app uses Jetpack Navigation Component for UI navigation and CameraX for image capture. We use the new ML Model Binding feature for importing the tflite model and then Kotlin Coroutine for async handling of the model inference so that the UI is not blocked while waiting for the results.
Let’s dive into the details step by step:

  • Download Android Studio 4.1 Preview.
  • Create a new Android project and set up the UI navigation.
  • Set up the CameraX API for image capture.
  • Import the .tflite models with ML Model Binding.
  • Putting everything together.

Download Android Studio 4.1 Preview

We need to first install Android Studio Preview (4.1 Beta 1) in order to use the new ML Model Binding feature to import a .tflite model and auto code generation. You can then explore the tfllite models visually and most importantly use the generated classes directly in your Android projects.
Download the Android Studio Preview here. You should be able to run the Preview version side by side with a stable version of Android Studio. Make sure to update your Gradle plug-in to at least 4.1.0-alpha10; otherwise the ML Model Binding menu may be inaccessible.

Create a new Android Project

First let’s create a new Android project with an empty Activity called MainActivity.kt which contains a companion object that defines the output directory where the captured image will be stored.
Use Jetpack Navigation Component to navigate the UI of the app. Please refer to the tutorial here to learn more details about this support library.
There are 3 screens in this sample app:

  • PermissionsFragment.kt handles checking the camera permission.
  • CameraFragment.kt handles camera setup, image capture and saving.
  • CartoonFragment.kt handles the display of input and cartoon image in the UI.

The navigation graph in nav_graph.xml defines the navigation of the three screens and data passing between CameraFragment and CartoonFragment.

Set up CameraX for image capture

CameraX is a Jetpack support library which makes camera app development much easier.
Camera1 API was simple to use but it lacked a lot of functionality. Camera2 API provides more fine control than Camera1 but it’s very complex — with almost 1000 lines of code in a very basic example.
CameraX on the other hand, is much easier to set up with 10 times less code. In addition, it’s lifecycle aware so you don’t need to write the extra code to handle the Android lifecycle.
Here are the steps to set up CameraX for this sample app:

  • Update build.gradle dependencies
  • Use CameraFragment.kt to hold the CameraX code
  • Request camera permission
  • Update AndroidManifest.ml
  • Check permission in MainActivity.kt
  • Implement a viewfinder with the CameraX Preview class
  • Implement image capture
  • Capture an image and convert it to a Bitmap

CameraSelector is configured to be able to take use of the front facing and rear facing camera since the model can stylize any type of faces or objects, and not just a selfie.
Once we capture an image, we convert it to a Bitmap which is passed to the TFLite model for inference. Navigate to a new screen CartoonFragment.kt where both the original image and the cartoonized image are displayed.

Import the TensorFlow Lite models

Now that the UI code has been completed. It’s time to import the TensorFlow Lite model for inference. ML Model Binding takes care of this with ease. In Android Studio, go to File > New > Other > TensorFlow Lite Model:

  • Specify the .tflite file location.
  • “Auto add build feature and required dependencies to gradle” is checked by default.
  • Make sure to also check “Auto add TensorFlow Lite gpu dependencies to gradle” since the GAN models are complex and slow, and so we need to enable GPU delegate.

This import accomplishes two things:

  • automatically create a ml folder and place the model file .tflite file under there.
  • auto generate a Java class under the folder: app/build/generated/ml_source_out/debug/[package-name]/ml, which handles all the tasks such as model loading, image pre-preprocess and post-processing, and run model inference for stylizing the input image.

Once the import completes, we see the *.tflite display the model metadata info as well as code snippets in both Kotlin and Java that can be copy/pasted in order to use the model: Repeat the steps above to import the other two .tflite model variants.

Putting everything together

Now that we have set up the UI navigation, configured CameraX for image capture, and the tflite models are imported, let’s put all the pieces together!

  • Model input: capture a photo with CameraX and save it
  • Run inference on the input image and create a cartoonized version
  • Display both the original photo and the cartoonized photo in the UI
  • Use Kotlin coroutine to prevent the model inference from blocking UI main thread

First we capture a photo with CameraX in CameraFragment.kt under imageCaptue?.takePicture(), then in ImageCapture.OnImageSavedCallback{}, onImageSaved() convert the .jpg image to a Bitmap, rotate if necessary, and then save it to an output directory defined in MainActivity earlier.
With the JetPack Nav Component, we can easily navigate to CartoonFragment.kt and pass the image directory location as a string argument, and the type of tflite model as an integer. Then in CartoonFragment.kt, retrieve the file directory string where the photo was stored, create an image file then convert it to be Bitmap which can be used as the input to the tflite model.
In CartoonFragment.kt, also retrieve the type of tflite model that was chosen for inference. Run model inference on the input image and create a cartoon image. We display both the original image and the cartoonized image in the UI.
Note: the inference takes time so we use Kotlin coroutine to prevent the model inference from blocking the UI main thread. Show a ProgressBar till the model inference completes.
Here is what we have once all pieces are put together and here are the cartoon images created by the model: This brings us to the end of the tutorial. We hope you have enjoyed reading it and will apply what you learned to your real-world applications with TensorFlow Lite. If you have created any cool samples with what you learned here, please remember to add it to awesome-tflite – a repo with TensorFlow Lite samples, tutorials, tools and learning resources.

Acknowledgments

This Cartoonizer with TensorFlow Lite project with end-to-end tutorial was created with the great collaboration by ML GDEs and the TensorFlow Lite team. This is the one of a series of end-to-end TensorFlow Lite tutorials. We would like to thank Khanh LeViet and Lu Wang (TensorFlow Lite), Hoi Lam (Android ML), Trevor McGuire (CameraX) and Soonson Kwon (ML GDEs Google Developers Experts Program), for their collaboration and continuous support.
Also thanks to the authors of the paper Learning to Cartoonize Using White-box Cartoon Representations: Xinrui Wang and Jinze Yu.
When developing applications, it’s important to consider recommended practices for responsible innovation; check out Responsible AI with TensorFlow for resources and tools you can use. Read More

Supercharging the TensorFlow.js WebAssembly backend with SIMD and multi-threading

Supercharging the TensorFlow.js WebAssembly backend with SIMD and multi-threading

Posted by Ann Yuan and Marat Dukhan, Software Engineers at Google

In March we introduced a new WebAssembly (Wasm) accelerated backend for TensorFlow.js (scroll further down to learn more about Wasm and why this is important). Today we are excited to announce a major performance update: as of TensorFlow.js version 2.3.0, our Wasm backend has become up to 10X faster by leveraging SIMD (vector) instructions and multithreading via XNNPACK, a highly optimized library of neural network operators.

Benchmarks

SIMD and multithreading bring major performance improvements to our Wasm backend. Below are benchmarks in Google Chrome that demonstrate the improvements on BlazeFace – a light model with 0.1 million parameters and about 20 million multiply-add operations:

(times listed are milliseconds per inference)

Larger models, such as MobileNet V2, a medium-sized model with 3.5 million parameters and roughly 300 million multiply-add operations, attain even greater speedups:

*Note: Benchmarks for the TF.js multi-threaded Wasm backend are not available for Pixel 4 because multi-threading support in mobile browsers is still a work-in-progress. SIMD support in iOS is also still under development.

**Note: Node support for the TF.js multi-threaded Wasm backend is coming soon.

The performance gains from SIMD and multithreading are independent of each other. These benchmarks show that SIMD brings a 1.7-4.5X performance improvement to plain Wasm, and multithreading brings another 1.8-2.9X speedup on top of that.

Usage

SIMD is supported as of TensorFlow.js 2.1.0, and multithreading is supported as of TensorFlow.js 2.3.0.

At runtime we test for SIMD and multithreading support and serve the appropriate Wasm binary. Today we serve a different binary for each of the following cases:

  • Default: The runtime does not support SIMD or multithreading
  • SIMD: The runtime supports SIMD but not multithreading
  • SIMD + multithreading: The runtime supports SIMD and multithreading

Since most runtimes that support multi-threading also support SIMD, we decided to omit the multi-threading-only runtime to keep our bundle size down. This means that if your runtime supports multithreading but not SIMD, you will be served the default binary. There are two ways to use the Wasm backend:

  1. With NPM
    // Import @tensorflow/tfjs or @tensorflow/tfjs-core
    const tf = require('@tensorflow/tfjs');
    // Add the WAsm backend to the global backend registry.
    require('@tensorflow/tfjs-backend-wasm');

    // Set the backend to WAsm and wait for the module to be ready.
    tf.setBackend('wasm').then(() => main());

    The library expects the Wasm binaries to be located relative to the main JS file. If you’re using a bundler such as parcel or webpack, you may need to manually indicate the location of the Wasm binaries with our setWasmPaths helper:

    import {setWasmPaths} from '@tensorflow/tfjs-backend-wasm';
    setWasmPaths(yourCustomFolder);
    tf.setBackend('wasm').then(() => {...});

    See the “Using bundlers” section in our README for more information.

  2. With script tags
    <!-- Import @tensorflow/tfjs or @tensorflow/tfjs-core -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>

    <!-- Adds the WAsm backend to the global backend registry -->
    <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-wasm/dist/tf-backend-wasm.js"></script>

    <script>
    tf.setBackend('wasm').then(() => main());
    </script>

    NOTE: TensorFlow.js defines a priority for each backend and will automatically choose the best supported backend for a given environment. Today, WebGL has the highest priority, followed by Wasm, then the vanilla JS backend. To always use the Wasm backend, we need to explicitly call tf.setBackend(‘wasm’).

Demo

To see the performance improvements for yourself, check out this demo of our BlazeFace model, which has been updated to use the new Wasm backend: https://tfjs-wasm-simd-demo.netlify.app/ To compare against the unoptimized binary, try this version of the demo, which manually turns off SIMD and multithreading support.

What is Wasm?

WebAssembly (Wasm) is a cross-browser binary format that brings near-native code execution speed to the web. Wasm serves as a compilation target for programs written in statically typed high level languages, such as C, C++, Go, and Rust. In TensorFlow.js, we implement our Wasm backend in C++ and compile with Emscripten. The XNNPACK library provides a heavily optimized implementation of neural network operators underneath.
Wasm has been supported by Chrome, Safari, Firefox, and Edge since 2017, and is supported by 90% of devices worldwide.
The WebAssembly specification is evolving quickly and browsers are working hard to support a growing number of experimental features. You can visit this site to see which features are supported by your runtime, including:

  1. SIMD
    SIMD stands for Single Instruction, Multiple Data, which means that SIMD instructions operate on small fixed-size vectors of elements rather than individual scalars. The Wasm SIMD proposal makes the SIMD instructions supported by modern processors usable inside Web browsers, unlocking significant performance gains.

    Wasm SIMD is a phase 3 proposal, and is available via an origin trial in Chrome 84-86. This means developers can opt in their websites to Wasm SIMD and all their visitors will enjoy its benefits without needing to explicitly enable the feature in their browser settings. Besides Google Chrome, Firefox Nightly supports Wasm SIMD by default.

  2. Multi-threading
    Nearly all modern processors have multiple cores, each of which is able to carry out instructions independently and concurrently. WebAssembly programs can spread their work across cores via the threads proposal for performance. This proposal allows multiple Wasm instances in separate web workers to share a single WebAssembly.Memory object for fast communication between workers.

    Wasm threads is a phase 2 proposal, and has been available in Chrome desktop by default since version 74. There is an ongoing cross-browser effort to enable this functionality for mobile devices as well.

To see which browsers support SIMD, threads, and other experimental features, check out the WebAssembly roadmap.

Other improvements

Since the original launch of our Wasm backend in March, we have extended operator coverage and now support over 70 operators. Many of the new operators are accelerated through the XNNPACK library, and unlock support for additional models, like the HandPose model.

Looking ahead

We expect the performance of our Wasm backend to keep improving. We’re closely following the progress of several evolving specifications in WebAssembly, including flexible vectors for wider SIMD, quasi fused multiply-add, and pseudo-minimum and maximum instructions. We’re also looking forward to ES6 module support for WebAssembly modules. As with SIMD and multithreading, we intend to take advantage of these features as they become available with no implications for TF.js user code.

More information

Acknowledgements

We would like to thank Daniel Smilkov and Nikhil Thorat for laying the groundwork of the WebAssembly backend and the integration with XNNPACK, Matsvei Zhdanovich for collecting Pixel 4 benchmark numbers, and Frank Barchard for implementing low-level Wasm SIMD optimizations in XNNPACK.Read More

Fast Supernovae Detection using Neural Networks

Fast Supernovae Detection using Neural Networks

A guest post by Rodrigo Carrasco-Davis & The ALeRCE Collaboration, Millennium Institute of Astrophysics, Chile

Introduction

Astronomy is the study of celestial objects, such as stars, galaxies or black holes. Studying celestial objects is a bit like having a natural physics laboratory – where the most extreme processes in nature occur – and most of them cannot be reproduced here on Earth. Observing extreme events in the universe allows us to test and improve our understanding by comparing what we know about physics to what we observe in the universe.

There is a particular type of event that is very interesting for astronomers that occurs at the end of the life of massive stars. Stars are made by the concentration of hydrogen that is pulled together by gravity, and when the density is high enough, the fusion of hydrogen atoms begins, generating light and creating elements such as helium, carbon, oxygen, neon, etc. The fusion process generates an outwards pressure while gravity causes an inward pressure, maintaining the star stable while it’s burning its fuel. This changes when the star tries to fuse iron atoms, which instead of generating energy must extract energy from the star, causing the core of the star to collapse and a supernova explosion to happen.

Crab Nebula, remnant of a supernova. Space Telescope Science Institute/NASA/ESA/J. Hester/A. Loll (Arizona State University). This image is from hubblesite.org.

This process is very important for astronomers. Due to the extreme conditions during the explosion, astronomers can observe the synthesis of heavy elements, test the behavior of matter under intense pressure and temperature, and also observe the product of the explosion, which could be a neutron star or a black hole.

Supernovae can also be used as standard candles. A typical problem in astronomy is measuring distances to celestial objects. Because stars are very far from the Earth, it is difficult to know if a star is faint and close to us, or it is far away and very bright. Most of the supernova explosions in the universe occur in a similar fashion; therefore, astronomers use supernovae to measure distances, which is important for cosmologists to study, for instance, the expansion of the universe and dark energy.

Even though supernova explosion are very bright (compared to the brightness of their own host galaxy), these events are hard to find due to their distance to the Earth, due to their low occurrence rates (roughly one supernova per galaxy per century), and the transient nature of the explosion, which could last from a few days to a couple of weeks. Also, to obtain useful information from a supernova, it is necessary to perform follow-up, this is observing the supernova with an instrument called spectrograph, to measure the energy emitted during the explosion at multiple frequencies. Early follow-up is desired because many of the interesting physical processes occur within a few hours from the beginning of the explosion. So how can we find these supernova explosions fast, among all the other observed astronomical objects in the universe?

Astronomy Today

A few decades ago, the astronomer had to choose and point to a specific object in the sky to study them. Now, modern telescopes such as The Zwicky Transient Facility (ZTF), which is currently operating, or The Vera C. Rubin Observatory, will take large images of the sky at a very high rate, observing the visible sky every three days, creating a movie of the southern hemisphere sky. Today, the ZTF telescope generates 1.4TB of data per night, identifying and sending information about interesting changing objects in the sky in real-time.

When something changes its brightness, these telescopes are able to detect the changes and generate an alert. These alerts are sent through a data stream, where each alert is composed of three cropped images of 63 by 63 pixels. These three images are called science, reference and difference image. The science image is the most recent observation of that particular location, the template is usually taken at the beginning of the survey and it is used to compare it against the science image. Everything that changed between science and template should appear in the difference image, which is computed by subtracting the reference to the science image after some image processing. The ZTF telescope is currently streaming up to one million alerts per night, ten hundred thousand on average. Let’s say a human wants to check each alert manually, then if it takes 3 seconds to inspect each alert, in a regular night, it would take approximately 3.5 days to see all of the alerts of a single night.

Science, reference and difference image from left to right. These three images, plus extra important data such as observation conditions and information about the object. The fourth image is a colored version from PanSTARRS using Aladin Sky Atlas. You can see the full evolution of brightness in time of the supernova in the ALeRCE frontend.

Organizing all the incoming alerts in the stream is a massive task. When a new alert arrives, the type of astronomical objects that generated the alert is not necessarily known. Therefore, we need to check if we already know this object from other observations (cross-match). We also need to figure out which kind of astronomical object generated the alert (classification), and lastly, we need to organize the data and make it available to the community. This is the duty of astronomical broker systems, such as ALeRCE, Lasair, Antares.

Since these alerts are basically everything that changes in the sky, we should be able to find supernovae among all the alerts sent by the ZTF telescope. The problem is that other astronomical objects also produce alerts, such as stars that change their brightness (variable stars), active galactic nuclei (AGNs), asteroids and errors in the measurement (bogus alerts). Fortunately, there are some distinguishable features in the science, reference and difference images that could help us to identify which alert is supernovae, or other objects. We would like to effectively discriminate among these five classes of objects.

Five classes of astronomical objects that can be separated using only the first alert. These are five examples per class, with science, reference and difference image respectively.

In summary, active galactic nuclei tend to occur at the center of galaxies. Supernovae occur usually close to a host galaxy. Asteroids are observed near the solar system plane, and they do not appear in the template image. Variable stars are found in images crowded with other stars since these are found mostly within the Milky Way. Bogus alerts have different causes, some of them are bad pixels in the camera, bad subtraction to generate the difference image, cosmic rays (very bright, concentrated and sharp regions of the image in the center of the alert), etc. As I mentioned before, there is no way a human could possibly check every alert by hand, so we need an automatic way to classify them so astronomers can check the most interesting sources that are more likely to be a supernova.

Finding Supernovae using Neural Networks

Since we roughly understand the differences between images among the five mentioned classes, in principle we could compute specific features to correctly classify them. However, handcrafting features are usually very hard and it takes a long period of trial and error. This is why we decided to train a convolutional neural network (CNN) to solve the classification problem (Carrasco-Davis et al. 2020). In this work, we used the first alert only to quickly find supernovae.

Our architecture provides rotational invariance by making 90° rotated copies of each image in the training set, to then apply average pooling to the dense representation of each rotated version of the image. Imposing rotational invariance in this problem is very helpful, since there is no particular orientation in which structures may appear in the images of the alert (Cabrera-Vives et al. 2017, E. Reyes et al. 2018). We also added part of the metadata contained in the alert, such as the position in sky coordinates, distance to other known objects, and atmospheric condition metrics. After training the model using cross-entropy, the probabilities were highly concentrated around values of 0 or 1, even in cases when the classifier was wrong in its predicted class. This is not so convenient when an expert further filters supernovae candidates after the model made a prediction. Saturated values of 0 or 1 do not give any insight about the chances of a wrong classification and second or third class guess made by the model.

Therefore, in addition to the cross-entropy term in the loss function, we added an extra to maximize the entropy of the prediction, in order to spread the values of the output probabilities (Pereyra et al. 2017). This improves the granularity or definition of predictions, obtaining probabilities in the whole range from 0 to 1 instead of being concentrated, producing much more interpretable predictions to assist the astronomer to choose good supernovae candidates to report for follow-up.

Convolutional neural network with enhanced rotational invariance. Rotated copies for each input are created and fed to the same CNN architecture, to then apply average pooling in the dense layer before concatenating with the metadata. Finally, two other fully connected layers, and a softmax are applied to obtain the predictions.

We performed inference on 400,000 objects uniformly distributed in space over the full coverage of ZTF, as a sanity check of the model predictions. It turns out that each predicted class by the CNN is spatially distributed as expected given the nature of each astronomical object. For instance, AGNs and supernovae (SNe) are mostly found outside the Milky Way plane (extragalactic objects), since it is less likely that further objects can be seen through the Milky Way plane due to occlusion. The model correctly predicts less number of objects close to the Milky Way plane (Galactic latitudes closer to 0). Variable stars are correctly found with higher density within the Galactic plane. Asteroids are found near the solar system plane, also called the ecliptic (marked as a yellow line) as expected and bogus alerts are spread everywhere. Running inference in a large unlabeled set gave us very important clues regarding biases in our training set and helped us to identify important metadata used by the CNN.

We found that the information within the images (science, reference and difference) is enough to obtain a good classification in the training set, but integrating the information from the metadata was critical to obtain the right spatial distribution of the predictions.

Spatial distribution of unlabeled set of astronomical objects. Each plot is in galactic coordinates. The galactic latitude is centered in the Milky Way, so latitudes closer to 0 are also closer to the Milky Way plane. The galactic longitude indicates which part of the disk we are seeing within the Milky Way plane. The yellow line represents the solar system plane (ecliptic).

Supernova Hunter

A vital part of this project is the web interface that allows astronomers to explore the candidates sorted by our neural network model certainty of being a supernova. The Supernova Hunter is a visualization tool that shows important information about the alert so the astronomer chooses which objects should report as supernovae. It also has a button to report wrong classifications made by our model, so we can add it to the training set to later improve the model using these examples labeled by hand.

Supernova Hunter: User interface for exploration of supernovae candidates. It shows a list with the alerts with a high probability of being a supernova. For each alert, the images of the alert, the position of the object and metadata are displayed on the web page.

Using the neural network classifier and the Supernova Hunter, we have been able to confirm 394 supernovae spectroscopically, and report 3060 supernovae candidates to the Transient Name Server, from June 26, 2019 to July 21, 2020 at a rate of 9.2 supernova candidates reported per day. This rate of discovery of supernovae is drastically increasing the amount of available supernovae in early stages of the explosion.

The Future

We are currently working on improving the classification performance of our model to have better supernova candidates and less expert assistants to report them. Ideally, we would like to have a system that is good enough to automatically report each possible supernova candidate with high confidence.

We would also like to extend our model so it can use more than a single stamp. We developed a neural network model that is able to receive a sequence of images instead of a single stamp, so every time a new image is available for a specific object, the model is able to integrate the new arriving information so it can improve the certainty of its prediction for each class.

Another key point of our effort is focused on finding rare objects using outlier detection techniques. This is a crucial task since these new telescopes will possibly reveal new kinds of astronomical objects due to the unprecedented sampling rate and the spatial depth of each observation.

We think this new way of analyzing massive amounts of astronomical data will be not only helpful but necessary. The organization, classification and redistribution of the data for the scientific community is an important part of doing science with astronomical data. This task requires expertise from different fields, such as computer science, astronomy, engineering and mathematics. The construction of new modern telescopes such as The Vera C. Rubin Observatory will drastically change the way astronomers study celestial objects, and as the ALeRCE broker we will be ready to make this possible. For more information, please visit our website, or take a look at our papers: ALeRCE presentation paper which describes the complete processing pipeline, the stamp classifier (the work described in this blogpost), and the light curve classifier, which provides a more complex classification with a larger taxonomy of classes by using the a time series called light curve.Read More

Announcing TensorFlow Lite Micro support on the ESP32

Announcing TensorFlow Lite Micro support on the ESP32

A guest article by Vikram Dattu, Aditya Patwardhan, Kedar Sovani of Espressif Systems

Introducing ESP32: The Wi-Fi MCU

We are glad to announce TensorFlow Lite Micro support for the ESP32 chipset.

The ESP32 is a Wi-Fi/BT/BLE enabled MCU (micro-controller) that is widely used by hobbyists and makers to build cool and interesting projects that sense or modify real world data/object, and also commonly deployed in smart home appliances like light bulbs, switches, refrigerators, and air conditioners to provide connectivity.
The interesting part of ESP32 is that it’s a unique SoC that can be used right from quick prototypes to high-volume production. A wide community, numerous development kits, and plethora of tutorials/SDKs make it a great vehicle for quick prototypes with almost any vertical you might be interested in. The all-in-one package (Wi-Fi/BT/MCU) and existing high volume deployments in the field make it ideal for building end-products with.

ESP32 is already being used in a number of smart-home/connected-device projects with a variety of sensors and actuators connected to the microcontroller to sense the environment and act accordingly. With TensorFlow Lite for Microcontrollers executing on ESP32, this opens up scenarios for all kinds of use-cases that are triggered by local inference. ESP32 has 2 CPU cores and a bunch of optimizations, making it easier to run heavy TF Micro workfloads. The Wi-Fi backhaul helps to raise remote events and trigger actions based on the inferences made.

Person Detection or a Door-Bell Camera?

As an example, we have modified the person_detection example that you all might be familiar with to make it a smart door-bell camera. We use the ESP-EYE developer kit for this demonstration. Note that this example uses person detection (it detects when a face is in front of the camera), and not person identification (identifying who the person is).

The ESP-EYE dev-kit includes the ESP32 Wi-Fi/BT MCU coupled with a 2MP camera.

In Action

In our example, we will use this camera to observe and send out an email notification if we detect a person in the vicinity.

Building it for yourself

  1. Order the ESP-EYE: You can get the ESP-EYE Development Kit from your favourite distributor, or from here. You will need a USB to micro-USB cable for connecting this to your Windows/Linux/macOS host.
  2. Clone the repository: https://github.com/espressif/tensorflow/
  3. Setup your development host: Setup your development host with toolchains and utilities required to cross-build for ESP32. Follow the instructions of the ESP-IDF get started guide to set up the toolchain and the ESP-IDF itself.
  4. Generate the example: The example project can be generated with the following command:
    make -f tensorflow/lite/micro/tools/make/Makefile TARGET=esp generate_doorbell_camera_esp_project
  5. Build the example:

    a. Go to the example project directory

    cd tensorflow/lite/micro/tools/make/gen/esp_xtensa-esp32/prj/doorbell_camera/esp-idf

    b. Clone the esp32-camera component with following command:

    $ git clone https://github.com/espressif/esp32-camera components/esp32-camera

    c. Configure the camera and the email address:

    idf.py menuconfig

    d. Enter the Camera Pins configuration and SMTP Configuration menus to select the camera details, and also the email details.

    e. Build the example:

    idf.py build
  6. Flash and Run the program: Use the following command to flash and run the program:
    idf.py --port /dev/ttyUSB0 flash monitor
  7. Now, whenever a person’s face is detected, the program will send out an email to the configured email address.

What Next?

Now that you have tried the door bell camera example, you may try the other applications that are part of the TF Micro repository: hello_world and micro_speech.
ESP32 is pretty powerful for a microcontroller. Clocked at 240MHz, with just a single core it can do the detection well under 1 second (roughly ~700ms; additional optimizations are on the way to reduce this even further). This leaves the second core free for other tasks from your application.
The TinyML book is an excellent resource for a thorough understanding of TensorFlow Lite for Microcontrollers.Read More

Introducing TF-Coder, a tool that writes tricky TensorFlow expressions for you!

Introducing TF-Coder, a tool that writes tricky TensorFlow expressions for you!

Posted by Kensen Shi, Google Research

When manipulating tensors, one must keep track of multiple dimensions, tensor shape and DType compatibility, and of course mathematical correctness. Additionally, there are hundreds of TensorFlow operations, and finding the right ones to use can be a challenge.

Instead of coding your tensor manipulation directly, what if you could just demonstrate it through an illustrative example and get the corresponding code automatically? TensorFlow Coder (TF-Coder) makes this possible!

TF-Coder is a program synthesis tool that helps you write TensorFlow code. First, the tool asks for an input-output example of the desired tensor transformation. Then, it runs a combinatorial search to find TensorFlow expressions that perform that transformation. TF-Coder’s output is real TensorFlow code that you can include in your projects.

The following one-minute video introduces TF-Coder, and this Colab notebook allows you to use the TF-Coder tool for your own tensor manipulation problems.

In this blog post, we’ll illustrate various scenarios where TF-Coder can help you write TensorFlow code.

Programming in TensorFlow by example

Suppose you want to “add” an M-element vector with an N-element vector in a broadcasted way to produce an M x N matrix containing all pairwise sums. Instead of digging through TensorFlow documentation to figure out how to do this, you can instead provide an input-output example (using M = 3 and N = 4):

Input tensors, as a dict mapping input variable names to example tensor values:

inputs = {
'rows': [10, 20, 30],
'cols': [1, 2, 3, 4],
}

The desired output tensor, corresponding to the provided input tensors:

output = [[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34]]

Given this information (already entered into the TF-Coder Colab by default), the TF-Coder tool will find the appropriate TensorFlow code automatically in a fraction of a second:

tf.add(cols, tf.expand_dims(rows, 1))

The above problem was pretty simple just to illustrate the idea of programming by example. TF-Coder can be useful for harder problems as well, as we’ll see below.

TF-Coder helps you find the right function to use

Let’s suppose you are working with a numerical feature such as the price of an item. The prices in your dataset have a wide range, e.g., from under $10 to over $1000. If these prices are used directly as features, your model may overfit to specific prices in the training data, and it may also have difficulty with outlier prices during evaluation.

To deal with these issues, you may want to use bucketing to transform the numerical prices into categorical features. For example, using bucket boundaries of [10, 50, 100, 1000] means that prices under $10 should fall into bucket 0, prices between $10 and $50 fall into bucket 1, and so on.

After choosing bucket boundaries, how do you actually map the numerical prices to the bucket indices using TensorFlow? For example, given the following bucket boundaries and item prices:

# Input tensors
boundaries = [10, 50, 100, 1000]
prices = [15, 3, 50, 90, 100, 1001]

you want to compute the bucket number for each item:

# Output tensor
bucketed_prices = [1, 0, 2, 2, 3, 4]

Although TensorFlow comes with various bucketing operations, it may be tricky to figure out which specific operation does this exact kind of bucketing. Since TF-Coder can identify hundreds of Tensor operations by behavior, you can look up the correct operation by providing an input-output example:

# Input-output example
inputs = {
'boundaries': [10, 50, 100, 1000],
'prices': [15, 3, 50, 90, 100, 1001],
}
output = [1, 0, 2, 2, 3, 4]

Within seconds, TF-Coder outputs the following solution:

tf.searchsorted(boundaries, prices, side='right')

This gives us a useful hint, and the documentation for tf.searchsorted confirms that this code indeed performs the bucketing as desired.

TF-Coder helps you combine functions in clever ways

Now let’s consider another problem: compute a 0-1 tensor that identifies the maximum element of each row of the input tensor.

# Input tensor
scores = [[0.7, 0.2, 0.1],
[0.4, 0.5, 0.1],
[0.4, 0.4, 0.2],
[0.3, 0.4, 0.3],
[0.0, 0.0, 1.0]]

# Output tensor
top_scores = [[1, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]

Note that if the same largest element appears multiple times within a row, such as in the third row of scores, then only the first such largest element should be marked, so that every row of top_scores has exactly one entry of 1.

Unlike in the last problem, there is no single TensorFlow function that performs this computation. If you search the documentation for “max”, you may find that tf.reduce_max, tf.argmax, and tf.maximum are relevant, but which one should you use? tf.reduce_max produces [0.7, 0.5, 0.4, 0.4, 1.0], tf.argmax produces [0, 1, 0, 1, 2], and tf.maximum isn’t right because it takes two arguments. None of these look close to our desired output.

TF-Coder can help solve tricky problems like this. You can write the problem in the form of an input-output example:

# Input-output example
inputs = {
'scores': [[0.7, 0.2, 0.1],
[0.4, 0.5, 0.1],
[0.4, 0.4, 0.2],
[0.3, 0.4, 0.3],
[0.0, 0.0, 1.0]],
}
output = [[1, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]

TF-Coder uses a combination of tf.one_hot and tf.argmax in a short solution to this problem:

tf.cast(tf.one_hot(tf.argmax(scores, axis=1), 3), tf.int32)

Through a detailed search over combinations of TensorFlow operations, TF-Coder often finds elegant solutions like this, which may simplify and speed up your TensorFlow programs.

TF-Coder helps you write correct code with less debugging

Consider normalizing lists of integer counts into probability distributions by dividing each row by the sum of that row. For instance:

# Input tensor
counts = [[0, 1, 0, 0],
[0, 1, 1, 0],
[1, 1, 1, 1]]

# Output tensor
normalized = [[0.0, 1.0, 0.0, 0.0],
[0.0, 0.5, 0.5, 0.0],
[0.25, 0.25, 0.25, 0.25]]

Even if you know relevant functions to use (tf.reduce_sum followed by tf.divide), writing the correct code is still nontrivial. A first attempt may look like this:

# First attempt
normalized = tf.divide(counts, tf.reduce_sum(counts, axis=1))

Is this right? There are many potential pitfalls to think about:

  • Is the summation axis correct, or should it be axis=0?
  • Are the shapes of counts and tf.reduce_sum(counts, axis=1) compatible for division, or do you need to reshape or transpose either of these?
  • counts and tf.reduce_sum(counts, axis=1) are both tf.int32 tensors. Can tf.int32 tensors be divided, or do you need to cast them to a float DType first?
  • Are the two arguments in the correct order, or should they be swapped?
  • Does the output have type tf.int32, tf.float32, or something else?
  • Is there a simpler or better way that was not considered?

You can give this task to TF-Coder with the following input-output example:

# Input-output example
inputs = {
'counts': [[0, 1, 0, 0],
[0, 1, 1, 0],
[1, 1, 1, 1]],
}
output = [[0.0, 1.0, 0.0, 0.0],
[0.0, 0.5, 0.5, 0.0],
[0.25, 0.25, 0.25, 0.25]]

TF-Coder’s solution is:

tf.cast(tf.divide(counts, tf.expand_dims(tf.reduce_sum(counts, axis=1), axis=1)), tf.float32)

By using TF-Coder to solve this problem, the mental burden of the exercise is reduced. When TF-Coder produces the solution above, it is guaranteed that the code correctly produces the example output when run on the example input. TF-Coder’s solution will also avoid any unnecessary steps. Thus, you can quickly deduce the answers to most of the questions above: an extra tf.expand_dims step is needed to make the shapes compatible for division, and the result of tf.divide must be cast to tf.float32 (in fact tf.divide returns a tf.float64 tensor when dividing two tf.int32 tensors). In this way, TF-Coder helps you write simple and correct code without painful debugging cycles.

Caveats

There are limitations to TF-Coder. It can currently find solutions involving 3-4 operations within a minute of searching, but solutions involving 6 or more operations are too complex to find in a reasonable amount of time. Furthermore, TF-Coder currently does not support complex or string tensors, or RaggedTensors. The full list of supported operations can be found in the Colab notebook.
In addition, TF-Coder only guarantees that its solutions work for the given input-output example. The tool searches for a simple TensorFlow expression that matches the provided input-output example, but sometimes this solution is too simple and doesn’t generalize in the intended way. It can be helpful to make the example as unambiguous as possible, which can often be achieved by adding more numbers to the input and output tensors. Please review TF-Coder’s solutions to ensure that they correctly implement the intended behavior.

Try TF-Coder yourself!

Be sure to give TF-Coder a try! Even experienced TensorFlow users at Google are learning new things with the help of TF-Coder.
You can access the tool using this Colab notebook — no download or installation is required. Follow this tutorial for a detailed walkthrough. You can also take a look at our code and documentation on GitHub and our research paper.

Note: in the Colab tool, we would like to log the problems given to TF-Coder and the resulting solutions, so that we can improve the tool and build a dataset that will accelerate program synthesis research in general, but this data collection is completely optional.Read More

Introducing Danfo.js, a Pandas-like Library in JavaScript

Introducing Danfo.js, a Pandas-like Library in JavaScript

A guest post by Rising Odegua, Independent Researcher; Stephen Oni, Data Science Nigeria

Danfo.js is an open-source JavaScript library that provides high-performance, intuitive, and easy-to-use data structures for manipulating and processing structured data. Danfo.js is heavily inspired by the Python Pandas library and provides a similar interface/API. This means that users familiar with the Pandas API and know JavaScript can easily pick it up.
One of the main goals of Danfo.js is to bring data processing, machine learning and AI tools to JavaScript developers. This is in line with our vision and essentially the vision of the TensorFlow.js team, which is to bring ML to the web. Open-source libraries like Numpy and Pandas revolutionise the ease of manipulating data in Python and lots of tools were built around them, thus driving the bubbling ecosystem of ML in Python.

Danfo.js is built on TensorFlow.js. That is, as Numpy powers Pandas arithmetic operations, we leverage TensorFlow.js to power our low-level arithmetic operations.

Some of the main features of Danfo.js

Danfo.js is fast. It is built on TensorFlow.js, and supports tensors out of the box. This means you can load Tensors in Danfo and also convert Danfo data structure to Tensors. Leveraging these two libraries, you have a data processing library on one hand (Danfo.js), and a powerful ML library on the other hand (TensorFlow.js).

In the example below, we show you how to create a Danfo DataFrame from a tensor object:

const dfd = require("danfojs-node")
const tf = require("@tensorflow/tfjs-node")

let data = tf.tensor2d([[20,30,40], [23,90, 28]])
let df = new dfd.DataFrame(data)
let tf_tensor = df.tensor
console.log(tf_tensor);
tf_tensor.print()

Output:

Tensor {
kept: false,
isDisposedInternal: false,
shape: [ 2, 3 ],
dtype: 'float32',
size: 6,
strides: [ 3 ],
dataId: {},
id: 3,
rankType: '2'
}
Tensor
[[20, 30, 40],
[23, 90, 28]]

You can easily convert Arrays, JSONs, or Objects to DataFrame objects for manipulation.

JSON object to DataFrame:

const dfd = require("danfojs-node")
json_data = [{ A: 0.4612, B: 4.28283, C: -1.509, D: -1.1352 },
{ A: 0.5112, B: -0.22863, C: -3.39059, D: 1.1632 },
{ A: 0.6911, B: -0.82863, C: -1.5059, D: 2.1352 },
{ A: 0.4692, B: -1.28863, C: 4.5059, D: 4.1632 }]
df = new dfd.DataFrame(json_data)
df.print()

Output:

Object array with column labels to DataFrame:

const dfd = require("danfojs-node")
obj_data = {'A': [“A1”, “A2”, “A3”, “A4”],
'B': ["bval1", "bval2", "bval3", "bval4"],
'C': [10, 20, 30, 40],
'D': [1.2, 3.45, 60.1, 45],
'E': ["test", "train", "test", "train"]
}
df = new dfd.DataFrame(obj_data)
df.print()

Output:

You can easily handle missing data (represented as NaN) in floating point as well as non-floating point data:

const dfd = require("danfojs-node")
let data = {"Name":["Apples", "Mango", "Banana", undefined],
"Count": [NaN, 5, NaN, 10],
"Price": [200, 300, 40, 250]}
let df = new dfd.DataFrame(data)
let df_filled = df.fillna({columns: ["Name", "Count"], values: ["Apples",
df["Count"].mean()]})
df_filled.print()

Output:

Intelligent label-based slicing, fancy indexing, and querying of large data sets:

const dfd = require("danfojs-node")
let data = { "Name": ["Apples", "Mango", "Banana", "Pear"] ,
"Count": [21, 5, 30, 10],
"Price": [200, 300, 40, 250] }

let df = new dfd.DataFrame(data)
let sub_df = df.loc({ rows: ["0:2"], columns: ["Name", "Price"] })
sub_df.print()

Output:

Robust IO tools for loading data from flat-files (CSV and delimited). Both in full and chunks:

const dfd = require("danfojs-node")
//read the first 10000 rows
dfd.read_csv("file:///home/Desktop/bigdata.csv", chunk=10000)
.then(df => {
df.tail().print()
}).catch(err=>{
console.log(err);
})

Robust data preprocessing functions like OneHotEncoders, LabelEncoders, and scalers like StandardScaler and MinMaxScaler are supported on DataFrame and Series:

const dfd = require("danfojs-node")
let data = ["dog","cat","man","dog","cat","man","man","cat"]
let series = new dfd.Series(data)
let encode = new dfd.LabelEncoder()
encode.fit(series)
let sf_enc = encode.transform(series)
let new_sf = encode.transform(["dog","man"])

Output:

Interactive, flexible and intuitive API for plotting DataFrames and Series in the browser:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<script src="https://cdn.jsdelivr.net/npm/danfojs@0.1.1/dist/index.min.js"></script>
<title>Document</title>
</head>
<body>
<div id="plot_div"></div>
<script>
dfd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv")

.then(df => {
var layout = {
title: 'A financial charts',
xaxis: {title: 'Date'},
yaxis: {title: 'Count'}
}
new_df = df.set_index({ key: "Date" })
new_df.plot("plot_div").line({ columns: ["AAPL.Open", "AAPL.High"], layout: layout
})
}).catch(err => {
console.log(err);
})
</script>
</body>
</html>

Output:

Titanic Survival Prediction using Danfo.js and Tensorflow.js
Below we show a simple end-to-end classification task using Danfo.js and TensorFlow.js. We use Danfo for data loading, manipulating and preprocessing of the dataset, and then export the tensor object.

const dfd = require("danfojs-node")
const tf = require("@tensorflow/tfjs-node")

async function load_process_data() {
let df = await dfd.read_csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")

//A feature engineering: Extract all titles from names columns
let title = df['Name'].apply((x) => { return x.split(".")[0] }).values
//replace in df
df.addColumn({ column: "Name", value: title })

//label Encode Name feature
let encoder = new dfd.LabelEncoder()
let cols = ["Sex", "Name"]
cols.forEach(col => {
encoder.fit(df[col])
enc_val = encoder.transform(df[col])
df.addColumn({ column: col, value: enc_val })
})

let Xtrain,ytrain;
Xtrain = df.iloc({ columns: [`1:`] })
ytrain = df['Survived']

// Standardize the data with MinMaxScaler
let scaler = new dfd.MinMaxScaler()
scaler.fit(Xtrain)
Xtrain = scaler.transform(Xtrain)

return [Xtrain.tensor, ytrain.tensor] //return the data as tensors
}

Next, we create a simple neural network using TensorFlow.js.

function get_model() {
const model = tf.sequential();
model.add(tf.layers.dense({ inputShape: [7], units: 124, activation: 'relu', kernelInitializer: 'leCunNormal' }));
model.add(tf.layers.dense({ units: 64, activation: 'relu' }));
model.add(tf.layers.dense({ units: 32, activation: 'relu' }));
model.add(tf.layers.dense({ units: 1, activation: "sigmoid" }))
model.summary();
return model
}

Finally, we perform training, by first loading the model and the processed data as tensors. This can be fed directly to the neural network.

async function train() {
const model = await get_model()
const data = await load_process_data()
const Xtrain = data[0]
const ytrain = data[1]

model.compile({
optimizer: "rmsprop",
loss: 'binaryCrossentropy',
metrics: ['accuracy'],
});

console.log("Training started....")
await model.fit(Xtrain, ytrain,{
batchSize: 32,
epochs: 15,
validationSplit: 0.2,
callbacks:{
onEpochEnd: async(epoch, logs)=>{
console.log(`EPOCH (${epoch + 1}): Train Accuracy: ${(logs.acc * 100).toFixed(2)},
Val Accuracy: ${(logs.val_acc * 100).toFixed(2)}n`);
}
}
});
};

train()

The reader will notice that the API of Danfo is very similar to Pandas, and a non-Javascript programmer can easily read and understand the code. You can find the full source code of the demo above here (https://gist.github.com/risenW/f54e4e5b6d92e7b1b9b1f30e884ca83c).

Closing Remarks

As web-based machine learning has matured, it is imperative to have efficient data science tools built specifically for it. Tools like Danfo.js will enable web-based applications to easily support ML features, thus opening the space to an ecosystem of exciting applications. TensorFlow.js started the revolution by providing ML capabilities available in Python, and we hope to see Danfo.js as an efficient partner in this journey. We can’t wait to see what Danfo.js grows into! Hopefully, it becomes indispensable to the web community as well.

  • Play with Danfo.js on CodePen
  • Link to the official getting started guide
  • Link to Github repository

Read More

Introducing TensorFlow Videos for a Global Audience: Japanese

Introducing TensorFlow Videos for a Global Audience: Japanese

Posted by the TensorFlow Team

When the TensorFlow YouTube channel launched in 2018, we had a vision to inform and inspire developers around the world about what was possible with Machine Learning. With series like Coding TensorFlow showing how you can use it, and Made with TensorFlow showing inspirational stories about what people have done with TensorFlow and much more, the channel has grown greatly. But we learned an important lesson: it’s a global phenomenon, and to reach the world effectively, we should provide content in multiple languages with native speakers presenting. Check out the popular Zero to Hero series in Japanese!

TensorFlow で機械学習ゼロからヒーローへ

最近は、インターネットや新聞、本などを閲覧していると、嫌でも機械学習や AI のようなバズワードが目に入ってくるようになりました。様々な分野で話題になっているおかげで、たくさんの情報が見つかるようになっています。ですが、デベロッパーの視点から見た機械学習とは、一体どういう物なのでしょうか?TensorFlow チームに所属するロレンス・モローニは、その疑問に応えるため、Google I/O 2019 でした好評だったスピーチをベースに、4 部に及ぶ動画シリーズ「機械学習: TensorFlow でゼロからヒーローへ」を作成しました。

第一部では、Java や C++ で作成された具体的なルールに従って動く従来のプログラムと、データからルール自体を推測するシステムである機械学習の違いを学ぶことができます。機械学習とは、どのようなコードで構成されているのか?などの質問に応えるため、シンプルな具体例を使って、機械学習モデルを作成する手順を解説します。ここで語られるいくつかのコンセプトは、第二部の、コンピュータ ビジョンの動画でも応用されています。

第二部では、機械学習を使った基本的なコンピュータ ビジョン(コンピューターに視覚的に確認させ、様々な対象をを認識させること)の仕組みを解説します。こちらのリンク先では、自らコードを実行してみることも可能です : https://goo.gle/34cHkDk

第三部では、なぜ畳み込みニューラル ネットワークがコンピュータ ビジョンの分野で優れているのかを解説します。畳み込みで使われるフィルタに画像を通すと、画像の類似点を明らかにする特徴を捉えてくれます。動画内では、実際に画像にフィルタを適用し、特徴を抽出するプロセスをご覧になっていただけます。
こちらのリンク先では、動画の内容をおさらいすることができます : http://bit.ly/2lGoC5f

第四部では、じゃんけん識別器の作り方を学びます。第一部では、じゃんけんの手を識別するコードを書くことの難しさについて説明しましたが、これまでの 3 つの動画で学んだことを総合し、画像のピクセルからパターンを探し出して、画像を分類し、畳み込みを使って特徴を抽出するニューラル ネットワークを作成するだけで、なんとじゃんけん識別器を自作できてしまいます。

Colab ノート : http://bit.ly/2lXXdw5

データセット : http://bit.ly/2kbV92O

動画シリーズはお楽しみいただけましたでしょうか?もっと知りたいと感じた方は、ぜひフィードバックで教えてください!Read More

Introducing Semantic Reactor: Explore NLP in Google Sheets

Introducing Semantic Reactor: Explore NLP in Google Sheets

Posted by Dale Markowitz, Applied AI Engineer

Editor’s note: An earlier version of this article was published on Dale’s blog.

Machine learning can be tricky, so being able to prototype ML apps quickly is a boon. If you’re building a language-powered app — like a video game with characters players can talk to or a customer service bot — the Semantic Reactor is a tool that will help you do just that.

The Semantic Reactor is a new plugin for Google Sheets that lets you run natural language understanding (NLU) models (variations the Universal Sentence Encoder) on your own data, right from a spreadsheet.
In this post, I’ll show you how to work with the tool and the NLU models it uses, but first, how does NLP actually work? What’s going on under the hood? (Want to skip straight to the tool? Scrolling to the next section.)

Understanding Embeddings

What are Word Embeddings?

One simple (but powerful) technique for building natural-language-powered software is to use “embeddings.”

In machine learning, embeddings are a learned way of representing data in space (i.e. points plotted on an n-dimensional grid) such that the distances between points are meaningful. Word vectors are one popular example:
The picture above is a rough visual example of how words can be closer or further away from each other. Note that the words “Austin,” “Texas,” and “barbecue” have a close relationship with each other, as do “pet” and “dog,” and “walk” and “run.” Each word is represented by a set of coordinates (or a vector) and are placed on a graph where we can see relationships. For instance, we can see that the word “rat” is close to both “pet” and also “cat”.

Where do these numbers come from? They’re learned by a machine learning model through many bits of conversational and language data. By showing all those examples, the model learns which words tend to occur in the same spots in sentences.

Consider these two sentences:

  • “My mother gave birth to a son.”
  • “My mother gave birth to a daughter.”

Because the words “daughter” and “son” are often used in similar contexts, the model will learn that they should be represented close to each other in space. Word embeddings are useful in natural language processing. They can be used to find synonyms (“semantic similarity”), to solve analogies, or as a preprocessing step for a more complicated model. You can quickly train your own basic word embeddings with TensorFlow here.

What are Sentence Embeddings?

It turns out that entire sentences (and even short paragraphs) can be effectively embedded in space too, using a type of model called a universal sentence encoder. Using sentence embeddings, we can figure out if two sentences are similar. This is useful, for example, if you’re building a chatbot and want to know if a question a user asked (i.e. “When will you wake me up?”) is semantically similar to a question you – the chatbot programmer – have anticipated and written a response to (“What time is my alarm?”).

Semantic Reactor: Prototype using NLP in a Google Sheet

Alright, now onto the fun part: Building things! There are three NLP models available in the Semantic Reactor:

  • Local – A small TensorFlow.js version of the Universal Sentence Encoder that can run entirely within a webpage.
  • Basic Online – A full sized, general-use version of the Universal Sentence Encoder.
  • Multilingual Online – A full-sized Universal Sentence Encoder model trained on question/answer pairs in 16 languages.

Each model offers two ranking methods:

  • Semantic Similarity: How similar are two blocks of text?

    Great for applications where you can anticipate what users might ask, like an FAQ bot. (Many customer service bots use semantic similarity to help deliver good answers to users.)

  • Input / Response: How good of a response is one block of text to another?

    Useful for when you have a large, and constantly changing, set of texts and you don’t know what users might ask. For instance, Talk to Books, a semantic search tool for a regularly updated collection of 100,000 books, uses input / response.

You can use the Semantic Reactor to test a response list against each model and ranking method. Sometimes it takes a good bit of experimenting before you get your response list and model selection to one you think will work for your application. The good news is that doing that work in a Google Sheet makes it fast and easy.
Once you have your response list, model selection and ranking method decided on, you can then begin writing code, and if you want to keep all operations within a website or on device (without requiring online API calls), you can use the newly updated TensorFlow.js model.
As mentioned, there are lots of great uses for NLU tech, and more interesting applications come out almost everyday. Every digital assistant, customer service bot, and search engine is likely using some flavor of machine learning. Smart Reply and Smart Compose in Gmail are two well-used features that make good use of semantic tech.
However, it’s fun and helpful to play with the tech within applications where the quality demands aren’t so high, where failure is okay and even entertaining. To that end, we’ve used the same tech that’s within the Semantic Reactor to create a couple of example games. Semantris is a word association game that uses the input-response ranking method, and The Mystery of the Three Bots uses semantic similarity.
Playing those two games, and finding out where they work and where they don’t, might give you ideas on what experiences you might create.

Semantris, a word-association game powered by word embeddings.
The Mystery of the Three Bots is a simple game powered by NLU and available as open source code. (It’s also playable here.)

One of the coolest applications of this tech comes from Anna Kipnis, a former game designer at Double Fine who now works with Stadia. She used Semantic Reactor to prototype a video game world that infers how the environment should react to player inputs using ML. Check out our conversation here.
In Anna’s game, players interact with a virtual fox by asking any question they think of:

  • “Fox, can I have some coffee?”

Then, using Semantic ML, the game engine (or the utility system) considers all of the possible ways the game might respond:

  • “Fox turns on lights.“
  • “Fox turns on radio.“
  • “Fox move to you.“
  • “Fox brings you mug.“

Using a sentence encoder model, the game decides what the best response is and executes it (in this case, the best response is “Fox brings you a mug,” so the game animates the Fox bringing you a mug). If that sounds a little abstract, definitely watch the video linked above.
Let’s see how you might build something like Anna’s game with Semantic Reactor (for all the nitty gritties of the fox demo, check out her original post).
First, create a new Google sheet and write some sentences in the first column. I put these sentences in the first column of my Google sheet:

  • I grab a ball
  • I go to you
  • I play with a ball
  • I go to school.
  • I go to the mug.
  • I bring you the mug.
  • I turn on music.
  • I take a nap.
  • I go for a hike.
  • I tell you a secret.
  • I snuggle with you.
  • I ask for a belly rub.
  • I send a text.
  • I answer the phone.
  • I make a sandwich.
  • I drink some water.
  • I play a board game.
  • I do some coding.

You’ll have to use your imagination here and think of these “actions” that a potential character (e.g. a chatbot or an actor in a video game) might take.
Once you’ve applied for and been given access to Semantic Reactor, you’ll be able to enable it by clicking on “Add-ons -> Semantic Reactor -> Start”. Clicking “Start” will open a panel that allows you to type in an input and hit “React”: When you hit “React”, Semantic Reactor uses a model to embed all of the responses you’ve written in that first column, calculate a score (how good a response is this sentence to the query?), and sort the results. For example, when my input was “I want some coffee,” the top ranked responses from my spreadsheet were, “I go to the mug” and “I bring you the mug.” You’ll also notice that there are two different ways to rank sentences using this tool: “Input/Response” and “Semantic Similarity.” As the name implies, the former ranks sentences by how good they are as responses to the given query, whereas “Semantic Similarity” simply rates how similar the sentences are to the query.

From Spreadsheet to Code with TensorFlow.js

Underneath the hood, Semantic Reactor is powered by the open-source TensorFlow.js models found here.
Let’s take a look at how to use those models in JavaScript, so that you can convert your spreadsheet prototype into a working app.
1 – Create a new Node project and install the module:

npm init
npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

2 – Create a new file (use_demo.js) and require the library:

require('@tensorflow/tfjs');
const encoder = require('@tensorflow-models/universal-sentence-encoder');

3 – Load the model:

const model = await encoder.loadQnA();

4 – Encode your sentences and query:

const input = {
queries: ["I want some coffee"],
responses: [
"I grab a ball",
"I go to you",
"I play with a ball",
"I go to school.",
"I go to the mug.",
"I bring you the mug."
]
};

const embeddings = await model.embed(input);

5 – Voila! You’ve transformed your responses and query into vectors. Unfortunately, vectors are just points in space. To rank the responses, you’ll want to compute the distance between those points (you can do this by computing the dot product, which gives you the squared Euclidean distance between points):

  //zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
const zipWith =
(f, xs, ys) => {
const ny = ys.length;
return (xs.length .map((x, i) => f(x, ys[i]));
}

// Calculate the dot product of two vector arrays.
const dotProduct = (xs, ys) => {
const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined;

return xs.length === ys.length ?
sum(zipWith((a, b) => a * b, xs, ys))
: undefined;
}

If you run this code, you should see output like:

 [
{ response: 'I grab a ball', score: 10.788130270345432 },
{ response: 'I go to you', score: 11.597091717283469 },
{ response: 'I play with a ball', score: 9.346379028479209 },
{ response: 'I go to school.', score: 10.130473646521292 },
{ response: 'I go to the mug.', score: 12.475453722603106 },
{ response: 'I bring you the mug.', score: 13.229019199245684 }
]

Check out the full code sample here.
And that’s it–that’s how you go from a Semantic ML spreadsheet to code fast!
An earlier version of this post was published at https://daleonai.com/semantic-ml.Read More

Optimizing Peptides in TensorFlow 2

Optimizing Peptides in TensorFlow 2

A guest post by Somesh Mohapatra, Rafael Gómez-Bombarelli of MIT

Introduction

A polymer is a material made up of long repeating chains of molecules, like plastic or rubber. Polymers are made up of subunits (monomers) that are chemically bound to one another. The chemical composition and arrangement of monomers dictate the properties of the polymer. A few examples of polymers in everyday use are water bottles, non-stick teflon coatings, and adhesives.

Figure 1. Conceptually, you can think of Peptimizer as generating a sequence of amino acids, then predicting a property of the peptide, then optimizing the sequence.

Peptides are short polymer chains made up of amino acids, analogous to words composed of letters. They are widely used for therapeutic applications, such as for the delivery of gene therapy by cell-penetrating peptides. Thanks to their modular chemistry amenable to automated synthesis and expansive design space, peptides are increasingly preferred over more conventional small molecule drugs, which are harder to synthesize. However, the vast sequence space (in terms of the amino acid arrangement) acts as an impediment to the design of functional peptides.

Synthetic accessibility, apart from functionality optimization, is a challenge. Peptides and other functional polymers with a precise arrangement of monomers are synthesized using methods such as flow chemistry. The synthesis involves monomer-by-monomer addition to a growing polymer chain. This process necessitates a high reaction yield for every step, thus making the accessibility of longer chains challenging.

Conventional approaches for optimization of functional polymers, such as peptides, in a lab environment involve the heuristic exploration of chemical space by trial-and-error. However, the number of possible polymers rises exponentially as mn, where m is the number of possible monomers, and n is the polymer length.

As an alternative to doing an experiment in a lab, you can design functional polymers using machine learning. In our work on optimizing cell-penetrating activity and synthetic accessibility, we design peptides using Peptimizer, a machine learning framework based on TensorFlow. Conceptually, you can think of Peptimizer as generating a sequence of amino acids, then predicting a property of the peptide, then optimizing the sequence.

Peptimizer can be used for the optimization of functionality (other than cell-penetrating activity as well) and synthetic accessibility of polymers. We use topological representations of monomers (amino acids) and matrix representations of polymer chains (peptide sequences) to develop interpretable (attribute the gain in property to a specific monomer and/or chemical substructure) machine learning models. The choice of representation and model architecture enables inference of biochemical design principles, such as monomer composition, sequence length or net charge of polymer, by using gradient-based attribution methods.

Key challenges for applying machine learning to advance functional peptide design include limited dataset size (usually less than 100 data points), choosing effective representations, and the ability to explain and interpret models.

Here, we use a dataset of peptides received from our experimental collaborators to demonstrate the utility of the codebase.

Optimization of functionality

Based on our work on designing novel and highly efficient cell-penetrating peptides, we present a framework for the discovery of functional polymers (Figure 1). The framework consists of a recurrent neural network generator, convolutional neural network predictor, and genetic algorithm optimizer.

The generator is trained on a dataset of peptide sequences using Teacher Forcing, and enables sampling of novel sequences similar to the ones in the training dataset. The predictor is trained over matrix representations of sequences and experimentally determined biological activity. The optimizer is seeded with sequences sampled utilizing the generator. It optimizes by evaluating an objective function that involves the predicted activity and other parameters such as length and arginine content. The outcome is a list of optimized sequences with high predicted activity, which may be validated in wet-lab experiments.

Each of these components can be accessed from the tutorial notebook to train on a custom dataset. The scripts for the individual components have been designed in a modular fashion and can be modified with relative ease.

Optimization of synthetic accessibility

Apart from functionality optimization, Peptimizer allows for the optimization of synthetic accessibility of a wild-type sequence (Figure 2). The framework consists of a multi-modal convolutional neural network predictor and a brute force optimizer. The predictor is trained over experimental synthesis parameters such as pre-synthesized chain, incoming monomer, temperature, flow rate, and catalysts. The optimizer evaluates single-point mutants of the wild-type sequence for higher theoretical yield.

The choice of a brute force optimizer for optimization of synthetic accessibility is based on the linearly growing sequence space (m x n) for the variations of the wild-type sequence. This sequence space is relatively small in comparison to the exponentially growing sequence space (mn) encountered in optimization of functionality.

This framework may be adapted for other stepwise chemical reaction platforms with in-line monitoring by specifying the different input and output variables and respective data types. It can be accessed using a tutorial notebook.

Figure 2. Outline of synthetic accessibility optimization.

Interpretability of models

A key feature of Peptimizer is the gradient-based attribution for the interpretation of model predictions (Figure 3). Taking the gradient of the predicted activity with the input sequence representation, we visualize both positive and negative activations for each input feature. Fingerprint indices corresponding to substructures that positively contribute to the activity have higher activation in the heatmap. This activation heatmap is averaged along the topological fingerprints axis to find key substructures or chemical motifs that contribute positively/negatively to the predicted activity. Averaging over the monomer position axis, we obtain the relative contribution of each monomer to the predicted functionality of the polymer. These visualizations provide in-depth insight into sequence-activity relationships and add to the contemporary understanding of biochemical design principles.

Figure 3. (left) Positive gradient activation heatmap, and (right) activated chemical substructure, for functional peptide sequence.

Outlook

Optimization of functional polymers using Peptimizer can inform experimental strategies and lead to significant savings in terms of time and costs. We believe that the tutorial notebooks will help bench scientists in chemistry, materials science, and the broader field of sequence design to run machine learning models over custom datasets, such as Khazana. In addition, the attribution methods will provide insights into the high-dimensional sequence-activity relationships and elucidation of design principles.

Experimental collaboration

This work was done in collaboration with the lab of Bradley Pentelute (Department of Chemistry, MIT). The collaborators for the optimization of functionality and synthetic accessibility were Carly Schissel and Dr. Nina Hartrampf, respectively. We thank them for providing the dataset, experimental validation, and the discussion during the development of the models.

Acknowledgment

We would like to acknowledge the support of Thiru Palanisamy and Josh Gordon at Google for their help with the blog post collaboration and with providing active feedback.Read More

Even Faster Mobile GPU Inference with OpenCL

Even Faster Mobile GPU Inference with OpenCL

Posted by Juhyun Lee and Raman Sarokin, Software Engineers

While the TensorFlow Lite (TFLite) GPU team continuously improves the existing OpenGL-based mobile GPU inference engine, we also keep investigating other technologies. One of those experiments turned out quite successful, and we are excited to announce the official launch of OpenCL-based mobile GPU inference engine for Android, which offers up to ~2x speedup over our existing OpenGL backend, on reasonably sized neural networks that have enough workload for the GPU.

Figure 1. Duo’s AR effects are powered by our OpenCL backend.

Improvements over the OpenGL Backend

Historically, OpenGL is an API designed for rendering vector graphics. Compute shaders were added with OpenGL ES 3.1, but its backward compatible API design decisions were limiting us from reaching the full potential of the GPU. OpenCL, on the other hand, was designed for computation with various accelerators from the beginning and is thus more relevant to our domain of mobile GPU inference. Therefore, we have looked into an OpenCL-based inference engine, and it brings quite a lot of features that let us optimize our mobile GPU inference engine.

Performance Profiling: Optimizing the OpenCL backend was much easier than OpenGL, because OpenCL offers good profiling features and Adreno supports them well. With these profiling APIs, we are able to measure the performance of each kernel dispatch very precisely.

Optimized Workgroup Sizes: We have observed that the performance of TFLite GPU on Qualcomm Adreno GPUs is very sensitive to workgroup sizes; picking the right workgroup size can boost the performance, whereby picking the wrong one can degrade the performance by an equal amount. Unfortunately, picking the right workgroup size is not trivial for complex kernels with complicated memory access patterns. With the help of the aforementioned performance profiling features in OpenCL, we were able to implement an optimizer for workgroup sizes, which resulted in up to 50% speedup over the average.

Native 16-bit Precision Floating Point (FP16): OpenCL supports FP16 natively and requires the accelerator to specify the data type’s availability. Being a part of the official spec, even some of the older GPUs, e.g. Adreno 305 from 2012, can operate at their full capabilities. OpenGL, on the other hand, relies on hints which the vendors can choose to ignore in their implementations, leading to no performance guarantees.

Constant Memory: OpenCL has a concept of constant memory. Qualcomm added a physical memory that has properties that makes it ideal to be used with OpenCL’s constant memory. This turned out to be very efficient for certain special cases, e.g. very thin layers at the beginning or at the end of the neural network. OpenCL on Adreno is able to greatly outperform OpenGL’s performance by having a synergy with this physical constant memory and the aforementioned native FP16 support.

Performance Evaluation

Below, we show the performance of TFLite on the CPU (single-threaded on a big core), on the GPU using our existing OpenGL backend, and on the GPU using our new OpenCL backend. Figure 2 and Figure 3 depict the performance of the inference engine on select Android devices with OpenCL on a couple of well-known neural networks, MNASNet 1.3 and SSD MobileNet v3 (large), respectively. Each group of 3 bars are to be observed independently which shows the relative speedup among the TFLite backends on a device. Our new OpenCL backend is roughly twice as fast as the OpenGL backend, but does particularly better on Adreno devices (annotated with SD), as we have tuned the workgroup sizes with Adreno’s performance profilers mentioned earlier. Also, the difference between Figure 2 and Figure 3 visualizes that OpenCL performs even better on larger networks.

Figure 2. Inference latency of MNASNet 1.3 on select Android devices with OpenCL.
Figure 3. Inference latency of SSD MobileNet v3 (large) on select Android devices with OpenCL.

Seamless Integration through the GPU Delegate

One major hurdle in employing the OpenCL inference engine is that OpenCL is not a part of the standard Android distribution. While major Android vendors include OpenCL as part of their system library, it is possible that OpenCL is not available for some users. For these devices, one needs to fall back to the OpenGL backend which is available on every Android device.

To make developers’ life easy, we have added a couple of modifications to the TFLite GPU delegate. We first check the availability of OpenCL at runtime. If it is available, we employ the new OpenCL backend as it is much faster than the OpenGL backend; if it is unavailable or couldn’t be loaded, we fall back to the existing OpenGL backend. In fact, the OpenCL backend has been in the TensorFlow repository since mid 2019 and seamlessly integrated through the TFLite GPU delegate v2, so you might be already using it through the delegate’s fallback mechanism.

Acknowledgements

Andrei Kulik, Matthias Grundman, Jared Duke, Sarah Sirajuddin, and special thanks to Sachin Joglekar for his contributions to this blog post.
Read More