CircularNet: Reducing waste with Machine Learning

CircularNet: Reducing waste with Machine Learning

Posted by Sujit Sanjeev, Product Manager, Robert Little, Sustainability Program Manager, Umair Sabir, Machine Learning Engineer

Have you ever been confused about how to file your taxes? Perplexed when assembling furniture? Unsure about how to understand your partner? It turns out that many of us find the act of recycling as more confusing than all of the above. As a result, we do a poor job of recycling right, with less than 10% of our global resources recycled, and tossing 1 of every 5 items (~17%) in a recycling bin that shouldn’t be there. That’s bad news for everyone — recycling facilities catch fire, we lose billions of dollars in recyclable material every year — and at an existential level, we miss an opportunity to leverage recycling as an impactful tool to combat climate change. With this context in mind, we asked ourselves – how might we use the power of technology to ensure that we recycle more and recycle right?

As the world population grows and urbanizes, waste production is estimated to reach 2.6 billion tons a year in 2030, an increase from its current level of around 2.1 billion tons. Efficient recycling strategies are critical to foster a sustainable future.

The facilities where our waste and recyclables are processed are called “Material Recovery Facilities” (MRFs). Each MRF processes tens of thousands of pounds of our societal “waste” every day, separating valuable recyclable materials like metals and plastics from non-recyclable materials. A key inefficiency within the current waste capture and sorting process is the inability to identify and segregate waste into high quality material streams. The accuracy of the sorting directly determines the quality of the recycled material; for high-quality, commercially viable recycling, the contamination levels need to be low. Even though the MRFs use various technologies alongside manual labor to separate materials into distinct and clean streams, the exceptionally cluttered and contaminated nature of the waste stream makes automated waste detection challenging to achieve, and the recycling rates and the profit margins stay at undesirably low levels.

Enter what we call “CircularNet”, a set of models that lowers barriers to AI/ML tech for waste identification and all the benefits this new level of transparency can offer.

Our goal with CircularNet is to develop a robust and data-efficient model for waste/recyclables detection, which can support the way we identify, sort, manage, and recycle materials across the waste management ecosystem. Models such as this could potentially help with:

  • Better understanding and capturing more value from recycling value chains
  • Increasing landfill diversion of materials
  • Identifying and reducing contamination in inbound and outbound material streams

Challenges

Processing tens of thousands of pounds of material every day, Material Recovery Facility waste streams present a unique and ever-changing challenge: a complex, cluttered, and diverse flow of materials at any given moment. Additionally, there is a lack of comprehensive and readily accessible waste imagery datasets to train and evaluate ML models.

The models should be able to accurately identify different types of waste in “real world” conditions of a MRF – meaning identifying items despite severe clutter and occlusions, high variability of foreground object shapes and textures, and severe object deformation.

In addition to these challenges, others that need to be addressed are visual diversity of foreground and background objects that are often severely deformed, and fine-grained differences between the object classes (e.g. brown paper vs. cardboard; or soft vs. rigid plastic).

There also needs to be consistency while tracking recyclables through the recycling value chain e.g. at point of disposal, within recycling bins and hauling trucks, and within material recovery facilities.

Solution

The CircularNet model is built to perform Instance Segmentation by training on thousands of images with the Mask R-CNN algorithm. Mask R-CNN was implemented from the TensorFlow Model Garden, which is a repository consisting of multiple models and modeling solutions for Tensorflow users.

By collaborating with experts in the recycling industry, we developed a customized and globally-applicable taxonomy of material types (e.g. “paper” “metal”,”plastic”, etc.) and material forms (e.g. “bag”, “bottle”, “can”, etc.), which is used to annotate training data for the model. Models were developed to identify material types, material forms and plastic types (HDPE, PETE, etc). Unique models were trained for different purposes, thus helping achieve better accuracy (when harmonized and flexibility to cater to different applications). The models are trained with various backbones such as ResNet, MobileNet and, SpineNet.

To train the model on distinct waste and recyclable items, we have collaborated with several MRFs and have started to accumulate real-world images. We plan to continue growing the number and geographic locations of our MRF and waste management ecosystem partnerships in order to continue training the model across diverse waste streams.

Here are a few details on how our model was trained.

  • Data importing, cleaning and pre-processing
    • Once the data was collected, the annotation files had to be converted into COCO JSON format. All noise, errors and incorrect labels were removed from the COCO JSON file. Corrupt images were also removed both from the COCO JSON and dataset to ensure smooth training.
    • The final file is converted to the TFRecord format for faster training
  • Training
    • Mask RCNN was trained using the Model Garden repository on Google Cloud Platform.
    • Hyper parameter optimization was done by changing image size, batch size, learning rate, training steps, epochs and data augmentation steps
  • Model conversion 
    • Final checkpoints achieved after training the model were converted to both saved model and TFLite model formats to support server side and edge side deployments
  • Model deployment 
    • We are deploying the model on Google Cloud for server side inferencing and on edge computing devices
  • Visualization
    • Three ways in which the CircularNet model characterizes recyclables: Form, Material, & Plastic Type


      • Model identifying the material type (Ex. “Plastic”)
      • Model identifying the product form of the material (Ex. “Bottle”)
      • Model identifying the types of plastics (Ex. “HDPE”)

    How to use the CircularNet model

    All the models are available with guides and their respective colab scripts for pre-processing, training, model conversion, inference and visualization are available in the Tensorflow Model Garden repository. Pre-trained models for direct use from servers, browsers or mobile devices are available on TensorFlow Hub.

    Conclusion

    We hope the model can be deployed by, tinkered with, and improved upon by various stakeholders across the waste management ecosystem. We are in the early days of model development. By collaborating with a diverse set of stakeholders throughout the material recovery value chain, we can better create a more globally applicable model. If you are interested in collaborating with us on this journey, please reach out to waste-innovation-external@google.com.

    Acknowledgement

    A huge thank you to everyone who’s hard work made this project possible! We couldn’t have done this without partnering with the recycling ecosystem.

    Special thanks to Mark McDonald, Fan Yang, Vighnesh Birodkar and Jeff Rechtman

    Read More

    Building a reinforcement learning agent with JAX, and deploying it on Android with TensorFlow Lite

    Building a reinforcement learning agent with JAX, and deploying it on Android with TensorFlow Lite

    Posted by Wei Wei, Developer Advocate

    In our previous blog post Building a board game app with TensorFlow: a new TensorFlow Lite reference app, we showed you how to use TensorFlow and TensorFlow Agents to train a reinforcement learning (RL) agent to play a simple board game ‘Plane Strike’. We also converted the trained model to TensorFlow Lite and then deployed it into a fully-functional Android app. In this blog, we will demonstrate a new path: train the same RL agent with Flax/JAX and deploy it into the same Android app we have built before. The complete code has been open sourced in the tensorflow/examples repository for your reference.

    To refresh your memory, our RL-based agent needs to predict a strike position based on the human player’s board position so that it can finish the game before the human player does. For more detailed game rules, please refer to our previous blog.

    Demo game play in ‘Plane Strike’
    Demo game play in ‘Plane Strike’

    Background: JAX and TensorFlow

    JAX is a NumPy-like library developed by Google Research for high performance computing. It uses XLA to compile programs optimized for GPUs and TPUs. Flax is a popular neural network library built on top of JAX. Researchers have been using JAX/Flax to train very large models with billions of parameters (such as PaLM for language understanding and generation, or Imagen for image generation), making full use of modern hardware. If you’re new to JAX and Flax, start with this JAX 101 tutorial and this Flax Getting Started example.

    TensorFlow started as a library for ML towards the end of 2015 and has since become a rich ecosystem that includes tools for productionizing ML pipelines (TFX), data visualization (TensorBoard), deploying ML models to edge devices (TensorFlow Lite), and devices running on a web browser or any device capable of executing JavaScript (TensorFlow.js). Models developed in JAX or Flax can tap into this rich ecosystem by first converting such a model to the TensorFlow SavedModel format, and then using the same tooling as if they had been developed in TensorFlow natively.

    If you already have a JAX-trained model and want to deploy it today, we have put together a list of resources for you:

    • This blog post demos how to convert a Flax/JAX model to TFLite and run it in a native Android app

    Overall, no matter what your deployment target is (server, web or mobile), we got you covered.
    Implementing the game agent with Flax/JAX

    Coming back to our board game, to implement our RL agent, we will leverage the same gym environment as before. We will train the same policy gradient model using Flax/JAX this time. Recall that mathematically the policy gradient is defined as:

     

    where:

    • T: the number of timesteps per episode, which can vary per episode
    • st: the state at timestep t
    • at: chosen action at timestep t given state s
    • πθ: the policy parameterized by θ
    • R(*): the reward gathered, given the policy

    We define a 3-layer MLP as our policy network, which predicts the agent’s next strike position.

    class PolicyGradient(nn.Module):

      “””Neural network to predict the next strike position.”””

     

      @nn.compact

      def __call__(self, x):

        dtype = jnp.float32

        x = x.reshape((x.shape[0], –1))

        x = nn.Dense(

            features=2 * common.BOARD_SIZE**2, name=‘hidden1’, dtype=dtype)(

               x)

        x = nn.relu(x)

        x = nn.Dense(features=common.BOARD_SIZE**2, name=‘hidden2’, dtype=dtype)(x)

        x = nn.relu(x)

        x = nn.Dense(features=common.BOARD_SIZE**2, name=‘logits’, dtype=dtype)(x)

        policy_probabilities = nn.softmax(x)

        return policy_probabilities

    In our main training loop, in each iteration we use the neural network to play a round of the game, gather the trajectory information (game board positions, actions taken and rewards), discount the rewards, and then train the model with the trajectories.

    for i in tqdm(range(iterations)):

       predict_fn = functools.partial(run_inference, params)

       board_log, action_log, result_log = common.play_game(predict_fn)

       rewards = common.compute_rewards(result_log)

       optimizer, params, opt_state = train_step(optimizer, params, opt_state,

                                                 board_log, action_log, rewards)

    In the train_step() method, we first compute the loss using the trajectories. Then we use jax.grad() to compute the gradients. Lastly we use Optax, a gradient processing and optimization library for JAX, to update the model parameters.


    def compute_loss(logits, labels, rewards):

      one_hot_labels = jax.nn.one_hot(labels, num_classes=common.BOARD_SIZE**2)

      loss = -jnp.mean(

          jnp.sum(one_hot_labels * jnp.log(logits), axis=-1) * jnp.asarray(rewards))

      return loss

     

     

    def train_step(model_optimizer, params, opt_state, game_board_log,

                  predicted_action_log, action_result_log):

    “””Run one training step.”””

     

      def loss_fn(model_params):

        logits = run_inference(model_params, game_board_log)

        loss = compute_loss(logits, predicted_action_log, action_result_log)

        return loss

     

      def compute_grads(params):

        return jax.grad(loss_fn)(params)

     

      grads = compute_grads(params)

      updates, opt_state = model_optimizer.update(grads, opt_state)

      params = optax.apply_updates(params, updates)

      return model_optimizer, params, opt_state

     

     

    @jax.jit

    def run_inference(model_params, board):

      logits = PolicyGradient().apply({‘params’: model_params}, board)

      return logits

    That’s it for the training loop. We can visualize the training progress in TensorBoard as below; here we use the proxy metric ‘game_length’ (the number of steps to finish the game) to track the progress. The intuition is that when the agent becomes smarter, it can finish the game in fewer steps.


    Converting the Flax/JAX model to TensorFlow Lite and integrating with the Android app

    After the model is trained, we use the jax2tf, a TensorFlow-JAX interoperation tool, to convert the JAX model into a TensorFlow concrete function. And the final step is to call TensorFlow Lite converter to convert the concrete function into a TFLite model.

    # Convert to tflite model

     model = PolicyGradient()

     jax_predict_fn = lambda input: model.apply({‘params’: params}, input)

     

     tf_predict = tf.function(

         jax2tf.convert(jax_predict_fn, enable_xla=False),

         input_signature=[

             tf.TensorSpec(

                 shape=[1, common.BOARD_SIZE, common.BOARD_SIZE],

                 dtype=tf.float32,

                 name=‘input’)

         ],

         autograph=False,

     )

     

     converter = tf.lite.TFLiteConverter.from_concrete_functions(

         [tf_predict.get_concrete_function()], tf_predict)

     

     tflite_model = converter.convert()

     

     # Save the model

     with open(os.path.join(modeldir, ‘planestrike.tflite’), ‘wb’) as f:

       f.write(tflite_model)

    The JAX-converted TFLite model behaves exactly like any TensorFlow-trained TFLite model. You can visualize it with Netron:

    Visualizing TFLite model converted from Flax/JAX using Netron
    Visualizing TFLite model converted from Flax/JAX using Netron
    We can use exactly the same Java code as before to invoke the model and get the prediction.

    convertBoardStateToByteBuffer(board);
    tflite.run(boardData, outputProbArrays);
    float[] probArray = outputProbArrays[0];
    int agentStrikePosition = -1;
    float maxProb = 0;
    for (int i = 0; i < probArray.length; i++) {
      int x = i / Constants.BOARD_SIZE;
      int y = i % Constants.BOARD_SIZE;
      if (board[x][y] == BoardCellStatus.UNTRIED && probArray[i] > maxProb) {
        agentStrikePosition = i;
        maxProb = probArray[i];
      }
    }

    Conclusion

    In summary, this article walks you through how to train a simple reinforcement learning model with Flax/JAX, leverage jax2tf to convert it to TensorFlow Lite, and integrate the converted model into an Android app.

    Now you have learned how to build neural network models with Flax/JAX, and tap into the powerful TensorFlow ecosystem to deploy your models pretty much anywhere you want. We can’t wait to see the fantastic apps you build with both JAX and TensorFlow!

    Read More

    Fast Reduce and Mean in TensorFlow Lite

    Fast Reduce and Mean in TensorFlow Lite

    Posted by Alan Kelly, Software Engineer

    We are happy to share that TensorFlow Lite version 2.10 has optimized Reduce (All, Any, Max, Min, Prod, Sum) and Mean operators. These common operators replace one or more dimensions of a multi-dimensional tensor with a scalar. Sum, Product, Min, Max, Bitwise And, Bitwise Or and Mean variants of reduce are available. Reduce is now fast for all possible inputs.

    Benchmark for Reduce Mean on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [1, 3], Output is a 2D tensor of shape [32, 5].

    Benchmark for Reduce Prod on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [1, 3], Output is a 2D tensor of shape [32, 5].

    Benchmark for Reduce Sum on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [0, 2], Output is a 2D tensor of shape [256, 128].


    These speed-ups are available by default using the latest version of TFLite on all architectures.

    How does this work?

    To understand how these improvements were made, we need to look at the problem from a different perspective. Let’s take a 3D tensor of shape [3, 2, 5].

    Let’s reduce this tensor over axes [0] using Reduce Max. This will give us an output tensor of shape [2, 5] as dimension 0 will be removed. Each element in the output tensor will contain the max of the three elements in the same position along dimension 0. So the first element will be max{0, 10, 20} = 20. This gives us the following output:

    To simplify things, let’s reshape the original 3D tensor as a 2D tensor of shape [3, 10]. This is the exact same tensor, just visualized differently.

    Reducing this over dimension 0 by taking the max of each column gives us:

    Which we then reshape back to its original shape of [2, 5]

    This demonstrates how simply changing how we visualize the tensor dramatically simplifies the implementation. In this case, dimensions 1 and 2 are adjacent and not being reduced over. This means that we can fold them into one larger dimension of size 2 x 5 = 10, transforming the 3D tensor into a 2D one. We can do the same to adjacent dimensions which are being reduced over.

    Let’s take a look at all possible Reduce permutations for the same 3D tensor of shape [3, 2, 5].

    Of all 8 permutations, only two 3D permutations remain after we re-visualize the input tensor. For any number of dimensions, there are only two possible reduction permutations: the rows or the columns. All other ones simplify to a lower dimension.

    This is the trick to an efficient and simple reduction operator as we no longer need to calculate input and output tensor indices and our memory access patterns are much more cache friendly.

    This also allows the compiler to auto-vectorize the integer reductions. The compiler won’t auto-vectorize floats as float addition is not commutative. You can see the code which removes redundant axes here and the reduction code here.

    Changing how we visualize tensors is a powerful code simplification and optimization technique which is used by many TensorFlow Lite operators.

    Next steps

    We are always working on adding new operators and speeding up existing ones. We’d love to hear about models of yours which have benefited from this work. Get in touch via the TensorFlow Forum. Thanks for reading!Read More

    Colab’s ‘Pay As You Go’ Offers More Access to Powerful NVIDIA Compute for Machine Learning

    Posted by Chris Perry, Google Colab Product Lead

    Google Colab is launching a new paid tier, Pay As You Go, giving anyone the option to purchase additional compute time in Colab with or without a paid subscription. This grants access to Colab’s  powerful NVIDIA GPUs and gives you more control over your machine learning environment.

    Colab is fully committed to supporting all of our users whether or not they pay for additional compute, and our free-of-charge tier stays in its current form. Today’s announcement reflects additions to paid options only.

    Colab helps you accomplish more with machine learning

    Google Colab is the easiest way to start machine learning. From the Colab notebooks powering TensorFlow’s tutorials and guides to Deepmind’s AlphaFold example, Colab is helping the world learn ML and share the results broadly, democratizing machine learning.

    Colab Pay As You Go further expands the potantial for using Colab. Pay As You Go allows anyone to purchase more compute time with Colab, regardless of whether or not they have a monthly subscription. Customers can use this feature to dramatically increase their usage allotments of Colab over what was possible before. Try it out at colab.research.google.com/signup

    Previously, Colab’s paid quota service throttled compute usage to smooth out quota exhaustion over the entire month of a subscription to ensure a paid user would be able to access Colab compute as much as possible over their month’s subscription: we didn’t want users to fully exhaust their quota on day one and spend the rest of the month frustrated by lack of access to runtimes. Now with Pay As You Go, we are relaxing usage throttling for all paid users (though this will remain the case for users in our free of charge tier).

    Paid users now have the flexibility to exhaust compute quota, measured in compute units, at whatever rate they choose. As compute units are exhausted, a user can choose to purchase more with Pay As You Go at their discretion. Once a user has exhausted their compute units their Colab usage quota will revert to our free of charge tier limits.

    Increasing your power with NVIDIA GPUs

    Paid Colab users can now choose between a standard or premium GPU in Colab, giving you the ability to upgrade your GPU when you need more power. Standard GPUs are typically NVIDIA T4 Tensor Core GPUs, while premium GPUs are typically NVIDIA V100 or A100 Tensor Core GPUs. Getting a specific GPU chip type assignment is not guaranteed and depends on a number of factors, including availability and your paid balance with Colab. If you want guaranteed access to a specific machine configuration, we recommend purchasing a VM on GCP Marketplace.

    When you need more power, select premium GPU in your runtime settings: Runtime > Change runtime type > GPU class > Premium. Premium GPUs will deplete your paid balance in Colab faster than standard GPUs.

    Colab is the right choice for ML projects

    Colab is the right choice for your machine learning project: TensorFlow and many excellent ML libraries come pre-installed, pre-warmed GPUs are a click away, and sharing your notebook with a collaborator is as easy as sharing a Google doc. Collaborators can access runtimes with GPU accelerators without need for payment. Pay As You Go makes Colab an even more useful product for any ML project you’re looking into.

    Read More

    Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

    Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

    Posted by Chansung Park and Sayak Paul (ML-GDEs)


    If you are an applications developer, or if your organization doesn’t have a dedicated ML Engineering team, it is common to deploy a machine learning model without worrying about the end to end machine learning pipeline or MLOps. TFX and TensorFlow Serving can help you create the heart of an MLOps infrastructure. 

    In this post, we will share how we serve a TensorFlow image classification model as RESTful and gRPC based services with TensorFlow Serving on a Kubernetes (k8s) cluster running on Google Kubernetes Engine (GKE) through a set of GitHub Actions workflows. 

    Overview

    In any GitHub project, you can make releases, with up to 2 GB of assets included in each release when using a free account. This is a good place to manage different versions of machine learning models for various reasons. One can also replace this with a more private component for managing model versions such as Google Cloud Storage buckets. For our purposes, the 2 GB space provided by GitHub Releases will be enough.

    Figure 1. Three steps to deploy TF Serving on GKE (original).

    The basic idea is to:

    1. Automatically detect a newly released version of a TensorFlow-based ML model in GitHub Releases
    2. Build a custom TensorFlow Serving Docker image containing the released ML model
    3. Deploy it on a k8s cluster running on GKE through a set of GitHub Actions.

    The entire workflow can be logically divided into three subtasks, so it’s a good idea to write three separate composite GitHub Actions:

    • First subtask handles the environmental setup
      • GCP Authentication (GCP credentials are injected from the GitHub Action Secret)
      • Install gcloud CLI toolkit to access the GKE cluster for the third subtask
      • Authenticate Docker to push images to the Google Cloud Registry (GCR)
      • Connect to a designated GKE cluster for further accesses
    • Second subtask builds a custom TensorFlow Serving image
      • Download and extract your latest released SavedModel from your GitHub repository
      • Run the official or a custom built TensorFlow Serving docker image
      • Copy the extracted SavedModel into the running TensorFlow Serving docker container
      • Commit the changes of the running container and give it a new name with the tags of special token to denote GCR, GCP project ID, and latest
      • Push the committed image to the GCR
    • Third subtask deploys the custom built TensorFlow Serving image to the GKE cluster
      • Download the Kustomize toolkit to handle overlay configurations
      • Pick one of the scenarios from the various experiments
      • Apply Deployment, Service, and ConfigMap according to the selected experiment to the currently connected GKE cluster
        • ConfigMap is used for batching-enabled scenarios to inject batching configurations dynamically into the Deployment.

    There are a number of parameters that you can customize such as the GCP project ID, GKE cluster name, the repository where the ML model will be released, and so on. The full list of parameters can be found here. As noted above, the GCP credentials should be set as a GitHub Action Secret beforehand. If the entire workflow goes without any errors, you will see something similar to the output below.

    NAME         TYPE            CLUSTER-IP      EXTERNAL-IP     PORT(S)                            AGE
    tfs-server   LoadBalancer    xxxxxxxxxx      xxxxxxxxxx       8500:30869/TCP,8501:31469/TCP      23m

    The combinations of the EXTERNAL-IP and the PORT(S) represent endpoints where external users can connect to the TensorFlow Serving pods in the k8s cluster. As you see, two ports are exposed, and 8500 and 8501 are for RESTful and gRPC services respectively. One thing to note is that we used LoadBalancer as the service type, but you may want to consider including Ingress controllers such as GKE Ingress for securing the k8s clusters with SSL/TLS and defining more flexible routing rules in production. You can check out the complete logs from the past runs.

    Build a Custom TensorFlow Serving Image within a GitHub Action

    As described in the overview and the official document, a custom TensorFlow Serving Docker image can be built in five steps. We also provide a notebook for local testing of these steps. In this section, we show how to write a composite GitHub Action for this partial subtask of the whole workflow (note that .inputs, .env, and ${{ }} for the environment variables are omitted for brevity).

    First, a model can be downloaded by an external robinraju/release-downloader GitHub Action with custom information about the URL of the GitHub repository and the filename in the list of assets from the latest release. The default filename is saved_model.tar.gz.

    Second, the downloaded file should be decompressed to fetch the actual SavedModel that TensorFlow Serving can understand.

    runs:
      using: “composite”
      steps:
          – name: Download the latest SavedModel release
            uses: robinraju/release-downloader@v1.3
            with:
              repository: $MODEL_RELEASE_REPO
              fileName: $MODEL_RELEASE_FILE

              latest: true
             
          – name: Extract the SavedModel
            run: |
              mkdir MODEL_NAME
              tar -xvf $MODEL_RELEASE_FILE –strip-components=1 –directory $MODEL_NAME
       
          – name: Run the CPU Optimized TensorFlow Serving container
            run: |
              docker run -d –name serving_base $BASE_IMAGE_TAG
             
          – name: Copy the SavedModel to the running TensorFlow Serving container
            run: |
              docker cp $MODEL_NAME serving_base:/models/$MODEL_NAME
             
          – id: push-to-registry
            name: Commit and push the changed running TensorFlow Serving image
            run: |
              export NEW_IMAGE_NAME=tfserving-$MODEL_NAME:latest
              export NEW_IMAGE_TAG=gcr.io/$GCP_PROJECT_ID/$NEW_IMAGE_NAME
              echo “::set-output name=NEW_IMAGE_TAG::$(echo $NEW_IMAGE_TAG)”
              docker commit –change “ENV MODEL_NAME $MODEL_NAME” serving_base $NEW_IMAGE_TAG
              docker push $NEW_IMAGE_TAG

    Third, we can modify a running TensorFlow Serving Docker container by placing a custom SavedModel inside. In order to do this, we need to run the base TensorFlow Serving container instantiated either from the official image or a custom-built image. We have used the CPU-optimized version as the base image by compiling from source, and it is publicly available here.

    Fourth, the SavedModel should be copied to the /models directory inside the running TensorFlow Serving container. In the last step, we set the MODEL_NAME environment variable to let TensorFlow Serving know which model to expose as services, and commit the two changes that we made to the base image. Finally, the updated TensorFlow Serving Docker image can be pushed into the designated GCR.

    Notes on the TensorFlow Serving Parameters

    We consider three TensorFlow Serving specific parameters in this post: tensorflow_inter_op_parallelism, tensorlfow_inter_op_parallelism, and the batching option. Here, we provide brief overviews of each of them.

    Parallelism threads: tesorflow_intra_op_parallelism controls the number of threads to parallelize the execution of an individual operation. tensorflow_inter_op_parallelism controls the number of threads to parallelize the execution of multiple independent operations. To know more, refer to this resource.

    Batching: As mentioned above, we can allow TensorFlow Serving to batch requests by setting the enable_batching parameter to True. If we do so, we also need to define the batching configurations for TensorFlow in a separate file (passed via the batching_parameters_file argument). Please refer to this resource for more information about the options we can specify in that file.

    Configuring TensorFlow Serving

    Once you have a custom TensorFlow Serving Docker image, you can deploy it with the k8s resource objects: Deployment and ConfigMap as shown below. This section shows how to write ConfigMap to write batching configurations and Deployment to add TensorFlow Serving specific runtime options. We also show you how to mount the ConfigMap to inject batching configurations into TensorFlow Serving’s batching_parameters_file option.

    apiVersion: apps/v1

    kind: Deployment


        spec:
          containers:
          – image: gcr.io/gcp-ml-172005/tfs-resnet-cpu-opt:latest
            name: tfs-k8s
            imagePullPolicy: Always
            args: [“–tensorflow_inter_op_parallelism=2”,
                  “–tensorflow_intra_op_parallelism=8”,
                  “–enable_batching=true”,
                  “–batching_parameters_file=/etc/tfs-config/batching_config.txt”]
            …
            volumeMounts:
              – mountPath: /etc/tfs-config/batching_config.txt
                subPath: batching_config.txt
                name: tfs-config

    The URI of the custom built TensorFlow Serving Docker image can be specified in spec.containers.image, and the behavior of TensorFlow Serving can be customized by providing arguments in the spec.containers.args in the Deployment. This post shows how to configure three kinds of custom behavior: tensorflow_inter_op_parallelism, tensorflow_intra_op_parallelism, and enable_batching.

    apiVersion: v1

    kind: ConfigMap
    metadata:
      name: tfs-config
    data:
      batching_config.txt: |
        max_batch_size { value: 128 }
        batch_timeout_micros { value: 0 }
        max_enqueued_batches { value: 2 }
        num_batch_threads { value: 2 }

    When enable_batching is set to true, we can further customize the batch inference by defining its specific batching-related configurations in a ConfigMap. Then, the ConfigMap can be mounted as a file with spec.containers.volumeMounts, and we can specify which file to look up for the batching_parameters_file argument in Deployment.

    Kustomize to Manage Various Experiments

    As you see, there are lots of parameters to determine the behavior of TensorFlow Serving, and the optimal values for them are usually found by running experiments. Indeed, we have experimented with various parameters within a number of different environmental setups: different numbers of nodes, different numbers of vCPU cores, and different RAM capacity.

    ├── base

    |   ├──kustomization.yaml

    |   ├──deployment.yaml

    |   └──service.yaml
    └── experiments
        ├── 2vCPU+4GB+inter_op2

        …

        ├── 4vCPU+8GB+inter_op2
        …

        ├── 8vCPU+64GB+inter_op2_w_batch

        |   ├──kustomization.yaml

        |   ├──deployment.yaml

        |   └──tfs-config.yaml
        …

    We used kustomize to manage the YAML files of various experiments. We keep common YAML files of Deployment and Service in the base directory while having specific YAML files for certain experimental environments and configurations under the experiments directory. With this and kustomize, the contents of the base YAML files could be easily overlaid with different numbers of replicas, different values of tensorflow_inter_op_parallelism, tensorflow_intra_op_parallelism, enable_batching, and batch configurations.

    runs:
      using: “composite”
      steps:
        – name: Setup Kustomize
          …

        – name: Deploy to GKE
          working-directory: .kube/
          run: |-
            ./kustomize build experiments/$TARGET_EXPERIMENT | kubectl apply -f –

    You can simply select the experiment that you want to test or that you think is optimal by setting $TARGET_EXPERIMENT. For example, the best experiment that we found was “8vCPU+16GB+inter_op4” which means each VM is configured with an 8vCPU and 16GB RAM while tensorflow_inter_op_parallelism is set to 4. Then the kustomize build command will provision the YAML files for the selected experiment for the k8s clusters.

    Costs

    We used the GCP cost estimator for this purpose. Pricing for each experiment configuration was assumed to be live for 24 hours per month (which was sufficient for our experiments).


    Machine Configuration (E2 series) Pricing (USD)

    2vCPUs, 4GB RAM, 8 Nodes

    11.15
    4vCPUs, 8GB RAM, 4 Nodes

    11.15
    8vCPUs, 16GB RAM, 2 Nodes

    11.15

    8vCPUs, 64GB RAM, 2 Nodes

    18.21

    Conclusion

    In this post, we discussed how to automatically deploy and experiment with an already trained model with various configurations. We leveraged TensorFlow Serving, Kubernetes, and GitHub Actions to streamline the deployment and experiments. We hope that you found this setup useful and reliable and that you will use this in your own model deployment projects.


    Acknowledgements

    We are grateful to the ML Developer Programs team that provided GCP credits for supporting our experiments. We also thank Hannes Hapke and Robert Crowe for providing us with helpful feedback and guidance.

    Read More

    Bridging communities: TensorFlow Federated (TFF) and OpenMined

    Bridging communities: TensorFlow Federated (TFF) and OpenMined

    Posted by Krzys Ostrowski (Research Scientist), Alex Ingerman (Product Manager), and Hardik Vala (Software Engineer)

    Since the announcement of TensorFlow Federated (TFF) on this blog 3.5 years ago, a number of organizations have developed frameworks for Federated Learning (FL). While growing attention to privacy and investments in FL are a welcome trend, one challenge that arises is fragmentation of community and industry efforts, which leads to code duplication and reinvention. One way we can address this as a community is by investing in interoperability mechanisms that could enable our platforms and developers to work together and leverage each other’s strengths.

    In this context, we’re excited to announce the collaboration between TFF and OpenMined – an OSS community dedicated to development of privacy-preserving technologies. OpenMined’s PySyft framework has attracted a vibrant community of hundreds of OSS contributors, and includes tools and APIs to facilitate containerized deployment and integrations with diverse data sources that complement the capabilities we offer in TFF.

    OpenMined is joining Special Interest Group (SIG) Federated (see the charter, forum, meeting notes, and the Discord server) we’ve recently established to enable developers of TFF, together with a growing set of OSS and industry partners, to openly engage in conversations about how to jointly evolve the TFF ecosystem and grow the adoption of FL.

    Introducing PySyTFF

    To kick off the collaboration, we – the developers of TFF and OpenMined’s PySyft – decided to focus our initial efforts on building together a new platform, with an endearing name PySyTFF, that combines elements of TFF and PySyft to support what we believe will be an increasingly common scenario, illustrated below.

    In this scenario, an owner of a sensitive dataset would like to invite researchers to experiment with training and evaluating ML models on their dataset to advance the current understanding of what model architectures, parameters, etc., work best, while protecting the data and adhering to policies that may govern its use. In practice, such scenarios often end up involving negotiating data usage contracts. On the one hand, these can be tedious to set up, and on the other hand, they largely rely on goodwill.

    What we’d like instead is, to have a platform that can offer structural safeguards in place that limit the disclosure of sensitive information and ensure policy compliance by construction – this is our goal for PySyTFF.

    As an aside, note that even though this blog post is about FL, we aren’t necessarily talking here about scenarios where data is physically siloed across physical locations – the data can also be hosted in a datacenter and logically siloed. More on this below.

    Developer experience

    The initial proof-of-concept implementation of PySyTFF offers an early glimpse of what the developer experience for the data scientist will look like. Note how we combine the advantages of both frameworks – e.g., TFF’s ability to define models in Keras, and PySyft’s access control mechanism and APIs for data access:


    domain = sy.login(email=“sam@stargate.net”, password=“changethis”, port=8081)


    model_fn = lambda: tf.keras.models.Sequential(…)


    params = {

        ’rounds’: 10,

        ‘no_clients’: 3,

        ‘noise_multiplier’: 0.05,

        ‘clients_per_round’: 2,

        ‘train_data_id’: domain.dataset[0][‘images’].id_at_location.to_string(),

        ‘label_data_id’: domain.datasets[0][‘labels’].id_at_location.to_string()

    }


    model, metrics = sy.tff.train_model(model_fn, params, domain, timeout=5000)

    Here, the data scientist is logging into a PySyft’s domain node – an infrastructure component provisioned by or on behalf of the data provider – and gains a limited, access control-guarded ability to enumerate the available resources and perform actions on them. This includes obtaining references to datasets managed by the node and their metadata (but not content) and issuing the train_model calls, wherein the data scientist can supply a Keras model they wish to train, and the various parameters that control the training process and affect the privacy guarantees of the computed result, such as the number of rounds or the amount of noise added in order to make the results of the model training more private. In return, the researcher may get computed outputs such as a set of evaluation metrics, or the trained model parameters.

    Exactly what ranges of parameters supplied by the researcher are accepted by the platform, and what results the researcher can get will, in general, depend on the policies defined by the data owner that might, e.g., mandate the use of privacy-preserving algorithms and constrain the allowed privacy budget – and these may constrain parameters such as the number of training rounds, clients per round, or the noise multiplier. Whereas at the current stage of development, PySyTFF does not yet offer policy engine integration, this is an important part of the future development plans.

    Under the hood

    The domain node is a docker-based environment that bundles together a web-based frontend that you can securely log into, with a mechanism for authenticating and authorizing users, and a set of internal services that includes database connectivity, as illustrated below.

    The train_model call in the code snippet above, perhaps embedded in the data scientist’s Python colab notebook, is implemented as a network request, carrying a serialized representation of the TensorFlow code of the model to train, along with the training parameters, and the references to the PySyft datasets to use for training and evaluation.

    Inside the domain node, the call is relayed to a PySyTFF service, a new component introduced to the PySyft ecosystem to orchestrate the training process. This involves interacting with PySyft’s data backend to obtain handles to shards of user data, calling TFF APIs to construct TFF computations to run, and passing the constructed TFF computations and data handles to an embedded instance of TFF runtime that loads the data using the supplied handles and runs the FL algorithms.

    FL on logically-siloed data

    At this point, some of you may be wondering how exactly FL fits into the picture. After all, FL is mostly known as a technology that supports computations on data that’s distributed across a set of devices, or (in what’s called a cross-silo flavor of FL) a set of data centers owned by a group of institutions, yet here, we’re talking about a scenario where the data is already in the customer’s PySyft database.

    To explain this, let’s pop up a level and consider the high level objective – to enable researchers to perform ML computations on sensitive data with platform-level, structural and formal privacy guarantees. In order to do so, the platform should ideally uphold formal privacy principles, such as data minimization (a guarantee on how the computation is executed and how sensitive data is handled), and anonymous aggregation (a guarantee on what is being computed and released).

    Federated Learning is a great fit in this context because it structurally embodies these principles, and provides a framework for implementing algorithms that provably achieve user-level Differential Privacy (DP) – the current gold standard. The FL algorithms that enable us to achieve these guarantees can be used to process data in datacenter deployments, even in scenarios where – as is the case here with the PySyft database – all of that data resides in a single administrative domain.

    To see this, just imagine that for each user in the database, we draw a virtual boundary around all their data, and think of it as a kind of virtual silo. We can treat such virtual silos of user data in the same way as how we treat “client” devices in a more traditional FL setting, and orchestrate FL algorithms to run across virtual silos as clients.

    Thus, for example, when training an ML model, we’d repeatedly pick sets of users from the database, locally and independently train local model updates on their data – separately for each user, add clipping to each local update and noise for privacy, aggregate these local updates across users to produce an updated global model, and repeat this process for thousands of rounds until the ML model converges, as shown below.

    Whereas the data may be only logically partitioned, following this approach enables us to achieve the very same types of formal guarantees, including provable user-level differential privacy, as those cited above – and indeed, TFF enables us to leverage the same FL algorithm implementation – literally the same TFF code – as that which powers Google’s mobile/IoT production deployments.

    Collaborate with us!

    As noted earlier, the initial version of PySyTFF is still missing a number of components – and this, dear reader, is where you come in. If the vision laid out above excites you, we – the TFF and PySyft teams – would love to work with you to evolve this platform together. In addition to policy engine integration, we plan to augment PySyTFF with the ability to spawn distributed instances of the TFF runtime on cloud or compute clusters to power very compute-intensive workloads, a system of charging for the use of resources, and to extend the scope of PySyTFF to include classical types of cross-silo FL deployments, to name just a few.

    There are a great many ways to go about this – from joining the TFF and PySyft’s collaborative efforts and directly helping us build and deploy this platform, to helping design and build generic components and APIs that can enable TFF and PySyft/PyGrid to interoperate.

    Ready to get started? You can visit the SIG Federated forum and join the Discord server, or you can reach out directly – see the contact info in the SIG charter, and the engagement channels created by the OpenMined’s PySyft team. We’re looking forward to hearing from you!

    Acknowledgments

    On behalf of the TFF team at Google, we’d like to thank our OpenMined partners Andrew Trask, Tudor Cebere, and Teo Milea for the productive collaboration leading up to this announcement.

    Read More

    Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

    Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

    Posted by Douglas Yarrington (Google TPgM), James Rubin (Google PM), Neal Vaidya (NVIDIA TME), Jay Rodge (NVIDIA PMM)

    Together, NVIDIA and Google are delighted to announce new milestones and plans to optimize TensorFlow and JAX for the Ampere and recently announced Hopper GPU architectures by leveraging the power of XLA: a performant, flexible and extensible ML compiler built by Google. We will deepen our ongoing collaboration with dedicated engineering teams focused on delivering improved performance in currently available A100 GPUs. NVIDIA and Google will also jointly support unique features in the recently announced H100 GPU, including the Transformer Engine with support for hardware-accelerated 8-bit floating-point (FP8) data types and the transformer library.

    We are announcing improved performance in TensorFlow, new NVIDIA GPU-specific features in XLA and the first release of JAX for multi-node, multi-GPU training, which will significantly improve large language model (LLM) training. We expect the Hopper architecture to be especially popular for LLMs.

    NVIDIA H100 Tensor Core GPU

    XLA for GPU

    Google delivers high performance with LLMs on NVIDIA GPUs because of a notable technology, XLA, which supports all leading ML frameworks, such as TensorFlow, JAX, and PyTorch. Over 90% of Google’s ML compilations – across research and production, happen on XLA. These span the gamut of ML use cases, from ultra-large scale model training at DeepMind and Google Research, to optimized deployments across our products, to edge inferencing at Waymo.

    XLA’s deep feature set accelerates large language model performance and is solving most large model challenges seen in the industry today. For example, a feature unique to XLA, SPMD, automates most of the work needed to partition models across multiple cores and devices, making large model training significantly more scalable and performant. XLA can also automatically recognize and select the most optimal hand-written library implementation for your target backend, like cuDNN for CUDA chipsets. Otherwise, XLA can natively generate optimized code for performant execution.

    We’ve been collaborating with NVIDIA on several exciting features and integrations that will further optimize LLMs for GPUs. We recently enabled collectives such as all-reduce to run in parallel to compute. This has resulted in a significant reduction in end to end latency for customers. Furthermore, we enabled support for bfloat16, which has resulted in compute gains of 4.5x over 32 bit floating point while retaining the same dynamic range of values.

    Our joint efforts mean that XLA integrates even more deeply with NVIDIA’s AI tools and can better leverage NVIDIA’s suite of AI hardware optimized libraries. In Q1 2023, we will release a XLA-cuDNN Graph API integration, which provides customers with optimized fusion of convolution/matmul operations and multi-headed attention in transformers for improved use of memory and faster GPU kernel execution. As a result, overheads drop significantly and performance improves notably.

    TensorFlow for GPU

    TensorFlow recently released distributed tensors (or DTensors) to enable Tensor storage across devices like NVIDIA GPUs while allowing programs to manipulate them seamlessly. The goal of DTensor is to make parallelizing large-scale TensorFlow models across multiple devices easy, understandable, and fast. DTensors are a drop-in replacement for local TensorFlow tensors and scale well to large clusters. In addition, the DTensor project improves the underlying TensorFlow execution and communication primitives, and they are available for use today!

    We are also collaborating with NVIDIA on several exciting new features in TensorFlow that leverage GPUs, including supporting the new FP8 datatype which should yield a significant improvement in training times for transformer models, when using the Hopper H100 GPU.

    JAX for GPU

    Google seeks to empower every developer with purpose-built tools for every step of the ML workflow. That includes TensorFlow for robust, production-ready models and JAX with highly optimized capabilities for cutting-edge research. We are pleased to announce the unique collaboration between NVIDIA and Google engineering teams to enhance TensorFlow and JAX for large deep-learning models, like LLMs. Both frameworks fully embrace NVIDIA A100 GPUs, and will support the recently-announced H100 GPUs in the future.

    One of the key advantages of JAX is the ease of achieving superior hardware utilization with industry-leading FLOPs across the accelerators. Through our collaboration with NVIDIA, we are translating these advantages to GPU using some XLA compiler magic. Specifically, we are leveraging XLA for operator fusion, improving GSPMD for GPU to support generalized data and model parallelism and optimizing for cross-host NVLink.

    Future Plans

    NVIDIA and Google are pleased with all the progress shared in this post, and are excited to hear from community members about their experience using TensorFlow and JAX, by leveraging the power of XLA for Ampere (A100) and Hopper (H100) GPUs.

    Check out the release notes for more information. To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow. If you’ve built something you’d like to share, please submit it for our Community Spotlight at goo.gle/TFCS. For feedback, please file an issue on GitHub or post to the TensorFlow Forum.

    TensorFlow is also available in the NVIDIA GPU Cloud (NGC) as a docker container that contains a validated set of libraries that enable and optimize GPU performance, with JAX NGC container coming soon later this year.

    Thank you!

    Contributors: Frederic Bastien (NVIDIA), Abhishek Ratna (Google), Sean Lee (NVIDIA), Nathan Luehr (NVIDIA), Ayan Moitra (NVIDIA), Yash Katariya (Google), Peter Hawkins (Google), Skye Wanderman-Milne (Google), David Majnemer (Google), Stephan Herhut (Google), George Karpanov (Google), Mahmoud Soliman (NVIDIA), Yuan Lin (NVIDIA), Vartika Singh (NVIDIA), Vinod Grover (NVIDIA), Pooya Jannaty (NVIDIA), Paresh Kharya (NVIDIA), Santosh Bhavani (NVIDIA)

    Read More

    September Machine Learning Updates

    September Machine Learning Updates

    Posted by the TensorFlow team

    On September 14, at the Google Developers Summit in Shanghai, China, members of Google’s open-source ML teams will be on stage to talk about updates to our growing ecosystem, and we’d love to share them here with you.

    MediaPipe Studio

    We recognize that creating and productionizing custom on-device ML solutions can be challenging, so we’re reinventing how you develop them by leveraging simple-to-use abstraction APIs and no-code GUIs. We’re excited to give you a sneak peek at MediaPipe Studio, our low-code and no-code solution that gets you from data to modeling to deployment on Android or iOS with native code integration libraries that make it easy to build ML-powered apps.

    General Availability of TensorFlow Lite in Google Play Services

    We recently launched the general availability of TensorFlow Lite in Google Play services. With this, the TensorFlow Lite runtime is automatically managed and updated by Google Play services, meaning you no longer need to ship it as part of your application. Your apps get smaller, and you get regular updates in the background, so your users will always have the latest version. This is nice for you as an app developer, because your user will get updates and bug fixes to the framework automatically, reducing the burden on you to provide them. And TensorFlow Lite in Google Play Services is production ready, already running over 100 billion daily inferences.

    Tensor Projects

    At Google, we are creating a world-class family of ML tools across all hardware and device types. Because we are committed to building tools that are fit for purpose, from cutting-edge research to tried-and-true planet-scale deployments, we are sharing our vision of an open ML ecosystem of the future: Tensor Projects.

    Tensor Projects is an ecosystem of ML technologies and platforms that bring together Google’s ML tools, and organize efforts across our world-class engineering and research teams. It creates a space and a promise of continued innovation and support to enable researchers, developers, MLOps, and business teams to build responsible and cutting edge ML, from novel model development to scaled production ML in any data center or on any device.

    These tools, like TensorFlow, Keras, JAX, and MediaPipe Studio, will work well independently, with each other, and/or with other industry-leading tools and standards. We want to give you full flexibility and choice to build powerful, performant infrastructure for all of your ML use cases. And it’s just the beginning. Tensor Projects will evolve and grow as ML continues to advance. Watch the summary video here:

       

    Updates to Tensorflow.org

    We have an updated experience on tensorflow.org for new or advanced users to easily find resources. You can quickly identify the right TensorFlow tool for your task, explore pre-built artifacts for faster model creation, find ideas and inspiration, get involved in the community, discover quick start guides for common scenarios and much more.

    PyTorch Foundation

    We believe in the power of choice for ML developers and continue to invest resources to make it easy to train, deploy and manage models. Our investment intends to bring machine learning to every developer’s toolbox and covers a broad spectrum of offerings: from TensorFlow and Keras, which provide free and open source offerings to millions of developers, allowing them to succeed with ML, and to JAX, which empowers researchers across Alphabet.

    Additionally, in the spirit of openness, we support PyTorch developers with Cloud TPU using XLA. To continue to help all developers succeed with Google Cloud, and to better position Google to make meaningful contributions to the community, we’re delighted to announce our role as a founding member of the newly formed PyTorch Foundation. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive the adoption of AI and ML through open source platforms.

    Thank you for reading! To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow.

    Read More

    Content moderation using machine learning: the server-side part

    Content moderation using machine learning: the server-side part

    Posted by Jen Person, Senior Developer Relations Engineer, TensorFlow

    Welcome to part 2 of my dual approach to content moderation! In this post, I show you how to implement content moderation using machine learning in a server-side environment. If you’d like to see how to implement this moderation client-side, check out part 1.

    Remind me: what are we doing here again?

    In short, anonymity can create some distance between people in a way that allows them to say things they wouldn’t say in person. That is to say, there are tons of trolls out there. And let’s be honest: we’ve all typed something online we wouldn’t actually say IRL at least once! Any website that takes public text input can benefit from some form of moderation. Client-side moderation has the benefit of instant feedback, but server-side moderation cannot be bypassed like client-side might, so I like to have both.

    This project picks up where part 1 left off, but you can also start here with a fresh copy of the Firebase Text Moderation demo code. The website in the Firebase demo showcases content moderation through a basic guestbook using a server-side content moderation system implemented through a Realtime Database-triggered Cloud Function. This means that the guestbook data is stored in the Firebase Realtime Database, a NoSQL database. The Cloud Function is triggered whenever data is written to a certain area of the database. We can choose what code runs when that event is triggered. In our case, we will use the Text Toxicity Classifier model to determine if the text written to the database is inappropriate, and then remove it from the database if needed. With this model, you can evaluate text on different labels of unwanted content, including identity attacks, insults, and obscenity. You can try out the demo to see the classifier in action.

    If you prefer to start at the end, you can follow along in a completed version of the project on GitHub.

    Server-side moderation

    The Firebase text moderation example I used as my starting point doesn’t include any machine learning. Instead, it checks for the presence of profanity from a list of words and then replaces them with asterisks using the bad-words npm package. I thought about blending this approach with machine learning (more on that later), but I decided to just wipe the slate clean and replace the code of the Cloud Function altogether. Start by navigating to the Cloud Functions folder of the Text Moderation example:

    cd textmoderation/functions

    Open index.js and delete its contents. In index.js, add the following code:

    const functions = require(‘firebase-functions’);

    const toxicity = require(‘@tensorflow-models/toxicity’);


    exports.moderator = functions.database.ref(‘/messages/{messageId}’).onCreate(async (snapshot, context) => {

      const message = snapshot.val();


      // Verify that the snapshot has a value

      if (!message) { 

        return;

      }

      functions.logger.log(‘Retrieved message content: ‘, message);


      // Run moderation checks on the message and delete if needed.

      const moderateResult = await moderateMessage(message.text);

      functions.logger.log(

        ‘Message has been moderated. Does message violate rules? ‘,

        moderateResult

      );

    });

    This code runs any time a message is added to the database. It gets the text of the message, and then passes it to a function called `moderateResult`. If you’re interested in learning more about Cloud Functions and the Realtime Database, then check out the Firebase documentation.

    Add the Text Toxicity Classifier model

    Depending on your development environment, you probably have some sort of error now since we haven’t actually written a function called moderateMessage yet. Let’s fix that. Below your Cloud Function trigger function, add the following code:

    exports.moderator = functions.database.ref(‘/messages/{messageId}’).onCreate(async (snapshot, context) => {

            //…

            // Your other function code is here.

    });


    async function moderateMessage(message) {

      const threshold = 0.9;


      let model = await toxicity.load(threshold);


      const messages = [message];


      let predictions = await model.classify(messages);


      for (let item of predictions) {

        for (let i in item.results) {

          if (item.results[i].match === true) {

            return true;

          }

        }

      }

      return false;

    }

    This function does the following:

    1. Sets the threshold for the model to 0.9. The threshold of the model is the minimum prediction confidence you want to use to set the model’s predictions to true or false–that is, how confident the model is that the text does or does not contain the given type of toxic content. The scale for the threshold is 0-1.0. In this case, I set the threshold to .9, which means the model will predict true or false if it is 90% confident in its findings.
    2. Loads the model, passing the threshold. Once loaded, it sets toxicity_model to the model` value.
    3. Puts the message into an array called messages, as an array is the object type that the classify function accepts.
    4. Calls classify on the messages array.
    5. Iterates through the prediction results. predictions is an array of objects each representing a different language label. You may want to know about only specific labels rather than iterating through them all. For example, if your use case is a website for hosting the transcripts of rap battles, you probably don’t want to detect and remove insults.
    6. Checks if the content is a match for that label. if the match value is true, then the model has detected the given type of unwanted language. If the unwanted language is detected, the function returns true. There’s no need to keep checking the rest of the results, since the content has already been deemed inappropriate.
    7. If the function iterates through all the results and no label match is set to true, then the function returns false – meaning no undesirable language was found. The match label can also be null. In that case, its value isn’t true, so it’s considered acceptable language. I will talk more about the null option in a future post.
    If you completed part 1 of this tutorial, then these steps probably sound familiar. The server-side code is very similar to the client-side code. This is one of the things that I like about TensorFlow.js: it’s often straightforward to transition code from the client to server and vice versa.

    Complete the Cloud Functions code

    Back in your Cloud Function, you now know that based on the code we wrote for moderateMessage, the value of moderateResult will be true or false: true if the message is considered toxic by the model, and false if it does not detect toxicity with certainty greater than 90%. Now add code to delete the message from the database if it is deemed toxic:

      // Run moderation checks on the message and delete if needed.

      const moderateResult = await moderateMessage(message.text);

      functions.logger.log(

        ‘Message has been moderated. Does message violate rules? ‘,

        moderateResult

      );


      if (moderateResult === true) {

        var modRef = snapshot.ref;

        try {

          await modRef.remove();

        } catch (error) {

          functions.logger.error(‘Remove failed: ‘ + error.message);

        }

      }

    This code does the following:

    1. Checks if moderateResult is true, meaning that the message written to the guestbook is inappropriate.
    2. If the value is true, it removes the data from the database using the remove function from the Realtime Database SDK.
    3. Logs an error if one occurs.

    Deploy the code

    To deploy the Cloud Function, you can use the Firebase CLI. If you don’t have it, you can install it using the following npm command:

    npm install g firebasetools

    Once installed, use the following command to log in:

    firebase login

    Run this command to connect the app to your Firebase project:

    firebase use add

    From here, you can select your project in the list, connect Firebase to an existing Google Cloud project, or create a new Firebase project.
    Once the project is configured, use the following command to deploy your Cloud Function:

    firebase deploy

    Once deployment is complete, the logs include the link to your hosted guestbook. Write some guestbook entries. If you followed part 1 of the blog, you will need to either delete the moderation code from the website and deploy again, or manually add guestbook entries to the Realtime Database in the Firebase console.

    You can view your Cloud Functions logs in the Firebase console.

    Building on the example

    I have a bunch of ideas for ways to build on this example. Here are just a few. Let me know which ideas you would like to see me build, and share your suggestions as well! The best ideas come from collaboration.

    Get a queue

    I mentioned that the “match” value of a language label can be true, false, or null without going into detail on the significance of the null value. If the label is null, then the model cannot determine if the language is toxic within the given threshold. One way to limit the number of null values is to lower this threshold. For example, if you change the threshold value to 0.8, then the model will label the match value as true if it is at least 80% certain that the text contains language that fits the label. My website example assigns labels of value null the same as those labeled false, allowing that text through the filter. But since the model isn’t sure if that text is appropriate, it’s probably a good idea to get some eyes on it. You could add these posts to a queue for review, and then approve or deny them as needed. I said “you” here, but I guess I mean “me”. If you think this would be an interesting use case to explore, let me know! I’m happy to write about it if it would be useful.

    What’s in ‘store

    The Firebase moderation sample that I used as the foundation of my project uses Realtime Database. I prefer to use Firestore because of its structure, scalability, and security. Firestore’s structure is well suited for implementing a queue because I could have a collection of posts to review within the collection of posts. If you’d like to see the website using Firestore, let me know.

    Don’t just eliminate – moderate!

    One of the things I like about the original Firebase moderation sample is that it sanitizes the text rather than just deleting the post. You could run text through the sanitizer before checking for toxic language through the text toxicity model. If the sanitized text is deemed appropriate, then it could overwrite the original text. If it still doesn’t meet the standards of decent discourse, then you could still delete it. This might save some posts from otherwise being deleted.

    What’s in a name?

    You’ve probably noticed that my moderation functionality doesn’t extend to the name field. This means that even a halfway-clever troll could easily get around the filter by cramming all of their expletives into that name field. That’s a good point and I trust that you will use some type of moderation on all fields that users interact with. Perhaps you use an authentication method to identify users so they aren’t provided a field for their name. Anyway, you get it: I didn’t add moderation to the name field, but in a production environment, you definitely want moderation on all fields.

    Build a better fit

    When you test out real-world text samples on your website, you might find that the text toxicity classifier model doesn’t quite fit your needs. Since each social space is unique, there will be specific language that you are looking to include and exclude. You can address these needs by training the model on new data that you provide.

    If you enjoyed this article and would like to learn more about TensorFlow.js, then there are a ton of things you can do:

    Read More

    Announcing TensorFlow Official Build Collaborators

    Announcing TensorFlow Official Build Collaborators

    Posted by Rostam Dinyari, Nitin Srinivasan, Douglas Yarrington and Rishika Sinha of the TensorFlow team

    Starting with TensorFlow 2.10, we are excited to announce our collaboration with Intel, AWS, ARM, and Linaro to develop official TensorFlow builds. This means that when you pip install TensorFlow on Windows Native and Linux Aarch64 hosts, you will receive a build of TensorFlow that has been reviewed and vetted by these platform experts. This happens transparently, and there are no changes to your workflow . We’ve updated the pip install scripts so it’s automatic for you.

    Official builds are TensorFlow releases that follow rigorous functional and performance testing standards Google engineers and our collaborators publish with each release, which we align with our published support expectations under the SIG Build forum. Collaborators monitor the builds daily and publish artifacts to the community in coordination with the overall TensorFlow release schedule.

    For the majority of use cases, there will be no changes to the behavior of pip install or pip uninstall TensorFlow. However, for Windows Native and Linux Aarch64 based systems an additional pip uninstall step may be needed. You can find details about install, uninstall and other best practices on tensorflow.org/install/pip.

    Over time, we expect the number of collaborators to expand but for now we want to share with you the progress we have made together to release increasingly performant and robust builds for these important platforms. You can learn more about each of the collaborations below.

    Intel Collaboration

    We are pleased to share that Intel has joined the 3P Official Build program to take ownership over Windows Native CPU builds. This will include responsibility for managing both nightly and final production releases. We and Intel do not expect this to disrupt end user experiences; users simply install TensorFlow as usual and the Intel produced Python binary artifacts (wheel files) will be correctly installed.

    AWS, ARM and Linaro Collaboration

    We are especially pleased to announce the availability of official builds for ARM Aarch64, specifically tuned for AWS Graviton instances. Together, the experts at Linaro have supported Google, AWS and ARM to ensure a highly performant version of TensorFlow is available on the emerging class of Aarch64 devices.

    Next steps

    These changes should be transparent for most users. You can learn more at tensorflow.org/install.

    Read More