Building the Future of TensorFlow

Building the Future of TensorFlow

Posted by the TensorFlow team

We’ve started planning the future of TensorFlow! In this article, we’d like to share our vision.

We open-sourced TensorFlow nearly seven years ago, on November 9, 2015. Since then, thanks to thousands of open-source contributors and our incredible community of Google Developer Experts, community organizers, researchers, and educators around the globe, TensorFlow has come to define its category. 

Today, TensorFlow is the most-used machine learning platform, adopted by millions of developers. It’s the 3rd most-starred software repository on GitHub (right behind Vue and React) and the most-downloaded machine learning package on PyPI. It has brought machine learning to the mobile ecosystem: TFLite now runs on four billion devices (maybe on yours, too!). TensorFlow has also brought machine learning to the Web: TensorFlow.js is now downloaded 170 thousand times weekly.

Across Google’s product lineup, TensorFlow powers virtually all production machine learning, from Search, GMail, YouTube, Maps, Play, Ads, Photos, and many more. Beyond Google, at other Alphabet companies, TensorFlow and Keras enable the machine intelligence in Waymo’s self-driving cars. 

In the broader industry, TensorFlow powers machine learning systems at thousands of companies, including most of the largest machine learning users in the world – Apple, ByteDance, Netflix, Tencent, Twitter, and countless more. And in the research world, every month, Google Scholar is indexing over 3,000 new scientific publications that mention TensorFlow or Keras.

Today, our user base and developer ecosystem are larger than ever, and growing!

We see the growth of TensorFlow not just as an achievement to celebrate, but as an opportunity to go further and deliver more value for the machine learning community.

Our goal is to provide the best machine learning platform on the planet. Software that will become a new superpower in the toolbox of every developer. Software that will turn machine learning from a niche craft into an industry as mature as web development.

To achieve this, we listen to the needs of our users, anticipate new industry trends, iterate on our APIs, and work to make it increasingly easy for you to innovate at scale. In the same way that TensorFlow originally helped the rise of deep learning, we want to continue to facilitate the evolution of machine learning by giving you the platform that lets you push the boundaries of what’s possible. Machine learning is evolving rapidly, and so is TensorFlow.

Today, we’re excited to announce we’ve started working on the next iteration of TensorFlow that will enable the next decade of machine learning development. We are building on TensorFlow’s class-leading capabilities, and focusing on four pillars.

Four pillars of TensorFlow

Fast and scalable

  • XLA Compilation. We are focusing on XLA compilation and aim to make most model training and inference workflows faster on GPU and CPU, building on XLA’s performance wins on TPU. We intend for XLA to become the industry-standard deep learning compiler, and we’ve opened it up to open-source collaboration as part of the OpenXLA initiative.
  • Distributed computing. We are investing in DTensor, a new API for large-scale model parallelism. DTensor unlocks the future of ultra-large model training and deployment and allows you to develop your model as if you were training on a single device, even while using multiple clients. DTensor will be unified with the tf.distribute API, allowing for flexible model and data parallelism.
  • Performance optimization. Besides compilation, we are also further investing in algorithmic performance optimization techniques such as mixed-precision and reduced-precision computation, which can deliver considerable speed ups on GPUs and TPUs.

Applied ML

  • New tools for CV and NLP. We are investing in our ecosystem for applied ML, in particular via the KerasCV and KerasNLP packages which offer modular and composable components for applied CV and NLP use cases, including a large array of state-of-the-art pretrained models.
  • Developer resources. We are adding more code examples, guides, and documentation for popular and emerging applied ML use cases. We aim to increasingly reduce the barrier to entry of ML and turn it into a tool in the hands of every developer.

Ready to deploy

  • Easier exporting. We are making it even easier to export to mobile (Android or iOS), edge (microcontrollers), server backends, or JavaScript. Exporting your model to TFLite and TF.js and optimizing its inference performance will be as easy as a call to `model.export()`.
  • C++ API for applications. We are developing a public TF2 C++ API for native server-side inference as part of a C++ application.
  • Deploy JAX models. We are making it easier for you to deploy models developed using JAX with TensorFlow Serving, and to mobile and the web with TensorFlow Lite and TensorFlow.js. 

Simplicity

  • NumPy API. As the field of ML expanded over the last few years TensorFlow’s API surface also increased, not always in ways that are consistent or simple to understand. We are working actively on consolidating and simplifying these APIs. For example, we will be adopting the NumPy API standard for numerics. 
  • Easier debugging. A framework isn’t just its API surface, it’s also its debugging experience. We aim at minimizing the time-to-solution for developing any applied ML system by focusing on better debugging capabilities.

The future of TensorFlow will be 100% backwards-compatible

We want TensorFlow to serve as a bedrock foundation for the machine learning industry to build upon. We see API stability as our most important feature. As an engineer who relies on TensorFlow as part of their product, as a builder of a TensorFlow ecosystem package, you should be able to upgrade to the latest TensorFlow version and immediately start benefiting from its new features and performance improvements – without fear that your existing codebase might break. As such, we commit to full backwards compatibility from TensorFlow 2 to the next version – your TensorFlow 2 code will run as-is. There will be no conversion script to run, no manual changes to apply.

Timeline

We plan to release a preview of the new TensorFlow capabilities in Q2 2023 and will release the production version later in the year. We will publish regular updates on our progress in the meantime. You can follow our progress via the TensorFlow blog, and on the TensorFlow YouTube channel.

Your feedback is welcome

We want to hear from you! For questions or feedback, please reach out via the TensorFlow forum.

Read More

How startups can benefit from TFX

How startups can benefit from TFX

Posted by Hannes Hapke and Robert Crowe

Startup companies building Machine Learning-based services and products require production-level infrastructure for training and serving their models. This can be especially challenging for small teams that are spread thin and need to innovate and grow quickly. TFX (TensorFlow Extended) provides a range of options to mitigate these challenges. In this blog post, you will learn how the San Francisco-based FinTech startup Digits has benefitted from applying TFX early, how TFX helps Digits grow, and how other startups can benefit from TFX too.

TFX is a set of libraries that streamline the development and deployment of production machine learning models, including implementing automated training pipelines. You might already be aware of major companies like Alphabet (including Google and Waze), Spotify, or Twitter successfully leveraging TFX to manage their machine learning pipelines. But TFX also has enormous benefits for medium-stage startups, like Digits.

Before we dive into how we are using TFX at Digits, let’s introduce a conceptual software design question that every startup will face: Choosing between tactical and strategic programming (introduced by John Ousterhout in “A Philosophy of Software Design”). In his analysis, Ousterhout shows that strategic programming is a much more sustainable approach for long-term success: even though it takes more time to get to an initial release, strategic programming will help make the complexity of a growing codebase more manageable.

Source: “A Philosophy of Software Design”, John Ousterhout, 2018

At Digits, we found that the same concept applies to machine learning. While we could train machine learning models in a minimal Jupyter notebooks-based setup, this system would become increasingly hard to manage as complexity increases. In this scenario, any initial wins of a rapidly trained machine learning model would dwindle as the company grows. Therefore, we invested heavily in our ML engineering setup from the start:

    1. We developed ML-specific workflows and created a clear distinction between ML experiments and production-ready ML.
    2. We invested heavily in ensuring we use tools like TFX, ML Metadata Store, and Google Cloud’s Vertex AI as efficiently as possible.
    3. We automated our model deployment processes to remove human shortcuts and errors.

Ousterhout found that strategic programming requires more upfront time, but developers will benefit from lower system complexity. For example, we have spent roughly 2-3 months setting up all the ML tooling and workflows, and we recognize that it is a substantial investment.

While this might not be feasible for startups that are still trying to establish a product-market-fit, we believe that this ML strategy is the right path for startups with a growing customer base. Furthermore, it has been our experience that applying strategic programming to machine learning problems will add to the developers’ job satisfaction and increase retention among the data team in the long run (fewer rushed hotfixes, systematic model retraining, etc.).

Growing our business with TFX, we have identified three key benefits that have allowed us to optimize our ML model training and deployment in ways that have been crucial to our success as a startup:

Key benefit 1: Standardization

At Digits, we distinguish between machine learning experiments and production machine learning. The objective of an ML experiment is to develop a proof of concept model. Our engineers are free to use any framework and tooling for ML experiments as long as our security requirements are met.

When we bring a model to production and customers rely on consistent predictions, we convert these experiments to production ML models. Every time we create a production ML model, we follow a consistent project structure and use the same steps for data and model analysis as well as feature engineering. TFX is crucial in standardizing those aspects.

Because each production model follows the same standards, we can detect potential synergies between projects early. This approach enables us to share code between projects even in the earliest development stages. Standardization has increased code reusability, and new projects have a much faster ramp-up time.

Another benefit of standardizing our workflows with TFX is that we can now apply our software engineering and DevOps principles to ML projects: Pipelines that run non-periodically can be triggered by our continuous integration system. TFX pipelines then register the newly produced model with our model registry. Based on this, the continuous integration system can also update our ML-serving endpoints and automatically deploy our ML models. This way, all changes to our ML systems are tracked in our Git repository.

System components including CI

Key benefit 2: Growth

In contrast to Keras’ preprocessing layers, TFX supports feature engineering, model analysis, and data validation via Apache Beam tasks. This way we only need to implement the feature engineering once – with TFX, we can simply swap out the Apache Beam configuration when our datasets grow and we need more processing capabilities.

Startups can begin with the TFX default setup based on Apache Beam’s DirectRunner. The DirectRunner mode doesn’t allow any parallelized execution of pipeline tasks but is available without any setup time. As the startup grows, the engineering team can swap out the underlying Apache Beam Runner for a more performant system like Google Cloud’s Dataflow, Apache Spark, or Apache Flink, with minimal code changes – often only one line. While Dataflow is only available to Google Cloud customers, Apache Spark and Flink are open-source, and all major cloud providers offer managed services.

We successfully employed this strategy at Digits: We started out with Apache Beam’s DirectRunner for our initial pipelines, a setup that helped us understand how TFX can improve our ML workflows. As our company grew, the volume of data to process grew as well. To handle the increasing volume of data, TFX allowed us to switch to a different Beam runner without any friction. By building our pipelines in two phases, we didn’t have to implement TFX and the more performative and complex orchestration dependencies all at once, and saved our small initial team considerable strain.

Different Beam Runner options, depending on the data volume

Another advantage that was useful to us is how easily TFX integrates with the Google Cloud ecosystem. Google Cloud’s Vertex AI Pipeline natively supports TFX and provides all necessary pipeline infrastructure as a managed service. Instead of managing our own Kubernetes clusters, we can easily switch back and forth between pipeline runs in different Google Cloud projects. We are also not limited by cluster compute and memory limitations since we can access both GPUs and TPUs with Vertex Pipelines.

Key benefit 3: Reproducibility & Repeatability

Keeping track of all ML artifacts is key for the sustainable management of production ML models. Our goal was to track all relevant data points for all our production models. We needed to store artifacts like datasets, data splits, data validation results, feature transformations, trained models, and model analysis results. But we also didn’t want to slow down the ML team with extensive record keeping.

TFX is tightly integrated with the ML Metadata Store (MLMD) which helps us to keep track of all model details in one place. Under the hood, each TFX component in our ML pipelines records all intermediate pipeline results and metadata. We can generate model lineages for each model produced by our ML pipelines without any additional overhead. This has proven to be an indispensable tool when things move fast.

Model lineage

Digits’ Lessons Learned

While adapting TFX to our needs did take some time, we have seen this initial investment pay off over time. We are now able to convert machine learning experiments within minutes into production pipelines and continuously produce and deploy new versions of our models.

  • TFX helps us to make our ML codebase more modular. We have developed several custom TFX components (e.g. for model deployments, model annotations, or model tracking). Due to the modularity of the TFX components, all projects can benefit from enhancements made in a single project.
  • At the same time, we benefited from standardizing our production ML codebase with TFX. As a growing startup company, we found this standardization especially useful as it helped us stay on track as complexity increased. New projects now follow a highly optimized cookie-cutter approach, which has resulted in major time and labor savings. Those standardizations also allowed us to automate large parts of the model deployment processes, which in turn helped free up engineering capacities. We have found that these savings are vital for the small, flexible ML teams which are common in startups. 
  • Using TFX also has allowed us to future-proof our MLOps tooling. The fact that TFX uses Apache Beam under the hood gave us confidence that we don’t need to reengineer our MLOps setup as the company grows. 
  • TFX, its metadata store, and its Google Cloud integrations have helped us reproduce models from given artifacts and made it much easier to accurately recreate any previous ML models whenever needed.

The experience of growing Digits with TFX has convinced us that any company that is serious about machine learning can benefit from TFX – at every step along the way, from small startups to large corporations.

For more information

To learn more about TFX, check out the TFX website, join the TFX discussion group, dive into other posts in the TFX blog, watch our TFX playlist on YouTube, or subscribe to the TensorFlow channel.

Read More

CircularNet: Reducing waste with Machine Learning

CircularNet: Reducing waste with Machine Learning

Posted by Sujit Sanjeev, Product Manager, Robert Little, Sustainability Program Manager, Umair Sabir, Machine Learning Engineer

Have you ever been confused about how to file your taxes? Perplexed when assembling furniture? Unsure about how to understand your partner? It turns out that many of us find the act of recycling as more confusing than all of the above. As a result, we do a poor job of recycling right, with less than 10% of our global resources recycled, and tossing 1 of every 5 items (~17%) in a recycling bin that shouldn’t be there. That’s bad news for everyone — recycling facilities catch fire, we lose billions of dollars in recyclable material every year — and at an existential level, we miss an opportunity to leverage recycling as an impactful tool to combat climate change. With this context in mind, we asked ourselves – how might we use the power of technology to ensure that we recycle more and recycle right?

As the world population grows and urbanizes, waste production is estimated to reach 2.6 billion tons a year in 2030, an increase from its current level of around 2.1 billion tons. Efficient recycling strategies are critical to foster a sustainable future.

The facilities where our waste and recyclables are processed are called “Material Recovery Facilities” (MRFs). Each MRF processes tens of thousands of pounds of our societal “waste” every day, separating valuable recyclable materials like metals and plastics from non-recyclable materials. A key inefficiency within the current waste capture and sorting process is the inability to identify and segregate waste into high quality material streams. The accuracy of the sorting directly determines the quality of the recycled material; for high-quality, commercially viable recycling, the contamination levels need to be low. Even though the MRFs use various technologies alongside manual labor to separate materials into distinct and clean streams, the exceptionally cluttered and contaminated nature of the waste stream makes automated waste detection challenging to achieve, and the recycling rates and the profit margins stay at undesirably low levels.

Enter what we call “CircularNet”, a set of models that lowers barriers to AI/ML tech for waste identification and all the benefits this new level of transparency can offer.

Our goal with CircularNet is to develop a robust and data-efficient model for waste/recyclables detection, which can support the way we identify, sort, manage, and recycle materials across the waste management ecosystem. Models such as this could potentially help with:

  • Better understanding and capturing more value from recycling value chains
  • Increasing landfill diversion of materials
  • Identifying and reducing contamination in inbound and outbound material streams

Challenges

Processing tens of thousands of pounds of material every day, Material Recovery Facility waste streams present a unique and ever-changing challenge: a complex, cluttered, and diverse flow of materials at any given moment. Additionally, there is a lack of comprehensive and readily accessible waste imagery datasets to train and evaluate ML models.

The models should be able to accurately identify different types of waste in “real world” conditions of a MRF – meaning identifying items despite severe clutter and occlusions, high variability of foreground object shapes and textures, and severe object deformation.

In addition to these challenges, others that need to be addressed are visual diversity of foreground and background objects that are often severely deformed, and fine-grained differences between the object classes (e.g. brown paper vs. cardboard; or soft vs. rigid plastic).

There also needs to be consistency while tracking recyclables through the recycling value chain e.g. at point of disposal, within recycling bins and hauling trucks, and within material recovery facilities.

Solution

The CircularNet model is built to perform Instance Segmentation by training on thousands of images with the Mask R-CNN algorithm. Mask R-CNN was implemented from the TensorFlow Model Garden, which is a repository consisting of multiple models and modeling solutions for Tensorflow users.

By collaborating with experts in the recycling industry, we developed a customized and globally-applicable taxonomy of material types (e.g. “paper” “metal”,”plastic”, etc.) and material forms (e.g. “bag”, “bottle”, “can”, etc.), which is used to annotate training data for the model. Models were developed to identify material types, material forms and plastic types (HDPE, PETE, etc). Unique models were trained for different purposes, thus helping achieve better accuracy (when harmonized and flexibility to cater to different applications). The models are trained with various backbones such as ResNet, MobileNet and, SpineNet.

To train the model on distinct waste and recyclable items, we have collaborated with several MRFs and have started to accumulate real-world images. We plan to continue growing the number and geographic locations of our MRF and waste management ecosystem partnerships in order to continue training the model across diverse waste streams.

Here are a few details on how our model was trained.

  • Data importing, cleaning and pre-processing
    • Once the data was collected, the annotation files had to be converted into COCO JSON format. All noise, errors and incorrect labels were removed from the COCO JSON file. Corrupt images were also removed both from the COCO JSON and dataset to ensure smooth training.
    • The final file is converted to the TFRecord format for faster training
  • Training
    • Mask RCNN was trained using the Model Garden repository on Google Cloud Platform.
    • Hyper parameter optimization was done by changing image size, batch size, learning rate, training steps, epochs and data augmentation steps
  • Model conversion 
    • Final checkpoints achieved after training the model were converted to both saved model and TFLite model formats to support server side and edge side deployments
  • Model deployment 
    • We are deploying the model on Google Cloud for server side inferencing and on edge computing devices
  • Visualization
    • Three ways in which the CircularNet model characterizes recyclables: Form, Material, & Plastic Type


      • Model identifying the material type (Ex. “Plastic”)
      • Model identifying the product form of the material (Ex. “Bottle”)
      • Model identifying the types of plastics (Ex. “HDPE”)

    How to use the CircularNet model

    All the models are available with guides and their respective colab scripts for pre-processing, training, model conversion, inference and visualization are available in the Tensorflow Model Garden repository. Pre-trained models for direct use from servers, browsers or mobile devices are available on TensorFlow Hub.

    Conclusion

    We hope the model can be deployed by, tinkered with, and improved upon by various stakeholders across the waste management ecosystem. We are in the early days of model development. By collaborating with a diverse set of stakeholders throughout the material recovery value chain, we can better create a more globally applicable model. If you are interested in collaborating with us on this journey, please reach out to waste-innovation-external@google.com.

    Acknowledgement

    A huge thank you to everyone who’s hard work made this project possible! We couldn’t have done this without partnering with the recycling ecosystem.

    Special thanks to Mark McDonald, Fan Yang, Vighnesh Birodkar and Jeff Rechtman

    Read More

    Building a reinforcement learning agent with JAX, and deploying it on Android with TensorFlow Lite

    Building a reinforcement learning agent with JAX, and deploying it on Android with TensorFlow Lite

    Posted by Wei Wei, Developer Advocate

    In our previous blog post Building a board game app with TensorFlow: a new TensorFlow Lite reference app, we showed you how to use TensorFlow and TensorFlow Agents to train a reinforcement learning (RL) agent to play a simple board game ‘Plane Strike’. We also converted the trained model to TensorFlow Lite and then deployed it into a fully-functional Android app. In this blog, we will demonstrate a new path: train the same RL agent with Flax/JAX and deploy it into the same Android app we have built before. The complete code has been open sourced in the tensorflow/examples repository for your reference.

    To refresh your memory, our RL-based agent needs to predict a strike position based on the human player’s board position so that it can finish the game before the human player does. For more detailed game rules, please refer to our previous blog.

    Demo game play in ‘Plane Strike’
    Demo game play in ‘Plane Strike’

    Background: JAX and TensorFlow

    JAX is a NumPy-like library developed by Google Research for high performance computing. It uses XLA to compile programs optimized for GPUs and TPUs. Flax is a popular neural network library built on top of JAX. Researchers have been using JAX/Flax to train very large models with billions of parameters (such as PaLM for language understanding and generation, or Imagen for image generation), making full use of modern hardware. If you’re new to JAX and Flax, start with this JAX 101 tutorial and this Flax Getting Started example.

    TensorFlow started as a library for ML towards the end of 2015 and has since become a rich ecosystem that includes tools for productionizing ML pipelines (TFX), data visualization (TensorBoard), deploying ML models to edge devices (TensorFlow Lite), and devices running on a web browser or any device capable of executing JavaScript (TensorFlow.js). Models developed in JAX or Flax can tap into this rich ecosystem by first converting such a model to the TensorFlow SavedModel format, and then using the same tooling as if they had been developed in TensorFlow natively.

    If you already have a JAX-trained model and want to deploy it today, we have put together a list of resources for you:

    • This blog post demos how to convert a Flax/JAX model to TFLite and run it in a native Android app

    Overall, no matter what your deployment target is (server, web or mobile), we got you covered.
    Implementing the game agent with Flax/JAX

    Coming back to our board game, to implement our RL agent, we will leverage the same gym environment as before. We will train the same policy gradient model using Flax/JAX this time. Recall that mathematically the policy gradient is defined as:

     

    where:

    • T: the number of timesteps per episode, which can vary per episode
    • st: the state at timestep t
    • at: chosen action at timestep t given state s
    • πθ: the policy parameterized by θ
    • R(*): the reward gathered, given the policy

    We define a 3-layer MLP as our policy network, which predicts the agent’s next strike position.

    class PolicyGradient(nn.Module):

      “””Neural network to predict the next strike position.”””

     

      @nn.compact

      def __call__(self, x):

        dtype = jnp.float32

        x = x.reshape((x.shape[0], –1))

        x = nn.Dense(

            features=2 * common.BOARD_SIZE**2, name=‘hidden1’, dtype=dtype)(

               x)

        x = nn.relu(x)

        x = nn.Dense(features=common.BOARD_SIZE**2, name=‘hidden2’, dtype=dtype)(x)

        x = nn.relu(x)

        x = nn.Dense(features=common.BOARD_SIZE**2, name=‘logits’, dtype=dtype)(x)

        policy_probabilities = nn.softmax(x)

        return policy_probabilities

    In our main training loop, in each iteration we use the neural network to play a round of the game, gather the trajectory information (game board positions, actions taken and rewards), discount the rewards, and then train the model with the trajectories.

    for i in tqdm(range(iterations)):

       predict_fn = functools.partial(run_inference, params)

       board_log, action_log, result_log = common.play_game(predict_fn)

       rewards = common.compute_rewards(result_log)

       optimizer, params, opt_state = train_step(optimizer, params, opt_state,

                                                 board_log, action_log, rewards)

    In the train_step() method, we first compute the loss using the trajectories. Then we use jax.grad() to compute the gradients. Lastly we use Optax, a gradient processing and optimization library for JAX, to update the model parameters.


    def compute_loss(logits, labels, rewards):

      one_hot_labels = jax.nn.one_hot(labels, num_classes=common.BOARD_SIZE**2)

      loss = -jnp.mean(

          jnp.sum(one_hot_labels * jnp.log(logits), axis=-1) * jnp.asarray(rewards))

      return loss

     

     

    def train_step(model_optimizer, params, opt_state, game_board_log,

                  predicted_action_log, action_result_log):

    “””Run one training step.”””

     

      def loss_fn(model_params):

        logits = run_inference(model_params, game_board_log)

        loss = compute_loss(logits, predicted_action_log, action_result_log)

        return loss

     

      def compute_grads(params):

        return jax.grad(loss_fn)(params)

     

      grads = compute_grads(params)

      updates, opt_state = model_optimizer.update(grads, opt_state)

      params = optax.apply_updates(params, updates)

      return model_optimizer, params, opt_state

     

     

    @jax.jit

    def run_inference(model_params, board):

      logits = PolicyGradient().apply({‘params’: model_params}, board)

      return logits

    That’s it for the training loop. We can visualize the training progress in TensorBoard as below; here we use the proxy metric ‘game_length’ (the number of steps to finish the game) to track the progress. The intuition is that when the agent becomes smarter, it can finish the game in fewer steps.


    Converting the Flax/JAX model to TensorFlow Lite and integrating with the Android app

    After the model is trained, we use the jax2tf, a TensorFlow-JAX interoperation tool, to convert the JAX model into a TensorFlow concrete function. And the final step is to call TensorFlow Lite converter to convert the concrete function into a TFLite model.

    # Convert to tflite model

     model = PolicyGradient()

     jax_predict_fn = lambda input: model.apply({‘params’: params}, input)

     

     tf_predict = tf.function(

         jax2tf.convert(jax_predict_fn, enable_xla=False),

         input_signature=[

             tf.TensorSpec(

                 shape=[1, common.BOARD_SIZE, common.BOARD_SIZE],

                 dtype=tf.float32,

                 name=‘input’)

         ],

         autograph=False,

     )

     

     converter = tf.lite.TFLiteConverter.from_concrete_functions(

         [tf_predict.get_concrete_function()], tf_predict)

     

     tflite_model = converter.convert()

     

     # Save the model

     with open(os.path.join(modeldir, ‘planestrike.tflite’), ‘wb’) as f:

       f.write(tflite_model)

    The JAX-converted TFLite model behaves exactly like any TensorFlow-trained TFLite model. You can visualize it with Netron:

    Visualizing TFLite model converted from Flax/JAX using Netron
    Visualizing TFLite model converted from Flax/JAX using Netron
    We can use exactly the same Java code as before to invoke the model and get the prediction.

    convertBoardStateToByteBuffer(board);
    tflite.run(boardData, outputProbArrays);
    float[] probArray = outputProbArrays[0];
    int agentStrikePosition = -1;
    float maxProb = 0;
    for (int i = 0; i < probArray.length; i++) {
      int x = i / Constants.BOARD_SIZE;
      int y = i % Constants.BOARD_SIZE;
      if (board[x][y] == BoardCellStatus.UNTRIED && probArray[i] > maxProb) {
        agentStrikePosition = i;
        maxProb = probArray[i];
      }
    }

    Conclusion

    In summary, this article walks you through how to train a simple reinforcement learning model with Flax/JAX, leverage jax2tf to convert it to TensorFlow Lite, and integrate the converted model into an Android app.

    Now you have learned how to build neural network models with Flax/JAX, and tap into the powerful TensorFlow ecosystem to deploy your models pretty much anywhere you want. We can’t wait to see the fantastic apps you build with both JAX and TensorFlow!

    Read More

    Fast Reduce and Mean in TensorFlow Lite

    Fast Reduce and Mean in TensorFlow Lite

    Posted by Alan Kelly, Software Engineer

    We are happy to share that TensorFlow Lite version 2.10 has optimized Reduce (All, Any, Max, Min, Prod, Sum) and Mean operators. These common operators replace one or more dimensions of a multi-dimensional tensor with a scalar. Sum, Product, Min, Max, Bitwise And, Bitwise Or and Mean variants of reduce are available. Reduce is now fast for all possible inputs.

    Benchmark for Reduce Mean on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [1, 3], Output is a 2D tensor of shape [32, 5].

    Benchmark for Reduce Prod on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [1, 3], Output is a 2D tensor of shape [32, 5].

    Benchmark for Reduce Sum on Google Pixel 6 Pro Cortex A55 (small core). Input tensor is 4D of shape [32, 256, 5, 128] reduced over axis [0, 2], Output is a 2D tensor of shape [256, 128].


    These speed-ups are available by default using the latest version of TFLite on all architectures.

    How does this work?

    To understand how these improvements were made, we need to look at the problem from a different perspective. Let’s take a 3D tensor of shape [3, 2, 5].

    Let’s reduce this tensor over axes [0] using Reduce Max. This will give us an output tensor of shape [2, 5] as dimension 0 will be removed. Each element in the output tensor will contain the max of the three elements in the same position along dimension 0. So the first element will be max{0, 10, 20} = 20. This gives us the following output:

    To simplify things, let’s reshape the original 3D tensor as a 2D tensor of shape [3, 10]. This is the exact same tensor, just visualized differently.

    Reducing this over dimension 0 by taking the max of each column gives us:

    Which we then reshape back to its original shape of [2, 5]

    This demonstrates how simply changing how we visualize the tensor dramatically simplifies the implementation. In this case, dimensions 1 and 2 are adjacent and not being reduced over. This means that we can fold them into one larger dimension of size 2 x 5 = 10, transforming the 3D tensor into a 2D one. We can do the same to adjacent dimensions which are being reduced over.

    Let’s take a look at all possible Reduce permutations for the same 3D tensor of shape [3, 2, 5].

    Of all 8 permutations, only two 3D permutations remain after we re-visualize the input tensor. For any number of dimensions, there are only two possible reduction permutations: the rows or the columns. All other ones simplify to a lower dimension.

    This is the trick to an efficient and simple reduction operator as we no longer need to calculate input and output tensor indices and our memory access patterns are much more cache friendly.

    This also allows the compiler to auto-vectorize the integer reductions. The compiler won’t auto-vectorize floats as float addition is not commutative. You can see the code which removes redundant axes here and the reduction code here.

    Changing how we visualize tensors is a powerful code simplification and optimization technique which is used by many TensorFlow Lite operators.

    Next steps

    We are always working on adding new operators and speeding up existing ones. We’d love to hear about models of yours which have benefited from this work. Get in touch via the TensorFlow Forum. Thanks for reading!Read More

    Colab’s ‘Pay As You Go’ Offers More Access to Powerful NVIDIA Compute for Machine Learning

    Posted by Chris Perry, Google Colab Product Lead

    Google Colab is launching a new paid tier, Pay As You Go, giving anyone the option to purchase additional compute time in Colab with or without a paid subscription. This grants access to Colab’s  powerful NVIDIA GPUs and gives you more control over your machine learning environment.

    Colab is fully committed to supporting all of our users whether or not they pay for additional compute, and our free-of-charge tier stays in its current form. Today’s announcement reflects additions to paid options only.

    Colab helps you accomplish more with machine learning

    Google Colab is the easiest way to start machine learning. From the Colab notebooks powering TensorFlow’s tutorials and guides to Deepmind’s AlphaFold example, Colab is helping the world learn ML and share the results broadly, democratizing machine learning.

    Colab Pay As You Go further expands the potantial for using Colab. Pay As You Go allows anyone to purchase more compute time with Colab, regardless of whether or not they have a monthly subscription. Customers can use this feature to dramatically increase their usage allotments of Colab over what was possible before. Try it out at colab.research.google.com/signup

    Previously, Colab’s paid quota service throttled compute usage to smooth out quota exhaustion over the entire month of a subscription to ensure a paid user would be able to access Colab compute as much as possible over their month’s subscription: we didn’t want users to fully exhaust their quota on day one and spend the rest of the month frustrated by lack of access to runtimes. Now with Pay As You Go, we are relaxing usage throttling for all paid users (though this will remain the case for users in our free of charge tier).

    Paid users now have the flexibility to exhaust compute quota, measured in compute units, at whatever rate they choose. As compute units are exhausted, a user can choose to purchase more with Pay As You Go at their discretion. Once a user has exhausted their compute units their Colab usage quota will revert to our free of charge tier limits.

    Increasing your power with NVIDIA GPUs

    Paid Colab users can now choose between a standard or premium GPU in Colab, giving you the ability to upgrade your GPU when you need more power. Standard GPUs are typically NVIDIA T4 Tensor Core GPUs, while premium GPUs are typically NVIDIA V100 or A100 Tensor Core GPUs. Getting a specific GPU chip type assignment is not guaranteed and depends on a number of factors, including availability and your paid balance with Colab. If you want guaranteed access to a specific machine configuration, we recommend purchasing a VM on GCP Marketplace.

    When you need more power, select premium GPU in your runtime settings: Runtime > Change runtime type > GPU class > Premium. Premium GPUs will deplete your paid balance in Colab faster than standard GPUs.

    Colab is the right choice for ML projects

    Colab is the right choice for your machine learning project: TensorFlow and many excellent ML libraries come pre-installed, pre-warmed GPUs are a click away, and sharing your notebook with a collaborator is as easy as sharing a Google doc. Collaborators can access runtimes with GPU accelerators without need for payment. Pay As You Go makes Colab an even more useful product for any ML project you’re looking into.

    Read More

    Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

    Automated Deployment of TensorFlow Models with TensorFlow Serving and GitHub Actions

    Posted by Chansung Park and Sayak Paul (ML-GDEs)


    If you are an applications developer, or if your organization doesn’t have a dedicated ML Engineering team, it is common to deploy a machine learning model without worrying about the end to end machine learning pipeline or MLOps. TFX and TensorFlow Serving can help you create the heart of an MLOps infrastructure. 

    In this post, we will share how we serve a TensorFlow image classification model as RESTful and gRPC based services with TensorFlow Serving on a Kubernetes (k8s) cluster running on Google Kubernetes Engine (GKE) through a set of GitHub Actions workflows. 

    Overview

    In any GitHub project, you can make releases, with up to 2 GB of assets included in each release when using a free account. This is a good place to manage different versions of machine learning models for various reasons. One can also replace this with a more private component for managing model versions such as Google Cloud Storage buckets. For our purposes, the 2 GB space provided by GitHub Releases will be enough.

    Figure 1. Three steps to deploy TF Serving on GKE (original).

    The basic idea is to:

    1. Automatically detect a newly released version of a TensorFlow-based ML model in GitHub Releases
    2. Build a custom TensorFlow Serving Docker image containing the released ML model
    3. Deploy it on a k8s cluster running on GKE through a set of GitHub Actions.

    The entire workflow can be logically divided into three subtasks, so it’s a good idea to write three separate composite GitHub Actions:

    • First subtask handles the environmental setup
      • GCP Authentication (GCP credentials are injected from the GitHub Action Secret)
      • Install gcloud CLI toolkit to access the GKE cluster for the third subtask
      • Authenticate Docker to push images to the Google Cloud Registry (GCR)
      • Connect to a designated GKE cluster for further accesses
    • Second subtask builds a custom TensorFlow Serving image
      • Download and extract your latest released SavedModel from your GitHub repository
      • Run the official or a custom built TensorFlow Serving docker image
      • Copy the extracted SavedModel into the running TensorFlow Serving docker container
      • Commit the changes of the running container and give it a new name with the tags of special token to denote GCR, GCP project ID, and latest
      • Push the committed image to the GCR
    • Third subtask deploys the custom built TensorFlow Serving image to the GKE cluster
      • Download the Kustomize toolkit to handle overlay configurations
      • Pick one of the scenarios from the various experiments
      • Apply Deployment, Service, and ConfigMap according to the selected experiment to the currently connected GKE cluster
        • ConfigMap is used for batching-enabled scenarios to inject batching configurations dynamically into the Deployment.

    There are a number of parameters that you can customize such as the GCP project ID, GKE cluster name, the repository where the ML model will be released, and so on. The full list of parameters can be found here. As noted above, the GCP credentials should be set as a GitHub Action Secret beforehand. If the entire workflow goes without any errors, you will see something similar to the output below.

    NAME         TYPE            CLUSTER-IP      EXTERNAL-IP     PORT(S)                            AGE
    tfs-server   LoadBalancer    xxxxxxxxxx      xxxxxxxxxx       8500:30869/TCP,8501:31469/TCP      23m

    The combinations of the EXTERNAL-IP and the PORT(S) represent endpoints where external users can connect to the TensorFlow Serving pods in the k8s cluster. As you see, two ports are exposed, and 8500 and 8501 are for RESTful and gRPC services respectively. One thing to note is that we used LoadBalancer as the service type, but you may want to consider including Ingress controllers such as GKE Ingress for securing the k8s clusters with SSL/TLS and defining more flexible routing rules in production. You can check out the complete logs from the past runs.

    Build a Custom TensorFlow Serving Image within a GitHub Action

    As described in the overview and the official document, a custom TensorFlow Serving Docker image can be built in five steps. We also provide a notebook for local testing of these steps. In this section, we show how to write a composite GitHub Action for this partial subtask of the whole workflow (note that .inputs, .env, and ${{ }} for the environment variables are omitted for brevity).

    First, a model can be downloaded by an external robinraju/release-downloader GitHub Action with custom information about the URL of the GitHub repository and the filename in the list of assets from the latest release. The default filename is saved_model.tar.gz.

    Second, the downloaded file should be decompressed to fetch the actual SavedModel that TensorFlow Serving can understand.

    runs:
      using: “composite”
      steps:
          – name: Download the latest SavedModel release
            uses: robinraju/release-downloader@v1.3
            with:
              repository: $MODEL_RELEASE_REPO
              fileName: $MODEL_RELEASE_FILE

              latest: true
             
          – name: Extract the SavedModel
            run: |
              mkdir MODEL_NAME
              tar -xvf $MODEL_RELEASE_FILE –strip-components=1 –directory $MODEL_NAME
       
          – name: Run the CPU Optimized TensorFlow Serving container
            run: |
              docker run -d –name serving_base $BASE_IMAGE_TAG
             
          – name: Copy the SavedModel to the running TensorFlow Serving container
            run: |
              docker cp $MODEL_NAME serving_base:/models/$MODEL_NAME
             
          – id: push-to-registry
            name: Commit and push the changed running TensorFlow Serving image
            run: |
              export NEW_IMAGE_NAME=tfserving-$MODEL_NAME:latest
              export NEW_IMAGE_TAG=gcr.io/$GCP_PROJECT_ID/$NEW_IMAGE_NAME
              echo “::set-output name=NEW_IMAGE_TAG::$(echo $NEW_IMAGE_TAG)”
              docker commit –change “ENV MODEL_NAME $MODEL_NAME” serving_base $NEW_IMAGE_TAG
              docker push $NEW_IMAGE_TAG

    Third, we can modify a running TensorFlow Serving Docker container by placing a custom SavedModel inside. In order to do this, we need to run the base TensorFlow Serving container instantiated either from the official image or a custom-built image. We have used the CPU-optimized version as the base image by compiling from source, and it is publicly available here.

    Fourth, the SavedModel should be copied to the /models directory inside the running TensorFlow Serving container. In the last step, we set the MODEL_NAME environment variable to let TensorFlow Serving know which model to expose as services, and commit the two changes that we made to the base image. Finally, the updated TensorFlow Serving Docker image can be pushed into the designated GCR.

    Notes on the TensorFlow Serving Parameters

    We consider three TensorFlow Serving specific parameters in this post: tensorflow_inter_op_parallelism, tensorlfow_inter_op_parallelism, and the batching option. Here, we provide brief overviews of each of them.

    Parallelism threads: tesorflow_intra_op_parallelism controls the number of threads to parallelize the execution of an individual operation. tensorflow_inter_op_parallelism controls the number of threads to parallelize the execution of multiple independent operations. To know more, refer to this resource.

    Batching: As mentioned above, we can allow TensorFlow Serving to batch requests by setting the enable_batching parameter to True. If we do so, we also need to define the batching configurations for TensorFlow in a separate file (passed via the batching_parameters_file argument). Please refer to this resource for more information about the options we can specify in that file.

    Configuring TensorFlow Serving

    Once you have a custom TensorFlow Serving Docker image, you can deploy it with the k8s resource objects: Deployment and ConfigMap as shown below. This section shows how to write ConfigMap to write batching configurations and Deployment to add TensorFlow Serving specific runtime options. We also show you how to mount the ConfigMap to inject batching configurations into TensorFlow Serving’s batching_parameters_file option.

    apiVersion: apps/v1

    kind: Deployment


        spec:
          containers:
          – image: gcr.io/gcp-ml-172005/tfs-resnet-cpu-opt:latest
            name: tfs-k8s
            imagePullPolicy: Always
            args: [“–tensorflow_inter_op_parallelism=2”,
                  “–tensorflow_intra_op_parallelism=8”,
                  “–enable_batching=true”,
                  “–batching_parameters_file=/etc/tfs-config/batching_config.txt”]
            …
            volumeMounts:
              – mountPath: /etc/tfs-config/batching_config.txt
                subPath: batching_config.txt
                name: tfs-config

    The URI of the custom built TensorFlow Serving Docker image can be specified in spec.containers.image, and the behavior of TensorFlow Serving can be customized by providing arguments in the spec.containers.args in the Deployment. This post shows how to configure three kinds of custom behavior: tensorflow_inter_op_parallelism, tensorflow_intra_op_parallelism, and enable_batching.

    apiVersion: v1

    kind: ConfigMap
    metadata:
      name: tfs-config
    data:
      batching_config.txt: |
        max_batch_size { value: 128 }
        batch_timeout_micros { value: 0 }
        max_enqueued_batches { value: 2 }
        num_batch_threads { value: 2 }

    When enable_batching is set to true, we can further customize the batch inference by defining its specific batching-related configurations in a ConfigMap. Then, the ConfigMap can be mounted as a file with spec.containers.volumeMounts, and we can specify which file to look up for the batching_parameters_file argument in Deployment.

    Kustomize to Manage Various Experiments

    As you see, there are lots of parameters to determine the behavior of TensorFlow Serving, and the optimal values for them are usually found by running experiments. Indeed, we have experimented with various parameters within a number of different environmental setups: different numbers of nodes, different numbers of vCPU cores, and different RAM capacity.

    ├── base

    |   ├──kustomization.yaml

    |   ├──deployment.yaml

    |   └──service.yaml
    └── experiments
        ├── 2vCPU+4GB+inter_op2

        …

        ├── 4vCPU+8GB+inter_op2
        …

        ├── 8vCPU+64GB+inter_op2_w_batch

        |   ├──kustomization.yaml

        |   ├──deployment.yaml

        |   └──tfs-config.yaml
        …

    We used kustomize to manage the YAML files of various experiments. We keep common YAML files of Deployment and Service in the base directory while having specific YAML files for certain experimental environments and configurations under the experiments directory. With this and kustomize, the contents of the base YAML files could be easily overlaid with different numbers of replicas, different values of tensorflow_inter_op_parallelism, tensorflow_intra_op_parallelism, enable_batching, and batch configurations.

    runs:
      using: “composite”
      steps:
        – name: Setup Kustomize
          …

        – name: Deploy to GKE
          working-directory: .kube/
          run: |-
            ./kustomize build experiments/$TARGET_EXPERIMENT | kubectl apply -f –

    You can simply select the experiment that you want to test or that you think is optimal by setting $TARGET_EXPERIMENT. For example, the best experiment that we found was “8vCPU+16GB+inter_op4” which means each VM is configured with an 8vCPU and 16GB RAM while tensorflow_inter_op_parallelism is set to 4. Then the kustomize build command will provision the YAML files for the selected experiment for the k8s clusters.

    Costs

    We used the GCP cost estimator for this purpose. Pricing for each experiment configuration was assumed to be live for 24 hours per month (which was sufficient for our experiments).


    Machine Configuration (E2 series) Pricing (USD)

    2vCPUs, 4GB RAM, 8 Nodes

    11.15
    4vCPUs, 8GB RAM, 4 Nodes

    11.15
    8vCPUs, 16GB RAM, 2 Nodes

    11.15

    8vCPUs, 64GB RAM, 2 Nodes

    18.21

    Conclusion

    In this post, we discussed how to automatically deploy and experiment with an already trained model with various configurations. We leveraged TensorFlow Serving, Kubernetes, and GitHub Actions to streamline the deployment and experiments. We hope that you found this setup useful and reliable and that you will use this in your own model deployment projects.


    Acknowledgements

    We are grateful to the ML Developer Programs team that provided GCP credits for supporting our experiments. We also thank Hannes Hapke and Robert Crowe for providing us with helpful feedback and guidance.

    Read More

    Bridging communities: TensorFlow Federated (TFF) and OpenMined

    Bridging communities: TensorFlow Federated (TFF) and OpenMined

    Posted by Krzys Ostrowski (Research Scientist), Alex Ingerman (Product Manager), and Hardik Vala (Software Engineer)

    Since the announcement of TensorFlow Federated (TFF) on this blog 3.5 years ago, a number of organizations have developed frameworks for Federated Learning (FL). While growing attention to privacy and investments in FL are a welcome trend, one challenge that arises is fragmentation of community and industry efforts, which leads to code duplication and reinvention. One way we can address this as a community is by investing in interoperability mechanisms that could enable our platforms and developers to work together and leverage each other’s strengths.

    In this context, we’re excited to announce the collaboration between TFF and OpenMined – an OSS community dedicated to development of privacy-preserving technologies. OpenMined’s PySyft framework has attracted a vibrant community of hundreds of OSS contributors, and includes tools and APIs to facilitate containerized deployment and integrations with diverse data sources that complement the capabilities we offer in TFF.

    OpenMined is joining Special Interest Group (SIG) Federated (see the charter, forum, meeting notes, and the Discord server) we’ve recently established to enable developers of TFF, together with a growing set of OSS and industry partners, to openly engage in conversations about how to jointly evolve the TFF ecosystem and grow the adoption of FL.

    Introducing PySyTFF

    To kick off the collaboration, we – the developers of TFF and OpenMined’s PySyft – decided to focus our initial efforts on building together a new platform, with an endearing name PySyTFF, that combines elements of TFF and PySyft to support what we believe will be an increasingly common scenario, illustrated below.

    In this scenario, an owner of a sensitive dataset would like to invite researchers to experiment with training and evaluating ML models on their dataset to advance the current understanding of what model architectures, parameters, etc., work best, while protecting the data and adhering to policies that may govern its use. In practice, such scenarios often end up involving negotiating data usage contracts. On the one hand, these can be tedious to set up, and on the other hand, they largely rely on goodwill.

    What we’d like instead is, to have a platform that can offer structural safeguards in place that limit the disclosure of sensitive information and ensure policy compliance by construction – this is our goal for PySyTFF.

    As an aside, note that even though this blog post is about FL, we aren’t necessarily talking here about scenarios where data is physically siloed across physical locations – the data can also be hosted in a datacenter and logically siloed. More on this below.

    Developer experience

    The initial proof-of-concept implementation of PySyTFF offers an early glimpse of what the developer experience for the data scientist will look like. Note how we combine the advantages of both frameworks – e.g., TFF’s ability to define models in Keras, and PySyft’s access control mechanism and APIs for data access:


    domain = sy.login(email=“sam@stargate.net”, password=“changethis”, port=8081)


    model_fn = lambda: tf.keras.models.Sequential(…)


    params = {

        ’rounds’: 10,

        ‘no_clients’: 3,

        ‘noise_multiplier’: 0.05,

        ‘clients_per_round’: 2,

        ‘train_data_id’: domain.dataset[0][‘images’].id_at_location.to_string(),

        ‘label_data_id’: domain.datasets[0][‘labels’].id_at_location.to_string()

    }


    model, metrics = sy.tff.train_model(model_fn, params, domain, timeout=5000)

    Here, the data scientist is logging into a PySyft’s domain node – an infrastructure component provisioned by or on behalf of the data provider – and gains a limited, access control-guarded ability to enumerate the available resources and perform actions on them. This includes obtaining references to datasets managed by the node and their metadata (but not content) and issuing the train_model calls, wherein the data scientist can supply a Keras model they wish to train, and the various parameters that control the training process and affect the privacy guarantees of the computed result, such as the number of rounds or the amount of noise added in order to make the results of the model training more private. In return, the researcher may get computed outputs such as a set of evaluation metrics, or the trained model parameters.

    Exactly what ranges of parameters supplied by the researcher are accepted by the platform, and what results the researcher can get will, in general, depend on the policies defined by the data owner that might, e.g., mandate the use of privacy-preserving algorithms and constrain the allowed privacy budget – and these may constrain parameters such as the number of training rounds, clients per round, or the noise multiplier. Whereas at the current stage of development, PySyTFF does not yet offer policy engine integration, this is an important part of the future development plans.

    Under the hood

    The domain node is a docker-based environment that bundles together a web-based frontend that you can securely log into, with a mechanism for authenticating and authorizing users, and a set of internal services that includes database connectivity, as illustrated below.

    The train_model call in the code snippet above, perhaps embedded in the data scientist’s Python colab notebook, is implemented as a network request, carrying a serialized representation of the TensorFlow code of the model to train, along with the training parameters, and the references to the PySyft datasets to use for training and evaluation.

    Inside the domain node, the call is relayed to a PySyTFF service, a new component introduced to the PySyft ecosystem to orchestrate the training process. This involves interacting with PySyft’s data backend to obtain handles to shards of user data, calling TFF APIs to construct TFF computations to run, and passing the constructed TFF computations and data handles to an embedded instance of TFF runtime that loads the data using the supplied handles and runs the FL algorithms.

    FL on logically-siloed data

    At this point, some of you may be wondering how exactly FL fits into the picture. After all, FL is mostly known as a technology that supports computations on data that’s distributed across a set of devices, or (in what’s called a cross-silo flavor of FL) a set of data centers owned by a group of institutions, yet here, we’re talking about a scenario where the data is already in the customer’s PySyft database.

    To explain this, let’s pop up a level and consider the high level objective – to enable researchers to perform ML computations on sensitive data with platform-level, structural and formal privacy guarantees. In order to do so, the platform should ideally uphold formal privacy principles, such as data minimization (a guarantee on how the computation is executed and how sensitive data is handled), and anonymous aggregation (a guarantee on what is being computed and released).

    Federated Learning is a great fit in this context because it structurally embodies these principles, and provides a framework for implementing algorithms that provably achieve user-level Differential Privacy (DP) – the current gold standard. The FL algorithms that enable us to achieve these guarantees can be used to process data in datacenter deployments, even in scenarios where – as is the case here with the PySyft database – all of that data resides in a single administrative domain.

    To see this, just imagine that for each user in the database, we draw a virtual boundary around all their data, and think of it as a kind of virtual silo. We can treat such virtual silos of user data in the same way as how we treat “client” devices in a more traditional FL setting, and orchestrate FL algorithms to run across virtual silos as clients.

    Thus, for example, when training an ML model, we’d repeatedly pick sets of users from the database, locally and independently train local model updates on their data – separately for each user, add clipping to each local update and noise for privacy, aggregate these local updates across users to produce an updated global model, and repeat this process for thousands of rounds until the ML model converges, as shown below.

    Whereas the data may be only logically partitioned, following this approach enables us to achieve the very same types of formal guarantees, including provable user-level differential privacy, as those cited above – and indeed, TFF enables us to leverage the same FL algorithm implementation – literally the same TFF code – as that which powers Google’s mobile/IoT production deployments.

    Collaborate with us!

    As noted earlier, the initial version of PySyTFF is still missing a number of components – and this, dear reader, is where you come in. If the vision laid out above excites you, we – the TFF and PySyft teams – would love to work with you to evolve this platform together. In addition to policy engine integration, we plan to augment PySyTFF with the ability to spawn distributed instances of the TFF runtime on cloud or compute clusters to power very compute-intensive workloads, a system of charging for the use of resources, and to extend the scope of PySyTFF to include classical types of cross-silo FL deployments, to name just a few.

    There are a great many ways to go about this – from joining the TFF and PySyft’s collaborative efforts and directly helping us build and deploy this platform, to helping design and build generic components and APIs that can enable TFF and PySyft/PyGrid to interoperate.

    Ready to get started? You can visit the SIG Federated forum and join the Discord server, or you can reach out directly – see the contact info in the SIG charter, and the engagement channels created by the OpenMined’s PySyft team. We’re looking forward to hearing from you!

    Acknowledgments

    On behalf of the TFF team at Google, we’d like to thank our OpenMined partners Andrew Trask, Tudor Cebere, and Teo Milea for the productive collaboration leading up to this announcement.

    Read More

    Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

    Optimizing TF, XLA and JAX for LLM Training on NVIDIA GPUs

    Posted by Douglas Yarrington (Google TPgM), James Rubin (Google PM), Neal Vaidya (NVIDIA TME), Jay Rodge (NVIDIA PMM)

    Together, NVIDIA and Google are delighted to announce new milestones and plans to optimize TensorFlow and JAX for the Ampere and recently announced Hopper GPU architectures by leveraging the power of XLA: a performant, flexible and extensible ML compiler built by Google. We will deepen our ongoing collaboration with dedicated engineering teams focused on delivering improved performance in currently available A100 GPUs. NVIDIA and Google will also jointly support unique features in the recently announced H100 GPU, including the Transformer Engine with support for hardware-accelerated 8-bit floating-point (FP8) data types and the transformer library.

    We are announcing improved performance in TensorFlow, new NVIDIA GPU-specific features in XLA and the first release of JAX for multi-node, multi-GPU training, which will significantly improve large language model (LLM) training. We expect the Hopper architecture to be especially popular for LLMs.

    NVIDIA H100 Tensor Core GPU

    XLA for GPU

    Google delivers high performance with LLMs on NVIDIA GPUs because of a notable technology, XLA, which supports all leading ML frameworks, such as TensorFlow, JAX, and PyTorch. Over 90% of Google’s ML compilations – across research and production, happen on XLA. These span the gamut of ML use cases, from ultra-large scale model training at DeepMind and Google Research, to optimized deployments across our products, to edge inferencing at Waymo.

    XLA’s deep feature set accelerates large language model performance and is solving most large model challenges seen in the industry today. For example, a feature unique to XLA, SPMD, automates most of the work needed to partition models across multiple cores and devices, making large model training significantly more scalable and performant. XLA can also automatically recognize and select the most optimal hand-written library implementation for your target backend, like cuDNN for CUDA chipsets. Otherwise, XLA can natively generate optimized code for performant execution.

    We’ve been collaborating with NVIDIA on several exciting features and integrations that will further optimize LLMs for GPUs. We recently enabled collectives such as all-reduce to run in parallel to compute. This has resulted in a significant reduction in end to end latency for customers. Furthermore, we enabled support for bfloat16, which has resulted in compute gains of 4.5x over 32 bit floating point while retaining the same dynamic range of values.

    Our joint efforts mean that XLA integrates even more deeply with NVIDIA’s AI tools and can better leverage NVIDIA’s suite of AI hardware optimized libraries. In Q1 2023, we will release a XLA-cuDNN Graph API integration, which provides customers with optimized fusion of convolution/matmul operations and multi-headed attention in transformers for improved use of memory and faster GPU kernel execution. As a result, overheads drop significantly and performance improves notably.

    TensorFlow for GPU

    TensorFlow recently released distributed tensors (or DTensors) to enable Tensor storage across devices like NVIDIA GPUs while allowing programs to manipulate them seamlessly. The goal of DTensor is to make parallelizing large-scale TensorFlow models across multiple devices easy, understandable, and fast. DTensors are a drop-in replacement for local TensorFlow tensors and scale well to large clusters. In addition, the DTensor project improves the underlying TensorFlow execution and communication primitives, and they are available for use today!

    We are also collaborating with NVIDIA on several exciting new features in TensorFlow that leverage GPUs, including supporting the new FP8 datatype which should yield a significant improvement in training times for transformer models, when using the Hopper H100 GPU.

    JAX for GPU

    Google seeks to empower every developer with purpose-built tools for every step of the ML workflow. That includes TensorFlow for robust, production-ready models and JAX with highly optimized capabilities for cutting-edge research. We are pleased to announce the unique collaboration between NVIDIA and Google engineering teams to enhance TensorFlow and JAX for large deep-learning models, like LLMs. Both frameworks fully embrace NVIDIA A100 GPUs, and will support the recently-announced H100 GPUs in the future.

    One of the key advantages of JAX is the ease of achieving superior hardware utilization with industry-leading FLOPs across the accelerators. Through our collaboration with NVIDIA, we are translating these advantages to GPU using some XLA compiler magic. Specifically, we are leveraging XLA for operator fusion, improving GSPMD for GPU to support generalized data and model parallelism and optimizing for cross-host NVLink.

    Future Plans

    NVIDIA and Google are pleased with all the progress shared in this post, and are excited to hear from community members about their experience using TensorFlow and JAX, by leveraging the power of XLA for Ampere (A100) and Hopper (H100) GPUs.

    Check out the release notes for more information. To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow. If you’ve built something you’d like to share, please submit it for our Community Spotlight at goo.gle/TFCS. For feedback, please file an issue on GitHub or post to the TensorFlow Forum.

    TensorFlow is also available in the NVIDIA GPU Cloud (NGC) as a docker container that contains a validated set of libraries that enable and optimize GPU performance, with JAX NGC container coming soon later this year.

    Thank you!

    Contributors: Frederic Bastien (NVIDIA), Abhishek Ratna (Google), Sean Lee (NVIDIA), Nathan Luehr (NVIDIA), Ayan Moitra (NVIDIA), Yash Katariya (Google), Peter Hawkins (Google), Skye Wanderman-Milne (Google), David Majnemer (Google), Stephan Herhut (Google), George Karpanov (Google), Mahmoud Soliman (NVIDIA), Yuan Lin (NVIDIA), Vartika Singh (NVIDIA), Vinod Grover (NVIDIA), Pooya Jannaty (NVIDIA), Paresh Kharya (NVIDIA), Santosh Bhavani (NVIDIA)

    Read More

    September Machine Learning Updates

    September Machine Learning Updates

    Posted by the TensorFlow team

    On September 14, at the Google Developers Summit in Shanghai, China, members of Google’s open-source ML teams will be on stage to talk about updates to our growing ecosystem, and we’d love to share them here with you.

    MediaPipe Studio

    We recognize that creating and productionizing custom on-device ML solutions can be challenging, so we’re reinventing how you develop them by leveraging simple-to-use abstraction APIs and no-code GUIs. We’re excited to give you a sneak peek at MediaPipe Studio, our low-code and no-code solution that gets you from data to modeling to deployment on Android or iOS with native code integration libraries that make it easy to build ML-powered apps.

    General Availability of TensorFlow Lite in Google Play Services

    We recently launched the general availability of TensorFlow Lite in Google Play services. With this, the TensorFlow Lite runtime is automatically managed and updated by Google Play services, meaning you no longer need to ship it as part of your application. Your apps get smaller, and you get regular updates in the background, so your users will always have the latest version. This is nice for you as an app developer, because your user will get updates and bug fixes to the framework automatically, reducing the burden on you to provide them. And TensorFlow Lite in Google Play Services is production ready, already running over 100 billion daily inferences.

    Tensor Projects

    At Google, we are creating a world-class family of ML tools across all hardware and device types. Because we are committed to building tools that are fit for purpose, from cutting-edge research to tried-and-true planet-scale deployments, we are sharing our vision of an open ML ecosystem of the future: Tensor Projects.

    Tensor Projects is an ecosystem of ML technologies and platforms that bring together Google’s ML tools, and organize efforts across our world-class engineering and research teams. It creates a space and a promise of continued innovation and support to enable researchers, developers, MLOps, and business teams to build responsible and cutting edge ML, from novel model development to scaled production ML in any data center or on any device.

    These tools, like TensorFlow, Keras, JAX, and MediaPipe Studio, will work well independently, with each other, and/or with other industry-leading tools and standards. We want to give you full flexibility and choice to build powerful, performant infrastructure for all of your ML use cases. And it’s just the beginning. Tensor Projects will evolve and grow as ML continues to advance. Watch the summary video here:

       

    Updates to Tensorflow.org

    We have an updated experience on tensorflow.org for new or advanced users to easily find resources. You can quickly identify the right TensorFlow tool for your task, explore pre-built artifacts for faster model creation, find ideas and inspiration, get involved in the community, discover quick start guides for common scenarios and much more.

    PyTorch Foundation

    We believe in the power of choice for ML developers and continue to invest resources to make it easy to train, deploy and manage models. Our investment intends to bring machine learning to every developer’s toolbox and covers a broad spectrum of offerings: from TensorFlow and Keras, which provide free and open source offerings to millions of developers, allowing them to succeed with ML, and to JAX, which empowers researchers across Alphabet.

    Additionally, in the spirit of openness, we support PyTorch developers with Cloud TPU using XLA. To continue to help all developers succeed with Google Cloud, and to better position Google to make meaningful contributions to the community, we’re delighted to announce our role as a founding member of the newly formed PyTorch Foundation. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive the adoption of AI and ML through open source platforms.

    Thank you for reading! To stay up to date, you can read the TensorFlow blog, follow twitter.com/tensorflow, or subscribe to youtube.com/tensorflow.

    Read More