Counterfactual Logit Pairing

Counterfactual Logit Pairing

Posted by Bhaktipriya Radharapu, Software Engineer

TensorFlow Model Remediation is an open source toolkit that showcases solutions to help mitigate unfair bias in Machine Learning models. The toolkit offers resources to build fairer models for everyone – in line with Google’s AI Principles. Today, we’re excited to announce a new technique within the TensorFlow Model Remediation Library called Counterfactual Logit Pairing (CLP) to address unintended bias in ML models.

ML models are prone to making incorrect predictions when a sensitive attribute in an input is removed or replaced, leading to unintended bias. For instance, the Perspective API, used to identify offensive or toxic text in comments, revealed a positive correlation between identity terms referencing race or sexual orientation and the predicted toxicity score. For instance, the phrase “I am a lesbian” received a toxicity score of 0.51, while “I am a man” received a lower toxicity score of 0.2. This correlation resulted in higher toxicity scores for some identity terms, even when used non-pejoratively. For more information on the Perspective API, see the blog post on unintended bias and identity terms.

Counterfactual Logit Pairing (CLP) is a technique that addresses such issues to ensure that a model’s prediction doesn’t change when a sensitive attribute referenced in an example is either removed or replaced. It improves a model’s robustness to such perturbations, and can positively influence a model’s stability, fairness, and safety.

CLP mitigates such counterfactual fairness issues at training time. It does so by adding an additional loss to the model’s training loss, which penalizes the difference in the model’s outputs between training examples and their counterfactuals.

Another advantage of using CLP is that you can use this even on unlabelled data. As long as the model treats the counterfactual examples similarly you can validate that your model is adhering to counterfactual fairness.

For an in-depth discussion on this topic, see research on counterfactual fairness, adversarial logit pairing, and counterfactual logit pairing.

Counterfactual Logit Pairing Walkthrough:

The CLP with Keras codelab provides an end-to-end example. In this overview, we’ll emphasize key points from the notebook, while providing additional context.

The notebook trains a text classifier to identify toxic content. This type of model attempts to identify content that is rude, disrespectful or otherwise likely to make someone leave a discussion, and assigns the content a toxicity score. For this task, our baseline model will be a simple Keras sequential model pre-trained on the Civil Comments dataset.

We will use CLP to avoid having identity terms unfairly skew what is classified as offensive. We consider a narrow class of counterfactuals that involves removing gender and sexual orientation related identity tokens in the input, such as removing “gay” in the input “I’m a gay person” to create the counterfactual example “I’m a person.”

The high-level steps will be to:

  1. Calculate flip rate and flip count of the classifier on original and counterfactual examples.
  2. Build a counterfactual dataset using CounterfactualPackedInputs by performing a naive ablation based on term matching.
  3. Improve performance on flip rate and flip count by training with CLP.
  4. Evaluate the new model’s performance on flip rate and flip count.

Be aware that this is a minimal workflow to demonstrate usage of the CLP technique, and not a complete approach to fairness in machine learning. CLP addresses one specific challenge that may impact fairness in machine learning. See the Responsible AI toolkit for additional information on responsible AI and tools that can be used to complement CLP.

In a production setting, you would want to approach each of these steps with more rigor. For example:

  • Consider the fairness goals of your model. What qualifies as “fair” for your model? Which definitions of fairness are you trying to achieve?
  • Consider when counterfactual pairs should have the same prediction. Many syntactic counterfactuals generated by token substitution may not require identical output. Consider the application space and the potential societal impact of your model and understand when the outputs should be the same and when they shouldn’t be.
  • Consider using semantically and grammatically grounded counterfactuals instead of heuristic based ablations.
  • Experiment with the configuration of CLP by tuning hyperparameters to get optimal performance.

Let’s begin by examining the flip count and flip rate of the original model on the counterfactual examples. The flip count measures the number of times the classifier gives a different decision if the identity term in a given example is changed. The flip rate measures the total number of times that the classifier incorrectly provides an incorrect decision over the total count.

Let’s use the “Fairness Indicators widget” in the notebook to measure the flip rate and counts. Select flip_rate/overall in the widget. Notice that the overall flip rate for females is about 13% and male is about 14%, which are both higher than the overall dataset of 8%. This means that the model is likely to change the classification based on the presence of gender related terms.

We’ll now use CLP to try to reduce the model’s flip rate and flip count for gender-related terms in our dataset. We start by creating an instance of CounterfactualPackedInputs, which packs the original_input and counterfactual_data.

CounterfactualPackedInputs(
original_input=(x, y, sample_weight),
counterfactual_data:(original_x, counterfactual_x,
counterfactual_sample_weight)
)

We next remove instances of gender specific terms using the helper function, build_counterfactual_data. Note that we only include non-pejorative terms, as pejorative terms should have a different toxicity score. Requiring equal predictions across examples with pejorative terms would both weaken the model’s ability to perform its task and potentially increase harm to vulnerable groups.

 

sensitive_terms_to_remove = [
'aunt', 'boy', 'brother', 'dad', 'daughter', 'father', 'female', 'gay',
'girl', 'grandma', 'grandpa', 'grandson', 'grannie', 'granny', 'he',
'heir', 'her', 'him', 'his', 'hubbies', 'hubby', 'husband', 'king',
'knight', 'lad', 'ladies', 'lady', 'lesbian', 'lord', 'man', 'male',
'mom', 'mother', 'mum', 'nephew', 'niece', 'prince', 'princess',
'queen', 'queens', 'she', 'sister', 'son', 'uncle', 'waiter',
'waitress', 'wife', 'wives', 'woman', 'women'
]

# Convert the Pandas DataFrame to a TF Dataset
dataset_train_main = tf.data.Dataset.from_tensor_slices(
(data_train[TEXT_FEATURE].values, labels_train)).batch(BATCH_SIZE)

counterfactual_data = counterfactual.keras.utils.build_counterfactual_dataset(
original_dataset=dataset_train_main,
sensitive_terms_to_remove=sensitive_terms_to_remove)

counterfactual_packed_input = counterfactual.keras.utils.pack_counterfactual_data(
dataset_train_main,
counterfactual_data)

To train with a Counterfactual model, simply take the original model and wrap it in a CounterfactualModel with a corresponding loss and loss_weight. This will co-train the model on the main classification task and on the debiasing task using the CLP loss.

We are using 1.0 as the default loss_weight, but this is a parameter that can be tuned for your use case, since it depends on your model and product requirements. You should experiment with changing the value to see how it impacts the model, noting that increasing it would cause the model to penalize the counterfactual examples more heavily. You can test a range of values to explore the trade off between the task performance and the flip rate.

Here, we use the Pairwise Mean Squared Error Loss. You can try experimenting with other metrics in the suite to know which options offer the best results.

counterfactual_weight = 1.0

counterfactual_model = counterfactual.keras.CounterfactualModel(
baseline_model,
loss=counterfactual.losses.PairwiseMSELoss(),
loss_weight=counterfactual_weight)

# Compile the model normally after wrapping the original model.
# Note that this means we use the baseline's model's loss here.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
loss = tf.keras.losses.BinaryCrossentropy()
counterfactual_model.compile(optimizer=optimizer, loss=loss,
metrics=['accuracy'])

counterfactual_model.fit(counterfactual_packed_input,
epochs=1)

Once again, we evaluate the results by looking at the flip count and flip rate. Select “flip_rate/overall” within Fairness Indicators and compare the results for female and male between the two models. You should notice that the flip rate for overall, female, and male have all decreased by about 90%, which leaves the final flip rate for female at approximately 1.3% and male at approximately 1.4%.

You can get started with Counterfactual by visiting TensorFlow Responsible AI and learn more about evaluation fairness with Fairness Indicators.

Acknowledgements

The Counterfactual framework was developed in collaboration with
  • Amy Wang, Ben Packer, Bhaktipriya Radharapu, Christina Greer, Nick Blumm, Parker Barnes, Piyush Kumar, Sean O’Keefe, Shivam Jindal, Shivani Poddar, Summer Misherghi, Thomas Greenspan.
This research effort was jointly led by
  • Alex Beutel, Jilin Chen, Tulsee Doshi in collaboration with Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi.
Further, this work was pursued in collaboration with
  • Andrew Smart, Francois Chollet, Molly FitzMorris, Tomer Kaftan, Mark Daoust, Daniel ‘Wolff’ Dobson, Soo Sung.

Read More

Celebrating Google Summer of Code Responsible AI Projects

Celebrating Google Summer of Code Responsible AI Projects

Posted by Bhaktipriya Radharapu, Software Engineer, Google Research

One of the key goals of Responsible AI is to develop software ethically and in a way that is responsive to the needs of society and takes into account the diverse viewpoints of users. Open source software helps address this by providing a way for a wide range of stakeholders to contribute.

To continue making Responsible AI development more inclusive and transparent, and in line with our AI Principles, Google’s Responsible AI team partnered with Google Summer of Code (GSoC), to provide students and professionals with the opportunity to contribute to open source projects that promote Responsible AI resources and practices. GSoC is a global, online program focused on bringing new contributors into open source software development. GSoC contributors work with an open source organization on a 12+ week programming project under the guidance of mentors. By bringing in new contributors and ideas, we saw that GSoC helped to foster a more innovative and creative environment for Responsible AI development.

This was also the first time several of Google’s Responsible AI tools, such as The Learning Interpretability Tool (LIT), TensorFlow Model Remediation and Data Cards Playbook, pulled in contributions from third-party developers across the globe, bringing in diverse and new developers to join us in our journey for building Responsible AI for all.

We’re happy to share the work completed by GSoC participants and share what they learned about working with state-of-the-art fairness and interpretability techniques, what we learned as mentors, and how rewarding summer of code was for each of us, and for the Responsible AI community.

We had the opportunity to mentor four developers – Aryan Chaurasia, Taylor Lee, Anjishnu Mukherjee, Chris Schmitz. Aryan successfully implemented XAI tutorials for LIT under the mentorship of Ryan Mullins, software engineer at Google. These showcase how LIT can be used to evaluate the performance of (multi-lingual) question-answering models, and understand behavioral patterns in text-to-image generation models.

Anjishnu implemented Tutorials for LIT also under the mentorship of Ryan Mullins. Anjishnu’s work influenced in-review research assessing professionals’ interpretability practices in production settings.

Chris, under the technical guidance of Jenny Hamer, a software engineer at Google, created two tutorials for TensorFlow Model Remediations’ experimental technique, Fair Data Reweighting. The tutorials help developers apply a fairness-enforcing data reweighting algorithm, a pre-processing bias remediation technique that is model architecture agnostic.

Finally, Taylor, under the guidance of Mahima Pushkarna, a senior UX designer at Google Research, and Andrew Zaldivar, a Responsible AI Developer Advocate at Google, designed the information architecture and user experience for activities from the Data Cards Playbook. This project translated a manual calculator that helps groups assess the reader-centricity of their Data Card templates into virtual experiences to foster rich discussion.

The participants learned a lot about working with state-of-the-art fairness and interpretability techniques. They also learned about the challenges of developing Responsible AI systems, and about the importance of considering the social implications of their work. What is also unique about GSOC is that this wasn’t just code and development – mentees were exposed to the code-adjacent work such as design and technical writing skills that are essential for the success of software projects and critical for cutting-edge Responsible AI projects; giving them a 360º view into the lifecycle of Responsible AI projects.

The program was open to participants from all over the world, and saw participation from 14 countries. We set-up several community channels for participants and professionals to discuss Responsible AI topics and Google’s Responsible AI tools and offerings which organically grew to 300+ members. The community engaged in various hands-on starter projects for GSoC in the areas of fairness, interpretibility and transparency, and were guided by a team of 8 Google Research mentors and organizers.

We were able to underscore the importance of community and collaboration in open source software development, especially in a field like Responsible AI, which thrives on transparent, inclusive development. Overall, the Google Summer of Code program has been a valuable tool for democratizing the responsible development of AI technologies. By providing a platform for mentorship, and innovation, GSoC has helped us improve the quality of open source software and to guide developers with tools and techniques to build AI in a safe and responsible way.

We’d like to say a heartfelt thank you to all the participants, mentors, and organizers who made Summer of Code a success. We’re excited to see how our developer community continues to work on the future of Responsible AI, together.

We encourage you to check out Google’s Responsible AI toolkit and share what you have built with us by tagging #TFResponsibleAI on your social media posts, or share your work for the community spotlight program.

If you’re interested in participating in the Summer of Code with TensorFlow in 2023, you can find more information about our organization and suggested projects here.

Acknowledgements:

Mentors and Organizers:

Andrew Zaldivar, Mahima Pushkarna, Ryan Mullins, Jenny Hamer, Pranjal Awasthi, Tesh Goyal, Parker Barnes,Bhaktipriya Radharapu

Sponsors and champions:

Special thanks to Shivani Poddar, Amy Wang, Piyush Kumar, Donald Gonzalez, Nikhil Thorat, Daniel Smilkov, James Wexler, Stephanie Taylor, Thea Lamkin, Philip Nelson, Christina Greer, Kathy Meier-Hellstern and Marian Croak for enabling this work.

Read More

How Adobe used Web ML with TensorFlow.js to enhance Photoshop for web

How Adobe used Web ML with TensorFlow.js to enhance Photoshop for web

Guest post by Joseph Hsieh (Principal Scientist, Project Lead at Adobe), Devin Fernandez (Director of Product Management, Adobe), and Jason Mayes (Web ML Lead, Google)

Introduction

Photoshop Web is a browser-based version of the popular desktop image editing software, Adobe Photoshop. This online tool offers a wide range of features and capabilities for editing, enhancing, and manipulating images, all through a web browser.

In this post, we will explore how Adobe plans to bring advanced ML features from desktop to web, such as the Object Selection tool. We will also look at how web-based machine learning in JavaScript can improve the performance and user experience of Photoshop Web, and what we can expect in the future.

Challenge

Photoshop has recently been made available on the web through WebAssembly in our first attempt to port our tooling to the browser. However, to bring advanced ML features such as the Object Selection Tool to Photoshop Web, it currently adopts a cloud inference solution for object selection tasks which requires the user to be online, and to send data to the cloud service to perform the machine learning task. This means the web app cannot run offline, user privacy is not preserved, and there is an added latency and monetary cost to each call to the cloud as we need to run those models on our own hardware.

Moving image of screenshot illustrating responsive UI in Object Selection in Adobe Photoshop

When it comes to the Object Selection tool, relying on cloud inference can sometimes result in suboptimal performance due to network latency. To provide a better user experience, Adobe Photoshop Web eliminates this latency by developing an on-device inference solution, resulting in faster predictions and a more responsive UI.

TensorFlow.js is an open-source machine learning library from Google aimed at JavaScript developers that’s able to run client side in the browser. It’s the most mature option for web ML with comprehensive WebGL and WebAssembly backend operators support, and in the future, there will also be an option for a WebGPU backend to be used within the browser for faster performance as new web standards evolve. Adobe has collaborated with Google to bring TensorFlow.js to Photoshop Web and enable advanced tasks such as object selection using ML running in the browser, the details of the collaboration are explained below.

When we first started to convert to a web solution, we noticed that there were synchronization issues between WebAssembly (what our core ported Photoshop code was running in) and TensorFlow.js (for running the ML models in the browser). Essentially we needed to load and run the TensorFlow.js models synchronously instead of asynchronously to work with our WebAssembly port of Photoshop. One potential 3rd party solution was not an option due to its drawbacks – such as large code overhead size or unpredictable performance across devices. So, a new solution was required.

To tackle these challenges, first Google and Adobe collaborated to bring a proxying API to Emscripten – a 3rd party compiler toolchain that can compile to WebAssembly that uses LLVM to enable code written in C or C++ to run in browser and interact with JavaScript libraries. A Proxying API for Emscripten effectively resolves these issues that the 3rd party solution suffered and allows for seamless integration between Photoshop’s Web Assembly implementation and the TensorFlow.js ML model running.

Next, once communication between WebAssembly and TensorFlow.js was possible, Adobe ported key ML models such as the one used in object selection shown above to the TensorFlow.js format. The TensorFlow.js team aided in model optimization for such models by focusing on optimizing common ops models utilized such as the Conv2D operation to ensure the converted models ran as fast as possible in the browser.

With both cloud and on-device solutions now a possibility, Photoshop Web can choose the optimal option for delivering the best user experience and deploy ML models accordingly. While on-device inference offers superior user interaction with low latency and privacy for frequently used tasks, not all ML models can run locally due to the limited memory per browser tab (currently around 4GB in Chrome). On the other hand, cloud inference can accommodate larger ML models for tasks where network latency may be acceptable, with the tradeoffs of less perceived privacy by the end user and the associated cost to host and execute such models on server side hardware.

Performance Improvement

Since the Google team has improved TensorFlow.js hardware execution performance via its various supported backends (WebGL, WASM, Web GPU), it has resulted in models seeing anywhere from 30% to 200% performance improvements (especially for the larger models that tend to see the biggest gains), enabling close to real time performance right in the browser.

Looking Ahead

Photoshop Web’s Select Subject and Object Selection tools demonstrate how machine learning can help enhance user workflow and experience. As web-based machine learning technology continues to evolve and TensorFlow.js backend support and efficiency continue to make performance gains, Photoshop Web will be able to bring more advanced models to the edge on device in the browser, pushing the limits of what is possible and enabling even more advanced features to delight users.

Try it out

Try out Photoshop Web right now for yourself at https://photoshop.adobe.com and see the power of machine learning in the browser that brings the best of Web ML (coming soon) and Cloud ML inference in action!

Adobe offerings and trademarks belong to Adobe Inc and are not associated with Google.

Read More

Enabling Optimal Inference Performance on AMD EPYC™ Processors with the ZenDNN Library

Enabling Optimal Inference Performance on AMD EPYC™ Processors with the ZenDNN Library

Posted by Sarina Sit, AMD

AMD launched the 4th Generation of AMD EPYC™ processors in November of 2022. 4th Gen AMD EPYC processors include numerous hardware improvements over the prior generation, such as AVX-512 and VNNI instruction set extensions, that are well-suited for improving inference performance. However, hardware is only one piece of the puzzle; software is a crucial component for effectively taking advantage of the underlying hardware.

We are happy to announce the new availability of the TensorFlow-ZenDNN plug-in for TensorFlow v2.12 and above, which represents the ongoing and focused effort by AMD to improve the accessibility of ZenDNN optimizations for the community via framework upstreaming. This plug-in enables neural network inferencing on AMD EPYC CPUs with the AMD ZenDNN library. 

ZenDNN 

ZenDNN, which is available open-source from GitHub, is a low-level AMD deep neural network library that includes basic neural network building blocks optimized for AMD EPYC CPUs. ZenDNN is purpose-built to help deep learning application and framework developers improve inference performance on AMD EPYC CPUs across an array of workloads, including computer vision, natural language processing, and recommender systems.

TF-ZenDNN 

We have integrated ZenDNN into high-level AI frameworks for ease of use. Our prototype integration with TensorFlow, called TF-ZenDNN, is done by forking the TensorFlow repository at a specific version and directly modifying TensorFlow code. TF-ZenDNN is available as a binary package for direct integration from AMD’s ZenDNN developer resources page (diagram 1 below), with installation instructions available in our TensorFlow + ZenDNN User Guide.

AMD Zen Deep Neural Network (ZenDNN) Resources page
Diagram 1. The ZenDNN v4.0 binary package available on our ZenDNN developer resources page is referred to in this blog as our TF-ZenDNN direct integration version.

TF-ZenDNN optimizes graphs at the network level and provides tuned primitive implementations at a library level, including Convolution, MatMul, Elementwise, and Pooling (Max and Average). We have seen performance benefits across a variety of neural network models, including the breadth of convolutional neural networks depicted by the orange line below in Graph 1. Optimizing Tencent’s AI Applications with the ZenDNN AI Inference Library and TF-ZenDNN impact on TinyDefectNet demonstrates the high performance of ZenDNN and its integration with TensorFlow, respectively. 

AMD Zen Deep Neural Network (ZenDNN) Resources page
Graph 1. Performance uplift of the TensorFlow-ZenDNN plug-in v0.1 and TF-ZenDNN direct integration v4.0 compared to TF-vanilla (without ZenDNN). As optimizations continue to be added to the TensorFlow-ZenDNN plug-in, the extent of performance uplift is expected to compare to that of TF-ZenDNN direct integration. Please see endnotes ZD-045 through ZD-051 at the end of this blog.

TensorFlow-ZenDNN Plug-in 

TF-ZenDNN direct integration, as in the binary form described in the section above, requires significant changes in the TensorFlow code. Upstreaming such changes to the TensorFlow repository would be cumbersome and unsustainable. TensorFlow v2.5 provides a PluggableDevice mechanism that enables modular, plug-and-play integration of device-specific code. AMD adopted PluggableDevice when implementing the TensorFlow-ZenDNN plug-in for AMD EPYC CPUs. TensorFlow-ZenDNN plug-in adds custom kernel implementations and operations specific to AMD EPYC CPUs to TensorFlow through its kernel and op registration C API (diagram 2 below).

AMD Zen Deep Neural Network (ZenDNN) Resources page
Diagram 2. The TensorFlow-ZenDNN plug-in upstreamed into TFv2.12 enables the addition of custom kernels and operations developed by AMD for performance improvement on AMD EPYC processors.

The main difference between the TensorFlow-ZenDNN plug-in and TF-ZenDNN direct integration is compatibility with standard TensorFlow packages. TF-ZenDNN direct integration is a standalone package which replaces standard TensorFlow packages. TensorFlow-ZenDNN plug-in is a supplemental package to be installed alongside standard TensorFlow packages starting from TF version 2.12 onwards.

From a TensorFlow developer’s perspective, the TensorFlow-ZenDNN plug-in approach simplifies the process of leveraging ZenDNN optimizations compared to the TF-ZenDNN direct integration approach. With TF-ZenDNN direct integration, the developer needs to download the foundational TensorFlow build and navigate separately to the AMD ZenDNN developer resources page to download the specific TF-ZenDNN binary for integration. In contrast, with the TensorFlow-ZenDNN plug-in approach, everything that a user needs to take advantage of ZenDNN resides on TensorFlow pages, as described further in the next section, “Step-by-Step Guide to using ZenDNN on AMD EPYC Processors”.

The TensorFlow-ZenDNN plug-in, in its first iteration (v0.1), currently offers 16 common ZenDNN ops, including Conv2D, MatMul, BatchMatMul, FusedBatchNorm, AvgPool, and MaxPool. Other ops that are not covered will fall back to TensorFlow’s native kernels. TensorFlow-ZenDNN plug-in provides competitive performance with TF-ZenDNN direct integration package for models such as ResNet, Inception, and VGG variants, as represented in Graph 1 above, with the blue bars representing TensorFlow-ZenDNN plug-in performance and the orange line representing TF-ZenDNN direct integration performance. However, TF-ZenDNN direct integration still outperforms the plug-in for other models, such as MobileNet and EfficientNet, because the plug-in does not yet support graph optimizations that are currently featured in TF-ZenDNN direct integration packages. We expect the performance to be closer once the TensorFlow-ZenDNN plug-in reaches feature parity with TF-ZenDNN direct integration.

Step-by-Step Guide to using ZenDNN
on AMD EPYC Processors 

Taking advantage of ZenDNN optimizations in TensorFlow is straightforward:  

1. Download ZenDNN Plug-in CPU wheel file from the TensorFlow Community Supported Builds webpage.

2. Pip install the ZenDNN plug-in using the following commands:

pip install tensorflow-cpu==2.12.0 

pip install tensorflow_zendnn_plugin-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

 
3. Enable ZenDNN optimizations in your inference flow by setting the following environment variables:

export TF_ENABLE_ZENDNN_OPTS=1

export TF_ENABLE_ONEDNN_OPTS=0

To disable ZenDNN optimizations in your inference execution, you can set the corresponding ZenDNN environment variable to 0:

export TF_ENABLE_ZENDNN_OPTS=0.  

TensorFlow-ZenDNN plug-in is supported with ZenDNN v3.3. Please see Chapter 5 of the TensorFlow-ZenDNN Plug-in User Guide for performance tuning guidelines. 

For optimal inference performance, AMD recommends using the TF-ZenDNN direct integration binaries available on the AMD ZenDNN developer resources page

What’s Coming Next with ZenDNN 

TensorFlow v2.12 marks the first release of our TensorFlow-ZenDNN plug-in. AMD intends to continue improving the performance of the TensorFlow-ZenDNN plug-in on current- and future-generation AMD EPYC processors by supporting more ZenDNN ops, graph optimizations, and quantization in subsequent TensorFlow-ZenDNN plug-in releases. Such enhancements include a planned plug-in version transition from ZenDNN v3.3 to ZenDNN v4.0 to enable optimizations that take advantage of the AVX-512 and VNNI capability in 4th Gen EPYC processors.

With our aim of continuously improving the TensorFlow-ZenDNN plug-in for the community, we encourage TensorFlow developers to test this new TensorFlow-ZenDNN plug-in and share comments and concerns on our ZenDNN GitHub page. Technical support resources can also be reached via the following email address: zendnnsupport@amd.com.

We are excited to continue collaborating with TensorFlow to improve the ZenDNN experience for the wider TensorFlow developer community!

Acknowledgements

The development and upstreaming of the TensorFlow-ZenDNN plug-in is the work of many people from AMD and the TensorFlow team at Google.

From AMD: Chandra Kumar Ramasamy, Aakar Dwivedi, Savan Anadani, Arun Ramachandran, Avinash-Chandra Pandey, Ratan Prasad, Aditya Chatterjee, Alok Ranjan Srivastava, Prakash Raghavendra, Pradeep Kumar Sinha, Vincent Dee.

From Google: Penporn Koanantakool, Eugene Zhulenev, Douglas Yarrington.

Legal Endnotes

ZD-045 through ZD-051:

Testing conducted by AMD Performance Labs as of Tuesday, February 7, 2023 on test systems comprising of:

AMD System: AMD Eng Sample of the AMD EPYC™ 9004 96-core processor, dual socket, with hyperthreading on, 2150 MHz CPU frequency (Max 3700 MHz), 768GB RAM, 768MB L3 Cache, NPS1 mode, Ubuntu® 20.04.5 LTS version, kernel version 5.4.0-131-generic, BIOS TQZ1000F, GCC/G++ version 11.1.0, GNU ID 2.31, Python 3.8.15. For no ZenDNN, Tensorflow 2.12. For the ZenDNN plug-in, AOCL BLIS 3.0.6, Tensorflow 2.12, ZenDNN version 3.3; for Direct Integration AOCL BLIS 4.0, Tensorflow Version 2.10, ZenDNN 4.0.

Tests run all from Unified Inference Frontend 1.1 (UIF1.1) model zoo:

  • FP32 EfficientNet
  • FP32 Inception v3
  • FP32 MobileNet v1
  • FP32 VGG16
  • FP32 RefineDet
  • FP32 BERT Base
  • FP32 ResNet50

Results may vary based on factors such as software versions and BIOS settings. ZD-045 thru ZD-051

Read More

What’s new in TensorFlow 2.12 and Keras 2.12?

What’s new in TensorFlow 2.12 and Keras 2.12?

Posted by the TensorFlow & Keras teams

TensorFlow 2.12 and Keras 2.12 have been released! Highlights of this release include the new Keras model saving and exporting format, the keras.utils.FeatureSpace utility, SavedModel fingerprinting, Python 3.11 wheels for TensorFlow and many more.

TensorFlow Core

SavedModel Fingerprinting

Models saved with tf.saved_model.save now come with a fingerprint file containing hashes to uniquely identify the SavedModel. Multiple fingerprints are derived from the model content, allowing you to compare the structure, graph, signatures, and weights across models. Read more about fingerprinting in the RFC and check out the read_fingerprint API and Fingerprint class.

tf.function

tf.function now uses the Python inspect library to consistently mimic the decorated function’s signature. WYSIWYG: decorated and non-decorated behavior is identical, even for complex uses like wrapping (functools.wraps) and partial application (functools.partial).

We now detect incompatible tf.function input types (such as mismatched functools.wraps calls). Additionally, we have improved type constraining logic (input_signature) for better error messages and consistency (e.g. a function with no parameters now automatically has input_signature=[]).

Additionally, we have added experimental.extension_type.as_dict() to convert tf.experimental.ExtensionTypes to Python dicts.

Keras

New model format

The biggest new Keras feature in this release is the new model export formats. We’ve completely reworked Keras saving and serialization to cleanly separate two key use cases:

1. Python saving & reloading. This is when you save a Keras model to re-instantiate it later in a Python runtime, exactly as it was. We achieve this with a new file format, called the “Keras v3” format (.keras). You can start using it by calling model.save("your_model.keras", save_format="keras_v3").

 

2. Model export for inference in a runtime that might not support Python at all (e.g. the TF Serving runtime). You can create a lightweight (single-file) export via model.export("your_model") – and reload it in TF Serving or Python via tf.saved_model.load("your_model"). By default, this format only preserves a single serving endpoint, the forward pass of the model, available upon reload as .serve(). Further customization is available through the keras.export.ExportArchive class.

In the 2.13 release, keras_v3 will become the default for all files with the .keras extension. The format supports non-numerical state such as vocabulary files and lookup tables, and it is easy to save custom layers with exotic state elements (such as a FIFOQueue). The format does not rely on loading arbitrary code through bytecode or pickling, so it is safe by default. This is a big advance for secure ML. Note that due to this safety-first mindset, Python lambdas are disallowed at loading time. If you want to use a lambda, and you trust the source of the model, you can pass safe_mode=False to the loading method.

The legacy formats (“h5” and “Keras SavedModel” format based on TF SavedModel) will stay supported in perpetuity. However, we recommend that you consider adopting the new Keras v3 format for richer Python-side model saving/reloading, and using export() for inference-optimized model export.

FeatureSpace

Another exciting feature is the introduction of the keras.utils.FeatureSpace utility. It enables one-step indexing and preprocessing of structured data – including feature hashing and feature crossing. See the [feature space tutorial](https://keras.io/examples/structured_data/structured_data_classification_with_feature_space/) for more information.

Like all Keras APIs, FeatureSpace is built with progressive disclosure of complexity in mind, so it is fully customizable – you can even specify custom feature types that rely on your own preprocessing layers. For instance, if you want to create a feature that encodes a text paragraph, that’s just two lines of code:

from tensorflow.keras import layers, utils custom_layer = layers.TextVectorization(output_mode="tf_idf") feature_space = utils.FeatureSpace( features={ "text": FeatureSpace.feature( preprocessor=custom_layer, dtype="string", output_mode="float" ), }, output_mode="concat", )

There are just the release highlights – there are many more Keras-related improvements included, so be sure to check out the release notes!

tf.data

Warm starting

tf.data has added support for warm-starting input processing. If warm_start=True (on tf.data.experimental.OptimizationOptions), tf.data will start preemptively start background threads during iterator creation (instead of waiting for the first call to GetNext). This allows users to improve latency to the initial GetNext call at the expense of higher memory usage.

Re-randomizing across epochs

tf.data added a new rerandomize_each_iteration argument to tf.data.Dataset.random(), to control whether the sequence of generated random numbers should be re-randomized every epoch, or not (the default behavior). If seed is set and rerandomize_each_iteration=True, random() will produce a different (deterministic) sequence of numbers every epoch. This can be useful when training over a relatively smaller number of input examples to ensure that the model doesn’t learn the sequence itself.

Infra Updates

  • Protobuf python runtime version was upgraded to 4.21.9. All protobuf *_pb2.py stubs are generated now with protoc 3.21.9. The minimal supported protobuf runtime version is 3.20.3.
  • We released Python 3.11 wheels for the TensorFlow packages with this release!
  • We removed Python 3.7 support from this version. Going forward, we will no longer release patches for Python 3.7.

      Read More

      People of AI

      People of AI

      Posted by Ashley Oldacre and Laurence Moroney

      Throughout the years, we have seen some incredible ways AI has had an impact on our careers and daily lives. From solving some really challenging problems like predicting air quality through apps like Air Cognizer, helping parents of deaf children learn sign language, protecting the Great Barrier Reef and bringing culture and people together through projects like Sounds of India and Shadow Art, the sky’s the limit.

      But who are the people behind it all?

      To answer this question, I joined forces with my co-host, Laurence Moroney, to launch the People of AI podcast. We want to share the stories of some of the incredible people behind this technology. Through our interviews, we learn from a handful of current AI/ML leaders and practitioners and invite them to share their stories, what they are building, lessons learned along the way, and how they see the future of the industry. Through our conversations, we uncover the passion and creativity behind AI and ML development, and potential applications and use cases for good.

      There is no doubt that AI is front and center in our lives today. It’s changing the future and shaping our conversations – whether it’s with family or the latest chat app. Throughout this podcast we will connect with some of the people behind the technology, share in their enthusiasm, concerns and learn from them.

      Starting today, we will release one new episode of “People of AI” per week. Listen to the first episode on the People of AI site, also available on Spotify, Apple podcasts, Google podcasts, Deezer and Stitcher.

      • Episode 1: meet your hosts, Ashley Oldacre and Laurence Moroney, as we uncover what it means to be a person of AI.
      • Episode 2: learn about entrepreneurship with Sharon Zhou, CS Faculty at Stanford and MIT Technology Review’s 35 Under 35.
      • Episode 3: learn about the amazing ways you can use ML on the web with Jason Mayes, the public face of Web ML at Google, Web Engineer, and Creative Innovator.
      • Episode 4: learn how to pivot mid-career into the field of ML with Catherine Nelson, Principal Data Scientist at SAP Concur.
      • Episode 5: learn how to follow your passion and bring it into your work with Gant Laborde, Chief Innovation Officer at Infinite Red, Inc. and Google Developer Expert.
      • Episode 6: we talk with Joana Carrasqueira, Head of Community for Developer Relations in ML at Google, about empowering and connecting our communities around AI.

      Whether you’re just getting started in AI/ML, or looking to expand your established experience, these stories are for you. We hope you will tune in!

      This podcast is sponsored by Google. Any remarks made by the speakers are their own and are not endorsed by Google.

      Read More

      TensorFlow with MATLAB

      TensorFlow with MATLAB

      Posted by Sivylla Paraskevopoulou, Product Marketing Manager at MathWorks

      In this blog post I will show you how to use TensorFlow™ with MATLAB® for deep learning applications. More specifically, I will show you how to convert pretrained TensorFlow models to MATLAB models, convert models from MATLAB to TensorFlow, and use MATLAB and TensorFlow together.

      These interoperability features, offered by MATLAB, enable collaboration between colleagues, teams, and communities that work on different platforms. Today’s post will show you how to use these features, and give you examples of when you might want to use them and how they connect the work of AI developers and engineers to enable domain-specific AI system design.

      Introduction

      What is MATLAB?

      MATLAB is a computing platform tailored for engineering and scientific applications like data analysis, signal and image processing, control systems, wireless communications, and robotics. MATLAB includes a programming language, interactive apps, and tools for automatically generating embedded code. MATLAB is also the foundation for Simulink®, a block diagram environment for simulating complex multi-domain systems.

      Similarly to Python® libraries, MATLAB provides toolboxes for achieving different goals. More specifically, MATLAB provides the Deep Learning Toolbox™ for deep learning workflows. Deep Learning Toolbox provides a framework for designing and implementing deep neural networks with algorithms, pretrained models, and apps. It can be combined with domain-specific toolboxes in areas such as computer vision, signal processing, and audio applications.

      Flow chart depicting the correlation between Python and MATLAB as programming languages to TensorFlow and Deep Learning Toolbox as Deep Learning Platforms respectively
      Figure:Python and MATLAB are programming languages; Python can leverage the TensorFlow library for deep learning workflows, while MATLAB provides the Deep Learning Toolbox.

      Why TensorFlow and MATLAB?

      Both TensorFlow and MATLAB are widely used for deep learning. Many MATLAB customers are interested in integrating TensorFlow models into their AI design, for creating customized tools, simulating complex systems, or optimizing data modeling. TensorFlow users can also leverage MATLAB to generate, analyze, and visualize training data, post-process model output, and deploy trained neural networks to desktop, web apps, or embedded hardware.

      For example, engineers have integrated TensorFlow models into Simulink (MATLAB simulation environment) to develop a battery state-of charge estimator for an electric vehicle and scientists have used MATLAB with TensorFlow to build a custom toolbox for reading climate data. For more details on these examples, see Integrate TensorFlow Model into Simulink for Simulation and Code Generation and Climate Data Store Toolbox for MATLAB.

      What’s Next?

      Now you have started to see the benefits of using TensorFlow with MATLAB. Let’s get into more of the technical details on how to use TensorFlow with MATLAB in the following three sections.

      You will see how straightforward it is to use TensorFlow with MATLAB and why I (and other engineers) like having the option to combine them for deep learning applications. Why choose when you don’t have to?

      Convert Model from TensorFlow to MATLAB

      Flow chart showing the conversion of a model `importTensorFlowNetwork` from TensorFlow to MATLAB

      You can convert a pretrained model from TensorFlow to MATLAB by using the MATLAB function importTensorFlowNetwork. A scenario when this function might be useful; a data scientist creates a model in TensorFlow and then an engineer integrates this model into an AI system created in MATLAB.

      We will show you here how to import an image classification TensorFlow model into MATLAB and (1) use it for prediction and (2) integrate it into an AI system.

      Convert model from TensorFlow to MATLAB
      Before importing a pretrained TensorFlow model into MATLAB network, you must save the TensorFlow model in the SavedModel format.

      Python code:

      import tensorflow as tf tf.saved_model.save(model.modelFolder)
      Then, you can import the TensorFlow model into MATLAB by using the MATLAB function importTensorFlowNetwork. You only need one line of code!

      MATLAB code:

      modelFolder = “EfficientNetV2L”;net = importTensorFlowNetwork(modelFolder,OutputLayerType=”classification”)

      Classify Image

      Read the image you want to classify. Resize the image to the input size of the network.
      MATLAB code:
      Im = imread(“mydoc.jpg”);InputSize = net.Layers(1).InputSize;Im = imresize(Im,InputSize(1:2));

      Before you classify the image, the image might require further preprocessing or changing the dimension ordering from TensorFlow to MATLAB. To learn more and get answers to common questions about importing models, see Tips on Importing Models from TensorFlow.

      Predict and plot image with classified label. MATLAB code:

      label = classify(net,Im); imshow(Im) title("Predicted label: " + string(label));

      Image of a pomeranian with text 'Predicted label: Pomeranian'

      To see the full example on how to import an image classification TensorFlow model into MATLAB and use the model for prediction, see Image Classification in MATLAB Using Converted TensorFlow Model. To learn more on importing TensorFlow models into MATLAB, check out the blog post Importing Models from TensorFlow, PyTorch, and ONNX.

      Transfer Learning
      A common reason to import a pretrained TensorFlow model into MATLAB is to perform transfer learning. Transfer learning is the process of taking a pretrained deep learning model and fine-tuning to fit the model to a new problem. For example, you are doing object detection in MATLAB, and you find a TensorFlow model that can improve the detection accuracy, but you need to retrain the model with your data. Using transfer learning is usually faster and easier than training a network from scratch.

      In MATLAB, you can perform transfer learning programmatically or interactively by using the Deep Network Designer (DND) app. It’s easy to do model surgery (prepare a network to train on new data) with a few lines of MATLAB code by using built-in functions that replace, remove, or add layers at any part of the network architecture. For an example, see Train Deep Learning Network to Classify New Images. With DND, you can interactively prepare the network for training, train the network, export the retrained network, and then use it for the new task. For an example, see Transfer Learning with Deep Network Designer.

      Screen grab showing editing of a pretrained model in Deep Network Designer
      Figure:Edit pretrained model with a low-code app for transfer learning.

      AI System Design in Simulink
      Simulink is a block diagram environment used to design systems with multi-domain models, simulate systems before moving to hardware, and deploy without writing code. Simulink users have expressed interest in the ability to bring in AI models and simulate entire systems. In fact, this is very easy to do with Simulink blocks.

      In the following figure, you can see a very simple AI system that reads and classifies an image using an imported TensorFlow model. Essentially, the Simulink system executes the same workflow shown above. To learn more about how to design and simulate such a system, see Classify Images in Simulink with Imported TensorFlow Network.

      Screen grab of using image_classifier in Simulink
      Figure:Simple Simulink system for predicting image label

      Of course, Simulink capabilities extend far beyond classifying an image of my dog after I gave him a bad haircut and trying to predict his breed. For example, you can use deep neural networks inside a Simulink model to perform lane and vehicle detection. To learn more, see Machine Learning with Simulink and NVIDIA Jetson.

      Moving image showing lane and vehicle detection output in Simulink
      Lane and vehicle detection in Simulink using deep learning

      Convert Model from MATLAB to TensorFlow

      Flow chart showing conversion of `exportnetworktoTensorFlow` from MATLAB to TensorFlow

      You can convert a trained or untrained model from MATLAB to TensorFlow by using the MATLAB function exportNetworkToTensorFlow. In MATLAB, we refer to trained models as networks and to untrained models as layer graphs. The Pretrained Deep Neural Networks documentation page shows you all the options of how to get a pretrained network. You can alternatively create your own network.

      Create Untrained Model

      Create a bidirectional long short-term memory (BiLSTM) network to classify sequence data. An LSTM network takes sequence data as input and makes predictions based on the individual time steps of the sequence data.

      Architecture of LSTM model
      Figure:Architecture of LSTM model

      MATLAB code:

      inputSize = 12;numHiddenUnits = 100;numClasses = 9; layers = [ sequenceInputLayer(inputSize) bilstmLayer(numHiddenUnits,OutputMode="last") fullyConnectedLayer(numClasses) softmaxLayer]; lgraph = layerGraph(layers);

      To learn how to create the training data set for this model, see Export Untrained Layer Graph to TensorFlow. An important step is to permute the sequence data from the Deep Learning Toolbox ordering (CSN) to the TensorFlow ordering (NSC), where C is the number of features of the sequence, S is the sequence length, and N is the number of sequence observations. To learn more about the dimension ordering of the input data for different deep learning platforms, see Input Dimension Ordering.

      Export Model to TensorFlow

      Export the layer graph to TensorFlow. The exportNetworkToTensorFlow function saves the TensorFlow model in the Python package myModel.

      MATLAB code:

      exportNetworkToTensorFlow(lgraph,”myModel”)

      Train TensorFlow Model

      Run the following code in Python to load the exported model from the Python package myModel. You can also compile and train the exported model in Python. To train the model, use the training data in training_data.mat that you previously created.

      Python code:

      import myModel model = myModel.load_model()

      Load training data.

      Python code:

      import scipy.io as sio data = sio.loadmat("training_data.mat") XTrain = data["XTrain"] YTrain = data["TTrain"]

      Compile and train model.

      Python code:

      model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics=["accuracy"]) r = model.fit(XTrain, YTrain, epochs=100, batch_size=27)

      To learn more on how to export MATLAB models to TensorFlow, check out our blog post.

      moving image showing how to export an untrained model from MATLAB to TensorFlow and train on Google Colab
      Export untrained model from MATLAB to TensorFlow and train on Google Colab

      Run TensorFlow and MATLAB Together

      TensorFlow + MATLAB

      You ‘ve seen so far how to convert models between TensorFlow and MATLAB. You also have the option to use TensorFlow and MATLAB together (run from the same environment) by either calling Python from MATLAB or calling MATLAB from Python. This way you can take advantage of the best capabilities from each environment by creating an integrated workflow.

      For example, TensorFlow might offer newer models but you like MATLAB apps for labeling data, or you might want to train your TensorFlow model under multiple initial conditions using the Experiment Manager app (see example).

      Call Python from MATLAB

      Instead of importing a TensorFlow model into MATLAB you have the option to directly use the TensorFlow model in your MATLAB workflow by calling Python from MATLAB. You can access Python libraries by adding the py. prefix and execute any Python statement from MATLAB by using the pyrun function. For an example that shows how to call a TensorFlow model in MATLAB, see Image Classification in MATLAB Using TensorFlow.

      A use case that this option might be useful is the following. You have created an object detection workflow in MATLAB. You want to quickly compare TensorFlow models to find the best suited model for your task before importing the best suited model into MATLAB. Call TensorFlow from MATLAB to run an inference test quickly.

      Call MATLAB from Python

      You can use MATLAB Engine API to call MATLAB from a Python environment and thus, integrate MATLAB tools and apps into your existing Python workflow. MATLAB is convenient for labeling and exploring data for domain-specific (e.g., radar, wireless, audio, and biomedical) signal processing using low-code apps. For an example, see our GitHub repo Co-Execution for Training a Speech Command Recognition System.

      Conclusion

      The bottom line is that both TensorFlow and MATLAB offer excellent tools that enable applying deep learning to your application. MATLAB integrates with TensorFlow to take full advantage of these tools and enable access to hundreds of deep learning models. Choose between the interoperability features (convert models between TensorFlow and MATLAB, or use TensorFlow and MATLAB together) to create a deep learning workflow that bridges platforms and teams.

      If you have questions about how, when, and why to use the described interoperability, email me at sparaske@mathworks.com. I would love to hear more about your workflow and discuss how working across deep learning platforms accelerates the application of deep learning to your domain.

      Read More

      How Vodafone Uses TensorFlow Data Validation in their Data Contracts to Elevate Data Governance at Scale

      How Vodafone Uses TensorFlow Data Validation in their Data Contracts to Elevate Data Governance at Scale

      Posted by Amandeep Singh (Vodafone), Max Vökler (Google Cloud)

      Vodafone leverages Google Cloud to deploy AI/ML use cases at scale

      As one of the largest telecommunications companies worldwide, Vodafone is working with Google Cloud to advance their entire data landscape, including their data lake, data warehouse (DWH), and in particular AI/ML strategies. While Vodafone has used AI/ML for some time in production, the growing number of use cases has posed challenges for industrialization and scalability. For Vodafone, it is key to rapidly build and deploy ML use cases at scale in a highly regulated industry. While Vodafone’s AI Booster Platform – built on top of Google Cloud’s Vertex AI – has provided a huge step to achieve that, this blog post will dive into how TensorFlow Data Validation (TFDV) helps advance data governance at scale.

      High-quality Data is a Prerequisite for ML Use Cases, yet not Easily Achieved

      Excelling in data governance is a key enabler to utilize the AI Booster Platform at scale. As Vodafone works in distributed teams and has shared responsibilities when developing use cases, it is important to avoid disruptions across the involved parties:

      • Machine Learning Engineer at Vodafone Group level (works on global initiatives and provides best practices on productionizing ML at scale)
      • Data Scientist in local market (works on a concrete implementation for their specific country, needs to ensure proper data quality and feature engineering)
      • Data Owner in local market (needs to ensure data schemas do not change)

      An issue that often arises is that table schemas are modified, or feature names and data types change. This could be due to a variety of reasons. For example, the data engineering process, which is owned by IT teams, is revised.

      Data Contracts Define the Expected Form and Shape of Data

      Data Contracts in machine learning are a set of rules that define the structure, data types, and constraints of the data that your models are trained on. The contracts provide a way to specify the expected schema and statistics of your data. The following can be included as part of your Data Contract:

      • Feature names
      • Data types
      • Expected distribution of values in each column.

      It can also include constraints on the data, such as:

      • Minimum and maximum values for numerical columns
      • Allowed values for categorical columns.

      Before a model is productionized, the Contract is agreed upon by the stakeholders working on the pipeline, such as the ML Engineers, Data Scientists and Data Owners. Once the Data Contract is agreed upon, it cannot change. If a pipeline breaks due to a change, the error can be traced back to the responsible party. If the Contract needs amending, it needs to go to a review between the stakeholders, and once agreed upon, the changes can be implemented into the pipeline. This helps ensure the quality of data going into our model in production.

      Vodafone Benefits from TFDV Data Contracts as a Way to Streamline Data Governance

      As part of Vodafone’s efforts to streamline data governance, we made use of Data Contracts. A Data Contract ensures all teams work in unison, helping to maintain quality throughout the data lifecycle. These contacts are a powerful tool for managing and validating data used for machine learning. They provide a way to ensure that data is of high quality, free of errors and has the expected distribution. This blog post covers the basics of Data Contracts, discusses how they can be used to validate and understand your data better, and shows you how to use them in combination with TFDV to improve the accuracy and performance of your ML models. Whether you’re a data scientist, an ML engineer, or a developer working with machine learning, understanding Data Contracts is essential for building high-quality, accurate models.

      How Vodafone Uses Data Contracts

      Utilizing such a Data Contract, both in training and prediction pipelines, we can detect and diagnose issues such as outliers, inconsistencies, and errors in the data before they can cause problems with the models. Another great use of using Data Contracts is that it helps us detect data drift. Data drift is the most common reason for performance degradation in ML Models. Data drift is when the input data to your model changes to what it was trained on, leading to errors and inaccuracies in your predictions. Using Data Contracts can help you identify this issue.

      Data Contracts are just one example of the many KPIs we have within Vodafone regarding AI Governance and Scalability. Since the development and release of AI Booster, more and more markets are using the platform to productionize their use case, and as part of this, we have the resources to scale components vertically. Examples of this, apart from Data Contracts, can be specialized logging, agreed-upon ways of calculating Model Metrics and Model Testing strategies, such as Champion/Challenger and A/B Testing.

      How TensorFlow Data Validation (TFDV) Brings Data Contracts to Life

      TFDV is a library provided by the TensorFlow team, for analyzing and validating machine learning data. It provides a way to check that the data conforms to a Data Contract. TFDV also provides visualization options to help developers understand the data, such as histograms and summary statistics. It allows the user to define data constraints and detect errors, anomalies, and drift between datasets. This can help to detect and diagnose issues such as outliers, inconsistencies, and errors in your data before they can cause problems with your models.

      When you use TFDV to validate your data, it will check that the data has the expected schema. If there are any discrepancies, such as a missing column or a column with the wrong datatype, TFDV will raise an error and provide detailed information about the problem.

      At Vodafone, before a pipeline is put into production, a schema is agreed upon for the input data. The agreement is between the Product Manager/Use Case Owner, Data Owner, Data Scientist and ML Engineer. The first thing we do in our pipeline, as seen in Figure 1, is to generate statistics about our data.

      Flow diagram showing components of a data contract in a typical Vodafone training pipeline
      Figure 1: Components of a Data Contract in a typical Vodafone training pipeline

      The code below uses TFDV to generate statistics for the training dataset and visualizes them (step 2), making it easy to understand the distribution of the data and how it’s been transformed. The output of this step is an HTML file, displaying general statistics about our input dataset. You can also choose a range of different functionalities on the HTML webpage to play around with the statistics and get a deeper understanding of the data.

      # generate training statisticsgen_statistics = generate_statistics( dataset=train_dataset.output, file_pattern=file_pattern, ).set_display_name("Generate data statistics") # visualise statisticsvisualised_statistics = visualise_statistics( statistics=gen_statistics.output, statistics_name="Training Statistics").set_display_name("Visualise data statistics")

      Step 3 is concerned with validating the schema. Within our predefined schema, we also define some thresholds for certain data fields. We can specify domain constraints on our Data Contract, such as minimum and maximum values for numerical columns or allowed values for categorical columns. When you validate your data, TFDV will check that all the values in the dataset are within the specified domain. If any values are out of range, TFDV will provide a warning and give you the option to either discard or correct the data. There is also the possibility to specify the expected distribution of values in each feature of the Data Contract. TFDV will compute the actual statistics of your data, as shown in Figure 2, and compare them to the expected distribution. If there are any significant discrepancies, TFDV will provide a warning and give you the option to investigate the data further.

      Furthermore, this allows us to detect outliers and anomalies in the data (step 4) by comparing the actual statistics of your data to the expected statistics. It can flag any data points that deviate significantly from the expected distribution and provide visualizations to help you understand the nature of the anomaly.

      screen shot showing visualization of the dataset statistics created by TFDV
      Figure 2: Example visualization of the dataset statistics created by TFDV

      This code below is using the TFDV library to validate the data schema and detect any anomalies. The validate_schema function takes two arguments, statistics, and schema_path. Statistics argument is the output of a previous step which is generating statistics, and schema_path is the path to the schema file that was constructed in the first line. This function checks if the data conforms to the schema specified in the schema file.

      # Construct schema_path from base GCS path + filenametfdv_schema_path = ( f"{pipeline_files_gcs_path}/{tfdv_schema_filename}") # validate data schemavalidated_schema = validate_schema( statistics=gen_statistics.output, schema_path=tfdv_schema_path ).set_display_name("Validate data schema") # show anomalies and fail if any anomalies were detectedanomalies = show_anomalies( anomalies=validated_schema.output, fail_on_anomalies=True).set_display_name("Show anomalies")

      The next block calls the show_anomalies function which takes two arguments, anomalies and fail_on_anomalies. The anomalies argument is the output of the previous validate_schema function, which includes the detected anomalies if any. The fail_on_anomalies argument is a flag that when set to true, will fail the pipeline if any anomalies are detected. This function will display the anomalies if any were detected, which looks something like this.

      anomaly_info { key: "producer_used" value { description: "Examples contain values missing from the schema: Microsoft (<1%), Sony Ericsson (<1%), Xiaomi (<1%), Samsung (<1%), IPhone (<1%). " severity: ERROR short_description: "Unexpected string values" reason { type: ENUM_TYPE_UNEXPECTED_STRING_VALUES short_description: "Unexpected string values" description: "Examples contain values missing from the schema: Microsoft (<1%), Sony Ericsson (<1%), Xiaomi (<1%), Samsung (<1%), IPhone (<1%). " } path { step: "producer_used" } } }

      All the above components were developed internally using Custom KFP components and TFDV.

      How Vodafone Industrialized the Approach on its AI Booster Platform

      As part of the AI Booster platform, we have also provided templates for different Modeling Libraries such as XGBoost, TensorFlow, AutoML and BigQuery ML. These templates, which are based on Kubeflow Pipelines (KFP) pipelines, offer a wide range of customizable components that can be easily integrated into your machine learning workflow.

      Our templates provide a starting point for our Data Scientists and ML Engineers, but they are fully customizable to fit their specific needs. However, we do enforce the inclusion of certain components in the pipeline when it is being productionized. As shown in Figure 1, we require that all production pipelines include Data Contract components. These components are not specific to a particular model and are intended to be used whenever data is being ingested for training or prediction.

      Automating this step helps with our data validation process, making it more efficient and less prone to human error. It gives all stakeholders the confidence that whenever the model is in production, the data being used by the Model is always up to standard and not full of surprises. In addition, it helps with reproducibility of use cases in different markets, using local data. But most importantly it helps with Compliance and Privacy. It ensures us that our data is being used in compliance with company policies and regulations, and provides a framework for tracking and monitoring the usage of the data to make sure that it is being used appropriately.

      Data Contracts with TFDV Helped Vodafone Industrialize their ML Workflow

      Data Contracts play a critical role in ensuring the quality and integrity of the data used in machine learning models. Data Contracts provide:

      • a set of rules and guidelines for how data should be collected, stored, and used, and help to ensure that the data is of high quality and free of errors
      • a framework for identifying and resolving issues with the data, such as outliers, inconsistencies, and errors, before they can cause problems with the models
      • a way to ensure compliance with company policies and regulations
      • a way to trace back the origin and history of the data, which can be useful for auditing and troubleshooting purposes

      They also help to ensure that the data is being used consistently and in a reproducible way, which can help to improve the accuracy and performance of the models and reduce the risk of errors and inaccuracies in the predictions. Data contracts used in conjunction with tools like TFDV help automate the data validation process, making it more efficient and less prone to human error. Applying this concept in AI Booster helped us at Vodafone to make a key step forward in industrializing our AI/ML use cases.

      Find Out More

      For more information about TFDV, see the user guide and tutorials on tensorflow.org. Special thanks to Amandeep Singh of Vodafone and Max Vökler of Google Cloud for their work to create this design and for writing this post.

      Read More

      TensorFlow Hub ❤️ Kaggle

      TensorFlow Hub ❤️ Kaggle

      Posted by Luiz GUStavo Martins, Google AI Developer Advocate

      We’re excited to announce our new integration with Kaggle Models, a recently launched pre-trained model platform. All 2,300+ TensorFlow models published on TFHub by Google, DeepMind, and more are now discoverable and usable on Kaggle Models, with documentation and sample code.

      Why Kaggle?

      Kaggle is a global community of over 12 million machine learners who test their knowledge in competitions and share machine learning resources, including over 200,000 public datasets. Over the past 10+ years, Kaggle’s competitions have become a proving ground for what works well and what doesn’t across a multitude of ML use cases. This is why Kaggle recently launched its open model hub, Kaggle Models, to better enable the ML community to stress test and validate models publicly and at scale.

      Hosting TensorFlow models on Kaggle makes them more easily accessible to the broader ML community, democratizing model building and advancement. We can’t wait to see what solutions come from this partnership.

      How to Get Started

      A great place to check out the new integration is with the live Kaggle competition called BirdCLEF 2023 using the recently published Bird Vocalization Classifier model. Participants are challenged to build a model that identifies bird species by sound. Bird populations around the world are falling alarmingly, with approximately 48% of existing species experiencing population declines. The results of this competition contribute to scaling the critical work of bird species monitoring that allows researchers to better evaluate whether interventions are working.

      The Bird Vocalization Classifier model was just open-sourced by the Google Research team on TFHub (and subsequently Kaggle Models 🙌). It’s a global bird embedding and classification model that can identify more than 10k bird species’ vocalizations, and also creates embedding vectors that can be used for other tasks.

      To try the model on Kaggle:
      1. Navigate to the model here.
      2. Click the “New Notebook” button, which will open a Kaggle Notebooks editor.
      3. Click the “Copy Code” button on the right-hand side of the editor, which will copy sample code that loads the model using the TensorFlow Hub library.
      4. Paste the code into the notebook’s cell, and you’re ready to go!
      5. Click the “Add Model” button at the bottom. This will attach the model to your notebook.
      Moving image showing the user's experience of the Bird Vocalization Classifier Model on Kaggle

      The snippet imports TFHub library and loads the newly published Bird Vocalization Classifier model. To find more information about this model, you can check its documentation and even play with a full example that demonstrates how to use the model in the competition here.

      import tensorflow_hub as hub keras_layer = hub.KerasLayer('https://kaggle.com/models/google/bird-vocalization-classifier/frameworks/TensorFlow2/variations/bird-vocalization-classifier/versions/1')

      For more information on Kaggle Models including its current feature set and future roadmap, check the official announcement here. We look forward to seeing what you build as a result of this integration!

      Read More

      A new Kaggle competition to help parents of deaf children learn sign language

      A new Kaggle competition to help parents of deaf children learn sign language

      By Sam Sepah, ML Research Program Manager

      As a Deaf person who learned sign language from my family at an early age, I consider myself lucky. Every day, 33 babies are born with permanent hearing loss in the U.S. (kdhe.ks.gov, deafchildren.org) The majority of deaf children are born to hearing parents who do not know how to sign, like my own. My parents were determined to provide me with the ability to communicate effectively anywhere, anytime, and anyplace. Because of this rich language environment, today I can achieve my dreams and live the life I want to live.

      But most hearing parents do not know how to sign and might not have the resources to learn. Because of this, they may not be able to have a conversation with their deaf child, even at the family dinner table.

      Deaf children who grow up in homes where sign language is not used are at risk for language deprivation. Language deprivation is a delay in language development that occurs when sufficient exposure to language, spoken or signed, is not provided in the first few years of a child’s life. Language deprivation is very common in deaf children, but it can happen to hearing children as well. It often leads to a life of challenges with employment, relationships, and the ability to be successful in one’s life goals.

      So, what can be done?

      You can help reduce the risk of language deprivation for deaf children by joining our new Isolated Sign Language Recognition competition on Kaggle and training an accurate, real-time sign language recognition (TensorFlow Lite) model!

      We plan to open source the winning model and add it to the PopSign smartphone game app. PopSign* is a smartphone app that makes learning American Sign Language (ASL) fun and interactive. Players match videos of ASL signs with bubbles containing written English words to pop them.

      PopSign is designed to help parents with deaf children learn ASL, but it’s open to anyone who wants to learn sign language vocabulary. The app is a great resource for parents who want to learn sign language and help their children develop language and social skills. By adding a sign language recognizer from this competition, PopSign players will be able to sign the type of bubble they want to shoot, providing the player with the opportunity to form the sign instead of just watching videos of other people signing.

      We are grateful to our partners, the National Technical Institute for the Deaf at Rochester Institute of Technology, the Georgia Institute of Technology, and Deaf Professional Arts Network, for developing the PopSign game app, creating the dataset and helping us prepare for this competition. The game, dataset, and model you train will help us improve access to communication for so many families!

      Join the competition today and together, we can make a difference for deaf children worldwide.

      *PopSign is an app developed by the Georgia Institute of Technology and the National Technical Institute for the Deaf at Rochester Institute of Technology. The app is available in beta on Android and iOS.

      Read More