Google I/O 2023: What’s new in TensorFlow and Keras?

Google I/O 2023: What’s new in TensorFlow and Keras?

Posted by Ayush Jain, Carlos Araya, and Mani Varadarajan for the TensorFlow team

Welcome to TensorFlow and Keras at Google I/O!

The world of machine learning is changing, faster than ever. The rise of Large Language Models (LLMs) is sparking the imagination of developers worldwide, with new generative AI applications reaching hundreds of millions of people around the world. These models are trained on massive datasets, and used to solve a variety of tasks, from natural language processing to image generation.

Powering all these new capabilities requires new levels of model efficiency and performance, as well as support for seamless deployment across a growing number of devices – be it on a server, the web, mobile devices, or beyond. As stewards of one of the largest machine learning communities in the world, the TensorFlow team is continually asking how we can better serve you.

To that end, this post covers a few of the many improvements and additions coming this year to the TensorFlow ecosystem. Let’s dive in!

A Growing Ecosystem

New functionality we’re covering today:

KerasCV and KerasNLP allows you to access pre-trained, state-of-the-art models in just a few lines of code.

DTensor helps you scale up your models and train them efficiently by combining different parallelism techniques.

With JAX2TF, models written with the JAX numerical library can be used in the TensorFlow ecosystem.

We also preview the TF Quantization API, which enables you to make your models more cost and resource-efficient without compromising on accuracy.

Applied ML with KerasCV & KerasNLP

KerasCV and KerasNLP are powerful, modularized libraries that give you direct access to the state-of-the-art in computer vision and natural language processing.

The KerasCV + KerasNLP suite, at a glance.
The KerasCV + KerasNLP suite, at a glance.

Whether you want to classify images, auto-generate text from prompts like with Bard or anything in between, KerasCV and KerasNLP make it easy with just a few lines of code. And since it’s a part of Keras, it’s fully integrated with the TensorFlow Ecosystem.

Let’s look at some code for image generation. KerasCV is designed to support many models, and in this case we’ll use a diffusion model. Despite the complexity of the underlying architecture, you can get it up and running with just a few lines of code.

from keras_cv.models import (
StableDiffusion,
)

model = StableDiffusion(
img_width=512,
img_height=512,
)

With one line to import and another to initialize the model, you can generate completely new images:

images = model.text_to_image(
"photograph of an astronaut "
"riding a horse",
batch_size=3,
)
KerasCV-generated images of an astronaut riding a horse
KerasCV-generated images of an astronaut riding a horse!

This is just one of many examples. To learn more, check out our full talk on KerasCV and KerasNLP or in-depth toolkit guides at keras.io/keras_cv and keras.io/keras_nlp.

Machine Learning at Scale with DTensor

DTensor enables larger and more performant model training by giving developers the flexibility to combine and fine-tune multiple parallelism techniques.

Traditionally, ML developers have scaled up models through data parallelism, which splits up your data and feeds it to horizontally-scaled model instances. This scales up training but has an important limitation: it requires that the model fits within a single hardware device.

As models get bigger, fitting into a single device is no longer a guarantee — developers need to be able to scale their models across hardware devices. This is where model parallelism becomes important, allowing for the model to be split up into shards that can be trained in parallel.

With DTensor, data and model parallelism are not only supported, but also can be directly combined to scale models even more efficiently. And it’s completely accelerator agnostic — whether you use TPUs, GPUs, or something else.

Diagram illustrating mixed (data + model) parallelism, with DTensor.
Mixed (data + model) parallelism, with DTensor.

Let’s go through an example. Let’s say that you are building with a transformer model, like the Open Pre-trained Transformer (OPT) available through KerasNLP, and training it with some input dataset:

opt_lm = keras_nlp.models.OPTCasualLM.from_preset("opt_6.7b_en")
opt_lm.compile(...)
opt_lm.fit(wiki_text_dataset)

But here’s the thing about OPT — it’s big. With variations up to 175 billion parameters, if we tried traditional data parallelism, it would have errored outright — there’s just too many weights to reasonably replicate within a single hardware device. That’s where DTensor comes in.

To work with DTensor, we need to define two things:

First is a mesh, where you define (a) a set of hardware devices and (b) a topology, here the batch and model dimensions.

mesh_dims = [("batch", 2), ("model", 4)] mesh = dtensor.create_distributed_mesh(mesh_dims, device_type="GPU")
dtensor.initialize_accelerator_system("GPU")

Second is a layout, which defines how to shard the Tensor dimension on your defined mesh. Through our Keras domain package integrations, you can do this in just one line.

layout_map = keras_nlp.models.OPTCausalLM.create_layout_map(mesh)

From there, you create the DTensor layout’s context and include your model creation code within it. Note that at no point did we have to make any changes to the model itself!

with layout_map.scope():
opt_lm = keras_nlp.models.OPTCasualLM.from_preset("opt_6.7b_en")
opt_lm.compile(...)
opt_lm.fit(wiki_text_dataset)

Performance for DTensor today is already on par with industry benchmarks, nearly matching the gold-standard implementation of model parallelism offered by NVIDIA’s Megatron for GPUs. Further improvements are in the works to raise the bar even further, across hardware devices.

In the future, DTensor will be fully integrated with key interfaces like tf.distribute and Keras as a whole, with one entry point regardless of hardware and a number of other quality of life features. If you want to learn more, check out the DTensor overview or the Keras integration guide!

Bringing Research to Production with JAX2TF

Many of the ML advancements that are now household names had their beginnings in research. For example, the Transformer architecture, created and published by Google AI, underpins the fantastic advances in language models.

JAX has emerged as a trusted tool for much of this kind of discovery, but productionizing it is hard. To that end, we’ve been thinking about how to bring research more easily into TensorFlow, giving innovations built on JAX the full strength of TensorFlow’s uniquely robust and diverse production ecosystem.

That’s why we’ve built JAX2TF, a lightweight API that provides a pathway from the JAX ecosystem to the TensorFlow ecosystem. There are many examples of how this can be useful – here’s just a few:

  • Inference: Taking a model written for JAX and deploying it either on a server using TF Serving or on-device using TFLite.
  • Fine Tuning: Taking a model that was trained using JAX, we can bring its components to TF using JAX2TF, and continue training it in TensorFlow with your existing training data and setup.
  • Fusion: Combining parts of models that were trained using JAX with those trained using TensorFlow for maximum flexibility.

The key to enabling this kind of interoperation between JAX and TensorFlow is baked into jax2tf.convert, which takes in model components created on top of JAX (e.g. your loss function, prediction function, etc.) and creates equivalent representations of them as TensorFlow functions, which can then be exported as a TensorFlow SavedModel.

We’ve created a code walkthrough for one of the examples above: a quick fine-tuning setup, creating a simple model using modeling libraries in the JAX ecosystem (like Flax and Optax) and bringing it into TF to finish training. Check it out here.

JAX2TF is already baked into various tools in the TensorFlow ecosystem, under the hood. For example, here are code guides for simple conversion from JAX to TFLite for mobile devices and from JAX to TF.js for web deployment!

Coming Soon: The TensorFlow Quantization API

ML developers today face a wide variety of real-world constraints introduced by the settings they’re working in, like the size of a model or where it gets deployed.

With TensorFlow, we want developers to be able to quickly adjust and accommodate for these kinds of constraints, and to do so without sacrificing model quality. To do this, we’re building the TF Quantization API, a native quantization toolkit for TF2 which will be available publicly later in 2023.

Briefly, quantization is a group of techniques designed to make models faster, smaller, and generally less resource- and infrastructure-intensive to train and serve.

Quantization does this by reducing the precision of a model’s parameters, just like reducing pixel depth in an image like the one of Albert Einstein below. Note that even with reduced precision, we can still make out the key details:

Eight renderings of a photograph of Albert Einstein with increasingly reduced bit precision from 8-bit to 1-bit.
Renderings of a photograph of Albert Einstein with increasingly reduced bit precision.

At a high level, this works by taking a range of values in your starting precision, and mapping that range to a single bucket in your ending precision. Let’s illustrate this with an example:

Graph showing quantizing float representation to 4-bit integers
Quantizing float representation to 4-bit integers.

Take a look at the range [0.27, 0.49] on the x-axis: for float32, the blue line actually represents 7381976 unique numbers! The red line represents the int4 quantization of this range, condensing all of those numbers into a single bucket: 1001 (the number 9 in decimal).

By lowering precision through quantization, we can store model weights in a much more efficient, compressed form.

There’s a few different ways to quantize.

  • Post-Training Quantization (PTQ): Convert to a quantized model after training. This is as simple as it gets and most readily accessible, but there can be a small quality drop.
  • Quantization-Aware Training (QAT): Simulate quantization during just the forward pass, providing for maximal flexibility with a minimal quality tradeoff.
  • Quantized Training: Quantize all computations while training. This is still nascent, and needs a lot more testing, but is a powerful tool we want to make sure TensorFlow users have access to.

TensorFlow previously has had a few tools for developers to quantize their models, like this guide for PTQ and this one for QAT. However, these have been limited – with PTQ depending on conversion to TFLite for mobile deployment and QAT requiring you to rewrite your model.

The TF Quantization API is different – it’s designed to work regardless of where you’re deploying, and without you having to rewrite a single line of existing modeling code. We’re building it with flexibility and fidelity in mind, so you get the benefits of a smaller quantized model with new levels of fine-grained control and without any concerns about how it’ll all fit into your stack.

Since you’ve made it this far into the blog, here’s a sneak peek at how it’ll look. We’ll start with a typical setup for a TensorFlow model, just a few layers in Keras. From there, we can load in a predefined quantization schema to apply as a config map to our model.

# Step 1: Define your model, just like always.
model = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(32, 3, strides=1, padding='same', activation='relu'),
… …])

# Step 2: Set up the quantization config, using a predefined schema.
scheme = scheme_registry.get_scheme('pixel_8bit_qat')
config_map = QuantizationConfigurationMap(scheme)

But if you need more flexibility, TF Quantization API will also let you fully customize how you quantize. There’s built-in support for you to curate your schema to apply different behaviors for every layer, operation, or tensor!

# ...or just as easily configure your own, whether per-layer:
layer_config = LayerConfiguration(
weight_config=..., activation_config=...)
config_map.set_config(model.layers[0], layer_config)

# per-op:
config_map.set_config(model.layers[0], op_type='matmul', config={
'a': ..., 'b': ...,
'return': ...
})

# even per-tensor:
_8bit_moving_average = QuantizationConfiguration(...)
per_tensor_config = LayerConfiguration(
weight_config=..., activation_config=_8bit_moving_average)
config_map.set_config(model.layers[0], per_tensor_config)

With that, we can directly apply quantization and train or save within a quantization context. Our model still has natural compatibility with the rest of the TF ecosystem, where quantization truly bears fruit.

# Now you can generate a quantization-aware model!
tf.quantization.apply_quantization_on_model(model, config_map, …)

# From here, you can train and save just as always.
with tf.quantization.scope(config_map):
model.fit()
model.save()

# You can also export to TFLite, without any changes!
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT] tflite_model = converter.convert()

We ran a bunch of tests using the MobileNetV2 model on the Pixel 7, and saw up to 16.7x gains in serving throughput versus the non-quantized baseline. This gain comes without any noticeable detriment to quality: both the float32 baseline and the int8 quantized model reported 73% accuracy.

The TF Quantization API isn’t public just yet, but will be available very soon and will continue to evolve to provide even more benefits.

That’s a wrap!

Today, we’ve shown you just a few of the key things we’ve been working on, and there’s a lot more to come.

We can’t wait to see what you’ll build, and we’re always inspired by our community’s enduring enthusiasm and continued partnership. Thanks for stopping by!

Acknowledgements

Special thanks to George Necula, Francois Chollet, Jonathan Bischof, Scott Zhu, Martin Gorner, Dong Li, Adam Koch, Bruce Fontaine, Laurence Moroney, Josh Gordon, Lauren Usui, and numerous others for their contributions to this post.

Read More

AI and Machine Learning @ I/O Recap

AI and Machine Learning @ I/O Recap

Posted by Lauren Usui and Joe Fernandez

Artificial intelligence is a topic of kitchen table conversations around the world today, and as AI becomes more accessible for users and developers, we want to make it easier and more useful for everyone. This year at Google I/O, we highlighted how we are helping developers like you build with generative AI, use machine learning in spreadsheets and applications, create ML models from the ground up, and scale them up to serve millions of users.

While AI technology is advancing rapidly, we must continue to ensure it is used responsibly. So we also took some time to explain how Google is taking a principled approach to applying generative AI and how you can apply our guidelines and tools to make sure your AI-powered products and projects are built responsibly to serve all your users.

If you are new to AI and want to get a quick overview of the technology, check out the getting started video from Google’s AI advocate lead, Laurence Moroney.

Develop generative AI apps with PaLM 2

Everyone seems to be chatting with—or about—generative AI recently, and we want you to be able to use Google’s latest large language model, PaLM 2, to power new and helpful experiences for your users with the PaLM API. Our session on Generative AI reveals more about how you can easily prompt models with MakerSuite to quickly prototype generative AI applications. We demonstrate how you can use the PaLM API for prompting using examples, conversational chat interactions, and using embedding functionality to compress and compare text data in useful ways. We also showed off how to use the PaLM API in Google Colab notebooks with a simple, magical syntax. Check out this talk and sign up to request access to the PaLM API and MakerSuite!

Crunch numbers with AI-powered spreadsheets

Hundreds of millions of people use spreadsheets to organize, manage, and analyze data for everything from business transactions, to inventory accounting, to family budgets. We’re making it easy for everyone to bring the power of AI into spreadsheets with Simple ML for Sheets, a Google Sheets add-on. We recently updated this tool to include anomaly detection and forecasting features. Check out the demonstration of how to predict missing data values and forecast sales with the tool. No coding required!

Simplify on-device ML applications with MediaPipe

AI is finding its way into applications across multiple platforms and MediaPipe makes it easy to build, customize, and deploy on-device ML solutions. We upgraded MediaPipe Solutions this year, improving existing solutions and adding new ones, including interactive segmentation to blur the background behind a selected subject and face stylization to render that selfie in your favorite graphic style.

Do more with Web ML

Every week, hundreds of thousands of developers build AI-powered applications to run in the browser or Node.js using JavaScript and web technologies. Web ML has advanced in multiple areas, and we provide a round up of the top updates in this year’s I/O talk. We announced Visual Blocks for ML, an open JavaScript framework for quickly and interactively building custom ML pipelines. You can now run machine learning models even faster with improved WebGL performance and the release of WebGPU in Chrome. More tools and resources are also now available for web ML developers, including TensorFlow Decision Forest support, a visual debugger for models, JAX to JS conversion support, and a new Zero to Hero training course to grow your skills in Web ML.

Find pre-trained models fast with Kaggle Models

Building machine learning models can take a huge amount of time and effort: collecting data, training, evaluating, and optimizing. Kaggle is making it a whole lot easier for developers to discover and use pretrained models. With Kaggle Models, you can search thousands of open-licensed models from leading ML researchers for multiple ML platforms. Find the model you need quickly with filters for tasks, supported data types, model architecture, and more. Combine this new feature with Kaggle’s huge repository of over 200K datasets and accelerate your next ML project.

Apply ML to vision and text with Keras

Lots of developers are exploring AI technologies and many of you are interested in working on computer vision and natural language processing applications. Keras released new, easy-to-use libraries for computer vision and natural language processing with KerasCV and KerasNLP. Using just a few lines of code, you can apply the latest techniques and models for data augmentation, object detection, image and text generation, and text classification. These new libraries provide modular implementations that are easy to customize and are tightly integrated with the broader TensorFlow ecosystem including TensorFlow Lite, TPUs, and DTensor.

Build ML flexibly and scalably with TensorFlow

With one of the largest ML development communities in the world, the TensorFlow ecosystem helps hundreds of thousands of developers like you build, train, deploy, and manage machine learning models. ML technology is rapidly evolving, and we’re upgrading TensorFlow with new tools to give you more flexibility, scalability, and efficiency. If you’re using JAX, you can now bring your model components into the TensorFlow ecosystem with JAX2TF. We also improved DTensor support for model parallelization, allowing you to scale up execution of larger models by running portions of a single model, or shards, across multiple machines. We also announced a toolkit for applying quantization techniques to practically any TensorFlow model, helping you gain substantial efficiency improvements for your AI applications. The quantization toolkit will be available later this year.

Scale large language models with Google Cloud

When it’s time to deploy your AI-powered applications to your business, enterprise, or the world, you need reliable tools and services that scale with you. Google Cloud’s Vertex AI is an end-to-end ML platform that helps you develop ML models quickly and easily, and deploy them at any scale. To help you build generative AI technology for your product or business, we’ve introduced Model Garden and the Generative AI Studio as part of the Vertex AI platform. Model Garden gives you quick access to the latest foundation models such as Google PaLM 2, and many more to build AI-powered applications for text processing, imagery, and code. Generative AI Studio lets you quickly prototype generative AI applications right in your browser, and when you are ready to deploy, Vertex AI and Google Cloud services enable you to scale up to hundreds, thousands, or millions of users.

Explore new resources to build with Google AI

As tools, technology, and techniques for AI development rapidly advance, finding what you need to get started or take the next step with your project can be challenging. We’re making it easier to find the right resources to accelerate your AI development at Build with Google AI. This new site brings together tools, guidance, and community for building, deploying, and managing ML. Whether you are creating AI for on-device apps or deploying AI at scale, we help you navigate the options and find your path. Check out our latest toolkits on Building an LLM on Android and Text Classification with Keras.

Making Generative AI safe and responsible

AI is a powerful tool, and it’s up to all of us to ensure that it is used responsibly and for the benefit of all. We’re committed to ensuring Google’s AI systems are developed according to our AI principles. This year at Google I/O, we shared how we’ve created guidelines and tools for building generative AI safely and responsibly, and how you can apply those same guidelines and tools for your own projects.

Aaannnd that’s a wrap! Check out the full playlist of all the AI-related sessions we mentioned above. We are excited to share these new tools, resources, and technologies with you, and we can’t wait to see what you build with them!

Read More

Meet the Omnivore: Creative Studio Aides Fight Against Sickle Cell Disease With AI-Animated Short

Meet the Omnivore: Creative Studio Aides Fight Against Sickle Cell Disease With AI-Animated Short

Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse to accelerate their 3D workflows and create virtual worlds.

Creative studio Elara Systems doesn’t shy away from sensitive subjects in its work.

Part of its mission for a recent client was to use fun, captivating visuals to help normalize what could be considered a touchy health subject — and boost medical outcomes as a result.

In collaboration with Boston Scientific and the Sickle Cell Society, the Elara Systems team created a character-driven 3D medical animation using the NVIDIA Omniverse development platform for connecting 3D pipelines and building metaverse applications.

The video aims to help adolescents experiencing sickle cell disease understand the importance of quickly telling an adult or a medical professional if they’re experiencing symptoms like priapism — a prolonged, painful erection that could lead to permanent bodily damage.

“Needless to say, this is something that could be quite frightening for a young person to deal with,” said Benjamin Samar, technical director at Elara Systems. “We wanted to make it crystal clear that living with and managing this condition is achievable and, most importantly, that there’s nothing to be ashamed of.”

To bring their projects to life, the Elara Systems team turns to the USD Composer app, generative AI-powered Audio2Face and Audio2Gesture, as well as Omniverse Connectors to Adobe Substance 3D Painter, Autodesk 3ds Max, Autodesk Maya and other popular 3D content-creation tools like Blender, Epic Games Unreal Engine, Reallusion iClone and Unity.

For the sickle cell project, the team relied on Adobe Substance 3D Painter to organize various 3D environments and apply custom textures to all five characters. Adobe After Effects was used to composite the rendered content into a single, cohesive short film.

It’s all made possible thanks to the open and extensible Universal Scene Description (USD) framework on which Omniverse is built.

“USD is extremely powerful and solves a ton of problems that many people may not realize even exist when it comes to effectively collaborating on a project,” Samar said. “For example, I can build a scene in Substance 3D Painter, export it to USD format and bring it into USD Composer with a single click. Shaders are automatically generated and linked, and we can customize things further if desired.”

An Animation to Boost Awareness

Grounding the sickle cell awareness campaign in a relatable, personal narrative was a “uniquely human approach to an otherwise clinical discussion,” said Samar, who has nearly two decades of industry experience spanning video production, motion graphics, 3D animation and extended reality.

The team accomplished this strategy through a 3D character named Leon — a 13-year-old soccer lover who shares his experiences about a tough day when he first learned how to manage his sickle cell disease.

3D character Leon in the Omniverse viewport.

The project began with detailed discussions about Sickle Cell Society’s goals for the short, followed by scripting, storyboarding and creating various sketches. “Once an early concept begins to crystallize in the artists’ minds, the creative process is born and begins to build momentum,” Samar said.

Sketches for Leon’s story.

Then, the team created rough 2D mockups using the illustration app Procreate on a tablet. This stage of the artistic process centered on establishing character outfits, proportions and other details. The final concept art was used as a clear reference to drive the rest of the team’s design decisions.

Rough animations of Leon made with Procreate.

Moving to 3D, the Elara Systems team tapped Autodesk Maya to build, rig and fully animate the characters, as well as Adobe Substance 3D Painter and Autodesk 3ds Max to create the short’s various environments.

“I’ve found the animated point cache export option in the Omniverse Connector for Maya to be invaluable,” Samar said. “It helps ensure that what we’re seeing in Maya will persist when brought into USD Composer, which is where we take advantage of real-time rendering to create high-quality visuals.”

The real-time rendering enabled by Omniverse was “critically important, because without it, we would have had zero chance of completing and delivering this content anywhere near our targeted deadline,” the technical artist said.

Leon brought to life in 3D using Adobe Substance 3D Painter, Autodesk Maya and USD Composer.

“I’m also a big fan of the Reallusion to Omniverse workflow,” he added.

The Connector allows users to easily bring characters created using Reallusion iClone into Omniverse, which helps to deliver visually realistic skin shaders. And USD Composer can enable real-time performance sessions for iClone characters when live-linked with a motion-capture system.

“Omniverse offers so much potential to help streamline workflows for traditional 3D animation teams, and this is just scratching the surface — there’s an ever-expanding feature set for those interested in robotics, digital twins, extended reality and game design,” Samar said. “What I find most assuring is the sheer speed of the platform’s development — constant updates and new features are being added at a rapid pace.”

Join In on the Creation

Creators and developers across the world can download NVIDIA Omniverse for free, and enterprise teams can use the platform for their 3D projects.

Check out artwork from other “Omnivores” and submit projects in the gallery. Connect your workflows to Omniverse with software from Adobe, Autodesk, Epic Games, Maxon, Reallusion and more.

Follow NVIDIA Omniverse on Instagram, Medium, Twitter and YouTube for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.

Read More

From Robustness to Privacy and Back

*= Equal Contributors
We study the relationship between two desiderata of algorithms in statistical inference and machine learning—differential privacy and robustness to adversarial data corruptions. Their conceptual similarity was first observed by Dwork and Lei (STOC 2009), who observed that private algorithms satisfy robustness, and gave a general method for converting robust algorithms to private ones. However, all general methods for transforming robust algorithms into private ones lead to suboptimal error rates. Our work gives the first black-box transformation that converts any…Apple Machine Learning Research

Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data

Self-training has been shown to be helpful in addressing data scarcity for many domains, including vision, speech, and language. Specifically, self-training, or pseudo-labeling, labels unsupervised data and adds that to the training pool. In this work, we investigate and use pseudo-labeling for a recently proposed novel setup: joint transcription and translation of speech, which suffers from an absence of sufficient parallel data resources. We show that under such data-deficient circumstances, the unlabeled data can significantly vary in domain from the supervised data, which results in…Apple Machine Learning Research

Generalization on the Unseen, Logic Reasoning and Degree Curriculum

This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an ‘extrapolating’ or ‘reasoning’ learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence…Apple Machine Learning Research

Operationalize ML models built in Amazon SageMaker Canvas to production using the Amazon SageMaker Model Registry

Operationalize ML models built in Amazon SageMaker Canvas to production using the Amazon SageMaker Model Registry

You can now register machine learning (ML) models built in Amazon SageMaker Canvas with a single click to the Amazon SageMaker Model Registry, enabling you to operationalize ML models in production. Canvas is a visual interface that enables business analysts to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code. Although it’s a great place for development and experimentation, to derive value from these models, they need to be operationalized—namely, deployed in a production environment where they can be used to make predictions or decisions. Now with the integration with the model registry, you can store all model artifacts, including metadata and performance metrics baselines, to a central repository and plug them into your existing model deployment CI/CD processes.

The model registry is a repository that catalogs ML models, manages various model versions, associates metadata (such as training metrics) with a model, manages the approval status of a model, and deploys them to production. After you create a model version, you typically want to evaluate its performance before you deploy it to a production endpoint. If it performs to your requirements, you can update the approval status of the model version to approved. Setting the status to approved can initiate CI/CD deployment for the model. If the model version doesn’t perform to your requirements, you can update the approval status to rejected in the registry, which prevents the model from being deployed into an escalated environment.

A model registry plays a key role in the model deployment process because it packages all model information and enables the automation of model promotion to production environments. The following are some ways that a model registry can help in operationalizing ML models:

  • Version control – A model registry allows you to track different versions of your ML models, which is essential when deploying models in production. By keeping track of model versions, you can easily revert to a previous version if a new version causes issues.
  • Collaboration – A model registry enables collaboration among data scientists, engineers, and other stakeholders by providing a centralized location for storing, sharing, and accessing models. This can help streamline the deployment process and ensure that everyone is working with the same model.
  • Governance – A model registry can help with compliance and governance by providing an auditable history of model changes and deployments.

Overall, a model registry can help streamline the process of deploying ML models in production by providing version control, collaboration, monitoring, and governance.

Overview of solution

For our use case, we are assuming the role of a business user in the marketing department of a mobile phone operator, and we have successfully created an ML model in Canvas to identify customers with the potential risk of churn. Thanks to the predictions generated by our model, we now want to move this from our development environment to production. However, before our model gets deployed to a production endpoint, it needs to be reviewed and approved by a central MLOps team. This team is responsible for managing model versions, reviewing all associated metadata (such as training metrics) with a model, managing the approval status of every ML model, deploying approved models to production, and automating model deployment with CI/CD. To streamline the process of deploying our model in production, we take advantage of the integration of Canvas with the model registry and register our model for review by our MLOps team.

The workflow steps are as follows:

  1. Upload a new dataset with the current customer population into Canvas. For the full list of supported data sources, refer to Import data into Canvas.
  2. Build ML models and analyze their performance metrics. Refer to the instructions to build a custom ML model in Canvas and evaluate the model’s performance.
  3. Register the best performing versions to the model registry for review and approval.
  4. Deploy the approved model version to a production endpoint for real-time inferencing.

You can perform Steps 1–3 in Canvas without writing a single line of code.

Prerequisites

For this walkthrough, make sure that the following prerequisites are met:

  1. To register model versions to the model registry, the Canvas admin must give the necessary permissions to the Canvas user, which you can manage in the SageMaker domain that hosts your Canvas application. For more information, refer to the Amazon SageMaker Developer Guide. When granting your Canvas user permissions, you must choose whether to allow the user to register their model versions in the same AWS account.

  1. Implement the prerequisites mentioned in Predict customer churn with no-code machine learning using Amazon SageMaker Canvas.

You should now have three model versions trained on historical churn prediction data in Canvas:

  • V1 trained with all 21 features and quick build configuration with a model score of 96.903%
  • V2 trained with all 19 features (removed phone and state features) and quick build configuration and improved accuracy of 97.403%
  • V3 trained with standard build configuration with 97.03% model score

Use the customer churn prediction model

Enable Show advanced metrics and review the objective metrics associated with each model version so that we can select the best performing model for registration to the model registry.

Based on the performance metrics, we select version 2 to be registered.

The model registry tracks all the model versions that you train to solve a particular problem in a model group. When you train a Canvas model and register it to the model registry, it gets added to a model group as a new model version.

At the time of registration, a model group within the model registry is automatically created. Optionally, you can rename it to a name of your choice or use an existing model group in the model registry.

For this example, we use the autogenerated model group name and choose Add.

Our model version should now be registered to the model group in the model registry. If we were to register another model version, it would be registered to the same model group.

The status of the model version should have changed from Not Registered to Registered.

When we hover over the status, we can review the model registry details, which include the model group name, model registry account ID, and approval status. Right after registration, the status changes to Pending Approval, which means that this model is registered in the model registry but is pending review and approval from a data scientist or MLOps team member and can only be deployed to an endpoint if approved.

Now let’s navigate to Amazon SageMaker Studio and assume the role of an MLOps team member. Under Models in the navigation pane, choose Model registry to open the model registry home page.

We can see the model group canvas-Churn-Prediction-Model that Canvas automatically created for us.

Choose the model to review all the versions registered to this model group and then review the corresponding model details.

If you open the details for version 1, we can see that the Activity tab keeps track of all the events happening on the model.

On the Model quality tab, we can review the model metrics, precision/recall curves, and confusion matrix plots to understand the model performance.

On the Explainability tab, we can review the features that influenced the model’s performance the most.

After we have reviewed the model artifacts, we can change the approval status from Pending to Approved.

We can now see the updated activity.

The Canvas business user will now be able to see that the registered model status changed from Pending Approval to Approved.

As the MLOps team member, because we have approved this ML model, let’s deploy it to an endpoint.

In Studio, navigate to the model registry home page and choose the canvas-Churn-Prediction-Model model group. Choose the version to be deployed and go to the Settings tab.

Browse to get the model package ARN details from the selected model version in the model registry.

Open a notebook in Studio and run the following code to deploy the model to an endpoint. Replace the model package ARN with your own model package ARN.

from sagemaker import ModelPackage
from time import gmtime, strftime
import boto3
import os
import pandas as pd
import sagemaker
import time
from datetime import datetime
from sagemaker import ModelPackage

boto_session = boto3.session.Session()
sagemaker_client = boto_session.client("sagemaker")
sagemaker_session = sagemaker.session.Session(
boto_session=boto_session, sagemaker_client=sagemaker_client
)
role=(sagemaker.get_execution_role())
model_package_arn = 'arn:aws:sagemaker:us-west-2:1234567890:model-package/canvas-churn-prediction-model/3'
model = ModelPackage(role=role,
model_package_arn=model_package_arn,
sagemaker_session=sagemaker_session)
model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')

After the endpoint gets created, you can see it tracked as an event on the Activity tab of the model registry.

You can double-click on the endpoint name to get its details.

Now that we have an endpoint, let’s invoke it to get a real-time inference. Replace your endpoint name in the following code snippet:

import boto3, sys

endpoint_name = '3-2342-12-21-12-07-09-075'
sm_rt = boto3.Session().client('runtime.sagemaker')

payload = [163, 806, 'no', 'yes', 300, 8.162204, 3, 3.933, 2.245779, 4, 6.50863, 6.05194, 5, 4.948816088, 1.784764, 2, 5.135322, 8]
l = ",".join([str(p) for p in payload])

response = sm_rt.invoke_endpoint(EndpointName=endpoint_name, ContentType='text/csv', Accept='text/csv', Body=l)

response = response['Body'].read().decode("utf-8")
print(response)

Clean up

To avoid incurring future charges, delete the resources you created while following this post. This includes logging out of Canvas and deleting the deployed SageMaker endpoint. Canvas bills you for the duration of the session, and we recommend logging out of Canvas when you’re not using it. Refer to Logging out of Amazon SageMaker Canvas for more details.

Conclusion

In this post, we discussed how Canvas can help operationalize ML models to production environments without requiring ML expertise. In our example, we showed how an analyst can quickly build a highly accurate predictive ML model without writing any code and register it to the model registry. The MLOps team can then review it and either reject the model or approve the model and initiate the downstream CI/CD deployment process.

To start your low-code/no-code ML journey, refer to Amazon SageMaker Canvas.

Special thanks to everyone who contributed to the launch:

Backend:

  • Huayuan (Alice) Wu
  • Krittaphat Pugdeethosapol
  • Yanda Hu
  • John He
  • Esha Dutta
  • Prashanth

Front end:

  • Kaiz Merchant
  • Ed Cheung

About the Authors

Janisha Anand is a Senior Product Manager in the SageMaker Low/No Code ML team, which includes SageMaker Autopilot. She enjoys coffee, staying active, and spending time with her family.

Krittaphat Pugdeethosapol is a Software Development Engineer at Amazon SageMaker and mainly works with SageMaker low-code and no-code products.

Huayuan(Alice) Wu is a Software Development Engineer at Amazon SageMaker. She focuses on building ML tools and products for customers. Outside of work, she enjoys the outdoors, yoga, and hiking.

Read More