Posted by Khanh LeViet, TensorFlow Developer Advocate
TensorFlow Lite is the official framework for running TensorFlow models on mobile and edge devices. It is used in many of Google’s major mobile apps, as well as applications by third-party developers. When deploying TensorFlow Lite models in production, you may come across situations where you need some support features that are not provided out-of-the-box by the framework, such as:
- over-the-air deployment of TensorFlow Lite models
- measure model inference speed in user devices
- A/B test multiple model versions in production
In these cases, instead of building your own solutions, you can leverage Firebase to quickly implement these features in just a few lines of code.
Firebase is the comprehensive app development platform by Google, which provides you infrastructure and libraries to make app development easier for both Android and iOS. Firebase Machine Learning offers multiple solutions for using machine learning in mobile applications.
In this blog post, we show you how to leverage Firebase to enhance your deployment of TensorFlow Lite models in production. We also have codelabs for both Android and iOS to show you step-by-step of how to integrate the Firebase features into your TensorFlow Lite app.
Deploy model over-the-air instantly
You may want to deploy your machine learning model over-the-air to your users instead of bundling it into your app binary. For example, the machine learning team who builds the model has a different release cycle with the mobile app team and they want to release new models independently with the mobile app release. In another example, you may want to lazy-load machine learning models, to save device storage for users who don’t need the ML-powered feature and reduce your app size for faster download from Play Store and App Store.
With Firebase Machine Learning, you can deploy models instantly. You can upload your TensorFlow Lite model to Firebase from the Firebase Console. You can also upload your model to Firebase using the Firebase ML Model Management API. This is especially useful when you have a machine learning pipeline that automatically retrains models with new data and uploads them directly to Firebase. Here is a code snippet in Python to upload a TensorFlow Lite model to Firebase ML.
# Load a tflite file and upload it to Cloud Storage.
source = ml.TFLiteGCSModelSource.from_tflite_model_file('example.tflite')
# Create the model object.
tflite_format = ml.TFLiteFormat(tflite_source=source)
model = ml.Model(display_name="example_model", model_format=tflite_format)
# Add the model to your Firebase project and publish it.
new_model = ml.create_model(model)
ml.publish_model(new_model.model_id)
Once your TensorFlow Lite model has been uploaded to Firebase, you can download it in your mobile app at any time and initialize a TensorFlow Lite interpreter with the downloaded model. Here is how you do it on Android.
val remoteModel = FirebaseCustomRemoteModel.Builder("example_model").build()
// Get the last/cached model file.
FirebaseModelManager.getInstance().getLatestModelFile(remoteModel)
.addOnCompleteListener { task ->
val modelFile = task.result
if (modelFile != null) {
// Initialize a TF Lite interpreter with the downloaded model.
interpreter = Interpreter(modelFile)
}
}
Measure inference speed on user devices
There is a diverse range of mobile devices available in the market nowadays, from flagship devices with powerful chips optimized to run machine learning models to cheap devices with low-end CPUs. Therefore, your model inference speed on your users’ devices may vary largely across your user base, leaving you wondering if your model is too slow or even unusable for some of your users with low-end devices.
You can use Performance Monitoring to measure how long your model inference takes across all of your user devices. As it is impractical to have all devices available in the market for testing in advance, the best way to find out about your model performance in production is to directly measure it on user devices. Firebase Performance Monitoring is a general purpose tool for measuring performance of mobile apps, so you also can measure any arbitrary process in your app, such as pre-processing or post-processing code. Here is how you do it on Android.
// Initialize a Firebase Performance Monitoring trace
val modelInferenceTrace = firebasePerformance.newTrace("model_inference")
// Run inference with TensorFlow Lite
interpreter.run(...)
// End the Firebase Performance Monitoring trace
modelInferenceTrace.stop()
Performance data measured on each user device is uploaded to Firebase server and aggregated to provide a big picture of your model performance across your user base. From the Firebase console, you can easily identify devices that demonstrate slow inference, or see how inference speed differs between OS versions.
A/B test multiple model versions
When you iterate on your machine learning model and come up with an improved model, you may feel very eager to release it to a production right away. However, it is not rare that a model may perform well on test data but fail badly in production. Therefore, the best practice is to roll out your model to a smaller set of users, A/B test it with the original model and closely monitor how it affects your important business metrics before releasing it to all of your users.
Firebase A/B Testing enables you to run this kind of A/B testing with minimal effort. The steps required are:
- Upload all TensorFlow Lite model versions that you want to test to Firebase, giving each one a different name.
- Setup Firebase Remote Config in the Firebase console to manage the TensorFlow Lite model name used in the app.
- Update the client app to fetch TensorFlow Lite model name from Remote Config and download the corresponding TensorFlow Lite model from Firebase.
- Setup A/B testing in the Firebase console.
- Decide the testing plan (e.g. how many percent of your user base to test each model version).
- Decide the metric(s) that you want to optimize for (e.g. number of conversions, user retention etc.).
Here is an example of setting up an A/B test with TensorFlow Lite models. We deliver each of two versions of our model to 50% of our user base and with the goal of optimizing for multiple metrics. Then we change our app to fetch the model name from Firebase and use it to download the TensorFlow Lite model assigned to each device.
val remoteConfig = Firebase.remoteConfig
remoteConfig.fetchAndActivate()
.addOnCompleteListener(this) { task ->
// Get the model name from Firebase Remote Config
val modelName = remoteConfig["model_name"].asString()
// Download the model from Firebase ML
val remoteModel = FirebaseCustomRemoteModel.Builder(modelName).build()
val manager = FirebaseModelManager.getInstance()
manager.download(remoteModel).addOnCompleteListener {
// Initialize a TF Lite interpreter with the downloaded model
interpreter = Interpreter(modelFile)
}
}
After you have started the A/B test, Firebase will automatically aggregate the metrics on how your users react to different versions of your model and show you which version performs better. Once you are confident with the A/B test result, you can roll out the better version to all of your users with just one click.
Next steps
Check out this codelab (Android version or iOS version) to learn step by step how to integrate these Firebase features into your app. It starts with an app that uses a TensorFlow Lite model to recognize handwritten digits and show you:
- How to upload a TensorFlow Lite model to Firebase via the Firebase Console and the Firebase Model Management API.
- How to dynamically download a TensorFlow Lite model from Firebase and use it.
- How to measure pre-processing, post processing and inference time on user devices with Firebase Performance Monitoring.
- How to A/B test two versions of a handwritten digit classification model with Firebase A/B Testing.
Acknowledgements
Amy Jang, Ibrahim Ulukaya, Justin Hong, Morgan Chen, Sachin KotwaniRead More