Easy Machine Learning for On-Device Audio

Posted by Luiz GUStavo Martins, Developer Advocate

At Google I/O, we shared a set of tutorials to help you use machine learning on audio. In this blog post you’ll find resources to help you develop and customize an audio classification model for your app, and a couple of real world examples for inspiration.

GIF of dog with audio waves picking up sound

Machine learning for audio

Sound and audio are sometimes used interchangeably, but they have a key difference. Sound is in essence what you can hear while audio is the sound’s electronic representation. That’s why we usually use the term audio when talking about machine learning.

Machine Learning for audio can be used to:

  • Understand speech
  • Understand musical instruments
  • Classify events (which bird is that?)
  • Detect pitch
  • Generate music

In this post we will focus on audio classification of events, a common scenario in practice with many real world applications like NOAA creating a humpback whale acoustic detector, and the Zoological Society of London using audio recognition to protect wildlife.

A number of classification models are available for you to try right now on TensorFlow Hub (YAMNet, Whale detection).

Audio recognition can also run completely on-device. For example, Android has a sound notifications feature that provides push notification for important sounds around you. It can also detect which music is playing, or even help with an ML-powered audio recorder app that can transcribe conversations on-device.

Having the models is only the beginning. Now you might ask:

  • How do I use them on my app?
  • How do I customize them for my audio use case?

Deploying machine learning models on-device

Imagine you have an audio classification model ready, such as a pretrained one from TF-Hub, how would you use this in a mobile app? To help you integrate audio classification into your app we created the TensorFlow Lite Task Library. The Audio Classifier component was released and you only need a couple of lines of code to add audio classification to your application:

// Initialization
val classifier = AudioClassifier.createFromFile(this, modelPath)

// Start recording
val record = classifier.createAudioRecord()
record.startRecording()

// Load latest audio samples
val tensor = classifier.createInputTensorAudio()
tensor.load(record);

// Run inference
val output = classifier.classify(tensor)

The library takes care of loading the model to memory, to create the audio recorder with the proper model specifications (sample rate, bit rate) and the classification method to get the model’s inference results. Here you can find a full sample to get some inspiration.

Customizing the models

What if you need to recognize audio events that are not in the set provided by the pretrained models? Or if you need to specialize them to fewer classes? In these situations, you need to fine tune the model using a technique called Transfer Learning.

This is a very popular process and you don’t need to be an expert on machine learning to be able to do it. You can use Model Maker to help you with this.

spec = audio_classifier.YamNetSpec()
data = audio_classifier.DataLoader.from_folder(spec, DATA_DIR)

train_data, validation_data = data.split(0.8)
model = audio_classifier.create(train_data, spec, validation_data)

model.export(models_path)

You can find complete code here. The output model can be directly loaded by the Task Library. And Model Maker can customize models not only for audio but also for image, text and recommendation system

Summary

Machine learning for audio is an exciting field and with many possibilities, enabling many new features. Doing ML on-device is getting easier and faster with tools like TensorFlow Lite Task Library and customization can be done without expertise in the field with Model Maker.

You can learn more about it on our new On-Device Machine Learning website (the audio path is here). You’ll find tutorials, codelabs and lots of resources on how to do not only audio related tasks but also for image (classification, object detection) and text (classification, entity extraction, question and answer)

You can share with us what you build by adding #TensorFlow on your social network post with your project, or submit it for the TensorFlow community spotlight program. And if you have any questions, you can ask them on discuss.tensorflow.org.

Read More