Posted by Thad Starner (Professor, Georgia Tech and Staff Research Scientist, Google), Sam Sepah (ML Research Program Manager), Manfred Georg (Software Engineer, Google), Mark Sherwood (Senior Product Manager, Google), Glenn Cameron (Product Marketing Manager, Google)
Over 70 million deaf people around the world use sign language to communicate. Collectively, they use more than 300 different sign languages worldwide. And over 1.5 billion people are affected by hearing loss globally. Most Deaf and Hard of Hearing people cannot use their voice to initiate a search or perform actions due to speech limitations. Additionally, the interfaces used by smart home devices and mobile platforms to respond to speech are generally audio based.
Signed languages are sophisticated systems of communication, each with a complete set of language features. On a surface level, handshapes along with four other “parameters” form the basis of signed communication. An open hand or a closed hand while making the same motion can completely change the meaning of a sign. Likewise, palm orientation, motion/contact, location, and non-manual markers (typically mouth movements and facial expressions) define individual signs. A number of grammatical constructs, some of which have no analog in spoken languages, allow a signer to produce complex phrases.
As we develop translation systems for American Sign Language (ASL) and other sign languages, it is natural to break apart various aspects of the language and attempt to perform tasks using those parts.
To that end, we’re excited to announce the release of one of the largest datasets of ASL fingerspelling and a Kaggle ML competition that will award $200k in prizes to ML engineers who develop the most accurate ASL fingerspelling recognition models using MediaPipe and TensorFlow Lite. The winning models will be open sourced to help developers add support for fingerspelling to their apps.
Watch These Hands (Kaggle remix) Performed by Sean Forbes, Co-Founder, Deaf Professional Arts Network |
Fingerspelling communicates words using hand shapes that represent individual letters. While fingerspelling is only a part of sign languages, it is often used for communicating names, addresses, phone numbers, names, and other information that is commonly entered on a mobile phone. Many Deaf smartphone users can fingerspell words faster than they can type on mobile keyboards. In fact, in our dataset, ASL fingerspelling of phrases averages 57 words per minute, which is substantially faster than the US average of 36 words per minute for an on screen keyboard. But, sign language recognition AI for text entry lags far behind voice-to-text or even gesture-based typing, as robust datasets didn’t previously exist.
Although fingerspelling is just a small part of sign languages, there are many reasons to produce systems which specifically focus on it, even while maintaining an ultimate goal of full translation. While fingerspelling at full speed (which can peak over 80 words per minute) the handshapes in the fingerspelling co-articulate together and entire words can become lexicalized into different shapes from their slowed down version. The resulting movements are visually among the fastest used in ASL, and thus stretch particular aspects of any visual recognition system which seeks to perform full translation.
Big Steps Forward
Google Research and the Deaf Professional Arts Network have worked together to create a massive fingerspelling dataset that we will release for this competition to help move sign language recognition forward. The dataset includes over 3 million fingerspelled characters produced by over 100 Deaf signers in the form of continuous phrases, names, addresses, phone numbers, and URLs. This signing was captured using the selfie camera of a smartphone with a variety of backgrounds and lighting conditions and is the largest dataset collection of its kind to date.
Large language models show increasing promise in a variety of language and speech tasks. Everything from chat agents to assistant technology is progressing at breathtaking speed. It is time to ensure that gesture and visual based systems also produce usable interfaces. Fingerspelling recognition models are part of this larger solution, which will address the widening gap in accessibility for Deaf and Hard of Hearing individuals.
How to Get Involved
Join the Kaggle competition today to help us make AI more accessible for the Deaf and hard of hearing community.