This story was delivered to BI Intelligence "Digital Media Briefing" subscribers. To learn more and subscribe, please click here.
Google’s DeepMind and the University of Oxford are working on a lip-reading system powered by artificial intelligence, New Scientist reports.
The AI system already outperforms professional lip-readers by a mile, opening doors to exciting new opportunities in consumer technology.
The two organizations applied a deep learning system to a large dataset of BBC programs. In total, the AI was trained with 118,000 sentences, using 5,000 hours of video from six TV shows that aired between January 2010 and December 2015. The system was then tested on live broadcasts between March and September 2016.
In a controlled test, the AI blew away professional (human) lip readers. Tasked with transcribing 200 randomly selected clips from the dataset, the professional correctly annotated just 12.4% of the words, compared to the AI which got 46.8% of all words correct. This AI system is also said to be more accurate than all other automated lip-reading systems.
This system is relevant to any context that uses speech recognition and a camera, such as:
- Adding speech recognition to hearing aids. Lip reading systems can be used to improve hearing aids by dubbing conversations in real-time. Around 20% of Americans suffer from hearing loss, according to the Hearing Loss Association of America. By age 65, one of three people has hearing loss. With the aging population, demand for hearing aids or lip-reading devices is only going to increase.
- Augmenting camera-equipped sunglasses. This technology could complement products like Spectacles, Snap's camera-equipped sunglasses. Anyone with this product would theoretically be able to receive full transcriptions of conversations in real-time, if they’re able to get a close enough look at the speaker’s lips. This could be useful in loud locations.
- Enabling silent dictation and voice commands. Another exciting use case for lip reading technology is allowing people to mouth commands to their devices in silence. In this scenario, user’s wouldn’t have to speak out loud to Siri anymore. It also opens the doors to visual passwords, because people's lips move differently. And a big reason consumers are reluctant to use voice assistants is because they're shy to speak out loud to their devices, especially in public.
To receive stories like this one directly to your inbox every morning, sign up for the Digital Media Briefing newsletter. Click here to learn more about how you can gain risk-free access today.