Lip reading AI smashes humans at interpreting silent sentences

Luke Dormehl
Digital Trends

Subscribe on YouTube

One of the most memorable parts of Stanley Kubrick’s sci-fi masterpiece 2001: A Space Odyssey is a plotline in which two members of the Discovery One spaceship crew grow increasingly suspicious about the behaviour of the ship’s AI assistant, HAL 9000.

Knowing that HAL is constantly listening to what they are saying, they retreat someplace they know HAL cannot listen and agree to disconnect him. HAL rumbles their plan after the two astronauts fail to take into account the AI’s superior lip-reading capabilities.

Futuristic stuff, eh? Not according to research carried out by investigators at Oxford University. They’ve developed an artificial intelligence program called LipNet, which is able to accurately interpret what people are saying, based purely on the way they move their mouth when speaking.

“LipNet performs lip-reading at the sentence-level using machine learning,” Brendan Shillingford, one of the researchers on the paper, told Digital Trends. “A neural network similar to state-of-the-art speech recognition models processes a sequence of video frames, mapping these to a sentence. Previous approaches worked by predicted individual words rather than sentences.”

More: Smartphone speech recognition can text 3 times faster than you can type

The performance of LipNet compares incredibly favorably to human lipreading experts on GRID corpus, the largest publicly-available sentence-level lipreading dataset. In fact, where human experts got just 52 percent, LipNet scored 93 percent. Its sentence-based approach to lip-reading also smashed the best previous attempt by a machine, which managed 79.6 percent accuracy on the same dataset.

However, while the fictitious HAL 9000 uses his lip-reading powers for no good, the team behind LipNet have other goals for their creation. Around 360 million people worldwide have disabling hearing loss. Tools like LipNet could be highly significant for these individuals, by helping to accurately interpret speech in a way that makes their lives easier.

“Other applications that we are interested in include silent dictation in public spaces, covert conversations, speech recognition in noisy environments, biometric identification, and silent-movie processing,” Shillingford continued.

While surveillance is going to be an issue with any technology like this, Nando de Freitas, who also worked on the project, said that it is not an application they have focused on. However, he said that it “would not be surprising” if other labs tried to build on such work for that purpose in the future.

“The public must be aware of this, and rely on our legal democratic institutions to establish appropriate laws that protect our privacy and dignity,” de Freitas continued. “It is our hope that by publishing this work, we help raise awareness, while still emphasizing the usefulness of this tech to help people in need.”