
Lip-reading could be an extremely difficult task for a human, but with artificial intelligence, it could be a lot easier to discern speech. Researchers from Google’s DeepMind and the University of Oxford have managed to create a lip-reading AI that defeats the professionals in recognizing speech.
According to the research paper published this month, the lip-reading AI was fed in total 118,000 sentences from six different TV programs, including BBC Breakfast, Newsnight, and Question Time.
The system correctly deciphered the entire sentences just by looking at each speaker’s lips, defeating a professional lip-reader. The lip-reading AI annotated 46.8 percent of all words with no error, while the professional lip-reader annotated only 12.4 percent.
“It’s a big step for developing fully automatic lip-reading systems,” says Ziheng Zhou at the University of Oulu in Finland. “Without that huge data set, it’s very difficult for us to verify new technologies like deep learning.”
Two weeks ago, a team from the University of Oxford’s Department of Computer Science has developed a similar deep learning system called LipNet, which outperformed human lip-reader on a lip-reading data set called GRID. However, GRID only contains a vocabulary of 51 unique words, where BBC data set contains a vocabulary of nearly 17,500 unique words. It means it was quite a bigger challenge for Google’s lip-reading AI.
The applications of such AI’s are potentially useful. Video calling will be a lot easier even if you’re in a crowded room or on a busy road. You may not have to shout on Siri all the time; the digital assistant will read your lips. Also, according to Zhou, lip-reading AI could be used in consumer devices to help the customers find out what the company is trying to say.