How heavy the machine learning through lip reading Deepmind to help the machine through the lipnet t

By , November 17, 2017 10:51 pm

How heavy the | machine learning through lip reading? DeepMind LipNet to help the machine to "see" understand what others say – Sohu technology selected from the machine of the heart in compilation: Wu Pan remembers the host computer Hall classic sci-fi movie "2001 Space Odyssey" in a spaceship? It has the ability to read the speaker’s lips to understand what they are saying, and this ability also plays a vital role in the development of the plot of the fantasy. Recently, University of Oxford, Google DeepMind and the Canadian Institute for advanced study (CIFAR) jointly issued a report also has the important value of the thesis, introduces the realization of the use of machine learning sentence level automatic lipreading LipNet. The technology will be advanced level automatic lipreading technology advancing to a hitherto unknown height. The original paper can be read at the end of the "reading the original" download. The lip (lipreading) refers to decode the text according to the task of speaker’s lip movement. The traditional approach is to divide the problem into two steps: designing or learning visual features, and forecasting. The depth of lipreading is the latest end to end training (Wand et al., 2016; Chung & Zisserman, 2016a). However, all existing methods can only perform classification of individual words, rather than sentence level sequence prediction. Studies have shown that humans in the longer discourse on lip reading performance will be better (Easton & Basala, 1982), which shows the importance of feature acquisition time background in the communication channel is not clear in the. By the observation of incentives, we propose LipNet – a video sequence can be mapped into variable length text classification model, the use of space time loss convolution, a LSTM loop network and connectionism (connectionist temporal classification loss), the model is completely based on end to end in the form of training the. We make full use of our knowledge, LipNet is the first model of lipreading sentence level, the use of a single end-to-end independent speaker depth model to simultaneously study visual features (spatiotemporal visual features) and a sequence model. On the GRID corpus, LipNet achieves 93.4% accuracy over the experienced human lip readers and the best accuracy of the previous 79.6%. 1 Introduction lipreading play a key role in human communication and speech understanding, this is called "McGurk effect (McGurk effect)" (McGurk & MacDonald, 1976), said that when a phoneme in a person’s voice in the video dubbing is another n someone said the相关的主题文章:

Comments are closed