<< | >> Wireless Speech Recognition <<

MIT's new CSAIL (Computer Science and Artificial Intelligence Laboratory) "Lecture Browser" may be raising the bar on searching the spoken audio in videos, for indexing. In fact, it's receiving over 20,000 hits per day - and it is to date only indexing lectures.

Originally funded by Microsoft and first announced in August, the Lecture Browser offers results in either video or audio timeline sections, the section containing the search term is highlighted, and snippets of surrounding text are displayed. The searcher can also "jump" to the relevant section of the video directly from the index, as well.

There are some impressive features built into this rather advanced application.

Optimized Speech Transcription:
- The speech recognition has been trained and configured to accurately transcribe accented speech, using short snippets of recorded speech spoken under various accents.
Accurate recognition of uncommon words
- A massive vocabulary has been trained into the system's lexicon, allowing it to recognize extremely uncommon scientific terms, et al

The system includes software designed by MIT, that segregates long strings of sentences with common topics into high-level concepts.
- "Topical transitions are very subtle," says Regina Barzilay, professor of Computer Science at MIT. "Lectures aren't like normal text."
  The software takes (approx) 100-word blocks of text and compares them to calculate the number of overlapping words shared between the text blocks. High repetitions of key terms are given more weight, and chunks with the highest rate of similar words are grouped together.

MIT's efforts to optimize the user experience are on-going. In the future, users will have the ability to contribute transcript corrections much like the "Wikipedia process", further improving transcription accuracy.

Even more impressive: MIT's plans include the ability for the system to learn from these corrections, as they propogate to other transcribed lectures.

A more comprehensive overview can also be read here.

Labels: accented speech, browsing, MIT, speech recognition, transcription, transcription learning, Videos