Wireless Speech Recognition ..

Speech recognition is now primarily wireless; We've migrated fast, to universal wireless access-communcation devices.

Often, the speech recognition is remote based - And the better signal we send it, the better it performs.

Here, we hope you'll find ideas, technology or projects using hands free and/or mobile devices to make wireless speech recognition a rewarding and useful universal tool!

Wednesday, February 06, 2008

Dynamic advertising via speech recognition in videos

 
 Cheap shots are often fired at Microsoft for being a little behind the "cutting edge" curve.

 Not so, as we well know; And the world's leader in speech recognition technology has ventured into a huge leap forward yet again.. Dynamically served contextual advertising, via automated speech recognition, on the spoken audio in videos, from Microsoft's adCenter Labs' technology.

 This technology enables ads to be rendered based on the contextual spoken audio in a video. For example, if the topic of the video was gardening, ads related to gardening or lawn improvement could be served in an adjacent text-based ad as the video played. For advertisers, this provides new & unique access to consumers.

 The YouTube video below makes a "video worth a thousand words.."




  ..Wow....What else could we say?

 

Labels: , , ,

Wednesday, December 12, 2007

MIT's Browsing through speech inside videos

 
 MIT's new CSAIL (Computer Science and Artificial Intelligence Laboratory) "Lecture Browser" may be raising the bar on searching the spoken audio in videos, for indexing. In fact, it's receiving over 20,000 hits per day - and it is to date only indexing lectures.

 Originally funded by Microsoft and first announced in August, the Lecture Browser offers results in either video or audio timeline sections, the section containing the search term is highlighted, and snippets of surrounding text are displayed. The searcher can also "jump" to the relevant section of the video directly from the index, as well.

 · MIT Lecture Browser screenshot · 



There are some impressive features built into this rather advanced application.

  • Optimized Speech Transcription:
    • The speech recognition has been trained and configured to accurately transcribe accented speech, using short snippets of recorded speech spoken under various accents.

  • Accurate recognition of uncommon words
    • A massive vocabulary has been trained into the system's lexicon, allowing it to recognize extremely uncommon scientific terms, et al


  • The system includes software designed by MIT, that segregates long strings of sentences with common topics into high-level concepts.
    • "Topical transitions are very subtle," says Regina Barzilay, professor of Computer Science at MIT. "Lectures aren't like normal text."
       The software takes (approx) 100-word blocks of text and compares them to calculate the number of overlapping words shared between the text blocks. High repetitions of key terms are given more weight, and chunks with the highest rate of similar words are grouped together.

MIT's efforts to optimize the user experience are on-going. In the future, users will have the ability to contribute transcript corrections much like the "Wikipedia process", further improving transcription accuracy.

  Even more impressive: MIT's plans include the ability for the system to learn from these corrections, as they propogate to other transcribed lectures.

A more comprehensive overview can also be read here.
 

Labels: , , , , , ,

Saturday, December 08, 2007

 
 In an interview with Everyzing's CEO Tom Wilde, he discusses a valuable step forward in letting Web users find the videos that they are interested in - using speech recognition to parse and publish the spoken audio streams, of videos that are posted to the Web.

 This powerful, useful implementation opens up "discoverability of multimedia within a Web search" says Wilde, and he's very correct. No longer does a searcher only find results centered on the just the metadata of a video, but the actual audio content of that video, as well.

 Everyzing's 2-part speech recogntion technology first turns the audio of videos into text; then it analyzes, extracts and indexes key terms, entities and concepts within the text. This enables new multimedia category indexes containing search related terms that may only appear briefly or just by mention inside any video's spoken audio!

 The recognition, analyzation & indexing process's core is from BBN Technology. Everyzing is combining it's Byblos engine, and BBN's two Information extraction from Speech and Information extraction from text technologies. BBN is a leader in speaker-independent recognition accuracy for speech in different environments, including telephony and broadcast news.

 CEO Tom Wilde has been frank about the present shortcomings; He notes the accuracy drops (understandably) against background music and/or multiple speakers. But for the infotainment & news markets he's targeting right now, the technology should offer a significant improvement over what's currently available, he says. "I think we'll look back in a couple of years and say, 'Of course the content of multimedia files needs to be searchable,'" says Wilde. We agree!

 A comprehensive look at the core technology by Technology Review can be found here, as well.
 

Labels: , , , ,

eMicrophones

Promote Your Page Too