Wireless Speech Recognition ..

Speech recognition is now primarily wireless; We've migrated fast, to universal wireless access-communcation devices.

Often, the speech recognition is remote based - And the better signal we send it, the better it performs.

Here, we hope you'll find ideas, technology or projects using hands free and/or mobile devices to make wireless speech recognition a rewarding and useful universal tool!

Wednesday, July 30, 2008

Robotic tongues improving speech recognition..

↑ top

The article we found today, can be read here.

But, the video below kind of "tells the story"!

Pretty cool... Eh?

Labels: animatronic tongue, Anton, robots, speech recognition

Wednesday, June 25, 2008

A speech-driven PIM on steroids, from Speereo!

↑ top

Speereo's PR Manager, who left a kind comment to our last posting about their super-cool Voice Translator has alerted us to another product, titled "Sapie", which has received rather glowing reviews by 3rd parties.

We haven't tested it yet, but here is a pretty comprehensive overview/review of Sapie, that is well worth reading.

In the meantime.. Take a look at this performance comparison chart!

We hope to be speaking with Speereo's PR Manager, Gleb Klimshin fairly soon and he's promised to discuuss some Wireless Speech Programs they've developed..If they are up to par with Speereo's other products..

Labels: Mobile PIM, performance, Sapie, speech recognition, Speereo

Saturday, June 07, 2008

Speech recognition's "mainstream", in unusual fields

↑ top

Cited primarily from an article at http://computer.getmash.net/;

Speech recognition has long languished in the no-man’s land between sci-fi fantasy (”Computer, engage warp drive!”) and everyday usage reality.

But that’s changing fast, as advances in computing power, artificial intelligence, powerful API's & newly available WSR Macros, make speech recognition the next powerful step for everyday use by "non-geek" users, user-interface design and now electronic voice-based security.

As to voice-based security: A whole host of highly advanced speech technologies, including emotion and lie detection, are moving from the lab to the marketplace.

This not a new technology,” says Daniel Hong, an analyst at Datamonitor who specializes in speech technology. “But it took a long time for Moore’s Law to make it viable.”

Mr. Hong estimates at the speech technology market is worth more than $2 billion, with plenty of growth in embedded and network apps.

And it’s about time. Speech recognition's technology has been around since the 1950s, but only recently have computer processors and accompanying artificial intelligence become powerful enough to handle the complex algorithms required to recognize our speech, both local & remote, improve our lives & productivity, and open our eyes to the long tail of speech recognition fields.

A few examples:
There are already several capable voice-controlled technologies on the market. You can issue spoken commands to devices like Motorola’s Mobile TV DH01n, a mobile TV with navigation capabilities, and a host of telematics GPS devices. Microsoft recently announced a deal to slip voice-activation software into cars manufactured by Hyundai and Kia, and its TellMe division is investigating voice-recognition applications for the iPhone. And Indesit, Europe’s second-largest home appliances manufacturer, just introduced the world’s first voice-controlled oven.

Yet as promising as this year’s crop of specch-controlled devices are, they’re just the beginning.

Speech technology comes in several flavors, including the speech recognition that drives voice-activated mobile devices; network systems that power IVR's using automated speech recognition, the unequalled desktop Vista Speech recognition, now with available macros {which we use to post & write articles) and the long-standing the standard in the Healthcare industry, the highly impressive network-based Philips SpeechMagic systems.

Voice biometrics (the true technical description of the often mis-used phrase "voice recognition") is a particularly hot area. Every individual has a unique voice print that is determined by the physical characteristics of his or her vocal tract. By analyzing speech samples for telltale acoustic features, voice biometrics can verify a speaker’s identity either in person or over the phone, without the specialized hardware required for fingerprint or retinal scanning.

The technology can also have unanticipated consequences. When the Australian social services agency Centrelink began using voice biometrics to authenticate users of its automated phone system, the software started to identify welfare fraudsters who were claiming multiple benefits — something a simple password system could never do.

The Federal Financial Institutions Examination Council has issued guidance requiring stronger security than simple ID and password combinations, which is expected to drive widespread adoption of voice verification by U.S. financial institutions in coming years. Ameritrade, Volkswagen and European banking giant ABN AMRO all employ voice-authentication systems already.

Advanced voice-recognition systems that can tell if a speaker is agitated, anxious or lying are also in the pipeline.

Computer scientists (e.g. at Carnegie Mellon) have already developed software that can identify emotional states and even truthfulness by analyzing acoustic features like pitch and intensity, and lexical ones like the use of contractions and particular parts of speech. And they are honing their algorithms using the massive amounts of real-world speech data collected by call centers and free 411 speech-driven services such as the extremely popular Goog411.

A reliable, speech-based lie detector would be a boon to law enforcement and the military. But broader emotion detection could be useful as well. Our host company which developed the now-standard Law Enforcement "Mobile Prosecutor" is presently experimenting with embedding it with voice-stress analysis.

In another example, a virtual call center agent that could sense a customer’s mounting frustration and route her to a live agent would save time, money and customer loyalty.

“It’s not quite ready, but it’s coming pretty soon,” says James Larson, an independent speech application consultant who co-chairs the W3C Voice Browser Working Group.

Companies like Autonomy eTalk claim to have functioning anger and frustration detection systems already, but experts are skeptical. According to Julia Hirschberg, a computer scientist at Columbia University, “The systems in place are typically not ones that have been scientifically tested.”

According to Hirschberg, lab-grade systems are currently able to detect anger with accuracy rates in “the mid-70s to the low 80s.”

They are even better at detecting uncertainty, which could be helpful in automated training contexts. (Imagine a computer-based tutorial that was sufficiently savvy to drill you in areas you seemed unsure of.)

Lie detection via voice stress & syntax-pattern deviation analysis is a tougher nut to crack, but progress is being made.

In a study funded by the National Science Foundation and the Department of Homeland Security, Hirschberg and several colleagues used software tools developed by SRI to scan statements that were known to be either true or false. Scanning for 250 different acoustic and lexical cues, “We were getting accuracy maybe around the mid- to upper-60s,” she says.

That may not sound so hot, but it’s a lot better than the commercial speech-based lie detection systems currently on the market. According to independent researchers, such “voice stress analysis” systems are no more reliable than a coin-toss.

It may be awhile before industrial-strength emotion and lie detection come to a call center near you. But make no mistake: They are just around the proverbial corner. And they will be preceded by a mounting tide of gadgets that you can talk to, argue with and intelligently discuss topics with.

Don’t be surprised if, some day soon, your Bluetooth headset tells you to calm down. Or informs you that your last caller was lying through his teeth.

Now that Windows Speech Recognition Macros for Windows Vista™ are in feverish development, both in-house (Microsoft Speech Components Group [ listen_+at+_microsoft.com ], and the beta group inside the Microsoft Speech Yahoo Technical Group) - desktop speech recognition is advancing daily by leaps and bounds, literally.

Powerful WSR macros that can, for example:

Open e-mail messages from a specific (non-Inbox) account with TO: / CC: / BCC: and Subject: fields already completed;

Macros that can move large blocks of extant text in and out of specific locations inside different applications;

Navigate & move items in and out of various folders inside Vista Explorer;

Spoken database lookups

are already evolving and being used & improved daily. It will not be long before speech recognition becomes "what we just use" for most of our daily work & living activities..

(A detailed post on the powerful new WSR Macro Tool & evolving macros is coming soon; We're gathering data, useful macros and research to be sure it is both interesting & useful to all types of speech recognition users)

Labels: artificial intelligence, automated speech recognition, Columbia University, Daniel Hong, Datamonitor, Macros, remote speech recognition, speech recognition, Voice biometrics, Windows Vista

Wednesday, June 04, 2008

Speech-Recognition, translation style!

↑ top

Speereo, a leading developer for voice recognition software offers a new "Free UEFA Euro 2008 Guide" complete with its popular application called Speereo Voice Translator.

· Click to visit the ytranslation spoftware's website ··

Speereo Voice Translator is a perfect solution for travelers and understands every spoken word. Once the user pronounces a phrase in his native language, Speereo Voice Translator immediately reads back the same phrase in one of 14 languages included into an application. It is designed for devices that have Windows Mobile (Pocket PC and Smartphones) or Symbian OS installed.

Via the Speereo web page:
"Speereo Voice Translator is available in two versions: multilanguage and two language. Multilanguage version of Speereo Voice Translator for business communication and traveling, running on smartphones and Pocket PCs, is an innovative phrasebook that provides translation among 14 languages: English, Spanish, German, French, Italian, Russian, Chinese Simplified, Chinese Traditional, Turkish, Portuguese, Korean, Japanese, formal Arabic, Finnish"

It seems speech recognition becomes more ubiquitous.. every week!

Labels: automated speech recognition, speech recognition, Speereo, translation

Monday, May 05, 2008

Speech recognition grows by 100% in the military

↑ top

According to a Businesswire.com article today;
Nuance's Dragon NaturallySpeaking® Medical software has seen a one hundred percent growth over the last year.

Via the businesswire.com article;
"Dragon Medical has extended its value into the Federal government through its seamless integration with AHLTA, the military’s electronic health record, to enable clinical documentation through accurate voice-to-text speech recognition. According to a survey performed across 17 distinct Army, Navy, Air Force and Marine Corps medical treatment facilities, 79.9 percent of those surveyed chose Dragon Medical as their preferred method for documenting care within AHLTA."

Craig Rohan, Staff Pediatrician at the Peterson Air Force base in Colorado is quoted as saying:
“Speech recognition technology allows for more thorough and expedited documentation of our patient encounters. Since using Dragon Medical I find that medical statements are more easily captured during routine clinic workflow. The comprehensive medical vocabulary Dragon Medical works off of ensures that the symptoms or diagnosis we say are correctly documented”
and
“More advanced users of Dragon Medical can take advantage of features such as 'macros', 'shortcuts' and other tools that further streamline documentation by producing a partially, or in some cases fully populated medical record when prompted by the command ‘normal study,’ for example. We’ve seen significant productivity improvements across those who have embraced speech recognition to document care.”

President Bush has set a goal for Americans to have an electronic health record by 2014. The DOD began AHLTA; an always available electronic health record for all active military, retirees and their families and expects to have it fully implemented by 2011.

Labels: AHLTA, Dragon Medical Software, Healthcare industry, speech recognition, US Military

Monday, April 21, 2008

More speech recognition enabled robots..

↑ top

The original REEM-A and the new REEM-B, being shown for the 1st time in Abu Dhabi April, 2008 offers speech recognition as well as other advanced robotic features! {Speech recognition featured in the last segments below}

More from the manufacturer can be read here

Labels: PAL Robotics, REEM-A, REEM-B, speech recognition

Sunday, April 13, 2008

Pioneer's LINC releases, with "intent" recognition

↑ top

Pioneer's Mobile Entertainment Division (Long Beach, Ca.) is releasing the promised AVIC-F500BT LINC (Lifestyle Innovation Network Console), a portable navigation and speech recognition unit.

· Click to see CNet's article in the LINC at CES.. ·

The LINC's main function: An in-dash GPS device with 1.2 million points of interest, an SD slot and it uses MSN for traffic, weather and gas price updates.

What's cool is the device incorporates Pioneer's "VoiceBox Conversational Voice Search Platform," a nicely developed speech recognition system that enables iPod or other MP3 players and voice control for you Bluetooth-connected phone.

VoiceBox's innovation is it's extraction algorihm that allow what Pioneer terms "conversational commands" and "intent recognition", and very advanced noise-canceling that deals quite well with ambient vehicle noise and the presence of extra voices.

The conversational element is its ability to deduce various forms of basic commands. "I want to hear the artist Herbie Hancock" or "Play Herbie Hancock" will produce the same result.
(** The LINC offers iPod-specific playback recognitions such as album name, playlist name or music genre.)

Pioneer's "conversational recognition" spreads it's wings with the ability to extract a relevant phrase from a inside a long utterance, that contains irrelevant words: "Uhh, play, hmm, let's see, that album Abbey Road". The unit has the capacity to ignore extra words it decides are superflous.

Pioneer's "Intent Recognition" is a artificial intelligence that reponds at a higher level to enhance the user's interactive abilities. With typical command & control, pre-defined, specific commands like "Call Phil Donnahue at home" or "Call Georg Bush on the mobile phone" are prerequisite. Pioneer's AI prompts for additional information, if it appears necessary for a positive recognition; E.G. "I have 2 numbers for James Caan - Home and Mobile. Which one would you like to call?"

An excellenc cnet.com video from CES 2008, of the LINC can be seen here.

Labels: artificial intelligence, command and control, conversational recognition, intent recognition, navigation, Pioneer, speech recognition, VoiceBox Conversational Voice Search Platform

Thursday, April 10, 2008

Speech recognition moves into Flight Simulators

↑ top

As of March 7th, the flight simulation game add-in First Officer released. First Officer is 100% command& control speech recognition, complete with a training interface, and spoken confirmation of commands!.

· Click to visit First Officer's website ·

Via their website:
"Gone are the days of having to read 45 minutes of text to a computer; speech
recognition technology has come a long way in the last 5 years.
Lengthy speech recognition training is no longer required and individual
commands can be trained when needed."

Labels: command and control, First Officer, Flight Simulation, speech recognition

Tuesday, April 08, 2008

Satoru Iehira - A study of Vista's Speech recognition

↑ top

Satoru Iehira works at the Japan Center for Persons with Disabilities, and he doesn't use his keyboard. At age 15, Satoru Iehira received severe cervical spinal cord injuries.

· Click here to view the original article at Microsoft.com! ·

"The operating system is easier to use. Speech recognition makes it so much more efficient than just using the keyboard," says Satoru, in reference to the Windows Vista™ Speech Recognition system. Using his wireless headset, Satoru performs all his work tasks everyday in Vista absent any keyboarding at all.

Via the Microsoft.com original article:
"Satoru especially likes the new mouse grid feature of Windows Speech Recognition in Windows Vista. With mouse grid, the computer screen is divided into a grid of nine, with each area numbered sequentially. The user selects an area by voicing the number, which then moves the cursor. The selected area is then further divided into a grid of nine, and the selection process is repeated in order to pinpoint the desired icon or button."

The complete article at Microsoft.com can be read here.
Its very cool to see real-world examples of Vista's speech recognition
working so well for the disabled...

Labels: disability, mouse grid, Satoru Iehira, speech recognition, Windows Vista

Thursday, April 03, 2008

Rob's Rhapsody Alert - 2 New Programming Videos

↑ top

Rob Chambers (Speech Program Manager for Microsoft) has posted a notice in his Rob's Rhapsody blog about two new videos in MSDN.
They are:

"How Do I: Get Started with Speech Recognition?
Will Depalo, Microsoft MVP, "explains the runtime and development requirements for doing speech development natively with Visual C++.
In addition he discusses create grammar files and how to compile them."

"How Do I: Use Speech Recognition in an Application?
Will explains how to use speech recognition in the applications that you build with Visual C++ so that your users can control them with their voice.
In addition, he explains how to set up the development environment and how to use the Platform SDK's grammar compiler."

The MSDN pages offer video and audio downloads in all formats as well as embedded Silverlight videos.

Labels: C++, MSDN, Rob Chambers, Rob's Rhapsody, speech recognition, Vista, Will DePalo

Tuesday, April 01, 2008

Navigating with speech, a la AT&T

↑ top

Via SlashPhone.com:
"AT&T announced today at CITA Wireless 2008, the immediate availability of its company-branded GPS-enabled navigation service, AT&T Navigator. The AT&T Navigator service features audible and visual turn-by-turn driving directions, including full-color moving maps, using GPS directly from your wireless phone. Working with TeleNav, the service is available on GPS-capable PDAs and handsets."

"The service also provides integrated speech recognition for address entry and points of interest search. You simply press a single button on your phone and speak the name of the business or address you want to find, and AT&T Navigator will provide voice and on-screen turn-by-turn directions to your destination. Integrated speech recognition is currently available on BlackBerry devices, but AT&T plans to make this feature, along with other value-added services, available on more handsets in 2008."

Cool. Very cool.

Labels: ATT, Blackberry, Navigator, speech recognition

Tuesday, March 25, 2008

Speak, Speak.. Shoot and Kill !l

↑ top

Good news for both speech recognition and virtual gun enthusiasts: “Tom Clancy’s Rainbow Six Vegas 2″ will be released this week. The game, according to a Gamepro.com review, casts players as a “highly trained anti-terrorist specialist who roams around with his squadmates, picking off evildoers with an impressive array of firepower.”

Fonix provides the speech technology that fuels the game’s voice command system — one of a handful of companies involved in the gaming vertical market.

Datamonitor analyst Daniel Hong says speech recognition will continue its upswing in gaming, because a) Headsets are standard features for the Xbox 360 and Playstation 3, and b) Far more powerful recent CPU's that can handle the cpu-hoggish speech interfaces.
"User interface [UI] design is paramount in gaming," Hong says. "Speech recognition provides differentiation and opens up UI choices for the gamer."

Labels: Fonix, gaming, headsets, Rainbow Six Vegas 2, speech recognition

Thursday, March 13, 2008

Bill Gates & speech recognition, once again!

↑ top

(Per the BostonHerald.com) In a speech to the Northern Virginia Technology Council, Gates speculated that some of the most important advances will come in the ways people interact with computers: speech-recognition technology, tablets that will recognize handwriting and touch-screen surfaces that will integrate a wide variety of information.

"I don’t see anything that will stop the rapid advance," Gates said, noting that technological change driven by academia and corporate researchers continued even after the Internet stock bubble burst in 2000.

Neither do we, and Bill has shown the global "Doubting Thomas" community just how effective speech recognition can be with Windows Vista™ ..
And if there's to be near-future advances, just imagine!

Labels: Bill Gates, rapid advances, speech recognition

Recognizing -unspoken- speech!

↑ top

Imagine thinking something, and having it turn into continuous speech.. without saying it.

It's not science fiction, it's not "vaporware".. Ambient Corporation of Champaign, Illinois recently unveiled an incredible breakthrough - a device that recognizes and responds to words you don't speak.

· Click here to view the Ambient technology page ·

The technology uses a neckband that picks up nerve signals directed to the user's vocal cords.
In the video below, Michael Callahan, co-founder of Ambient Corporation demonstrates the device, called the Audeo demonstrates the world's first "voiceless" phone call:

Even more incredible:
"I can still talk verbally at the same time," Callahan says. "We can differentiate between when you want to talk silently, and when you want to talk out loud." That could be useful in certain situations, he says, for example when making a private call while out in public.

The system demonstrated above can only recognize about 150 words and phrases - At the end of the year Ambient plans to release an improved version, able to recognize the individual phonemes that make up complete words and phrases - producing true continuous speech recognition.

Callahan also notes that the device doesn't have any risk of divulging inner thought; Callahan says producing signals for the Audeo to decipher requires "a level above thinking", users must think specifically about uttering specific speech for the Audeo to discern them.

Pardon the pun, but..
What will be thought up next?

Labels: Ambient, Audeo thought recognition, neckband, nerve signals, speech recognition

Tuesday, March 11, 2008

Speech recognition for iPods?

↑ top

Patrick Lor has posted a small blurb in his Entrepreneur In Action blog about a speech recognition device that "that will automatically categorize your music library", to be made by Ivoxx Corporation.

This sounds extremely cool, and we've emailed Ivoxx to learn more..

Labels: iPod, Ivoxx, sorting music, speech recognition

Wednesday, March 05, 2008

"Speech Recognition" improves for deaf kids

↑ top

The March 2008 issue of Otolaryngology – Head and Neck Surgery demonstrates deaf children with Cochlear implants experience a 59% variance of in later reading skills, and the data was collected by assessing the early speech perception/recognition and production performance!

The speech perception and production skills at the vowel, consonant, phoneme, and word level of 72 children with prelingual, profound hearing loss were assessed after 48 mos of Cochlear implant use.

The children’s reading skills were subsequently assessed using word and passage comprehension measures after an average of 89.5 mos of CI use. The results indicated that early speech perception and production skills of children with profound hearing loss who received advanced Cochlear implants predicts the future reading achievement skills...

Original paper's abstract is available here.

Labels: Cochlear implants, deaf children, reading skills, speech perception, speech recognition

Tuesday, March 04, 2008

A speech recognition tablet PC, from Motion PC

↑ top

Motion Computing, maker of the impressive C5 medical tablet computer, has released a new rugged mobile tablet PC, the F5.

It includes very good speech recognition; You can also catch a quick chuckle at Engadget, where they compare the handle-held device to that other bit of hardware brilliance, the infamous Speak N’ Spell.

Labels: Engadget, Motion PC, speech recognition, Tablet

Monday, March 03, 2008

Speech-driven "Anywhere-anytime" data entry..

↑ top

A research team with the Canadian National Research Council's Institute for Information Technology (NRC-IIT) is working on a way to enter data using speech; they've created a multimodal field data entry (MFDE) application to help with data collection during concrete inspections. The researchers say that if workers are are using instruments or taking measurements they can use speech to enter data or information at the same time.

· Click here for original NRC IIT article ·

While background noise was somewhat of a problem the researchers have gone back to the drawing board to identify a better microphone for the application. Even with the noise issue respondents claimed they were able to complete tasks faster while at the same time being able to be more aware of their environment.

They've worked on actual mobile microphone improvements; Their research won "Best Paper" at the prestigious British Human-Computer Interaction conference, HCI 2007 about microphone performance in noisy ambience.

They are also seeking the best speech engine for noisy environments:
Per NRC IIT:
"We also plan to look at different speech recognition engines to see if we can improve the accuracy at higher noise levels."

Labels: Construction Industry, Data Entry, Mobile Phones, speech recognition, wireless remote speech recognition

Monday, January 21, 2008

Speech Recognition 2007, By Stephen Potter

↑ top

Stephen Potter has written a "What a Year for Microsoft Speech Recognition" blog post that's worth every word.

· An Excellent Paper, on 'Speech Server Tuning', by Stephen Potter - 2004 ·

· Steve Balmer and Mike McCuue of tellMe Networks ·

Congratulations on a great article and the year's resounding success!

Labels: Microsoft, progressive speech recognition, speech recognition

Wednesday, December 12, 2007

MIT's Browsing through speech inside videos

↑ top

MIT's new CSAIL (Computer Science and Artificial Intelligence Laboratory) "Lecture Browser" may be raising the bar on searching the spoken audio in videos, for indexing. In fact, it's receiving over 20,000 hits per day - and it is to date only indexing lectures.

Originally funded by Microsoft and first announced in August, the Lecture Browser offers results in either video or audio timeline sections, the section containing the search term is highlighted, and snippets of surrounding text are displayed. The searcher can also "jump" to the relevant section of the video directly from the index, as well.

There are some impressive features built into this rather advanced application.

Optimized Speech Transcription:
- The speech recognition has been trained and configured to accurately transcribe accented speech, using short snippets of recorded speech spoken under various accents.
Accurate recognition of uncommon words
- A massive vocabulary has been trained into the system's lexicon, allowing it to recognize extremely uncommon scientific terms, et al

The system includes software designed by MIT, that segregates long strings of sentences with common topics into high-level concepts.
- "Topical transitions are very subtle," says Regina Barzilay, professor of Computer Science at MIT. "Lectures aren't like normal text."
  The software takes (approx) 100-word blocks of text and compares them to calculate the number of overlapping words shared between the text blocks. High repetitions of key terms are given more weight, and chunks with the highest rate of similar words are grouped together.

MIT's efforts to optimize the user experience are on-going. In the future, users will have the ability to contribute transcript corrections much like the "Wikipedia process", further improving transcription accuracy.

Even more impressive: MIT's plans include the ability for the system to learn from these corrections, as they propogate to other transcribed lectures.

A more comprehensive overview can also be read here.

Labels: accented speech, browsing, MIT, speech recognition, transcription, transcription learning, Videos