Wireless Speech Recognition ..

Speech recognition is now primarily wireless; We've migrated fast, to universal wireless access-communcation devices.

Often, the speech recognition is remote based - And the better signal we send it, the better it performs.

Here, we hope you'll find ideas, technology or projects using hands free and/or mobile devices to make wireless speech recognition a rewarding and useful universal tool!

Wednesday, April 30, 2008

A comment about Fujitsu's DSR from a reader

↑ top

It's no secret we are propnents of Distributed Speech Recognition (DSR), and one of our readers posted an encouraging comment about the NTT DoCoMo F884i..

· Click to read about DoCoMo's technolgies ·

The comment from Mike:
"I spoke to President (and soon chairman) Suzuki of Advanced Media last week about DSR and the handset by Fujitsu uses the Fujitsu/AM DSR and not the ETSI Standard that has been implemented on other DoCoMo 905i series phones and all future FOMA phones in Japan. Very nifty indeed!"

It's great news to hear that DSR is gaining well deserved footholds!

Labels: Advanced Media, Distributed Speech Recognition, DoComo, F884i, Fujitsu

Garmin's nuvi 880 gets great recognition!

↑ top

The Garmin nuvi 880, the first navigation unit with full bragging rights to great speech recognition has garnered some glowing reviews from various websites.

· Click to read Garmin's original Press release ·

One particular review by PC Magazine's Craig Ellison gives some detailed results from a former Dragon speech recognition user, and all in all, our hats are off to a great product with real speech recognition!

Labels: Garmin, GPS, Interactive speech recognition, navigation

Monday, April 21, 2008

More speech recognition enabled robots..

↑ top

The original REEM-A and the new REEM-B, being shown for the 1st time in Abu Dhabi April, 2008 offers speech recognition as well as other advanced robotic features! {Speech recognition featured in the last segments below}

More from the manufacturer can be read here

Labels: PAL Robotics, REEM-A, REEM-B, speech recognition

Sunday, April 20, 2008

Microsoft's new Response Point partners..

↑ top

It's been sort of a "slow" week for speech recognition, but yesterday Microsoft announced (2) new SIP Partners for their cool new Response Point, speech recognition-enabled software-based PBX.

· Click to view Response Point web pages at Microsoft's site ·

The two firms, Cbeyond and New Global Telecom (NGT) will provide SIP services for the Microsoft Response Point digital voice/IP-VoIP PBX phone systems designed for for small businesses.

After completing extensive interoperability testing, NGT became the first certified, Microsoft recommended service provider offering industry-standard SIP phone services that work superbly with the Response Point systems.

CBeyond and NGT were selected as partners because of their nationwide coverage, high quality of voice service, and customer satisfaction. Each service provider will roll out unique partner programs for VARs to boost their ability to reach and service small businesses with Response Point.

Labels: automated speech recognition, Cbeyond, IP PBX, Microsoft Response Point, New Global Telecom, VoIP PBX

Monday, April 14, 2008

EveryZing rolls out RAMP

↑ top

EveryZing, a leader in the new technology of indexing the spoken audio inside video clips,today announces RAMP (Reach, Access, Monetization and Protection) that gives media companies control over how their content is discovered, distributed, presented and consumed.

· Click to read more about EveryZing's technology ·

For those who don't know about EveryZing's technology:
Via EzeryZing's website:

"EveryZing’s patented speech-to-text technology wraps every piece of audio and video from your site in a rich layer of metadata, including a full text output of the spoken word track."

and

"At the heart of EveryZing’s solutions is our core speech-to-text technology – the fruit of $100 million of government-funded research by BBN Technologies (inventor of the internet’s ubiquitous “@” symbol). EveryZing’s speech-to-text technology enables multimedia clips to be robustly indexed, increasing their “discoverability” by the web search engines and boosting online advertising opportunities. EveryZing uses its technology to ensure that every piece of audio and video from each client’s web site is wrapped in a rich layer of metadata, including a full text output of the spoken word track, so it can be searched and accessed easily and precisely by consumers, just like text… and, as a result, online advertisers can now place contextually relevant messages within and along side multimedia content, just like text."

More on RAMP, via their website:

"RAMP is EveryZing’s point-and-click web multimedia management console, giving you control over your content, context and brand. RAMP makes it easy for “infotainment” web site operators to control every aspect of the audio and video content on your site – from full-text indexing the clips, to publishing them in any format you choose, to monitoring visitor traffic. RAMP ensures reach, access, monetization and protection of your online audio and video.

Based in Cambridge, Mass., EveryZing is a pioneer in next-generation universal search technology and video search engine optimization (video SEO). EveryZing was originally founded by BBN Technologies, creators of the email @ symbol. The company’s core intellectual property and capabilities include speech-to-text technology and natural language processing.

EveryZing's automated speech recognition is impressive, to say the least; It even outputs line-by-line transcriptions. We've blogged about them before, and they continue to impress us as their technoloy evolves.

Labels: automated speech recognition, EveryZing, indexing videos, natural language processing, RAMP, speech-to-text, spoken audio transcription

DoCoMo's new handset; built-in speech recognition

↑ top

Japan's NTT DoCoMo Inc (NYSE: DCM) is releasing the FOMA "Raku-Raku Phone Premium" F884i mobile phone today, with built-in speech recognition for remote transcription of email text.

The handset, made by Fujitsu Ltd, contains new & proprietary technology to enter e-mail text using remote speech recognition. If the "voice input" button in the e-mail editing display is pressed, software that extracts the characteristics of the user's speech (and performs Analog-to-Digital conversion) will start, and access the DoCoMo's i-mode site.

When users say what they want transcribed into an email, the in-box software sends the dictation to the i-mode server. There, speech recognition software manufactured by Advanced Media Inc outputs transcribed text. The F884i receives the transcribed text, displays it in the email's text display interface.

As best we can tell, without a direct response from DoCoMo or Advanced Media - This appears to be DSR (Distributed Speech Recognition) which we've blogged about in the past; and we are tremendous fans of DSR as a global answer to near-perfect mobile speech recognition.

We've also emailed David Pearce, founder and Chief Developer of this emerging technology to see if he has any information on whether this may, in fact be DSR..
Check back later for details!

Labels: Distributed Speech Recognition, DoComo, Fujitsu, mobile speech recognition, remote speech recognition

Sunday, April 13, 2008

Pioneer's LINC releases, with "intent" recognition

↑ top

Pioneer's Mobile Entertainment Division (Long Beach, Ca.) is releasing the promised AVIC-F500BT LINC (Lifestyle Innovation Network Console), a portable navigation and speech recognition unit.

· Click to see CNet's article in the LINC at CES.. ·

The LINC's main function: An in-dash GPS device with 1.2 million points of interest, an SD slot and it uses MSN for traffic, weather and gas price updates.

What's cool is the device incorporates Pioneer's "VoiceBox Conversational Voice Search Platform," a nicely developed speech recognition system that enables iPod or other MP3 players and voice control for you Bluetooth-connected phone.

VoiceBox's innovation is it's extraction algorihm that allow what Pioneer terms "conversational commands" and "intent recognition", and very advanced noise-canceling that deals quite well with ambient vehicle noise and the presence of extra voices.

The conversational element is its ability to deduce various forms of basic commands. "I want to hear the artist Herbie Hancock" or "Play Herbie Hancock" will produce the same result.
(** The LINC offers iPod-specific playback recognitions such as album name, playlist name or music genre.)

Pioneer's "conversational recognition" spreads it's wings with the ability to extract a relevant phrase from a inside a long utterance, that contains irrelevant words: "Uhh, play, hmm, let's see, that album Abbey Road". The unit has the capacity to ignore extra words it decides are superflous.

Pioneer's "Intent Recognition" is a artificial intelligence that reponds at a higher level to enhance the user's interactive abilities. With typical command & control, pre-defined, specific commands like "Call Phil Donnahue at home" or "Call Georg Bush on the mobile phone" are prerequisite. Pioneer's AI prompts for additional information, if it appears necessary for a positive recognition; E.G. "I have 2 numbers for James Caan - Home and Mobile. Which one would you like to call?"

An excellenc cnet.com video from CES 2008, of the LINC can be seen here.

Labels: artificial intelligence, command and control, conversational recognition, intent recognition, navigation, Pioneer, speech recognition, VoiceBox Conversational Voice Search Platform

Thursday, April 10, 2008

Speech recognition moves into Flight Simulators

↑ top

As of March 7th, the flight simulation game add-in First Officer released. First Officer is 100% command& control speech recognition, complete with a training interface, and spoken confirmation of commands!.

· Click to visit First Officer's website ·

Via their website:
"Gone are the days of having to read 45 minutes of text to a computer; speech
recognition technology has come a long way in the last 5 years.
Lengthy speech recognition training is no longer required and individual
commands can be trained when needed."

Labels: command and control, First Officer, Flight Simulation, speech recognition

Wednesday, April 09, 2008

A terrific blog about digital dictation..

↑ top

We've received a comment from the bloggers at the Acappella Conference Audio Recorder Blog, to our recent post about Sony's new henheld recorders..

We visited their site, and were quite impressed with both the quality and quantity of the information we found there. Their apparent sponsors, Acappella, seem to have an impressive digital recorder, that even integrates with SharePoint!

Their website offers an extremely well-written & researched whitepaper.

Their Technology page notes that the Acappella digital recorder supports:

- Integration into many popular digital dictation packages

- Unlimited number of attendees

- Laptop and Desktop PCs

- Standalone and server installations

- Individual or batch submission

- Desktop and lapel microphones

  Here's an interesting feature:
"The Acappella playback assistant floats above your chosen word processing application and can be positioned anywhere on the screen as well as minimised for convenience."

  Very, very cool stuff.
  We hope to hear more from this obviously very advanced firm.

Labels: Acappella, Digital recorder, Digital Transcription, Microsoft, SharePoint, Transcription Technology

AT&T Navigator gets ho-hum user responses..

↑ top

AT&T launced it's "Navigator" service at Wireless CITA 2008 and it's off to a rocky start. The new service features interactive speech recognition which seems to work well, but the GPS and routing configurations are getting some complaints already in popular user forums..

It's a built-in feature for these AT&T handset models:

BlackJack™ by Samsung
BlackJack™ II by Samsung (Real-Time Traffic compatible)
HP iPAQ hw6510/6515
HP iPAQ hw 6920/hw6925
Palm® Treo™ 680
Palm® Treo™ 750
HTC Tilt 8925 (Real-Time Traffic compatible)
AT&T 8125
Nokia E62
BlackBerry® 8700c (Real-Time Traffic compatible)
BlackBerry® Pearl™ (Real-Time Traffic compatible)
BlackBerry® 8800 (Real-Time Traffic compatible)
BlackBerry® Curve™ 8300 (Real-Time Traffic compatible)
BlackBerry® Curve™ 8310 (Real-Time Traffic compatible)
Motorola L6
Motorola L7
Motorola v3i
Motorola v365
Motorola v557
Motorola MOTO Q™ 9h Global (Real-Time Traffic compatible)
Nokia 6682
Pantech Duo™
Sony Ericsson W300i
Sony Ericsson w600i
Sony Ericsson W810i
Sony Ericsson Z520a
Sony Ericsson Z525a

and is available through AT&T to these devices, as an account add-on:

Palm® Treo™ 650
AT&T 2125
HTC 3125
AT&T 8525

and/or other "AT&T Navigator approved mobile devices".

Navigator also offers updated estimated arrival time(s) and traffic information.. but those services also appear to have a few inherent bugs, too.

Another valid & common complaint we're seeing:
The product is obviously TeleNav, but Telenav refuses to support it and calls to AT&T offer no individual dept. for support, nor does AT&T Customer Service seem to be aware there is such a product, or where to direct calls.

Labels: ATT, GPS, Interactive speech recognition, Navigator

Jott enables dictated email replies, on Blackberry

↑ top

Jott (we've blogged about them before) has rolled out speech recognition to several Blackberry models, the 8800, 8300 and 8100 - that allows users to reply to emails by speech alone. It's a free beta download for now, and offers options for “Reply with Jott” and “Reply All with Jott.”

Jott also supplies users with a copy of the transcribed email, which can be pretty helpful in tweaking recognition accuracy and spotting/preventing (inevitable) transcription nuances.

The Jott web page can be viewed here.

Labels: Blackberry, mobile speech recognition. Jott, wireless remote speech recognition

Tuesday, April 08, 2008

Fluency Voice gets transcription accuracy patent

↑ top

Fluency Voice Technology announced on March 7th that they have been granted an interesting patent on tweaking recognizer accuracy.

· Click to see original patent press release ·

The patent revolves around sending speech to the recognizer multiple times, and each time distorted a bit differently. All the resulting recognitions are then compared; an evaluation is done to decide which result(s) is most likely to be correct - that one is returned to the application. Fluency says they have applied this technique to all leading speech recognizers.

Via the press release:
"Dr Trevor Thomas, the inventor and Chief Scientist at Fluency, stated 'This invention will deliver important improvements to recognition accuracy and will increase the performance of our spoken dialogue systems when compared to similar dialogue systems that just make conventional use of a speech recognizer'.

Interesting stuff. We're waiting on feedback from some of our trusted sources to see just how far-reaching they think this technology is, re: transcription accuracy of continuous speech!

Labels: Fluency Voice Technology, improving recognition accuracy, press release, recognizers

Sony announces enhanced Handheld Recorders

↑ top

Sony-Europe’s IT Peripherals division has announced seven new handheld recorders that declare its desires to dominate the digital dictation marketplace.

They each incorporate high audio fidelity, increased recording times, memory capacity and impressive playback features at nicely competitive price points to capture student, business and professional consumers.

The 'Professional' model numbers are the ICDSX68, ICDSX78 and ICDSX78DR9; the 'Business' model number is ICDUX60B and the 'Student' model numbers are ICDB600, ICDP620, ICDP630F.

Via the Sony-Europe press release page:
"The market for digital dictation machines is still very important, particularly with voice recognition software making transcription easier”, said Mikuni Shikada, product manager, Sony Europe's IT Peripherals division.

Labels: digital revolution, Handheld recorders, Sony, transcription

Satoru Iehira - A study of Vista's Speech recognition

↑ top

Satoru Iehira works at the Japan Center for Persons with Disabilities, and he doesn't use his keyboard. At age 15, Satoru Iehira received severe cervical spinal cord injuries.

· Click here to view the original article at Microsoft.com! ·

"The operating system is easier to use. Speech recognition makes it so much more efficient than just using the keyboard," says Satoru, in reference to the Windows Vista™ Speech Recognition system. Using his wireless headset, Satoru performs all his work tasks everyday in Vista absent any keyboarding at all.

Via the Microsoft.com original article:
"Satoru especially likes the new mouse grid feature of Windows Speech Recognition in Windows Vista. With mouse grid, the computer screen is divided into a grid of nine, with each area numbered sequentially. The user selects an area by voicing the number, which then moves the cursor. The selected area is then further divided into a grid of nine, and the selection process is repeated in order to pinpoint the desired icon or button."

The complete article at Microsoft.com can be read here.
Its very cool to see real-world examples of Vista's speech recognition
working so well for the disabled...

Labels: disability, mouse grid, Satoru Iehira, speech recognition, Windows Vista

Monday, April 07, 2008

SendChat - new universal speech-to-text messaging!

↑ top

TMCnet reports today that SR Virtual has developed SendChat, a "state-of-the-art, voice-to-text SMS" that mobile users can download to any mobile phone, regardless of which wireless carrier they are using. SR Virtual notes that SendChat will install as easily as a new ringtone, (welcome news to those of us who don't fancy tiresome fiddling with mobile phones..)

TMC further reports that SendChat uses smart technology that continues to learn the user’s vernacular and diction; thereby increasing its transcription accuracy with every use.
Very Cool.

SendChat is server-based which is why it will work with phones on any wireless network, and also spares users the tedious chore of updating new versions as well.

We don't have a download address, yet, but we will contact them shortly and see if we can glean further info; this promises to be a godsend to those of us who do text, but hate the "thumb typing" that goes along with it!

Labels: remote speech recognition, SendChat, speech to text messaging, SR Virtual

Friday, April 04, 2008

BlueAnt's new speech recognition gadgets - cool!

↑ top

Via PCMag.com, today - At this year's CTIA Wireless conference:

"BlueAnt showed off three new Bluetooth-enabled products—two of which offer voice recognition—that the company had originally announced back in January at CES 2008. Its V1 Voice Controlled Headset features Sensory, Inc.'s BlueGenie Voice Interface, which eliminates having to memorize complicated sequences of button pushes. Instead, you talk to the device and (in theory) it tries to figure out what you want it to do. The V1 also includes dual microphones along with Voice Isolation Technology for more effective noise cancellation, wind noise reduction, and better sound quality.

The second model, the BlueAnt Supertooth 3, is an upgrade to last year's Supertooth Light hands-free device. Like the older model, it either clips to your car's sun visor or sits on your desk. It builds on the former model's text-to-speech technology, which announces incoming callers names and caller ID information. The Supertooth 3 lets you activate voice dialing, answer calls, reject calls, and redial all with just the sound of your voice, and also includes a DSP chip for noise cancellation."

Labels: automated speech recognition, BlueAnt, Bluetooth

Thursday, April 03, 2008

Comparing SpeechMagic to Dragon (sort of)

↑ top

The SpeechRecognition blog, an excellent site that focuses on speech recognition in the Healthcare Industry has posted a comparison of sorts between Philips SpeechMagic (a network solution) and Dragon NaturallySpeaking (a desktop solution).

A sysnopsis of its author, Claire Betis' conclusions:

"Both products offer a similar choice of medical dictionaries, covering general medicine and a number of specialties, in a wide range of languages."

"SpeechMagic was designed as a network solution while Dragon was originally made for individual users in the consumer world... I am even tempted to conclude this SpeechMagic-Dragon comparison thread by saying both products shouldn’t be compared in the first place."

We agree with No. 2, wholeheartedly. SpeechMagic is sold through various industry-specific vendors, and the support structure is accordantly different; it is network based and not really comparable to a desktop speech recognition program that doesn't offer (nor claims to) any file management and workflow features and/or any server-side implementation whatsoever, for that matter..

** Additionally, two of our members have dealt with Philips directly regarding a project for Windows Vista™ speech recognition, and we have to say that Philips' responsiveness re: support and communication really put Nuance to shame, IOHO.

Labels: Claire Betis, comparison, Dragon Naturally Speaking, Speech Recognition Blog, SpeechMagic

Rob's Rhapsody Alert - 2 New Programming Videos

↑ top

Rob Chambers (Speech Program Manager for Microsoft) has posted a notice in his Rob's Rhapsody blog about two new videos in MSDN.
They are:

"How Do I: Get Started with Speech Recognition?
Will Depalo, Microsoft MVP, "explains the runtime and development requirements for doing speech development natively with Visual C++.
In addition he discusses create grammar files and how to compile them."

"How Do I: Use Speech Recognition in an Application?
Will explains how to use speech recognition in the applications that you build with Visual C++ so that your users can control them with their voice.
In addition, he explains how to set up the development environment and how to use the Platform SDK's grammar compiler."

The MSDN pages offer video and audio downloads in all formats as well as embedded Silverlight videos.

Labels: C++, MSDN, Rob Chambers, Rob's Rhapsody, speech recognition, Vista, Will DePalo

Wednesday, April 02, 2008

Speech recognition gets increased public confidence

↑ top

Callcentres.net has released a study in Australia with some quite welcome revelations..
The high points --

Overall, customers were significantly more satisfied with their speech recognition experience in 2007 than they were in 2005.

The research also shows that speech recognition is the preferred self-service interface; 66 percent of survey respondents preferred speech across the internet, & 59 percent preferred speech recognition over touch-tone (DTMF-driven) IVRs, when using the telephone.

Dr. Catriona Wallace, a director of callcentres.net said: "Confidence has emerged as a key factor influencing satisfaction with speech recognition. The research showed that frequent users of speech technology have a statistically significantly higher level of satisfaction with the experience than new or inexperienced users.

"It's also interesting to note that while men are more willing to try speech recognition, it's women who are more likely to become real advocates of the experience once they've used it," Dr. Wallace explained.

Labels: Australia, callcentres.net, confidence, remote speech recognition, study

Speech-enabled Yahoo oneSearch is released

↑ top

Today at CITA Wireless 2008, Yahoo announced version 2.0 of its oneSearch mobile search application now includng Voice-Enabled Search.

A combination of predictive query entry and speech-recognition provided by vlingo, oneSearch is only available now for select Blackberry devices including the 8800 series, Curve, and Pearl, but Yahoo stressed that other handset support would follow shortly.

"Consumers can search for anything, including flight numbers, locations, Web site names, local restaurants, and more, by simply speaking," a release from Yahoo detailed. The voice-activation software is now available for download on a number of RIM's BlackBerry devices, and Yahoo has said that over the next few months it will be compatible with more handsets.

We're thrilled to see speech recognition emerging as a driving force for mobile search. We'd hope that Yahoo! and many others will begin to use vlingo's (the only close 2nd to Microsoft's voice searching on Windows Mobile/Smartphones) technology to expand and tweak mobile-centric searches.

Even though hurdles still exist for mobile-centric speech searching (noise canceling, poor voice signal quality) it’s begun to receive some serious integration as of late. Other outfits like Free411, Goog411 and Ask.com are using speech recognition technology; and the new ChaCha has a pretty robust speech recognition built into its new mobile-centric searching.

Labels: remote speech recognition, vlingo, Yahoo oneSearch

Microphone in your tooth? Available now!

↑ top

A company called Chinavasion is now in production of, yes, you read it right - a Bluetooth microphone inside your tooth of choice.

· Click to visit the device's web page ·

Per Chinavasion's web page:
"The durable composite resin filling is designed to fit in a hole 2.2mm in diameter and 1.7 mm deep and will pick up sound and vibrations from your mouth to produce incredibly clear sound.. never forget your trusty bluetooth kit ever again, simply install and forget".

It's a unique concept, that's for sure. This mandible-to-microphone technology's been around a while, and works very, very well.
(E.G. Aliph's Jawbone headset).

Are we seeing the future of microphones .. ?

Labels: Bluetooth, Chinavasion, tooth-mounted microphone

Tuesday, April 01, 2008

Nuance challenges SpinVox in voicemail-to-text

↑ top

Nuance Communications, Inc., announced today at CTIA Wireless 2008 the "Nuance Voicemail to Text". Offered via wireless carriers, transcribed messages are sent to users as SMS or email messages.

Move over, SpinVox, a big dog is headed for your porch..

“Converting voicemail to text is a powerful and simple concept. But implementing a highly scalable semi-automated service is far more complex and requires highly accurate speech recognition – technology that takes decades to develop,” said Steve Chambers, president, mobile and consumer services division, Nuance. “The Nuance Voicemail to Text Service integrates speech technology with over 3,000 Nuance transcriptionists, hosted in a Nuance-owned facility, with proven security, scalability, and reliability.”

Looks like SpinVox is about to get a real run for the money!

Labels: Nuance, SpinVox, transcription, voicemail to text

Catalog ordering, with mobile speech recognition!

↑ top

France Loisirs, the French version of a mini-Amazon.com, has now selected Atos Worldline to host and operate a new automated telephone order service, based on speech recognition. The service is named “Commande Flash” & both smooths out / combines the different purchasing channels already available and reduces human-handled calls.

Thanks to speech recognition technology powered by Nuance, the caller speaks key words, a membership number and the products they wish to order.

Marc Duteil, customer relations General Manager at France Loisirs noted “The members quickly adopted the ‘Commande Flash’ tool which provides real comfort, especially for mobile phone users."

Founded in the early 1970s, France Loisirs, a 100%-owned subsidiary of the Bertelsmann group (50% owner of Sony BMG Music Entertainment), now has 3.5 million members in France, one household out of five. With 24 million books sold per year, France Loisirs accounts for 8% of the French publishing market. The brand uses a multi-channel distribution network and has 208 shops in France.

Now, imagine a speech-powered ordering system for, say Amazon.com...

Labels: Bertelsmann Group, catalog ordering, France Loisirs, mobile speech recognition

Navigating with speech, a la AT&T

↑ top

Via SlashPhone.com:
"AT&T announced today at CITA Wireless 2008, the immediate availability of its company-branded GPS-enabled navigation service, AT&T Navigator. The AT&T Navigator service features audible and visual turn-by-turn driving directions, including full-color moving maps, using GPS directly from your wireless phone. Working with TeleNav, the service is available on GPS-capable PDAs and handsets."

"The service also provides integrated speech recognition for address entry and points of interest search. You simply press a single button on your phone and speak the name of the business or address you want to find, and AT&T Navigator will provide voice and on-screen turn-by-turn directions to your destination. Integrated speech recognition is currently available on BlackBerry devices, but AT&T plans to make this feature, along with other value-added services, available on more handsets in 2008."

Cool. Very cool.

Labels: ATT, Blackberry, Navigator, speech recognition