This processor represents a very significant leap forward; for both speech recognition "in general" but primarily for what we all love to hate - the infamous IVR's we hear answering most of our phone calls today, to most companies.
IBM notes: "Speech recognition systems in telephony applications for automated call centers represent the largest segment of the speech processing market". How true, and how sad the IVR performances that we often encounter really are. Who hasn't begun to pound on the "0" key and/or shout "Operator! Customer Service!" in desparation when encountering one of these infamous sad perfomers..
Current multi-channel speech recognition systems, that use "clusters" of traditional CPU's can manage between 20 to 30 speech channels in real time.
The Cell/B.E. can handle thousands of simultaneous voice channels in real time, and IBM states "On both the Cell/B.E. processor and the software platforms, recognition accuracy was 99%".
Take a look at the tremendous difference in performance, below:
(In the Table above, 1 RTC = 1 second of audio per 1 second of processing time)
The performance measurement above is based on speaker-independent recognition of a small vocabulary, based on the TIDIGITS corpus that included the digits "zero" through "nine" including the "oh" pronunciation for zero, using a propreitary IBM speech recognition engine (who's recognition algorithims are explained in detail). It's obviously fairly simple recognition testing, even though it included speakers of different gender and dialects.
Nonetheless, IBM notes the "performance of our prototype speech recognition engine on the Cell/B.E. processor can be extended to production systems because the SPE kernel programs were designed to scale with model and language complexity."
They further point out that due to the raw computaional power and memory management of the Cell/B.E, when used with systems that have large vocabularies and complex grammars, there will still be much higher recognition accuracy even when complex recognitions are put to task.
Additionally, they plan to add in models built on the Texas Instruments/Massachusetts Institute of Technology ("TIMIT ") model which is designed specifically for automated speech recognition systems.
Also interesting is IBM's plans to optimize the Cell/B.E. for recognition of compressed speech signals, and, they mention pursuing the proverbial Holy Grail, software-based noise canceling: "...we are trying to classify speech from background" !!
The research team goes on to say in their conclusion:
"We have implemented and demonstrated a prototype speech recognition engine that is capable of processing approximately 1,000 speech channels on a single Cell/B.E. processor. The kernel computations are designed to be highly scalable, and we expect this performance result to generalize well to commercial speech systems".
What a terrific development. IBM's embedded ViaVoice won Speech Technology Magazine's 2007 Market Leader award; and some of us remember when ViaVoice for continuous speech recognition (circa 1997, 1998) was the the best of the best. We admit we are definitely looking forward to seeing this technology emerge into the mainstream, and push speech recognition to new limits!