[Message Prev][Message Next][Thread Prev][Thread Next][Message Index][Thread Index]
Re: Speech recognition system for Home automation.
"Soren" <soren.skou.nielsen@xxxxxxxxx> wrote in message
> Research has already reached a point where you can get very good
> speech recognition down to a SNR of 10 dB. This article got around 80%
> correct for a 10 dB SNR (http://www.ee.ucla.edu/~spapl/paper/
> cui_fcpoly.pdf), ok it's in the lab, but they also tested different
> noise types, like restaurants, airports, etc. Now, noise at home is
> rarely at the level of a restaurant.. or an airport :) (at least if
> you don't have kids)
Lab results for SR or VR have to be taken with a grain of sand. In the real
world, people slur their speech for more than they ever realize. Probably a
lot more than researchers do. I doubt these are double blind tests, either.
The researchers are often on the development team and have a bit of a bias.
But I don't think that's news to you.
I've used at least 5 different incarnations of Dragon Dictate and other
similar programs over the years and the improvement has been phenomenal, but
I'm also aware that I speak a very different way to the dictation mike than
I do to another human being. The - spacing - between - each - word - is -
very - pronounced - and - my - wife - certainly - is - not - pleased -
when - I - use - the - same - halting - speech - on - her. Why do I talk to
DD that way? Well, while I was training DD, it was also training me! I
learned that the best way to avoid misrecognition was to - speak - like -
this - and - annunciate - very - clearly.
The 80% figure they cite means one out of every five words is misheard. For
dictation, that's not even close. Even at 95+% recognition for Dragon, I
still find myself unhappy at the amount of manual correcting I have to do.
For home automation, I don't think I'd accept every 5th command going
unheard, or worse, misheard and the wrong action taken.
> Theres no doubt in my mind that signal processing
> algorithms will reach a level where speech recognition will even work
> well in heavy noise, and it's not in the too distant future. Of
> course, beating the human hearing is not around the corner.. if you
> can't understand what a person is saying while the TV is on, then most
> likely neither can the computer.
I agree wholeheartedly. I think home SR and VR (word recognition vs.
actually ID'ing the speaker's voice) will both be along soon, but the
improvements will come from better, faster, smarter, lower power CPUs,
cheaper memory and algorithm improvements based on the feedback developers
gain from studying large corporate systems.
I use a number of SR-based services, and some of them are quite good at
natural speech processing. But most of them, like my pharmacy refill
system, are restricted to a very, very narrow set of commands, usually 0
through 9, the pound and the star key, and sometimes "yes" or "no." Whether
that's because these were simply fast ports from touch tone customer
response systems or because recognizing only those commands boosts
reliability tremendously, I can't say.
> There are already home speech
> recognition systems, like the butler-in-a-box, that provide noise
> robust speech recognition to some extent. I'm certain that the
> algorithms today have reached a level that makes robust speech
> recognition in your home, with moderate noise possible, and i'm not
> surprised if such systems already exists and works well. I just want
> to see how good/bad a REALLY cheap DIY system works :)
I'd be more surprised than you to find such systems working well outside the
lab or without the kind of computing horsepower that puts the cost or size
or complexity outside the reach of your requirements.
There are a lot of reasons working against it, and those reasons are pretty
common in the tech arena. When early adopters take on a technology and it
seems "almost there" but never quite "all the way" those technology leaders
can actually become brakes on large-scale acceptance of the technology.
It's very similar to LCD TV's and CFL bulbs. The earliest revisions of
these two technologies worked, but with lots of warts. Early LCDs TV were
dim, had refresh rate issues, low contrast and narrow viewing angles. So
the "stink" got on LCD TVs and the better, but in their own way troublesome,
plasma TVs surged ahead. The newest LCD panels coming out of equally new
huge factories
http://seekingalpha.com/article/5698-all-out-war-sharp-vs-matsushita-how-to-
invest-in-related-japanese-south-korean-stocks-mc-sne-pio-ewy
are light years ahead of those produced just two or three years ago. But
the stink will stay on them for quite a bit longer because of the poor
performance of the first efforts. Part of the LCD problem was that their
performance was tied to another relatively new technology, the CFL. Both
technologies have matured quite a bit. Even so, some LCD TV makers have
switched to LED backlighting to overcome the few, though gnarly, remaining
issues with CFLs.
http://www.engadget.com/2007/05/14/samsung-poised-to-introduce-white-led-bac
klit-displays/
What does this have to do with sound recognition? It's still not mature
enough to reach the point where the "stink" of early failures doesn't follow
it. Yes, you can make it work but you have to "want to" - it's not going to
be bulletproof out of the box without some adjustment on the part of the
user.
I hope you find what you're looking for and if it works, share the results
with us. The time for cheap, reliable and standalone SR is fast
approaching. My new $400 HD LCD TV with 7 inputs including VGA has
convinced me that LCD HD TV's have arrived at a price point that will make
the abandonment of old CRT TV's pretty painless. Driven by a fairly new PC
with a ATI Radeon 7000 with a puny 32MB it produces some of the best still
and video images I've ever seen on a large screen. It wasn't until I went
out and actually looked at the very newest models that I overcame my own
prejudice about LCDs based on recent, but not absolutely current,
experience.
LCD technology has mostly cleared the hurdles that plagued early products,
I'm hoping home SR will get there, too, without requiring the use of tiny
tracking shotgun microphones in every room or a permanent Star Trek
communicator badge. Ironically, though, I think the resolution of the
problem *will* be the badge because so many other technologies are
converging on the endpoint of wireless connection between the electronic
world and a person's eyes, ears and mouths.
--
Bobby G.
comp.home.automation Main Index |
comp.home.automation Thread Index |
comp.home.automation Home |
Archives Home