[Message Prev][Message Next][Thread Prev][Thread Next][Message Index][Thread Index]

Re: Speech recognition system for Home automation.



On Sep 18, 3:03 pm, "Robert Green" <ROBERT_GREEN1...@xxxxxxxxx> wrote:

> Lab results for SR or VR have to be taken with a grain of sand.  In the real
> world, people slur their speech for more than they ever realize.  Probably a
> lot more than researchers do.  I doubt these are double blind tests, either.
> The researchers are often on the development team and have a bit of a bias.

I've actually worked on some SR when I did my thesis. Usually you have
large databases with many people saying the same words, some
researchers make their own, others download the available ones on the
net. I've read a few where they'd only use a single database, but the
serious ones use utterances from different databases to prove the
robustness of the algorithm. Slurring of speech is really common and
It can really be a problem. I've listened to many utterances of
different sentences, and if you did not know the exact words they
spoke, you could actually come in doubt yourself. Especially if the
sentence was just random words. Thats a huge problem with SR, when we
as humans hear a mumbled word, we might not notice it at all, since
the brain perfectly understands the context of what was said, and can
"guess" the correct word, even though it actually sounds like 5-10
similar words. Its a bit like the old famous "you dnot hvae to wirte
the ltteres in the crroect oderr, the bairn sees the wrod as a wolhe".
Teaching a computer to understand context is an enormous task.. maybe
possible in a short sentence, but in an entire conversation!? Not
today, That's the next step :)

As for bias, yes it certainly happens. One should always make an
effort to spot weaknesses in articles, but it can be very tricky
unless you've actually done the work. I've been fooled a couple of
times. :)

> I've used at least 5 different incarnations of  Dragon Dictate and other
> similar programs over the years and the improvement has been phenomenal, but
> I'm also aware that I speak a very different way to the dictation mike than
> I do to another human being.  The - spacing - between - each - word - is -
> very - pronounced - and - my - wife - certainly - is - not - pleased -
> when - I - use - the - same - halting - speech - on - her.  Why do I talk to
> DD that way?  Well, while I was training DD, it was also training me!   I
> learned that the best way to avoid misrecognition was to - speak - like -
> this - and - annunciate - very - clearly.

I've read somewhere (sorry, no references) that some of the newest SR
software is able to detect words in fluent speech. I bought the HM2007
chip which is old and discontinued, I will definitely have to speak
very clearly and with "large" spacing if I wanted to say full
sentences, 1-2 secs in between words I believe the datasheet said. But
what I am after in the beginning is really just a robust on/off SR
switch. The HM2007 has a 40 word memory. You could train "on" and
"off" 20 times each. The more the better, as is also shown in the
article, the more they utterances the average over, the better the
error rate becomes. They did an average over 300 utterances and get
99% accuracy in some cases.

> For home automation, I don't think I'd accept every 5th command going
> unheard, or worse, misheard and the wrong action taken.

Exactly, for a robust system that people would actually want to use in
their daily lives you'd almost need 99% accuracy. Not there yet, but
getting there :) False triggering can be avoided somewhat, by using
special sequences of words, that are not too similar.. that'll be my
initial approach. Something in the lines of a trigger word.. and then
the command.

> I use a number of SR-based services, and some of them are quite good at
> natural speech processing.  But most of them, like my pharmacy refill
> system, are restricted to a very, very narrow set of commands, usually 0
> through 9, the pound and the star key, and sometimes "yes" or "no."  Whether
> that's because these were simply fast ports from touch tone customer
> response systems or because recognizing only those commands boosts
> reliability tremendously, I can't say.

Recognizing only a very limited set of commands does boots
reliability, and keeps costs of stand alone systems to a minimum.

> I'd be more surprised than you to find such systems working well outside the
> lab or without the kind of computing horsepower that puts the cost or size
> or complexity outside the reach of your requirements.

I believe that any really robust system, would probably be very
complex and expensive.. I don't know how well the mastervoice (butler-
in-a-box) works.. but at a price tag of 3000$, i hope it works very
well. It's probably also a full PC with some attachments. A cheap
robust system? Robustness is a highly valued quality in SR or VR, and
people would be willing to pay big bucks for those extra 5-15%
improvement in errorrate. My hope is that you could construct a simple
system with very few features, that works reasonably well.. I'll share
my errorrate when I get there :) A bit like the "clapper" on/off
switch that was so popular in the 80's, only more robust :)

> What does this have to do with sound recognition?  It's still not mature
> enough to reach the point where the "stink" of early failures doesn't follow
> it.  Yes, you can make it work but you have to "want to" - it's not going to
> be bulletproof out of the box without some adjustment on the part of the
> user.

I think thats a good comparison, and I couldn't agree more.

> I hope you find what you're looking for and if it works, share the results
> with us.  The time for cheap, reliable and standalone SR is fast
> approaching.

Thanks :)

> LCD technology has mostly cleared the hurdles that plagued early products,
> I'm hoping home SR will get there, too, without requiring the use of tiny
> tracking shotgun microphones in every room or a permanent Star Trek
> communicator badge.  Ironically, though, I think the resolution of the
> problem *will* be the badge because so many other technologies are
> converging on the endpoint of wireless connection between the electronic
> world and a person's eyes, ears and mouths.

Interesting point.

As a matter of fact, I'm still not really decided on LCD technology
yet. Ok, I switched from my old CRT to a 22" Wide LCD for my computer,
and it looks great! But I'm kind of wary of buying a large LCD TV. I
still see old "LCD effects" in rapid camera movement, even on some of
the new LCD's. And of course, to make matters worse, HD TV has not
reached danish broadcasters yet. I think SED's looks promising.. but
LCD's can probably improve tremendously in the time it takes for SED
TV's to reach todays LCD prices.


Regards,
Soren





comp.home.automation Main Index | comp.home.automation Thread Index | comp.home.automation Home | Archives Home