Voice Recognition?
Saturday, October 25th, 2008ast week I had the possibility to visit two impressive exhibitions in Redmond where Microsoft displays their visions for future technological scenarios at both home and work. I was delighted to be able to experience in person many real implementations of futuristic ideas and concepts that I had only previously seen in science fiction movies. The only words that came to my mind while I was taking part in the demo were cool, awesome, incredible, fantastic, plus a couple of I definitely want that for my house.
But (there is always a “but”) in addtion to the theoretical prices that all those gizmos and systems might have in the market nowadays, there was something else kind of discouraging about all those future technological scenarios: the majority of them were human-voice driven. Do you guys remember how astronauts Dave and Frank spoke to Hal 9000 giving orders to the control system in A Space Odyssey? Or how Rippley talks to Mother as though that machine were a person under her command in Alien? That is the way you will talk to your home or your office in the future. And that would be perfectly okay if it weren’t for something called accent.
If you have ever tried to call a hotline in a foreign country and to your dismay the system responds with an electronic voice saying: “if you want soandso, SAY soandso; if you want blablabla, SAY blablabla” then you know what I am talking about. The main problem here is that the soandsos and the blablablas that are coming out of your mouth are different from those that the machine is expecting -not necessarily because your pronunciation is bad, but just because you accent makes them sound different- and there is no way the machine can understand what you want. Note that I am not even talking about different languages - just different accents. Who is to say that an Indian accent is not appropriate to command a machine in the US, for example?
Unfortunately that is just the first aspect of the whole issue, since things can get more complicated if you also take into account the pronunciation of personal names. Let me give you an example:
Microsoft has a machine-operated telephone number where you can call and ask to be connected or leave a message to a MSFT employee without knowing that person’s number. It is a cool and useful system… only if you know how to pronounce the name of the person you want to talk to in a way that is understandable for an American voice recognition system. “Oh, come on, that’s easy“, I bet you are thinking… Right… then tell me how to pronounce my last name with American accent because I’ve been unable to find myself using that damn system. Well, myself or any non-American person, for that matter. Germans, Indians, Frenchies, Chinese, Spaniards…, they all are totally unidentifiable for that machine no matter what I say whatsoever. Really annoying considering that Microsoft is the closest thing to the United Nations I know when it comes to nationalities. How do you pronounce Punjiamurthula, or Tordable, or Tzongjhy, or VondenBospoort?
So just imagine that I am heading out for Redmond to attend to a meeting. The rain (hey, this is Seattle after all, don’t forget that) has caused a big accident on the 405 and I am stuck in traffic. Need to call my contact to let him know I am late, but I don’t have his telephone number handy. Well, no problem - call the magic number and let the artificial intelligence help you.
- tuuuut, tuuut,…
- Hello. Welcome to the Microsoft directory. Please say “connect with” if you want to talk to someone, or “message for” if you want to leave a message, and then the name of the person you would like to be connected to.
- Connect with Eugenio Pace
- Sorry, I did not hear you. Please say “connect with” if you want to talk to someone, or “message for” if you want to leave a message, and then the name of the person you would like to be connected to.
- C o n n e c t w i t h E u g e n i o P a c e
- Sorry, I did not understand you. Please say “connect with” if you want to talk to someone, or “message for” if you want to leave a message, and then the name of the person you would like to be connected to.
- Bueeeno, ya empezamos…
- Did you say Connect with Mike Manos?
- No, I said Connect with Eugenio Pace
- I am sorry, I cannot understand you. You need to speak more clearly. Please say the name of the person you would like to be connected to
- Iuginiou Pahzeh
- I am sorry, but I cannot find that name in the directory…
- Joder…
- Did you say Godert? I have two people by that name located in the Neetherlands and…
- Shit!
- Oh, you are looking for the closest bathroom in campus? Give me your location and…
- NO! It’s okay, never mind, I’ll hang up…
- Do you really want to hang up? You don’t want to be connected to Eeuginiou Peiz?
- YES! that’s him, Iuginio Peis
- Sorry, I cannot find that person in the directory. Try spitting out the gum you are chewing…
- WTF!
- Ooops, sorry, no services for that in the campus. But I can give you a few numbers in Seattle…
Well, maybe that is NOT exactly what the machine really says on the phone… But the result of the conversation is pretty much the same: I always end up having the feeling that the system is pulling my leg since I don’t find my Spanish accent that difficult to understand.
So my bottom line here is that if you want voice recognition systems to really drive our globalized world, you better make them accent-proof. Of course you can always personalize those systems that are going to be used just by you… but if they become so general as people are expecting, there is no possible personalization at all and they will have to deal with dozens of different accents. And if a communication breakdown between humans can be fatal (double check this video1 if you don’t believe me)… what could miscommunication between machines and humans trigger in the future?
PA.