Why voice assistants aren't as useful as they could/should be

This puts words to why I have been and continue to be disappointed in what should be a terrific technology.

I use either Google or Siri, but it’s pretty limited eg. “call Dave Smith”, play Foo Fighters, what is the weather and that’s about it.

And I do find it funny that both Alexa and Siri will admonish you if you swear at it, but they haven’t really done much to address why you might do that in the first place.
Siri and Alexa are getting on their owners’ last nerves - The Washington Post

2 Likes

Thanks for that link @Desertlap.

There is some larger issue end users can’t see going on here. I didn’t really appreciate this until:

  • M$ pulled Cortana from the mobile space and locked most of her features to school and work accounts; and
  • I learned from my time with SDuo that some Google Assistant features are limited to Android phones BUT NOT AVAILABLE on tablets (discovered when I changed the SDuo screen resolution to “small” rather than “default”. This made SDuo a “tablet” as far as GA was concerned, and GA stopped calling contacts by voice command for example.)

GA is still locked to Google default apps. Siri has the same limiting “feature” but tends to play better with others in iOS. I admit I haven’t tried Alexa, but I trust Bezos slightly less than I trust Zuck, which is zero.

So what’s the REAL issue hampering progress here? Is it a control thing? A marketing fear? I don’t believe it’s a limitation of handset technology. I was impressed when Google brought foreign language translation voice processing back to the handset on Pixel 6 - granted with a special chip. If we can have real time foreign language translation processed on local devices, the REAL personal assistant capability exists but is being held back.

WHY?

2 Likes

A huge problem is that they are incapable of understanding more than one language at a time. Try and insert a foreign name in a query and it goes haywire in an instant.

3 Likes

Regardless of assistant one of the first things I do when I get a new device is to turn it off. I do use dictation, however. Siri wrote this post for me.

2 Likes

Interesting. My brother does as well, but for me personally it seems to be more trouble than it’s worth as it makes even more typos than when I type.

I realize though, that says more about me than Siri :slight_smile:

1 Like

Can’t even get the weather.

I use my Echos for lights, alarms, calculations/conversions I’m too lazy to do, and the odd bit of trivia that occasionally come to mind (most recent being how many species of penguin there are).

I’d be glad of a replacement for my Newton which would allow writing such requests in an easy and natural fashion, w/ suitable integration to the system (calendar, &c.)

1 Like

I used to use Alexa, it was better than Google at a lot of things (like whisper mode - to get quiet responses if you whisper instead of normal volume, and brief mode - to get just a beep acknowledging that I told it to turn on an outlet) but it also wasn’t as good for the services I used.

Now I use Google Home, which is overly chatty, but on top of that also does a lot of really dumb things sometimes. Like if I’m in my living room and talk to my speaker and tell it to ‘turn off the lights’, it doesn’t give a long response and knows I mean the ones in the living room (based on the speaker location being tagged). But if I’m in the same room and it thinks I’m talking to my phone in my pocket, not my speaker, it will tell me that it can’t do that unless I unlock my phone. And then if I unlock my phone it’ll turn off all the lights with “Okay, turning off 19 lights” because the phone doesn’t know where in the house I am.

Or another really dumb one. Like the majority of business accounts (probably by a 70-30 margin over Google), I use Exchange for work email, so that’s where the majority of my calendar is. And so that’s fine, my Exchange calendar is downloaded into my phone calendar storage, and yet you’re telling me that Google Assistant (even on the phone) can’t read back to me what’s on my calendar? Or ideally, take that data and share it with my other devices that are signed in with the same Google account so that all of my devices know my calendar? Or hell, just let me add an Exchange account to Google Assistant direct from the cloud.

2 Likes

So, to answer the original question (as someone who worked in this field and the reason why I’ve gone to CES), to put it simply, words are difficult to deal with.

A voice AI can be trained well for a specific domain and specific tasks, like one would see with Alexa. But once you begin combining tasks, with all the intricacies inherent with language, things can get crossed up exponentially. Giving the AI more training, more data, doesn’t necessarily help. I’ve had instances having the AI trained well, and then adding that one little additional piece of training would completely render it useless. It’s exactly like the issue of overtraining a neural network.

To give a super simplified example, imagine if I had an AI perfectly trained to take pictures on command. And then, let’s say I want to also train it to play basketball. So then what happens when I ask it to “shoot”? Now I’ll need to train it to understand that maybe if I were having a conversational about sports beforehand, it will assume that I want it to shoot the ball. Now what happens if I want to train this AI to fire a gun? Or play billiards? Or to understand that “shoot” is synonymous with disappointment?

Then when people start talking normally, using slang, jokes, jumping around and coming back to topics, assuming things from context and situation, useability can go easily awry.

The other challenge is that a lot of the tech, the speech recognition, the language processor, and the speech synthesis are mostly cloud based. The tech wants to continue to have data feeding it to learn… But having your words and data going through cloud servers presents a big logistical and legal challenge of privacy and having the additional constraints of having to scrub it. That’s where from my experience, corporations get stuck on these implementations, slowing down the growth of progress in this field.

4 Likes

This is what I’m curious about with Tesla full self driving. They keep having to fix edge cases, but I wonder how they keep track of adverse side effects. Running millions of simulated real world simulations I guess, but it certainly seems like a challenge!

1 Like

I have a personal “20/80” rule when it comes to training neural networks, as in, it takes 20% effort to get 95% accuracy, and 80% effort to try to get that last 5%. And I don’t think it’s ever possible to get 100%, with edge cases like you say. I suspect that is the problem with autonomous vehicle programming, who wants to be responsible for not catching edge cases that have tragic consequences for the consumer?

Automakers can run millions of simulations to test drive a vehicle in software that mimics real world physics. But training an AI can be an art form in whack a mole fashion. For me, it’s impossible to tangibly understand how data effects the pachinko like pathways a neural network chooses to use. It’s a ton of trial and error data curation/labeling with QA testing. I will say, I personally find it easier to model visual things compared to words and language, which again speaks to the difficulty of voice assistant programming.

1 Like

Yeah I imagine that’s hard when decision making is buried in thousands (millions?) of matrix elements. Interesting challenge to ‘debug’ or assign fault! Kind of jealous, it sounds like a very intriguing field to be working in these days!

Another one that’s kind of dumb to me, my Google phone can read notifications to me. But my Google speakers can’t even give a notification tone that I’ve got an incoming phone call?

For that matter, I use Google Fi. I can make outgoing phone calls from any speaker. And I can even get incoming phone calls on any web browser. Why can’t I get incoming phone calls on the Google speakers exactly?

Not that I’d even necessarily heavily use that, but I was just thinking that it’d have been nice to have had some idea that the phone was ringing while I was in the shower, with music playing over my Google speaker that is in the bathroom.

3 Likes

You know that’s one of the things I’ve pondered. Both Siri and Google have have their dumb and smart areas. I wonder how much that has to do with the differing approaches to Apple and Google.

In other words, the majority of processing is on the phone with Siri and is cloud based with Android phones. I’m not arguing one is superior, to my mind in aggregate the two are a wash. I’m just wondering how those choices play out real world.

1 Like

We have both - I got a Home for £10 when I bought a present for my partner. The Google speaker tends to be far more accurate for weather but it’s nowhere near as entertaining as Alexa (my 12 year old daughter has one and we have one downstairs). I’m learning French using the Alexa (translate to) command and my daughter uses the German and French to help with her language accents.

My daughter and I got into interactive stories playing several of the free story games on Alexa and we’re looking to self publish one on the Amazon store this year sometime.

1 Like

Oh, I do like Alexa. I don’t play the games anymore, though they are fun.

It’s just using them in English in Japan means that they just don’t work for things like weather or traffic as they butcher place names.

But I don’t really need them for that. Alarms, timers, trivia, linked speakers for music (Spotify has fantastic integration between platforms), radio, and podcasts are all great.