There is some larger issue end users can’t see going on here. I didn’t really appreciate this until:
M$ pulled Cortana from the mobile space and locked most of her features to school and work accounts; and
I learned from my time with SDuo that some Google Assistant features are limited to Android phones BUT NOT AVAILABLE on tablets (discovered when I changed the SDuo screen resolution to “small” rather than “default”. This made SDuo a “tablet” as far as GA was concerned, and GA stopped calling contacts by voice command for example.)
GA is still locked to Google default apps. Siri has the same limiting “feature” but tends to play better with others in iOS. I admit I haven’t tried Alexa, but I trust Bezos slightly less than I trust Zuck, which is zero.
So what’s the REAL issue hampering progress here? Is it a control thing? A marketing fear? I don’t believe it’s a limitation of handset technology. I was impressed when Google brought foreign language translation voice processing back to the handset on Pixel 6 - granted with a special chip. If we can have real time foreign language translation processed on local devices, the REAL personal assistant capability exists but is being held back.
I used to use Alexa, it was better than Google at a lot of things (like whisper mode - to get quiet responses if you whisper instead of normal volume, and brief mode - to get just a beep acknowledging that I told it to turn on an outlet) but it also wasn’t as good for the services I used.
Now I use Google Home, which is overly chatty, but on top of that also does a lot of really dumb things sometimes. Like if I’m in my living room and talk to my speaker and tell it to ‘turn off the lights’, it doesn’t give a long response and knows I mean the ones in the living room (based on the speaker location being tagged). But if I’m in the same room and it thinks I’m talking to my phone in my pocket, not my speaker, it will tell me that it can’t do that unless I unlock my phone. And then if I unlock my phone it’ll turn off all the lights with “Okay, turning off 19 lights” because the phone doesn’t know where in the house I am.
Or another really dumb one. Like the majority of business accounts (probably by a 70-30 margin over Google), I use Exchange for work email, so that’s where the majority of my calendar is. And so that’s fine, my Exchange calendar is downloaded into my phone calendar storage, and yet you’re telling me that Google Assistant (even on the phone) can’t read back to me what’s on my calendar? Or ideally, take that data and share it with my other devices that are signed in with the same Google account so that all of my devices know my calendar? Or ■■■■, just let me add an Exchange account to Google Assistant direct from the cloud.
So, to answer the original question (as someone who worked in this field and the reason why I’ve gone to CES), to put it simply, words are difficult to deal with.
A voice AI can be trained well for a specific domain and specific tasks, like one would see with Alexa. But once you begin combining tasks, with all the intricacies inherent with language, things can get crossed up exponentially. Giving the AI more training, more data, doesn’t necessarily help. I’ve had instances having the AI trained well, and then adding that one little additional piece of training would completely render it useless. It’s exactly like the issue of overtraining a neural network.
To give a super simplified example, imagine if I had an AI perfectly trained to take pictures on command. And then, let’s say I want to also train it to play basketball. So then what happens when I ask it to “shoot”? Now I’ll need to train it to understand that maybe if I were having a conversational about sports beforehand, it will assume that I want it to shoot the ball. Now what happens if I want to train this AI to fire a gun? Or play billiards? Or to understand that “shoot” is synonymous with disappointment?
Then when people start talking normally, using slang, jokes, jumping around and coming back to topics, assuming things from context and situation, useability can go easily awry.
The other challenge is that a lot of the tech, the speech recognition, the language processor, and the speech synthesis are mostly cloud based. The tech wants to continue to have data feeding it to learn… But having your words and data going through cloud servers presents a big logistical and legal challenge of privacy and having the additional constraints of having to scrub it. That’s where from my experience, corporations get stuck on these implementations, slowing down the growth of progress in this field.
This is what I’m curious about with Tesla full self driving. They keep having to fix edge cases, but I wonder how they keep track of adverse side effects. Running millions of simulated real world simulations I guess, but it certainly seems like a challenge!
I have a personal “20/80” rule when it comes to training neural networks, as in, it takes 20% effort to get 95% accuracy, and 80% effort to try to get that last 5%. And I don’t think it’s ever possible to get 100%, with edge cases like you say. I suspect that is the problem with autonomous vehicle programming, who wants to be responsible for not catching edge cases that have tragic consequences for the consumer?
Automakers can run millions of simulations to test drive a vehicle in software that mimics real world physics. But training an AI can be an art form in whack a mole fashion. For me, it’s impossible to tangibly understand how data effects the pachinko like pathways a neural network chooses to use. It’s a ton of trial and error data curation/labeling with QA testing. I will say, I personally find it easier to model visual things compared to words and language, which again speaks to the difficulty of voice assistant programming.
Yeah I imagine that’s hard when decision making is buried in thousands (millions?) of matrix elements. Interesting challenge to ‘debug’ or assign fault! Kind of jealous, it sounds like a very intriguing field to be working in these days!
Another one that’s kind of dumb to me, my Google phone can read notifications to me. But my Google speakers can’t even give a notification tone that I’ve got an incoming phone call?
For that matter, I use Google Fi. I can make outgoing phone calls from any speaker. And I can even get incoming phone calls on any web browser. Why can’t I get incoming phone calls on the Google speakers exactly?
Not that I’d even necessarily heavily use that, but I was just thinking that it’d have been nice to have had some idea that the phone was ringing while I was in the shower, with music playing over my Google speaker that is in the bathroom.
You know that’s one of the things I’ve pondered. Both Siri and Google have have their dumb and smart areas. I wonder how much that has to do with the differing approaches to Apple and Google.
In other words, the majority of processing is on the phone with Siri and is cloud based with Android phones. I’m not arguing one is superior, to my mind in aggregate the two are a wash. I’m just wondering how those choices play out real world.
We have both - I got a Home for £10 when I bought a present for my partner. The Google speaker tends to be far more accurate for weather but it’s nowhere near as entertaining as Alexa (my 12 year old daughter has one and we have one downstairs). I’m learning French using the Alexa (translate to) command and my daughter uses the German and French to help with her language accents.
My daughter and I got into interactive stories playing several of the free story games on Alexa and we’re looking to self publish one on the Amazon store this year sometime.